Content uploaded by Ran Goldblatt
Author content
All content in this area was uploaded by Ran Goldblatt on Jan 08, 2020
Content may be subject to copyright.
remote sensing
Article
Assessing OpenStreetMap Completeness for
Management of Natural Disaster by Means of Remote
Sensing: A Case Study of Three Small Island States
(Haiti, Dominica and St. Lucia)
Ran Goldblatt 1, * , Nicholas Jones 2and Jenny Mannix 1
1New Light Technologies Inc., Washington, DC 20005, USA; jennifer.mannix@nltgis.com
2Global Facility for Disaster Reduction and Recovery/World Bank, Washington, DC 20433, USA;
njones@worldbankgroup.org
*Correspondence: ran.goldblatt@nltgis.com; Tel.: +1-202-630–0362
Received: 26 November 2019; Accepted: 25 December 2019; Published: 1 January 2020
Abstract:
Over the last few decades, many countries, especially islands in the Caribbean, have been
challenged by the devastating consequences of natural disasters, which pose a significant threat to
human health and safety. Timely information related to the distribution of vulnerable population
and critical infrastructure is key for effective disaster relief. OpenStreetMap (OSM) has repeatedly
been shown to be highly suitable for disaster mapping and management. However, large portions of
the world, including countries exposed to natural disasters, remain incompletely mapped. In this
study, we propose a methodology that relies on remotely sensed measurements (e.g., Visible Infrared
Imaging Radiometer Suite (VIIRS), Sentinel-2 and Sentinel-1) and derived classification schemes (e.g.,
forest and built-up land cover) to predict the completeness of OSM building footprints in three small
island states (Haiti, Dominica and St. Lucia). We find that the combinatorial effects of these predictors
explain up to 94% of the variation of the completeness of OSM building footprints. Our study extends
the existing literature by demonstrating how remotely sensed measurements could be leveraged to
evaluate the completeness of the OSM database, especially in countries with high risk of natural
disasters. Identifying areas that lack coverage of OSM features could help prioritize mapping efforts,
especially in areas vulnerable to natural hazards and where current data gaps pose an obstacle to
timely and evidence-based disaster risk management.
Keywords:
OpenStreetMap; OSM; OpenStreetMap coverage; disaster management; remote sensing
1. Introduction
Over the last few decades, many countries have been challenged by the devastating consequences
of natural disasters which pose a significant threat to human health and safety and impact vulnerable
communities and critical infrastructure globally. Every year, natural disasters impact close to 160 million
people worldwide [
1
], causing destruction of the physical, biological and social environments, impacting
food security, and causing global losses that amount to over 100 billion dollars [
2
]. The frequency of
natural disasters has been steadily increasing since 1940 [
3
] and over the next century, climate change
will likely amplify the number and severity of such disasters [4].
While the impacts of natural disasters are worldwide, some countries have been more vulnerable
to different types of disasters than others [
5
]. For example, in 2017, Puerto Rico, Sri Lanka and
Dominica were at the top of the list of the most affected countries to natural disasters such as significant
precipitation, floods and landslides. Caribbean island countries are especially exposed to a wide
range of natural disasters [
6
] and small island developing states—which are frequently characterized
Remote Sens. 2020,12, 118; doi:10.3390/rs12010118 www.mdpi.com/journal/remotesensing
Remote Sens. 2020,12, 118 2 of 25
by coastal communities, geographic isolation, and limited technical capacity—are among the most
vulnerable countries to natural disasters and climate change [7].
Recognizing these trends, there is an increasing need for efficient and well-planned disaster
management and disaster relief operations. The term disaster risk management refers to the full
lifecycle of actions aiming to prevent, prepare for, respond to, and recover from disasters. Generally,
disaster risk management consists of four main phases: (1) Mitigation, i.e., activities that reduce the
likelihood and expected adverse impacts of a natural disaster event; (2) Preparedness, i.e., plans or
preparations to strengthen emergency response capabilities; (3) Response, i.e., actions taken to save lives
and prevent property damage in an emergency situation; and (4) Recovery, i.e., interventions aimed at
returning communities and infrastructure to a proper level of functionality following a disaster.
Timely geospatial information indicating the distribution of vulnerable population and the
location, availability and functionality of critical infrastructure is key for effective disaster relief
operations. Until recently, governmental agencies and the commercial sector were the primary sources
for geospatial data for disaster management. In the past decade, however, the public has been
increasingly recognized as a valuable source for geospatial information for disaster management [
8
].
Recent developments in web mapping technologies have led to disaster management operations that
are more dynamic, transparent, and decentralized, with an increased contribution by individuals and
organizations from both inside and outside the impacted area [
9
], including by means of geospatial
information that is contributed by volunteers. The term Volunteered Geographic Information (VGI)
refers to geographic information collected by individuals, often on a voluntary basis [
8
]. This data is
made open, freely accessible [
10
] and fills the deficiencies of traditional mapping technologies and
sources of data [
11
,
12
]. Governments in developing countries increasingly recognize the economic and
social value of VGIs and their potential to provide new ways to interact with the community and for
strengthening civil society [13].
1.1. OpenStreetMap (OSM) for Disaster Management
Created in 2004, OpenStreetMap (OSM) is a collaborative user-generated mapping project aiming
to provide a freely available geographical information database of the world [
14
]. During the first year
of the project, most mapping efforts focused on road and transportation networks. Today, however,
a variety of geographical features are constantly added to OSM’s database, including buildings
and their functionality, land use and public transportation information [
15
]. This data allows local
governments and communities to better perform risk assessment and emergency planning [
16
–
18
] and
is routinely utilized for various disaster risk management applications [
19
,
20
]. As of today, there are
more than 5.5 million OSM users and one million contributors who generate more than 3 million
changes every day. In the context of natural disasters, the coordination of volunteers’ mapping
efforts is operated by the Humanitarian OpenStreetMap Team (HOT), originally formed after the
Haiti earthquake, which conducts activities aimed at enriching OSM data to support emergency relief
operations (https://wiki.openstreetmap.org/wiki/Stats). Often, when disasters occur, there is a lack of
this essential information, which results in mapping campaigns, including Mapathons [
21
], which are
designed to map the impacted areas.
OSM data is collected by three main means [
22
]: (1) using GPS records, which can be uploaded to
the database; (2) relying on orthophotos and high-resolution satellite imagery to trace and digitize
features; or (3) importing datasets from external sources such as administrative census data. Recently,
large corporations, including Apple, Microsoft, and Facebook have been hiring editors to contribute to
the OSM database [
23
]. Several tools have also been developed to support OSM’s mapping efforts,
one of them is the MapSwipe app (https://mapswipe.org/), which enables volunteers to map and tag
geographical features on mobile phones based on satellite imagery [
24
]. Other initiatives, such as
Missing Maps (https://www.missingmaps.org/) allow volunteers trace features based on satellite images
by splitting mapping into small tasks, allowing remote volunteers to work simultaneously on the same
overall area (as of 2018, there were nearly 60,000 mappers contributing to Missing Maps) [25].
Remote Sens. 2020,12, 118 3 of 25
1.2. Assessing OSM Completeness and Accuracy
Although OSM road network data is estimated to exceed 80% completeness in relation to
the world’s roads and streets [
26
], in general, the coverage and completeness of OSM features
(including building footprints) vary significantly—not only between countries, but also within
countries. For example, completeness of coverage of remote and rural areas is often lower than that
of highly populated urban areas [
27
], and the coverage of developed countries tends to be lower
than that of developing countries [
28
–
32
]. These differences are in part due to societal factors, such as
population distribution and population density, distance to major cities and the location of contributing
users [29,33–37].
With the increased utilization of VGIs—including OSM—for disaster preparedness and response,
various methodologies have been proposed to assess the quality and the accuracy of the collected
data [
12
]; for example, in terms of data completeness, logical consistency, positional, thematic,
semantic and spatial accuracy, temporal quality and usability [
28
,
38
–
42
]. Several approaches have
been proposed to assess the completeness of the OSM database and the completeness of the street
networks [
32
,
43
], the land use and the building footprints [
44
]. The completeness of the coverage can
be assessed by comparing the OSM mapped features with external datasets, for example, national
administrative data [
28
,
44
–
47
]. Such data varies by country and is not always made available—especially
in developing countries.
In this study, we propose a methodology that utilizes remotely sensed observations to estimate
the coverage of OSM mapped features, specifically to identify gaps in the completeness of OSM
building footprints. In the past, expensive satellite imagery and limited computational power only
allowed analysis of small geographical contexts. This model is being replaced thanks to the accessibility
of publicly available and free satellite data that capture every location on earth every few days.
The availability of daytime (e.g., Sentinel-2, Landsat) and nighttime (e.g., DMSP Operational Linescan
System (OLS) or the Visible Infrared Imaging Radiometer Suite (VIIRS)) satellite imagery, together with
advancements in the capabilities of cloud-based computational platforms, now allows for analyzing
Land Use and Land Cover (LULC) characteristics of Earth across a greater geographic and temporal
scale. Land cover refers to the attributes of the Earth land surface and its immediate subsurface (e.g.,
biota, soil, typography, surface, groundwater and human structure). Land use refers to the purpose for
which humans exploit the land cover [
48
]. Because remotely sensed observations typically capture the
unique reflectance characteristics of physical objects on Earth, most remote sensing applications focus
on detection and classification of Earth’s land cover characteristics. Differentiation between different
types of Land Use (which typically do not hold unique physical characteristics) remains challenging.
In respect to OSM, mapped features can be tagged according to both, their land use and land cover.
Although OSM contributors are free to use their own tags, there is a quasi-official collection of tags that
has been established and agreed upon (for example, “landuse” and “landcover” keys or other more
specific keys such as “building” or “highway”) [
49
]. Previous studies demonstrated the potential use
of these tags to create detailed LULC maps [50].
The methodology we propose in this study relies on remotely sensed measurements to estimate
the coverage of OSM building footprints and to identify “mapping gaps” (i.e., areas that have not yet
been mapped). Previous studies have utilized OSM data for different remote sensing applications, for
example, for classification of urban areas [
51
] or for semantic labeling of aerial and satellite images [
52
].
Despite significant progress in the field of machine learning and the increasing availability of satellite
imagery, there is still a scarcity of studies aiming to utilize remotely sensed observations to estimate the
completeness of OSM building footprints at a given point in time. Identifying areas that lack coverage
of OSM features could help plan and prioritize mapping efforts, especially in areas that are vulnerable
to natural hazards and where current data gaps pose an obstacle to timely and evidence-based disaster
risk management actions.
By its nature, the OSM database is dynamic and is updated daily with thousands of new entries.
However, as discussed above, the frequency and extent of updates vary largely by geographical areas.
Remote Sens. 2020,12, 118 4 of 25
Some regions are being updated more frequently than others, and especially developing countries are
not fully mapped, which are often the most vulnerable to the impacts of natural disasters. The objective
of this study is to propose a methodology to estimate the completeness of OSM building footprints
based on remotely sensed measurements that are available at a global scale and are updated frequently.
We demonstrate our methodology in the case study of three small island states: Haiti, Dominica and
St. Lucia.
The remainder of this article is organized as follows. In Section 2, we discuss the methodology,
the study area and the data we use to predict the coverage of OSM building footprints. In Section 3,
we present and evaluate the results in the case of Haiti and in Section 3.2, we illustrate the applicability
of our approach in the case of Dominica and St. Lucia. In Section 4, we offer a concluding discussion.
2. Materials and Methods
2.1. Study Areas
We demonstrate our methodology in the case of three small island states: Haiti, Dominica and
St. Lucia (Figure 1).
2.1.1. Haiti
Located on the western side of Hispaniola Island, Haiti (27,750 km
2
in size, with a population of
approximately 11.5 million) is the poorest country in the Western Hemisphere, with a Gross Domestic
Product (GDP) per capita of US$ 870 [
53
]. Haiti is highly vulnerable to natural disasters; more than 96%
of its population is exposed to different types of natural hazards, particularly hurricane, coastal and
riverine flood, and earthquake [
53
]. More than half of the population lives in cities and towns, a major
shift from the 1950s when approximately 90% of Haitians lived in the countryside [
54
]. Almost all
of Haiti‘s 30 major watersheds experience significant flood events, due to intense seasonal rainfall,
storm surge in the coastal zones, deforestation and erosion, and sediment-laden river channels [
55
].
Furthermore, large portions of the country’s population (e.g., in the capital, Port-au-Prince) live in
shanty towns built upon steep and exposed hillsides [
56
]. In 2018 alone, some 2.8 million people were
considered to be in need of humanitarian assistance valued at US$ 252.2 million [57].
2.1.2. St. Lucia
A small windward island state located in the Caribbean Sea and the North Atlantic Ocean, St. Lucia
(616 km
2
in size) has a population of approximately 165,000 [
58
] and a GDP per capita of US$ 10,315 [
59
].
St. Lucia is susceptible to numerous natural hazards, including hurricanes, landslides, flooding, and
volcanic eruptions. Its terrain consists mainly of mountains and steep slopes in the center of the country
due to its volcanic origins with low-lying areas along the coasts [
60
]. As of 2018, approximately 19%
of the population resides in these low-lying areas [
61
]. In addition, St. Lucia’s economy is highly
dependent on two sources: the export of bananas and income from tourism. Both have been negatively
impacted recently by natural hazards such as in 2016 when Hurricane Matthew caused 70% of the
island to lose power and damaged 80% of the country’s banana plantations [60].
2.1.3. Dominica
Dominica (approximately 74,000 people [
62
]) is located in Leeward Islands chain in the Lesser
Antilles of the Caribbean Sea, approximately 1200 km southeast of Haiti, with large portions of its
population residing in the capital Roseau (population 14,700) and Portsmouth (population 5200) [
62
].
Dominica is vulnerable to a wide range of natural hazards, including hurricanes, intense rainfall,
slope instability, volcanic eruptions, seismic activities, and tsunamis [
63
]. Reflecting a rugged physical
topography, most of the population and infrastructure are located on the coast, making the country
particularly vulnerable to strong winds and high seas [
64
]. In September 2017, a Category 5 hurricane
Maria hit the country, causing losses and damages worth 226 percent of GDP [65].
Remote Sens. 2020,12, 118 5 of 25
RemoteSens.2020,12,1185of27
countryparticularlyvulnerabletostrongwindsa
Figure1.Locationsandsizecomparisonsofthethreestudyareas:Haiti,Dominica,andSt.Lucia.
2.2.AnalyticalFramework
TheobjectiveofthisstudyistoidentifygapsinthecompletenessofOSMbuildingfootprintsin
threesmallislandstates(Haiti,St.LuciaandDominica)basedonremotelysensedmeasurementsand
othergeospatialfeatures.Theprocedureinvolvessevensteps.
2.2.1.Step1:ConstructanArtificialTessellation
Weconstructanartificialtessellatedgridofcellsthatspaneachofthecountries;eachcellis0.25
squarekminsize(atotalof136,747gridcellsoverHaiti,2,796gridcellsoverSt.Luciaand3,861grid
cellsoverDominica).Eachgridcellwastreatedanindependentunitofanalysis.
2.2.2.Step2:DownloadtheCurrentOSMBuildingFootprints
Wedownloadedthemostup‐to‐dateOSMdataforthethreecountries(datadownloadedinJuly
2019).ForHaiti,wedownloadedthedata(inaShapefileformat)fromGeofabrik
(https://www.geofabrik.de/data/download.html).Atthetimeoftheanalysis,Geofabrikdidnothave
dataforDominicaandSt.Lucia;thus,wedownloadedthedataforthesecountriesfromoverpass
turbo(https://overpass‐turbo.eu/)inaKMLformat(thisdatarequiresadditionalpre‐processingand
weselectedOSMfeaturesthatarelabeledas“building=Yes”).Atthetimeoftheanalysis,therewere
930,000mappedbuildingsinHaiti,38,619mappedbuildingsinDominicaand29,412mapped
buildingsinSt.Lucia.
2.2.3.Step3:CalculateTotalAreaofOSMBuildingFootprintsinaGridCell
WecalculatedthetotalareaofOSMbuildingfootprintsineachgridcell.Thisisthevaluetobe
predictedbytheexplanatoryvariables(theremotelysensedandgeospatialmeasurements).
2.2.4.Step4:PreprocessandAggregatetheRemotelySensedandGeospatialData
Wereliedonseveralpredictors(explanatoryvariables)toestimatethecoverageofOSMbuilding
footprintsinagridcellandtoidentifygapsinOSMcoverage.Wepreprocessedthedataand
aggregatedittothelevelofagridcells(Table1providesadescriptionoftheevaluatedexplanatory
variablesandtheaggregationmeasures).Thepreprocessing,analysisandaggregationofthe
remotelysenseddatawerecompletedbyusingGoogleEarthEngine(GEE).GEEisaplatformthat
leveragescloud‐computingservicestoachieveplanetary‐scaleutilityandhasbeenpreviouslyused
forawiderangeofapplications[66],includingmappingpopulation[67,68]andurbanareas[69,70].
Figure 1. Locations and size comparisons of the three study areas: Haiti, Dominica, and St. Lucia.
2.2. Analytical Framework
The objective of this study is to identify gaps in the completeness of OSM building footprints in
three small island states (Haiti, St. Lucia and Dominica) based on remotely sensed measurements and
other geospatial features. The procedure involves seven steps.
2.2.1. Step 1: Construct an Artificial Tessellation
We construct an artificial tessellated grid of cells that span each of the countries; each cell is
0.25 square km in size (a total of 136,747 grid cells over Haiti, 2796 grid cells over St. Lucia and 3861 grid
cells over Dominica). Each grid cell was treated an independent unit of analysis.
2.2.2. Step 2: Download the Current OSM Building Footprints
We downloaded the most up-to-date OSM data for the three countries (data downloaded
in July 2019). For Haiti, we downloaded the data (in a Shapefile format) from Geofabrik (https:
//www.geofabrik.de/data/download.html). At the time of the analysis, Geofabrik did not have data
for Dominica and St. Lucia; thus, we downloaded the data for these countries from overpass turbo
(https://overpass-turbo.eu/) in a KML format (this data requires additional pre-processing and we
selected OSM features that are labeled as “building=Yes”). At the time of the analysis, there were
930,000 mapped buildings in Haiti, 38,619 mapped buildings in Dominica and 29,412 mapped buildings
in St. Lucia.
2.2.3. Step 3: Calculate Total Area of OSM Building Footprints in a Grid Cell
We calculated the total area of OSM building footprints in each grid cell. This is the value to be
predicted by the explanatory variables (the remotely sensed and geospatial measurements).
2.2.4. Step 4: Preprocess and Aggregate the Remotely Sensed and Geospatial Data
We relied on several predictors (explanatory variables) to estimate the coverage of OSM building
footprints in a grid cell and to identify gaps in OSM coverage. We preprocessed the data and
aggregated it to the level of a grid cells (Table 1provides a description of the evaluated explanatory
variables and the aggregation measures). The preprocessing, analysis and aggregation of the remotely
sensed data were completed by using Google Earth Engine (GEE). GEE is a platform that leverages
cloud-computing services to achieve planetary-scale utility and has been previously used for a wide
range of applications [66], including mapping population [67,68] and urban areas [69,70].
Nighttime Lights (VIIRS): The Visible Infrared Imaging Radiometer Suite (VIIRS) is one of the
key instruments onboard the Suomi National Polar-Orbiting Partnership (Suomi NPP) spacecraft
Remote Sens. 2020,12, 118 6 of 25
(launched in 2011). VIIRS instrument collects visible and infrared imagery and global observations
of land, atmosphere, cryosphere and oceans. This instrument has significant improvements over the
capabilities of the former DMSP-OLS [
71
], notably its availability on a daily basis and higher spatial
resolution (up to 500 m at the equator). The VIIRS DNB provides global coverage with 12-hour revisit
time. First, we record for each pixel the maximum value of all overlapping pixels (in the same location)
in a stack of seven monthly composites (Jan–July) of 2019. Then, for each grid cell, we calculated a
Sum of Light (SOL) measure (calculated as the sum of the digital number values of all overlapping
pixels in each cell).
Sentinel-2-Derived Spectral Indices: The Copernicus Sentinel-2 mission comprises a constellation
of two polar-orbiting satellites that collect multispectral data in 13 spectral bands, with four bands at a
spatial resolution of 10 m and 6 bands at a spatial resolution of 20 m. The revisit period of Sentinel-2 is 5
days at the equator. We calculated four remotely sensed measures sensitive to vegetation and built-up
land cover: Normalized Difference Vegetation Index (NDVI) [
72
], Soil Adjusted Vegetation Index
(SAVI) [
73
], Normalized Difference Built-up Index (NDBI) [
74
] and Urban Index (UI) [
75
]. For each
grid cell, we calculated a per-index sum value of all pixels overlapping with the grid cell.
Sentinel-1 SAR: Sentinel-1 mission comprises a constellation of two polar-orbiting satellites,
performing C-band synthetic aperture radar imaging, enabling them to acquire imagery in day and
night conditions regardless of the weather. Sentinel-1 has a 12-day repeat cycle, with a spatial resolution
down to 5 m. Similarly to [
70
], we captured the texture of the surface by utilizing Sentinel-1’s
C-band (single co-polarization vertical transmit and vertical receive (VV) acquisition mode with an
Interferometric Wide Swath (IW) instrument mode, a 250 km swath at 5 m by 20 m spatial resolution
(single look)). From each scene, we removed speckle noise and performed radiometric calibration and
terrain correction. To create the annual composites, we calculated for each location (pixel) the median
value of all overlapping pixels in an entire stack of all scenes captured in 2019. For each grid cell, we
calculated the average value of all pixels incorporated within the area of the grid cell.
Slope: To capture the topography of the surface, we used the Global SRTM mTPI dataset (available
in GEE in a spatial resolution of 270 m), where a local gradient is calculated for each pixel based on
the global SRTM DEM elevation data (30 m resolution). The mTPI distinguishes ridge from valley
forms and is calculated using elevation data for each location subtracted by the mean elevation within
a neighborhood [76]. For each grid cell, we calculated the average value of all pixels in the grid cell.
Forest Cover: We estimated the extent of forest cover in 2018 based on the Hansen Global Forest
Change v1.6 (2000–2018) [
77
]. First, we defined a pixel as “forest” in the year 2000 if more than 20% of
it was covered in 2000 with forest. We recorded pixels that experienced a major event of forest cover
loss between 2000 and 2018 and estimate the total area of forest cover in 2018 per grid cell.
Urban Footprints: We relied on two remotely sensed derived products signifying urban and
rural settlements that were produced by the Earth Observation Center at DLR: The Global Urban
Footprint (GUF) (in a spatial resolution of ~12m) and the World Settlement Footprint (WSF) (in a
spatial resolution of ~10m) [78–80].
OSM Transportation Network Features: We calculated the total length of OSM roads in a cell and
the total number of junctions in a cell as additional potential predictors of OSM-building footprints.
Remote Sens. 2020,12, 118 7 of 25
Table 1. The predictors used to predict per-cell area of OpenStreetMap (OSM) building footprints.
Predictor Source Number of Scenes Per-Cell Statistics
Nighttime lights VIIRS 7 Sum of Light (SOL): The sum of DNmax
value of all pixels in cell, where DNmaxi
is the maximum digital number (DN)
value of pixel in location iover 7
monthly composites in 2019.
NDVI
(NIR-RED)/(NIR+RED)
Sentinel-2 ~42 The sum NDVI value of all pixels in a
grid cell
SAVI
(NIR-RED)/(NIR+RED+L) * (1+L)
Sentinel-2 ~42 The sum SAVI value of all pixels in a
grid cell
NDBI
(MIR-NIR)/(MIR+NIR)
Sentinel-2 ~42 The sum NDBI value of all pixels in a
grid cell
UI
(SWIR2-NIR)/(SWIR2+NIR)
Sentinel-2 ~42 The sum UI value of all pixels in a
grid cell
deforestation Hansen Global Forest
Change v1.6 (2000-2018)
1 Total forest cover in a grid cell (2018)
Built-up area GUF 1 Total built up area in a grid cell
Built-up area WSF 1 Total built up area in a grid cell
Topography (slope) SRTM 1 Average slope per grid cell
Surface texture Sentinel-1 ~70 Average texture per grid cell
Roads OSM - Total length of roads in a grid cell
Roads junctions OSM - Number of junctions in a grid cell
2.2.5. Step 5: Identify Mapped Grid Cells
We adopted a visual interpretation method to visually assess the completeness of OSM building
footprints in the grid cells in Haiti and St. Lucia. We achieved this by overlaying the OSM building
footprint dataset with the most recent high-resolution base map image (provided by ESRI, updated as
of 2019 [
81
]). We identified grid cells in Haiti and St. Lucia where we assessed that at least 75% of the
buildings that are visible in the satellite image have been mapped (we identified 835 grid cells in Haiti
and 179 grid cells in St. Lucia). Because the majority area of Dominica has been mapped, we skipped
this step in the case of this country.
2.2.6. Step 6: Perform Correlation Analysis and Prediction
We evaluated the correlation between the remotely sensed and the geospatial measures (the
explanatory variables) and the area of OSM building footprint in a grid cell using a Pearson Correlation
Test, and performed an Ordinary Least Squares (OLS) regression to estimate the potential of the
variables, combined, to explain the observed variation in the area of OSM building footprints in a grid
cell. Additionally, we evaluated the potential of the explanatory variables to predict the area of OSM
building footprints in a grid cell using a regression with Random Forests. Random Forests [
82
] are
tree-based models that include kdecision trees and prandomly chosen predictors for each recursion.
When predicting, for an example, its variables are run through each of the ktrees, and the kpredictions
are averaged through an arithmetic mean. Each tree is trained using a subset of examples from the
training set, drawn randomly with replacement, with each node’s binary question determined using a
random subset of pinput variables. We performed the regression with the 835 grid cells that were
visually assessed as being relatively fully mapped (i.e., more than 75% of the buildings in a grid cell are
assessed as mapped). To evaluate the accuracy of the prediction, we adopted a fivefold cross-validation
method. In each experiment, the examples in one of the data folds were left out for testing and the
examples in the remaining four folds were used to train the model. The performance quality of the
trained model was tested on the examples in the left-out fold, and the overall performance measure is
then averaged over the five folds. We assessed the classification accuracy with a different number of
decision trees: 2, 4, 8, 16, 32, 64, 128, 256 and 512, with minimum size of terminal nodes set to 5.
2.2.7. Step 7: Predict the Coverage of OSM-Building Footprints in Each Entire Country
We used either the grid cells that are visually assessed as relatively fully mapped (in the case
of Haiti and St. Lucia) or all the grid cells (in the case of Dominica) as references for the training of
Remote Sens. 2020,12, 118 8 of 25
Random Forest Regression and to predict the area of OSM building footprints over the entire grid
cells in each country. We identified the grid cells that were predicted to incorporate OSM building
footprints, but were not yet mapped.
3. Results
An examination of the 136,747 cells spanning Haiti shows that only 25.1% of the cells have at least
one mapped building, and only 512 of the 136,747 cells have more than 10% of their area covered with
building footprints (Figure 2shows a histogram of the distribution of OSM building footprints per
cell). On average, there are 27.5 buildings in a cell (Std =83.4); 1530 of the cells (i.e., only 1.1% of the
cells spanning Haiti) incorporate more than 100 mapped buildings. In comparison, 8.15% and 6.84% of
the cells incorporate built-up land cover according to WSF and GUF, respectively.
RemoteSens.2020,12,1188of27
adoptedafivefoldcross‐validationmethod.Ineachexperiment,theexamplesinoneofthedatafolds
wereleftoutfortestingandtheexamplesintheremainingfourfoldswereusedtotrainthemodel.
Theperformancequalityofthetrainedmodelwastestedontheexamplesintheleft‐outfold,andthe
overallperformancemeasureisthenaveragedoverthefivefolds.Weassessedtheclassification
accuracywithadifferentnumberofdecisiontrees:2,4,8,16,32,64,128,256and512,withminimum
sizeofterminalnodessetto5.
2.2.7.Step7:PredicttheCoverageofOSM‐BuildingFootprintsinEachEntireCountry
Weusedeitherthegridcellsthatarevisuallyassessedasrelativelyfullymapped(inthecaseof
HaitiandSt.Lucia)orallthegridcells(inthecaseofDominica)asreferencesforthetrainingof
RandomForestRegressionandtopredicttheareaofOSMbuildingfootprintsovertheentiregrid
cellsineachcountry.WeidentifiedthegridcellsthatwerepredictedtoincorporateOSMbuilding
footprints,butwerenotyetmapped.
3.Results
Anexaminationofthe136,747cellsspanningHaitishowsthatonly25.1%ofthecellshaveat
leastonemappedbuilding,andonly512ofthe136,747cellshavemorethan10%oftheirareacovered
withbuildingfootprints(Figure2showsahistogramofthedistributionofOSMbuildingfootprints
percell).Onaverage,thereare27.5buildingsinacell(Std=83.4);1530ofthecells(i.e.,only1.1%of
thecellsspanningHaiti)incorporatemorethan100mappedbuildings.Incomparison,8.15%and
6.84%ofthecellsincorporatebuilt‐uplandcoveraccordingtoWSFandGUF,respectively.
Figure2.Thedistribution(histogram)ofOSMbuildingfootprintsarea(squaremeters)pergridcell.
Asdiscussedabove,avisualexaminationofthecompletenessofOSMbuildingfootprintsover
Haitisuggeststhatlargeportionsoftheislandremainunmapped(Figure3a).Figure3b,cshow,as
anillustration,thecoverageofOSMbuildingfootprintsinthecapitalofHaiti,Port‐au‐Princeand
Carrefour,andintheadjacentCarrefourcommune.Whilebuildingsinmanyareaswithinthesecities
havebeenmapped,largeportionsarestillnotfullymapped.Weobservethatdenselymappedzones
ofPort‐au‐Princeco‐existalongsidezonesthatremainentirelyunmapped(Figure3c),avisualpattern
thatmayresultfromtheepisodicengagementofcommunitymappingvolunteersandthedefinition
ofmapping‘tasks’onaneighborhoodscalethroughOSMeditingtools.Moreover,significantparts
innorthernHaitiarenotmapped(Figure4),including,forexample,thecitiesGonaïvesandCap‐
Haitien.
0
2000
4000
6000
8000
10000
12000
200
800
1400
2000
2600
3200
3800
4400
5000
5600
6200
6800
7400
8000
8600
9200
9800
10400
11000
11600
12200
12800
13400
14000
14600
15200
15800
16400
17000
17600
18200
18800
19400
Numberofcells
OSMareapercell(Sqm)
Figure 2. The distribution (histogram) of OSM building footprints area (square meters) per grid cell.
As discussed above, a visual examination of the completeness of OSM building footprints over
Haiti suggests that large portions of the island remain unmapped (Figure 3a). Figure 3b,c show, as
an illustration, the coverage of OSM building footprints in the capital of Haiti, Port-au-Prince and
Carrefour, and in the adjacent Carrefour commune. While buildings in many areas within these cities
have been mapped, large portions are still not fully mapped. We observe that densely mapped zones
of Port-au-Prince co-exist alongside zones that remain entirely unmapped (Figure 3c), a visual pattern
that may result from the episodic engagement of community mapping volunteers and the definition of
mapping ‘tasks’ on a neighborhood scale through OSM editing tools. Moreover, significant parts in
northern Haiti are not mapped (Figure 4), including, for example, the cities Gonaïves and Cap-Haitien.
Remote Sens. 2020,12, 118 9 of 25
RemoteSens.2020,12,1189of27
Figure3.(a)OSMbuildingfootprintscoverageinHaiti,(b)inthecapitalofHaiti,Port‐au‐Prince,and
(c)intheadjacentCarrefourcommune.YellowindicatesOSMbuildingfootprints.
(a) (b)
Figure4.OSMbuildingfootprintscoveragein(a)thecityofCap‐Haitienand(b)Gonaïvesinnorthern
Haiti.
APearsoncorrelationtestindicatedasignificant(p<0.01)correlationbetweenthetotalareaof
OSMbuildingfootprintsinagridcellandseveraloftheexaminedexplanatoryvariables.As
expected,therewasapositiveandsignificantcorrelationbetweentheareaofOSMbuilding
footprintsinagridcellandthetotalareaofbuilt‐uplandcover,accordingtoWSFandGUF(r=0.73
and0.71,respectively,p<0.01)aswellaswithnighttimelights(VIIRSSOL)(r=0.63,p<0.01).Wefind
asignificant(p<0.01)correlationbetweenOSMbuildingfootprintsareainagridcellwiththefour
Sentinel‐2spectralindices,indicatedbyapositivecorrelationwithUIandNDBI(r=0.59andr=0.47)
andanegativecorrelationwithbothSAVIandNDVI(r=–0.53).
Weidentified835gridcellswhere,accordingtoavisualassessment,atleast75%ofthebuildings
thatwerevisibleinthesatelliteimagearemappedinOSM(Figure5showsexamplesofgridcells
wheremorethan75%ofthestructuresaremapped).ThecorrelationbetweentheareaofOSM
buildingfootprintsinagridcellandtheexaminedpredictorswashighercomparedtotheprevious
experiment,whereallthegridcells(i.e.,136,747gridcells)wereconsidered(forexample,r=0.78and
r=0.65withWSFandVIIRSandr=0.61andr=–0.55withUIandSAVI,respectively)(Table2),which
islikelyduetothefactthatlargeportionsofthecountryarenotmapped(i.e.,therearegridcellsthat
lackOSMcoveragewhileactuallypopulatedandexhibitLULCcharacteristicsofpopulatedareas).
Asexpected,therewerealsosimilaritiesandcorrelationsbetweensomeoftheexplanatoryvariables.
Figure 3.
(
a
) OSM building footprints coverage in Haiti, (
b
) in the capital of Haiti, Port-au-Prince, and
(c) in the adjacent Carrefour commune. Yellow indicates OSM building footprints.
RemoteSens.2020,12,1189of27
Figure3.(a)OSMbuildingfootprintscoverageinHaiti,(b)inthecapitalofHaiti,Port‐au‐Prince,and
(c)intheadjacentCarrefourcommune.YellowindicatesOSMbuildingfootprints.
(a) (b)
Figure4.OSMbuildingfootprintscoveragein(a)thecityofCap‐Haitienand(b)Gonaïvesinnorthern
Haiti.
APearsoncorrelationtestindicatedasignificant(p<0.01)correlationbetweenthetotalareaof
OSMbuildingfootprintsinagridcellandseveraloftheexaminedexplanatoryvariables.As
expected,therewasapositiveandsignificantcorrelationbetweentheareaofOSMbuilding
footprintsinagridcellandthetotalareaofbuilt‐uplandcover,accordingtoWSFandGUF(r=0.73
and0.71,respectively,p<0.01)aswellaswithnighttimelights(VIIRSSOL)(r=0.63,p<0.01).Wefind
asignificant(p<0.01)correlationbetweenOSMbuildingfootprintsareainagridcellwiththefour
Sentinel‐2spectralindices,indicatedbyapositivecorrelationwithUIandNDBI(r=0.59andr=0.47)
andanegativecorrelationwithbothSAVIandNDVI(r=–0.53).
Weidentified835gridcellswhere,accordingtoavisualassessment,atleast75%ofthebuildings
thatwerevisibleinthesatelliteimagearemappedinOSM(Figure5showsexamplesofgridcells
wheremorethan75%ofthestructuresaremapped).ThecorrelationbetweentheareaofOSM
buildingfootprintsinagridcellandtheexaminedpredictorswashighercomparedtotheprevious
experiment,whereallthegridcells(i.e.,136,747gridcells)wereconsidered(forexample,r=0.78and
r=0.65withWSFandVIIRSandr=0.61andr=–0.55withUIandSAVI,respectively)(Table2),which
islikelyduetothefactthatlargeportionsofthecountryarenotmapped(i.e.,therearegridcellsthat
lackOSMcoveragewhileactuallypopulatedandexhibitLULCcharacteristicsofpopulatedareas).
Asexpected,therewerealsosimilaritiesandcorrelationsbetweensomeoftheexplanatoryvariables.
Figure 4.
OSM building footprints coverage in (
a
) the city of Cap-Haitien and (
b
) Gonaïves in
northern Haiti.
A Pearson correlation test indicated a significant (p<0.01) correlation between the total area of
OSM building footprints in a grid cell and several of the examined explanatory variables. As expected,
there was a positive and significant correlation between the area of OSM building footprints in a grid
cell and the total area of built-up land cover, according to WSF and GUF (r =0.73 and 0.71, respectively,
p<0.01) as well as with nighttime lights (VIIRS SOL) (r =0.63, p<0.01). We find a significant (
p<0.01
)
correlation between OSM building footprints area in a grid cell with the four Sentinel-2 spectral indices,
indicated by a positive correlation with UI and NDBI (r =0.59 and r =0.47) and a negative correlation
with both SAVI and NDVI (r =−0.53).
We identified 835 grid cells where, according to a visual assessment, at least 75% of the buildings
that were visible in the satellite image are mapped in OSM (Figure 5shows examples of grid cells
where more than 75% of the structures are mapped). The correlation between the area of OSM building
footprints in a grid cell and the examined predictors was higher compared to the previous experiment,
where all the grid cells (i.e., 136,747 grid cells) were considered (for example, r =0.78 and r =0.65
with WSF and VIIRS and r =0.61 and r =
−
0.55 with UI and SAVI, respectively) (Table 2), which is
likely due to the fact that large portions of the country are not mapped (i.e., there are grid cells that
Remote Sens. 2020,12, 118 10 of 25
lack OSM coverage while actually populated and exhibit LULC characteristics of populated areas).
As expected, there were also similarities and correlations between some of the explanatory variables.
Figure 6a presents pairwise correlation coefficients between the explanatory variable (variables are
ordered according to a hierarchical clustering). The explanatory variables form several similarity
clusters: a cluster composed out of vegetation spectral indices (NDVI, SAVI) and forest cover (which
are positively correlated with each other), and a cluster composed out of built-up land cover spectral
indices (NDBI, UI), together with VIIRS, WSF, GUF, and OSM road network features. As expected,
there is a negative and significant correlation between the vegetation and the built-up land cover
spectral indices. The dendrogram shown in the figure further highlights hierarchical clusters formed
between the variables, notably, OSM area and VIIRS, UI and NDBI, NDVI and SAVI, and road length
and number of junctions in a grid cell.
RemoteSens.2020,12,11810of27
Figure6apresentspairwisecorrelationcoefficientsbetweentheexplanatoryvariable(variablesare
orderedaccordingtoahierarchicalclustering).Theexplanatoryvariablesformseveralsimilarity
clusters:aclustercomposedoutofvegetationspectralindices(NDVI,SAVI)andforestcover(which
arepositivelycorrelatedwitheachother),andaclustercomposedoutofbuilt‐uplandcoverspectral
indices(NDBI,UI),togetherwithVIIRS,WSF,GUF,andOSMroadnetworkfeatures.Asexpected,
thereisanegativeandsignificantcorrelationbetweenthevegetationandthebuilt‐uplandcover
spectralindices.Thedendrogramshowninthefigurefurtherhighlightshierarchicalclustersformed
betweenthevariables,notably,OSMareaandVIIRS,UIandNDBI,NDVIandSAVI,androadlength
andnumberofjunctionsinagridcell.
Figure5.Examplesofgridcellsthathavemorethan75%oftheirareamappedwithOSMbuilding
footprints.
Table2.PearsoncorrelationtestbetweentheareaofOSMbuildingfootprintsinagridcellandthe
evaluatedpredictors(thiscorrelationtestincludesonlygridcellsthatwereassessedasmapped,
N=835).
VIIRSGUFWSFNDVINDBISAVI
r0.654*0.76*0.78*–0.551*0.486*–0.551*
UIForestCoverSE1Slope RoadlengthOSMjunctions
r0.614*–0.388*0.16–0.110.69*0.60*
Note:*p<0.01
(a)
Figure 5.
Examples of grid cells that have more than 75% of their area mapped with OSM
building footprints.
Table 2.
Pearson correlation test between the area of OSM building footprints in a grid cell and the
evaluated predictors (this correlation test includes only grid cells that were assessed as mapped, N =835).
VIIRS GUF WSF NDVI NDBI SAVI
r 0.654 * 0.76 * 0.78 * −0.551 * 0.486 * −0.551 *
UI Forest Cover SE1 Slope Road length OSM junctions
r 0.614 * −0.388 * 0.16 −0.11 0.69 * 0.60 *
Note: * p<0.01.
RemoteSens.2020,12,11810of27
Figure6apresentspairwisecorrelationcoefficientsbetweentheexplanatoryvariable(variablesare
orderedaccordingtoahierarchicalclustering).Theexplanatoryvariablesformseveralsimilarity
clusters:aclustercomposedoutofvegetationspectralindices(NDVI,SAVI)andforestcover(which
arepositivelycorrelatedwitheachother),andaclustercomposedoutofbuilt‐uplandcoverspectral
indices(NDBI,UI),togetherwithVIIRS,WSF,GUF,andOSMroadnetworkfeatures.Asexpected,
thereisanegativeandsignificantcorrelationbetweenthevegetationandthebuilt‐uplandcover
spectralindices.Thedendrogramshowninthefigurefurtherhighlightshierarchicalclustersformed
betweenthevariables,notably,OSMareaandVIIRS,UIandNDBI,NDVIandSAVI,androadlength
andnumberofjunctionsinagridcell.
Figure5.Examplesofgridcellsthathavemorethan75%oftheirareamappedwithOSMbuilding
footprints.
Table2.PearsoncorrelationtestbetweentheareaofOSMbuildingfootprintsinagridcellandthe
evaluatedpredictors(thiscorrelationtestincludesonlygridcellsthatwereassessedasmapped,
N=835).
VIIRSGUFWSFNDVINDBISAVI
r0.654*0.76*0.78*–0.551*0.486*–0.551*
UIForestCoverSE1Slope RoadlengthOSMjunctions
r0.614*–0.388*0.16–0.110.69*0.60*
Note:*p<0.01
(a)
Figure 6. Cont.
Remote Sens. 2020,12, 118 11 of 25
RemoteSens.2020,12,11811of27
(b)
(c)
Figure6.Pairwisecorrelationcoefficientsbetweentheexplanatoryvariablesin(a)Haiti,(b)St.Lucia
(calculatedwithinvisuallyassessedgridcells)and(c)Dominica(calculatedwithinallgridcells).
Variablesareorderedaccordingtohierarchicalclustering,whichisalsorepresentedbythe
dendrogram.ThebluelineinthelegendisahistogramofthedistributionofthePearsoncorrelation
coefficients.
AnOrdinaryLeastSquares(OLS)regressionshowsthatnineofthevariablestogetherexplain
upto82%ofthevariationofOSMbuildingfootprintsareainagridcell(R2=0.82,F(12,822)=323.20,
p<0.01)(Table3).Weevaluatedthecontributionoffourtypes(groups)offeaturestothemodelfit
usingastepwiseregressionanalysis:(1)onlyGUFandWSF;(2)withtheadditionofnighttimelights
(VIIRS);(3)withtheadditionoffurtherremotelysensedmeasuresandderivedproducts;(4)withthe
additionofOSMroadnetworkfeatures.Theresultsshowanimprovementofthemodelfitwiththe
additionofeachofthepredictivevariablesgroups(Table4).WhileGUFandWSFtogetherexplain
66%ofthefit,theadditionofnighttimelightsimprovesthefitofthemodel(indicatedbyexplanation
ofupto76%ofthevariation).Theadditionoffurtherremotelysensedmeasures(i.e.,Sentinel‐2‐
derivedspectralindices,slope,textureandforestcover)improvesthemodelfitbyafurther5%(up