ArticlePDF Available

Assessing OpenStreetMap Completeness for Management of Natural Disaster by Means of Remote Sensing: A Case Study of Three Small Island States (Haiti, Dominica and St. Lucia)

Authors:

Abstract and Figures

Over the last few decades, many countries, especially islands in the Caribbean, have been challenged by the devastating consequences of natural disasters, which pose a significant threat to human health and safety. Timely information related to the distribution of vulnerable population and critical infrastructure is key for effective disaster relief. OpenStreetMap (OSM) has repeatedly been shown to be highly suitable for disaster mapping and management. However, large portions of the world, including countries exposed to natural disasters, remain incompletely mapped. In this study, we propose a methodology that relies on remotely sensed measurements (e.g., Visible Infrared Imaging Radiometer Suite (VIIRS), Sentinel-2 and Sentinel-1) and derived classification schemes (e.g., forest and built-up land cover) to predict the completeness of OSM building footprints in three small island states (Haiti, Dominica and St. Lucia). We find that the combinatorial effects of these predictors explain up to 94% of the variation of the completeness of OSM building footprints. Our study extends the existing literature by demonstrating how remotely sensed measurements could be leveraged to evaluate the completeness of the OSM database, especially in countries with high risk of natural disasters. Identifying areas that lack coverage of OSM features could help prioritize mapping efforts, especially in areas vulnerable to natural hazards and where current data gaps pose an obstacle to timely and evidence-based disaster risk management.
Content may be subject to copyright.
RemoteSens.2020,12,118;doi:10.3390/rs12010118www.mdpi.com/journal/remotesensing
Article
AssessingOpenStreetMapCompletenessfor
ManagementofNaturalDisasterbyMeansof
RemoteSensing:ACaseStudyofThreeSmallIsland
States(Haiti,DominicaandSt.Lucia)
RanGoldblatt1,*,NicholasJones2andJennyMannix3
1NewLightTechnologiesInc.,Washington,DC20005,USA
2WorldBankGroup,Washington,DC20433,USA;njones@worldbankgroup.org
3NewLightTechnologiesInc.,Washington,DC20005,USA;jennifer.mannix@nltgis.com
*Correspondence:ran.goldblatt@nltgis.com;Tel.:+1202630–0362
Received:26November2019;Accepted:25December2019;Published:1January2020
Abstract:Overthelastfewdecades,manycountries,especiallyislandsintheCaribbean,havebeen
challengedbythedevastatingconsequencesofnaturaldisasters,whichposeasignificantthreatto
humanhealthandsafety.Timelyinformationrelatedtothedistributionofvulnerablepopulation
andcriticalinfrastructureiskeyforeffectivedisasterrelief.OpenStreetMap(OSM)hasrepeatedly
beenshowntobehighlysuitablefordisastermappingandmanagement.However,largeportions
oftheworld,includingcountriesexposedtonaturaldisasters,remainincompletelymapped.Inthis
study,weproposeamethodologythatreliesonremotelysensedmeasurements(e.g.,Visible
InfraredImagingRadiometerSuite(VIIRS),Sentinel2andSentinel1)andderivedclassification
schemes(e.g.,forestandbuiltuplandcover)topredictthecompletenessofOSMbuildingfootprints
inthreesmallislandstates(Haiti,DominicaandSt.Lucia).Wefindthatthecombinatorialeffectsof
thesepredictorsexplainupto94%ofthevariationofthecompletenessofOSMbuildingfootprints.
Ourstudyextendstheexistingliteraturebydemonstratinghowremotelysensedmeasurements
couldbeleveragedtoevaluatethecompletenessoftheOSMdatabase,especiallyincountrieswith
highriskofnaturaldisasters.IdentifyingareasthatlackcoverageofOSMfeaturescouldhelp
prioritizemappingefforts,especiallyinareasvulnerabletonaturalhazardsandwherecurrentdata
gapsposeanobstacletotimelyandevidencebaseddisasterriskmanagement.
Keywords:OpenStreetMap;OSM;OpenStreetMapcoverage;disastermanagement;remotesensing
1.Introduction
Overthelastfewdecades,manycountrieshavebeenchallengedbythedevastating
consequencesofnaturaldisasterswhichposeasignificantthreattohumanhealthandsafetyand
impactvulnerablecommunitiesandcriticalinfrastructureglobally.Everyyear,naturaldisasters
impactcloseto160millionpeopleworldwide[1],causingdestructionofthephysical,biologicaland
socialenvironments,impactingfoodsecurity,andcausinggloballossesthatamounttoover100
billiondollars[2].Thefrequencyofnaturaldisastershasbeensteadilyincreasingsince1940[3]and
overthenextcentury,climatechangewilllikelyamplifythenumberandseverityofsuchdisasters
[4].
Whiletheimpactsofnaturaldisastersareworldwide,somecountrieshavebeenmorevulnerable
todifferenttypesofdisastersthanothers[5].Forexample,in2017,PuertoRico,SriLankaand
Dominicawereatthetopofthelistofthemostaffectedcountriestonaturaldisasterssuchas
significantprecipitation,floodsandlandslides.Caribbeanislandcountriesareespeciallyexposedto
RemoteSens.2020,12,1182of27
awiderangeofnaturaldisasters[6]andsmallislanddevelopingstates—whicharefrequently
characterizedbycoastalcommunities,geographicisolation,andlimitedtechnicalcapacity—are
amongthemostvulnerablecountriestonaturaldisastersandclimatechange[7].
Recognizingthesetrends,thereisanincreasingneedforefficientandwellplanneddisaster
managementanddisasterreliefoperations.Thetermdisasterriskmanagementreferstothefull
lifecycleofactionsaimingtoprevent,preparefor,respondto,andrecoverfromdisasters.Generally,
disasterriskmanagementconsistsoffourmainphases:(1)Mitigation,i.e.,activitiesthatreducethe
likelihoodandexpectedadverseimpactsofanaturaldisasterevent;(2)Preparedness,i.e.,plansor
preparationstostrengthenemergencyresponsecapabilities;(3)Response,i.e.,actionstakentosave
livesandpreventpropertydamageinanemergencysituation;and(4)Recovery,i.e.,interventions
aimedatreturningcommunitiesandinfrastructuretoaproperleveloffunctionalityfollowinga
disaster.
Timelygeospatialinformationindicatingthedistributionofvulnerablepopulationandthe
location,availabilityandfunctionalityofcriticalinfrastructureiskeyforeffectivedisasterrelief
operations.Untilrecently,governmentalagenciesandthecommercialsectorweretheprimary
sourcesforgeospatialdatafordisastermanagement.Inthepastdecade,however,thepublichasbeen
increasinglyrecognizedasavaluablesourceforgeospatialinformationfordisastermanagement[8].
Recentdevelopmentsinwebmappingtechnologieshaveledtodisastermanagementoperationsthat
aremoredynamic,transparent,anddecentralized,withanincreasedcontributionbyindividualsand
organizationsfrombothinsideandoutsidetheimpactedarea[9],includingbymeansofgeospatial
informationthatiscontributedbyvolunteers.ThetermVolunteeredGeographicInformation(VGI)
referstogeographicinformationcollectedbyindividuals,oftenonavoluntarybasis[8].Thisdatais
madeopen,freelyaccessible[10]andfillsthedeficienciesoftraditionalmappingtechnologiesand
sourcesofdata[11,12].Governmentsindevelopingcountriesincreasinglyrecognizetheeconomic
andsocialvalueofVGIsandtheirpotentialtoprovidenewwaystointeractwiththecommunityand
forstrengtheningcivilsociety[13].
1.1.OpenStreetMap(OSM)forDisasterManagement
Createdin2004,OpenStreetMap(OSM)isacollaborativeusergeneratedmappingproject
aimingtoprovideafreelyavailablegeographicalinformationdatabaseoftheworld[14].Duringthe
firstyearoftheproject,mostmappingeffortsfocusedonroadandtransportationnetworks.Today,
however,avarietyofgeographicalfeaturesareconstantlyaddedtoOSM’sdatabase,including
buildingsandtheirfunctionality,landuseandpublictransportationinformation[15].Thisdata
allowslocalgovernmentsandcommunitiestobetterperformriskassessmentandemergency
planning[16–18]andisroutinelyutilizedforvariousdisasterriskmanagementapplications[19,20].
Asoftoday,therearemorethan5.5millionOSMusersandonemillioncontributorswhogenerate
morethan3millionchangeseveryday.Inthecontextofnaturaldisasters,thecoordinationof
volunteers’mappingeffortsisoperatedbytheHumanitarianOpenStreetMapTeam(HOT),
originallyformedaftertheHaitiearthquake,whichconductsactivitiesaimedatenrichingOSMdata
tosupportemergencyreliefoperations(https://wiki.openstreetmap.org/wiki/Stats).Often,when
disastersoccur,thereisalackofthisessentialinformation,whichresultsinmappingcampaigns,
includingMapathons[21],whicharedesignedtomaptheimpactedareas.
OSMdataiscollectedbythreemainmeans[22]:(1)usingGPSrecords,whichcanbeuploaded
tothedatabase;(2)relyingonorthophotosandhighresolutionsatelliteimagerytotraceanddigitize
features;or(3)importingdatasetsfromexternalsourcessuchasadministrativecensusdata.Recently,
largecorporations,includingApple,Microsoft,andFacebookhavebeenhiringeditorstocontribute
totheOSMdatabase[23].SeveraltoolshavealsobeendevelopedtosupportOSM’smappingefforts,
oneofthemistheMapSwipeapp(https://mapswipe.org/),whichenablesvolunteerstomapandtag
geographicalfeaturesonmobilephonesbasedonsatelliteimagery[24].Otherinitiatives,suchas
MissingMaps(https://www.missingmaps.org/)allowvolunteerstracefeaturesbasedonsatellite
imagesbysplittingmappingintosmalltasks,allowingremotevolunteerstoworksimultaneouslyon
RemoteSens.2020,12,1183of27
thesameoverallarea(asof2018,therewerenearly60,000mapperscontributingtoMissingMaps)
[25].
1.2.AssessingOSMCompletenessandAccuracy
AlthoughOSMroadnetworkdataisestimatedtoexceed80%completenessinrelationtothe
world’sroadsandstreets[26],ingeneral,thecoverageandcompletenessofOSMfeatures(including
buildingfootprints)varysignificantly—notonlybetweencountries,butalsowithincountries.For
example,completenessofcoverageofremoteandruralareasisoftenlowerthanthatofhighly
populatedurbanareas[27],andthecoverageofdevelopedcountriestendstobelowerthanthatof
developingcountries[28–32].Thesedifferencesareinpartduetosocietalfactors,suchaspopulation
distributionandpopulationdensity,distancetomajorcitiesandthelocationofcontributingusers
[29,33–37].
WiththeincreasedutilizationofVGIs—includingOSM—fordisasterpreparednessand
response,variousmethodologieshavebeenproposedtoassessthequalityandtheaccuracyofthe
collecteddata[12];forexample,intermsofdatacompleteness,logicalconsistency,positional,
thematic,semanticandspatialaccuracy,temporalqualityandusability[28,38–42].Several
approacheshavebeenproposedtoassessthecompletenessoftheOSMdatabaseandthe
completenessofthestreetnetworks[32,43],thelanduseandthebuildingfootprints[44].The
completenessofthecoveragecanbeassessedbycomparingtheOSMmappedfeatureswithexternal
datasets,forexample,nationaladministrativedata[28,44–47].Suchdatavariesbycountryandisnot
alwaysmadeavailable—especiallyindevelopingcountries.
Inthisstudy,weproposeamethodologythatutilizesremotelysensedobservationstoestimate
thecoverageofOSMmappedfeatures,specificallytoidentifygapsinthecompletenessofOSM
buildingfootprints.Inthepast,expensivesatelliteimageryandlimitedcomputationalpoweronly
allowedanalysisofsmallgeographicalcontexts.Thismodelisbeingreplacedthankstothe
accessibilityofpubliclyavailableandfreesatellitedatathatcaptureeverylocationoneartheveryfew
days.Theavailabilityofdaytime(e.g.,Sentinel2,Landsat)andnighttime(e.g.,DMSPOperational
LinescanSystem(OLS)ortheVisibleInfraredImagingRadiometerSuite(VIIRS))satelliteimagery,
togetherwithadvancementsinthecapabilitiesofcloudbasedcomputationalplatforms,nowallows
foranalyzingLandUseandLandCover(LULC)characteristicsofEarthacrossagreatergeographic
andtemporalscale.LandcoverreferstotheattributesoftheEarthlandsurfaceanditsimmediate
subsurface(e.g.,biota,soil,typography,surface,groundwaterandhumanstructure).Landuserefers
tothepurposeforwhichhumansexploitthelandcover[48].Becauseremotelysensedobservations
typicallycapturetheuniquereflectancecharacteristicsofphysicalobjectsonEarth,mostremote
sensingapplicationsfocusondetectionandclassificationofEarth`slandcovercharacteristics.
DifferentiationbetweendifferenttypesofLandUse(whichtypicallydonotholduniquephysical
characteristics)remainschallenging.InrespecttoOSM,mappedfeaturescanbetaggedaccordingto
both,theirlanduseandlandcover.AlthoughOSMcontributorsarefreetousetheirowntags,there
isaquasiofficialcollectionoftagsthathasbeenestablishedandagreedupon(forexample,“landuse”
and“landcover”keysorothermorespecifickeyssuchas“building”or“highway”)[49].Previous
studiesdemonstratedthepotentialuseofthesetagstocreatedetailedLULCmaps[50].
Themethodologyweproposeinthisstudyreliesonremotelysensedmeasurementstoestimate
thecoverageofOSMbuildingfootprintsandtoidentify“mappinggaps(i.e.,areasthathavenotyet
beenmapped).PreviousstudieshaveutilizedOSMdatafordifferentremotesensingapplications,
forexample,forclassificationofurbanareas[51]orforsemanticlabelingofaerialandsatelliteimages
[52].Despitesignificantprogressinthefieldofmachinelearningandtheincreasingavailabilityof
satelliteimagery,thereisstillascarcityofstudiesaimingtoutilizeremotelysensedobservationsto
estimatethecompletenessofOSMbuildingfootprintsatagivenpointintime.Identifyingareasthat
lackcoverageofOSMfeaturescouldhelpplanandprioritizemappingefforts,especiallyinareasthat
arevulnerabletonaturalhazardsandwherecurrentdatagapsposeanobstacletotimelyand
evidencebaseddisasterriskmanagementactions.
RemoteSens.2020,12,1184of27
Byitsnature,theOSMdatabaseisdynamicandisupdateddailywiththousandsofnewentries.
However,asdiscussedabove,thefrequencyandextentofupdatesvarylargelybygeographical
areas.Someregionsarebeingupdatedmorefrequentlythanothers,andespeciallydeveloping
countriesarenotfullymapped,whichareoftenthemostvulnerabletotheimpactsofnatural
disasters.Theobjectiveofthisstudyistoproposeamethodologytoestimatethecompletenessof
OSMbuildingfootprintsbasedonremotelysensedmeasurementsthatareavailableataglobalscale
andareupdatedfrequently.Wedemonstrateourmethodologyinthecasestudyofthreesmallisland
states:Haiti,DominicaandSt.Lucia.
Theremainderofthisarticleisorganizedasfollows.InSection2,wediscussthemethodology,
thestudyareaandthedataweusetopredictthecoverageofOSMbuildingfootprints.InSection3,
wepresentandevaluatetheresultsinthecaseofHaitiandinSection3.3,weillustratethe
applicabilityofourapproachinthecaseofDominicaandSt.Lucia.InSection4,weofferaconcluding
discussion.
2.MaterialsandMethods
2.1.StudyAreas
Wedemonstrateourmethodologyinthecaseofthreesmallislandstates:Haiti,Dominicaand
St.Lucia(Figure1).
2.1.1.Haiti
LocatedonthewesternsideofHispaniolaIsland,Haiti(27,750km2insize,withapopulationof
approximately11.5million)isthepoorestcountryintheWesternHemisphere,withaGrossDomestic
Product(GDP)percapitaofUS$870[53].Haitiishighlyvulnerabletonaturaldisasters;morethan
96%ofitspopulationisexposedtodifferenttypesofnaturalhazards,particularlyhurricane,coastal
andriverineflood,andearthquake[53].Morethanhalfofthepopulationlivesincitiesandtowns,a
majorshiftfromthe1950swhenapproximately90%ofHaitianslivedinthecountryside[54].Almost
allofHaiti‘s30majorwatershedsexperiencesignificantfloodevents,duetointenseseasonalrainfall,
stormsurgeinthecoastalzones,deforestationanderosion,andsedimentladenriverchannels[55].
Furthermore,largeportionsofthecountry`spopulation(e.g.,inthecapital,PortauPrince)livein
shantytownsbuiltuponsteepandexposedhillsides[56].In2018alone,some2.8millionpeoplewere
consideredtobeinneedofhumanitarianassistancevaluedatUS$252.2million[57].
2.1.2.St.Lucia
AsmallwindwardislandstatelocatedintheCaribbeanSeaandtheNorthAtlanticOcean,St.
Lucia(616km2insize)hasapopulationofapproximately165,000[58]andaGDPpercapitaofUS$
10,315[59].St.Luciaissusceptibletonumerousnaturalhazards,includinghurricanes,landslides,
flooding,andvolcaniceruptions.Itsterrainconsistsmainlyofmountainsandsteepslopesinthe
centerofthecountryduetoitsvolcanicoriginswithlowlyingareasalongthecoasts[60].Asof2018,
approximately19%ofthepopulationresidesintheselowlyingareas[61].Inaddition,St.Lucia’s
economyishighlydependentontwosources:theexportofbananasandincomefromtourism.Both
havebeennegativelyimpactedrecentlybynaturalhazardssuchasin2016whenHurricaneMatthew
caused70%oftheislandtolosepoweranddamaged80%ofthecountry’sbananaplantations[60].
2.1.3.Dominica
Dominica(approximately74,000people[62])islocatedinLeewardIslandschainintheLesser
AntillesoftheCaribbeanSea,approximately1,200kmsoutheastofHaiti,withlargeportionsofits
populationresidinginthecapitalRoseau(population14,700)andPortsmouth(population5,200)[62].
Dominicaisvulnerabletoawiderangeofnaturalhazards,includinghurricanes,intenserainfall,
slopeinstability,volcaniceruptions,seismicactivities,andtsunamis[63].Reflectingarugged
physicaltopography,mostofthepopulationandinfrastructurearelocatedonthecoast,makingthe
RemoteSens.2020,12,1185of27
countryparticularlyvulnerabletostrongwindsandhighseas[64].InSeptember2017,aCategory5
hurricaneMariahitthecountry,causinglossesanddamagesworth226percentofGDP[65].
Figure1.Locationsandsizecomparisonsofthethreestudyareas:Haiti,Dominica,andSt.Lucia.
2.2.AnalyticalFramework
TheobjectiveofthisstudyistoidentifygapsinthecompletenessofOSMbuildingfootprintsin
threesmallislandstates(Haiti,St.LuciaandDominica)basedonremotelysensedmeasurementsand
othergeospatialfeatures.Theprocedureinvolvessevensteps.
2.2.1.Step1:ConstructanArtificialTessellation
Weconstructanartificialtessellatedgridofcellsthatspaneachofthecountries;eachcellis0.25
squarekminsize(atotalof136,747gridcellsoverHaiti,2,796gridcellsoverSt.Luciaand3,861grid
cellsoverDominica).Eachgridcellwastreatedanindependentunitofanalysis.
2.2.2.Step2:DownloadtheCurrentOSMBuildingFootprints
WedownloadedthemostuptodateOSMdataforthethreecountries(datadownloadedinJuly
2019).ForHaiti,wedownloadedthedata(inaShapefileformat)fromGeofabrik
(https://www.geofabrik.de/data/download.html).Atthetimeoftheanalysis,Geofabrikdidnothave
dataforDominicaandSt.Lucia;thus,wedownloadedthedataforthesecountriesfromoverpass
turbo(https://overpassturbo.eu/)inaKMLformat(thisdatarequiresadditionalpreprocessingand
weselectedOSMfeaturesthatarelabeledas“building=Yes”).Atthetimeoftheanalysis,therewere
930,000mappedbuildingsinHaiti,38,619mappedbuildingsinDominicaand29,412mapped
buildingsinSt.Lucia.
2.2.3.Step3:CalculateTotalAreaofOSMBuildingFootprintsinaGridCell
WecalculatedthetotalareaofOSMbuildingfootprintsineachgridcell.Thisisthevaluetobe
predictedbytheexplanatoryvariables(theremotelysensedandgeospatialmeasurements).
2.2.4.Step4:PreprocessandAggregatetheRemotelySensedandGeospatialData
Wereliedonseveralpredictors(explanatoryvariables)toestimatethecoverageofOSMbuilding
footprintsinagridcellandtoidentifygapsinOSMcoverage.Wepreprocessedthedataand
aggregatedittothelevelofagridcells(Table1providesadescriptionoftheevaluatedexplanatory
variablesandtheaggregationmeasures).Thepreprocessing,analysisandaggregationofthe
remotelysenseddatawerecompletedbyusingGoogleEarthEngine(GEE).GEEisaplatformthat
leveragescloudcomputingservicestoachieveplanetaryscaleutilityandhasbeenpreviouslyused
forawiderangeofapplications[66],includingmappingpopulation[67,68]andurbanareas[69,70].
RemoteSens.2020,12,1186of27
NighttimeLights(VIIRS):TheVisibleInfraredImagingRadiometerSuite(VIIRS)isone
ofthekeyinstrumentsonboardtheSuomiNationalPolarOrbitingPartnership(SuomiNPP)
spacecraft(launchedin2011).VIIRSinstrumentcollectsvisibleandinfraredimageryandglobal
observationsofland,atmosphere,cryosphereandoceans.Thisinstrumenthassignificant
improvementsoverthecapabilitiesoftheformerDMSPOLS[71],notablyitsavailabilityonadaily
basisandhigherspatialresolution(upto500mattheequator).TheVIIRSDNBprovidesglobal
coveragewith12hourrevisittime.First,werecordforeachpixelthemaximumvalueofall
overlappingpixels(inthesamelocation)inastackofsevenmonthlycomposites(Jan–July)of2019.
Then,foreachgridcell,wecalculatedaSumofLight(SOL)measure(calculatedasthesumofthe
digitalnumbervaluesofalloverlappingpixelsineachcell).
Sentinel2DerivedSpectralIndices:TheCopernicusSentinel2missioncomprisesa
constellationoftwopolarorbitingsatellitesthatcollectmultispectraldatain13spectralbands,with
fourbandsataspatialresolutionof10mand6bandsataspatialresolutionof20m.Therevisitperiod
ofSentinel2is5daysattheequator.Wecalculatedfourremotelysensedmeasuressensitiveto
vegetationandbuiltuplandcover:NormalizedDifferenceVegetationIndex(NDVI)[72],Soil
AdjustedVegetationIndex(SAVI)[73],NormalizedDifferenceBuiltupIndex(NDBI)[74]andUrban
Index(UI)[75].Foreachgridcell,wecalculatedaperindexsumvalueofallpixelsoverlappingwith
thegridcell.
Sentinel1SAR:Sentinel1missioncomprisesaconstellationoftwopolarorbitingsatellites,
performingCbandsyntheticapertureradarimaging,enablingthemtoacquireimageryindayand
nightconditionsregardlessoftheweather.Sentinel1hasa12dayrepeatcycle,withaspatial
resolutiondownto5m.Similarlyto[70],wecapturedthetextureofthesurfacebyutilizingSentinel
1’sCband(singlecopolarizationverticaltransmitandverticalreceive(VV)acquisitionmodewith
anInterferometricWideSwath(IW)instrumentmode,a250kmswathat5mby20mspatial
resolution(singlelook)).Fromeachscene,weremovedspecklenoiseandperformedradiometric
calibrationandterraincorrection.Tocreatetheannualcomposites,wecalculatedforeachlocation
(pixel)themedianvalueofalloverlappingpixelsinanentirestackofallscenescapturedin2019.For
eachgridcell,wecalculatedtheaveragevalueofallpixelsincorporatedwithintheareaofthegrid
cell.
Slope:Tocapturethetopographyofthesurface,weusedtheGlobalSRTMmTPIdataset
(availableinGEEinaspatialresolutionof270m),wherealocalgradientiscalculatedforeachpixel
basedontheglobalSRTMDEMelevationdata(30mresolution).ThemTPIdistinguishesridgefrom
valleyformsandiscalculatedusingelevationdataforeachlocationsubtractedbythemeanelevation
withinaneighborhood[76].Foreachgridcell,wecalculatedtheaveragevalueofallpixelsinthe
gridcell.
ForestCover:Weestimatedtheextentofforestcoverin2018basedontheHansenGlobal
ForestChangev1.6(2000–2018)[77].First,wedefinedapixelas“forest”intheyear2000ifmorethan
20%ofitwascoveredin2000withforest.Werecordedpixelsthatexperiencedamajoreventofforest
coverlossbetween2000and2018andestimatethetotalareaofforestcoverin2018pergridcell.
UrbanFootprints:Wereliedontworemotelysensedderivedproductssignifyingurbanand
ruralsettlementsthatwereproducedbytheEarthObservationCenteratDLR:TheGlobalUrban
Footprint(GUF)(inaspatialresolutionof~12m)andtheWorldSettlementFootprint(WSF)(ina
spatialresolutionof~10m)[78–80].
OSMTransportationNetworkFeatures:WecalculatedthetotallengthofOSMroads
inacellandthetotalnumberofjunctionsinacellasadditionalpotentialpredictorsofOSMbuilding
footprints.
Table1.ThepredictorsusedtopredictpercellareaofOpenStreetMap(OSM)buildingfootprints.
PredictorSourceNumber
ofscenes
Percellstatistics
RemoteSens.2020,12,1187of27
NighttimelightsVIIRS7SumofLight(SOL):ThesumofDNmaxvalueof
allpixelsincell,where𝐷𝑁isthemaximum
digitalnumber(DN)valueofpixelinlocationi
over7monthlycompositesin2019.
NDVI
(NIRRED)/
(NIR+RED)
Sentinel2~42ThesumNDVIvalueofallpixelsinagridcell
SAVI
(NIRRED)/
(NIR+RED+L)*
(1+L)
Sentinel2~42ThesumSAVIvalueofallpixelsinagridcell
NDBI
(MIRNIR)/
(MIR+NIR)
Sentinel2~42ThesumNDBIvalueofallpixelsinagridcell
UI
(SWIR2NIR)/
(SWIR2+NIR)
Sentinel2~42ThesumUIvalueofallpixelsinagridcell
deforestationHansenGlobal
ForestChange
v1.6(20002018)
1Totalforestcoverinagridcell(2018)
Builtuparea GUF1Totalbuiltupareainagridcell
BuiltupareaWSF1Totalbuiltupareainagridcell
Topography
(slope)
SRTM1Averageslopepergridcell
SurfacetextureSentinel1~70Averagetexturepergridcell
RoadsOSM‐Totallengthofroadsinagridcell
RoadsjunctionsOSM‐Numberofjunctionsinagridcell
2.2.5.Step5:IdentifyMappedGridCells
WeadoptedavisualinterpretationmethodtovisuallyassessthecompletenessofOSMbuilding
footprintsinthegridcellsinHaitiandSt.Lucia.WeachievedthisbyoverlayingtheOSMbuilding
footprintdatasetwiththemostrecenthighresolutionbasemapimage(providedbyESRI,updated
asof2019[81]).WeidentifiedgridcellsinHaitiandSt.Luciawhereweassessedthatatleast75%of
thebuildingsthatarevisibleinthesatelliteimagehavebeenmapped(weidentified835gridcellsin
Haitiand179gridcellsinSt.Lucia).BecausethemajorityareaofDominicahasbeenmapped,we
skippedthisstepinthecaseofthiscountry.
2.2.6.Step6:PerformCorrelationAnalysisandPrediction
Weevaluatedthecorrelationbetweentheremotelysensedandthegeospatialmeasures(the
explanatoryvariables)andtheareaofOSMbuildingfootprintinagridcellusingaPearson
CorrelationTest,andperformedanOrdinaryLeastSquares(OLS)regressiontoestimatethepotential
ofthevariables,combined,toexplaintheobservedvariationintheareaofOSMbuildingfootprints
inagridcell.Additionally,weevaluatedthepotentialoftheexplanatoryvariablestopredictthearea
ofOSMbuildingfootprintsinagridcellusingaregressionwithRandomForests.RandomForests
[82]aretreebasedmodelsthatincludekdecisiontreesandprandomlychosenpredictorsforeach
recursion.Whenpredicting,foranexample,itsvariablesarerunthrougheachofthektrees,andthe
kpredictionsareaveragedthroughanarithmeticmean.Eachtreeistrainedusingasubsetof
examplesfromthetrainingset,drawnrandomlywithreplacement,witheachnodeʹsbinaryquestion
determinedusingarandomsubsetofpinputvariables.Weperformedtheregressionwiththe835
gridcellsthatwerevisuallyassessedasbeingrelativelyfullymapped(i.e.,morethan75%ofthe
buildingsinagridcellareassessedasmapped).Toevaluatetheaccuracyoftheprediction,we
RemoteSens.2020,12,1188of27
adoptedafivefoldcrossvalidationmethod.Ineachexperiment,theexamplesinoneofthedatafolds
wereleftoutfortestingandtheexamplesintheremainingfourfoldswereusedtotrainthemodel.
Theperformancequalityofthetrainedmodelwastestedontheexamplesintheleftoutfold,andthe
overallperformancemeasureisthenaveragedoverthefivefolds.Weassessedtheclassification
accuracywithadifferentnumberofdecisiontrees:2,4,8,16,32,64,128,256and512,withminimum
sizeofterminalnodessetto5.
2.2.7.Step7:PredicttheCoverageofOSMBuildingFootprintsinEachEntireCountry
Weusedeitherthegridcellsthatarevisuallyassessedasrelativelyfullymapped(inthecaseof
HaitiandSt.Lucia)orallthegridcells(inthecaseofDominica)asreferencesforthetrainingof
RandomForestRegressionandtopredicttheareaofOSMbuildingfootprintsovertheentiregrid
cellsineachcountry.WeidentifiedthegridcellsthatwerepredictedtoincorporateOSMbuilding
footprints,butwerenotyetmapped.
3.Results
Anexaminationofthe136,747cellsspanningHaitishowsthatonly25.1%ofthecellshaveat
leastonemappedbuilding,andonly512ofthe136,747cellshavemorethan10%oftheirareacovered
withbuildingfootprints(Figure2showsahistogramofthedistributionofOSMbuildingfootprints
percell).Onaverage,thereare27.5buildingsinacell(Std=83.4);1530ofthecells(i.e.,only1.1%of
thecellsspanningHaiti)incorporatemorethan100mappedbuildings.Incomparison,8.15%and
6.84%ofthecellsincorporatebuiltuplandcoveraccordingtoWSFandGUF,respectively.
Figure2.Thedistribution(histogram)ofOSMbuildingfootprintsarea(squaremeters)pergridcell.
Asdiscussedabove,avisualexaminationofthecompletenessofOSMbuildingfootprintsover
Haitisuggeststhatlargeportionsoftheislandremainunmapped(Figure3a).Figure3b,cshow,as
anillustration,thecoverageofOSMbuildingfootprintsinthecapitalofHaiti,PortauPrinceand
Carrefour,andintheadjacentCarrefourcommune.Whilebuildingsinmanyareaswithinthesecities
havebeenmapped,largeportionsarestillnotfullymapped.Weobservethatdenselymappedzones
ofPortauPrincecoexistalongsidezonesthatremainentirelyunmapped(Figure3c),avisualpattern
thatmayresultfromtheepisodicengagementofcommunitymappingvolunteersandthedefinition
ofmapping‘tasks’onaneighborhoodscalethroughOSMeditingtools.Moreover,significantparts
innorthernHaitiarenotmapped(Figure4),including,forexample,thecitiesGonaïvesandCap
Haitien.
0
2000
4000
6000
8000
10000
12000
200
800
1400
2000
2600
3200
3800
4400
5000
5600
6200
6800
7400
8000
8600
9200
9800
10400
11000
11600
12200
12800
13400
14000
14600
15200
15800
16400
17000
17600
18200
18800
19400
Numberofcells
OSMareapercell(Sqm)
RemoteSens.2020,12,1189of27
Figure3.(a)OSMbuildingfootprintscoverageinHaiti,(b)inthecapitalofHaiti,PortauPrince,and
(c)intheadjacentCarrefourcommune.YellowindicatesOSMbuildingfootprints.
(a) (b)
Figure4.OSMbuildingfootprintscoveragein(a)thecityofCapHaitienand(b)Gonaïvesinnorthern
Haiti.
APearsoncorrelationtestindicatedasignificant(p<0.01)correlationbetweenthetotalareaof
OSMbuildingfootprintsinagridcellandseveraloftheexaminedexplanatoryvariables.As
expected,therewasapositiveandsignificantcorrelationbetweentheareaofOSMbuilding
footprintsinagridcellandthetotalareaofbuiltuplandcover,accordingtoWSFandGUF(r=0.73
and0.71,respectively,p<0.01)aswellaswithnighttimelights(VIIRSSOL)(r=0.63,p<0.01).Wefind
asignificant(p<0.01)correlationbetweenOSMbuildingfootprintsareainagridcellwiththefour
Sentinel2spectralindices,indicatedbyapositivecorrelationwithUIandNDBI(r=0.59andr=0.47)
andanegativecorrelationwithbothSAVIandNDVI(r=–0.53).
Weidentified835gridcellswhere,accordingtoavisualassessment,atleast75%ofthebuildings
thatwerevisibleinthesatelliteimagearemappedinOSM(Figure5showsexamplesofgridcells
wheremorethan75%ofthestructuresaremapped).ThecorrelationbetweentheareaofOSM
buildingfootprintsinagridcellandtheexaminedpredictorswashighercomparedtotheprevious
experiment,whereallthegridcells(i.e.,136,747gridcells)wereconsidered(forexample,r=0.78and
r=0.65withWSFandVIIRSandr=0.61andr=–0.55withUIandSAVI,respectively)(Table2),which
islikelyduetothefactthatlargeportionsofthecountryarenotmapped(i.e.,therearegridcellsthat
lackOSMcoveragewhileactuallypopulatedandexhibitLULCcharacteristicsofpopulatedareas).
Asexpected,therewerealsosimilaritiesandcorrelationsbetweensomeoftheexplanatoryvariables.
RemoteSens.2020,12,11810of27
Figure6apresentspairwisecorrelationcoefficientsbetweentheexplanatoryvariable(variablesare
orderedaccordingtoahierarchicalclustering).Theexplanatoryvariablesformseveralsimilarity
clusters:aclustercomposedoutofvegetationspectralindices(NDVI,SAVI)andforestcover(which
arepositivelycorrelatedwitheachother),andaclustercomposedoutofbuiltuplandcoverspectral
indices(NDBI,UI),togetherwithVIIRS,WSF,GUF,andOSMroadnetworkfeatures.Asexpected,
thereisanegativeandsignificantcorrelationbetweenthevegetationandthebuiltuplandcover
spectralindices.Thedendrogramshowninthefigurefurtherhighlightshierarchicalclustersformed
betweenthevariables,notably,OSMareaandVIIRS,UIandNDBI,NDVIandSAVI,androadlength
andnumberofjunctionsinagridcell.
Figure5.Examplesofgridcellsthathavemorethan75%oftheirareamappedwithOSMbuilding
footprints.
Table2.PearsoncorrelationtestbetweentheareaofOSMbuildingfootprintsinagridcellandthe
evaluatedpredictors(thiscorrelationtestincludesonlygridcellsthatwereassessedasmapped,
N=835).
VIIRSGUFWSFNDVINDBISAVI
r0.654*0.76*0.78*–0.551*0.486*–0.551*
UIForestCoverSE1Slope RoadlengthOSMjunctions
r0.614*–0.388*0.16–0.110.69*0.60*
Note:*p<0.01
(a)
RemoteSens.2020,12,11811of27
(b)
(c)
Figure6.Pairwisecorrelationcoefficientsbetweentheexplanatoryvariablesin(a)Haiti,(b)St.Lucia
(calculatedwithinvisuallyassessedgridcells)and(c)Dominica(calculatedwithinallgridcells).
Variablesareorderedaccordingtohierarchicalclustering,whichisalsorepresentedbythe
dendrogram.ThebluelineinthelegendisahistogramofthedistributionofthePearsoncorrelation
coefficients.
AnOrdinaryLeastSquares(OLS)regressionshowsthatnineofthevariablestogetherexplain
upto82%ofthevariationofOSMbuildingfootprintsareainagridcell(R2=0.82,F(12,822)=323.20,
p<0.01)(Table3).Weevaluatedthecontributionoffourtypes(groups)offeaturestothemodelfit
usingastepwiseregressionanalysis:(1)onlyGUFandWSF;(2)withtheadditionofnighttimelights
(VIIRS);(3)withtheadditionoffurtherremotelysensedmeasuresandderivedproducts;(4)withthe
additionofOSMroadnetworkfeatures.Theresultsshowanimprovementofthemodelfitwiththe
additionofeachofthepredictivevariablesgroups(Table4).WhileGUFandWSFtogetherexplain
66%ofthefit,theadditionofnighttimelightsimprovesthefitofthemodel(indicatedbyexplanation
ofupto76%ofthevariation).Theadditionoffurtherremotelysensedmeasures(i.e.,Sentinel2
derivedspectralindices,slope,textureandforestcover)improvesthemodelfitbyafurther5%(up
RemoteSens.2020,12,11812of27
to81%ofthevariation).WiththeadditionofOSMtransportationnetworkfeaturesthefitofthe
modelimprovesmarginallytoaround82%.
Table3.Modelfitforthestepwiseregressionanalysisofnineevaluatedpredictors.
StepVariableR2AdjustedR2C(p)AICRMSE
1WSF0.6140.613984.918235.213332.1
2UI0.7050.704559.918013.311666.7
3GUF0.7640.763282.917828.110435.7
4VIIRS0.7990.798120.217696.09636.3
5Roadlength0.8140.81349.517631.29264.0
6FCarea0.8200.81923.417605.89118.9
7NDBI0.8220.82117.017599.59078.8
8Numberofjunctions0.8230.82213.117595.69052.3
9Medianslope0.8240.82211.917594.39040.2
Table4.Fourregressionmodeloutputsshowinganimprovementofmodelfitwiththeinclusionof
fourgroupsofvariables:(1)onlyGUFandWSF;(2)additionofnighttimelights(VIIRS);(3)addition
ofremotelysensedmeasuresandderivedproducts;(4)additionofOSMroadnetworkfeatures.
Step(1)(2)(3)(4)
GUF0.115***0.124***0.127***0.138***
(0.010)(0.009)(0.008)(0.008)
WSF0.141***0.081***0.050***0.034***
(0.010)(0.009)(0.009)(0.009)
VIIRS2,214.671***1,276.821**1,073.838***
(124.738)(127.805)(126.619)
NDBI 37.152***‐17.790
(11.258)(11.409)
NDVI 72,452.840**64,046.550**
(30,894.100)(29,957.530)
SAVI 48,312.220**‐42,708.300**
(20,600.640)(19,976.140)
UI 46.857***25.415**
(9.751)(10.191)
Forestcover 0.060***0.051***
(0.012)(0.011)
Slope 179.453274.377
(192.545)(187.222)
Sentinel1  ‐696.229‐295.342
(453.517)(442.270)
Roadlength  1.470***
 (0.543)
No.ofjunctions 60.057**
 (29.311)
Constant0.716‐699.98830,909.080***17,403.480***
(672.122)(573.990)(3,897.596)(4,351.436)
RemoteSens.2020,12,11813of27
Observations835835835835
R20.6630.7560.8130.825
AdjustedR20.6620.7550.8110.823
ResidualStd.Error12,460.3910,615.9409,329.4209,029.806
FStatistic818.245***
856.591***
357.937***
323.203***
Note: 
*p<0.1;**p<0.05;***p<0.01
3.2.PredictionofOSMBuildingFootprintCoverage
TheresultsaboveindicatethattheareaofOSMbuildingfootprintsinagridcellcanbeexplained
byseveraloftheremotelysensedandgeospatialexplanatoryvariables.Toevaluatethepotentialof
thesevariablestopredicttheareaofOSMbuildingfootprintsinacell,weperformedaregression
withRandomForests.
RandomForestregressionpredictsupto89%ofthevariationofOSMbuildingfootprintsina
gridcell.Performanceimproveswiththeadditionofdecisiontreesupto64trees,forexample,from
81%to89%ofthepredictedarea(with2and64decisiontrees,respectively,Figure7).Figure8
presentsacomparisonbetweentheactualandthepredictedareaofOSMbuildingfootprintsinagrid
cell(regressionwith64decisiontrees)(Figure8a).Thetwomostimportantvariablestothemodel
areWSFandGUF,followedbyOSMroadnetworkfeaturesandSentinel2derivedspectralindices
(indicatedbyvariableimportancesensitivity(lncNodePurity),Figure8b).
Figure7.TheimprovementofR2withtheincreasefrom2to64decisiontreesintheRandomForest
model.
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
2 4 8 16 32 64 128 256 512
R2
NumberofTrees
RemoteSens.2020,12,11814of27
(a) (b)
Figure8.Comparisonbetweenthe(a)actualandpredictedareaofOSMbuildingcoverageineach
gridcell(using64decisiontrees)and(b)variableimportancesensitivity.
WeusetheRandomForestmodeltopredicttheareaofOSMbuildingfootprintsoverallthegrid
cellsinHaiti(i.e.,wetrainthemodelwiththe835gridcellsthatwereassessedasrelativelyfully
mappedandpredictforthecoverageovertheentiredataset).Figure10bshowsthepredictedareaof
OSMbuildingfootprintspergridcelloverHaiti.TheresultshighlightlargeportionsinHaitithat
havenotyetbeenmapped(e.g.,forexample,thenortherncitiesGonaivesandCapHaitien)aswell
aspatchesofunmappedgridcellsaroundmajorcities(e.g.,PortauPrince).Thisanalysisallowsus
toidentifyareas(gridcells)thatarepredictedtoincorporatelargeareasofbuildingfootprintsbut
arenotmapped(Figure9).
AvisualexaminationshowsthatthepredictedcoverageofOSMbuildingfootprints(Figure10b)
correspondsmorecloselywiththedistributionofbuiltuplandcover(accordingtoGUF,for
example)andnighttimelights(VIIRS)(Figure10c,d,respectively)thancomparedtothecurrent
distributionofOSMbuildingfootprints(Figure10a).
Tofurtherevaluatetheaccuracyofthemodel,weperformanOLSregressionanalysisusingthe
entireHaitidataset(i.e.,136,747gridcells).Wefindthattheremotelysensedmeasurementsexplain
upto89%ofthevariationofthepredictedareaofOSMbuildingfootprintsinallthegridcells
spanningHaiti(R2=0.89,F(12,136,734)=90,690,p<0.01).Incomparison,theseindicatorsexplain
only48%ofthevariationofthecurrentareaofOSMbuildingfootprintsintheHaitidataset(R2=0.48,
F(12,136,734)=10,330,p<0.01).
Finally,inordertoidentifygridcellsthatarepredictedtoincorporatebuildingfootprintsbut
areactuallynotmapped,wecalculatedtheratiobetweentheactualandthepredictedareaofOSM
buildingfootprintinagridcell(calculatedasthepredictedareaofOSMbuildingfootprintsinagrid
celldividedbytheactualareaofOSMbuildingfootprintsinagridcell)(Figure11a,c).Thisanalysis
allowsustoidentifygridcellswheretheratiobetweenthepredictedandtheactualareaofOSM
buildingfootprintinagridcellislow(Figure11b,d),highlightinggridcellsthatrequiremapping.
RemoteSens.2020,12,11815of27
(a)
(b)
Figure9.Acomparisonbetween(a)existingOSMbuildingfootprintsand(b)predictedareaofOSM
buildingfootprintsinagridcell(Carrefour,Haiti).Areasshownindarkpurplearepredictedto
includealargeareaofbuildingfootprints.
RemoteSens.2020,12,11816of27

Figure10.(a)TheareaofOSMbuildingfootprintsinagridcellcomparedto(b)theareapredicted
byRandomForestRegression.Thepatternsobservedinthepredictedmapalignwiththeobserved
distributionofthebuiltuplandcoveraccordingto(c)GUFandto(d)theintensityofnighttimelights
accordingtoVIIRS.
Figure11.(a,c)TheratiobetweentheactualandthepredictedareaofOSMbuildingfootprintina
gridcell(calculatedasthepredictedareaofOSMbuildingfootprintsinagridcelldividedbythe
actualareaofOSMbuildingfootprintsinagridcell);Gridcellswiththelowestratio(lowerthan20).
(b,d)GridcellsshowninpurplearegridcellswheretheactualareaofOSMbuildingfootprintsis
muchlowerthanthepredictedarea.
RemoteSens.2020,12,11817of27
3.3.EvaluationoftheMethodintheCaseofDominicaandSt.Lucia
Theresultsabovesuggestthepotentialofseveralremotelysensedindicatorstopredictthe
coverageofOSMbuildingfootprints,atleastinthecaseofHaiti.Inordertoassessthevalidityofthe
method,weperformedfurtheranalysisinthecaseoftwoadditionalsmallislandstates:Dominica
andSt.Lucia.AvisualexaminationsuggeststhatthecoverageofOSMbuildingfootprintsin
Dominicaisrelativelycomplete,whilelargeportionsofSt.Luciaremainunmapped.Areaslacking
OSMbuildingfootprintsincludepartsofthecapital,Castries,andthesecondlargesttown,Vieux
Fort(Figure12).Thesetwoareasaccountforapproximately49%ofthepopulationofSt.Lucia(64,654
and16,624people,respectively[83]).
(a)VieuxFort(b)Castries
Figure12.ExamplesofareasmissingOSMbuildingfootprintsinSt.Lucia((a)VieuxFortand(b)the
Castries).
SimilartothemethodologydescribedinthecaseofHaiti,wecreatedafishnetofgridcells,0.25
km2insize,spanningthetwoislands.InthecaseofDominica,wefoundahighpositiveand
significantcorrelationbetweentheareaofOSMbuildingfootprintsinagridcellandseveral
explanatoryvariables.WefoundahighpositivecorrelationwithGUF,numberofjunctionsandWSF
(betweenr=0.90andr=0.91,p<0.01forboth).ThecorrelationbetweentheareaofOSMbuilding
footprintsandVIIRSisabitlower(r=0.75,p<0.01).WithSentinel2derivedspectralindices,this
correlationrangesbetweenr=0.35andr=0.38(withUIandNDBI,respectively,p<0.01forboth).An
OLSregressionanalysisrevealsthattogether,thesevariablesexplain92%ofthevariationofOSM
buildingfootprintareainagridcell(R2=0.92,F(12,3846)=3848,p<0.01).RandomForestregression
(with64decisiontrees)resultsinsimilartrends,indicatedbyahighaccuracyrateof88%(regression
accuracyassessedusingfivefoldcrossvalidation).
InthecaseofSt.Lucia,whentheanalysiswasdonewiththeentiredatasetofgridcellsoverthe
country(i.e.,N=2781),thecorrelationbetweentheareaofOSMbuildingfootprints,GUFandWSF
rangesbetween0.70and0.75(p<0.01)andthereisalowercorrelationbetweenOSMbuilding
footprintsareaandVIIRS(r=0.58,p<0.01).Together,thepredictorsexplainonly66%ofthevariation
isOSMbuildingfootprints(R2=0.66,F(12,2783)=464.6,p<0.01).Werelatethelowerfitofthemodel
tothefactthatlargeportionsofthecountryhavenotbeenmapped.Thus,wevisuallyassessedthe
completenessofOSMbuildingfootprintsinSt.Luciagridcellsandidentified179gridcellsinwhich
wecouldassessthatmorethan75%oftheirareaismapped.Withthesevisuallyassessedgridcells,
thefitofthemodelimproved,andtogether,thepredictorsexplain92%ofthevariationofOSM
buildingfootprintarea(R2=92%,F(12,166)=166.4,p<0.01).RandomForestregression(with64
decisiontrees)resultsinasimilaraccuracyrate(R2=92%)(Table5).
Finally,weuseRandomForesttopredicttheareaofOSMbuildingfootprintsineachofthe
countries.Figure13presentsthepredictedareaofOSMbuildingfootprintinSt.Lucia.Unmapped
areasinclude,forexample,areassurroundingthecapitalofCastries.Thecentralpartofthecityis
RemoteSens.2020,12,11818of27
denselymappedwhileitsadjacentneighborhoodslackcoverage.Largeswathsofthesurrounding
areasofCharlotte,Vigie,andBiseecompletelylackOSMbuildingfootprints.
Table5.PearsonCorrelationTest,OLSRegression,andRandomForestRegressionforDominicaand
St.Lucia.
DominicaSt.Lucia
Fulldataset
(N=3861)
Full
(N=2781)
Visuallyassessedcells*
(N=179)
(I)PearsonCorrelationTest
GUFr=0.91* r=0.75* r=0.89*
NumofJuncr=0.90*r=0.76*r=0.88
WSFr=0.90*r=0.70*r=0.78
RoadLengthr=0.81* r=0.68*r=0.84*
VIIRSr=0.75*r=0.58*r=0.72*
NDBIr=0.38* r=0.30r=0.3
UIr=0.35* r=0.26r=0.35
Sentinel1r=0.03*r=0.00r=0.20
NDVIr=–0.20* r=–0.19 r=–0.29
SAVIr=–0.20*r=–0.19r=–0.29
Sloper=–0.16r=–0.14*r=0.10*
ForestCoverr=–0.07r=–0.35r=–0.43
(II)OLS
R2=92%
F(12,3846)=3848,p=0.00
R2=66%
F(12,2783)=464.6,p=0.00
R2=92%
F(12,166)=166.4,p=0.00
(III)RandomForest
R2=88%R2=94%
Note:*p<0.01
RemoteSens.2020,12,11819of27
Figure13.(a)ExistingOSMbuildingfootprintsversus(b)predictedareaofOSMbuildingfootprints
inagridcell.Areasshownindarkpurplearepredictedtoincludealargeareaofbuildingfootprints.
4.Discussion
Inrecentdecades,naturaldisastershavebeenresponsibleforanestimated0.1%ofglobaldeaths,
killingonaverage60,000peopleperyear[84].Inthelasttwodecadesalone,developingcountries
haveaccountedformorethanhalfofallreportedcasualties[85].Naturaldisastersoftencause
significantdamagetocommunities,infrastructureandtheenvironment,andrequireimmediate
interventionandimplementationofappropriatemeasuresaimingtosavelives.
Accurateandeasilyaccessiblegeospatialinformationiskeyforaneffectivedisasterrisk
managementcycleandforinformeddecisionmakingduringhumanitarianresponse[86].The
increasingavailabilityofgeospatialinformationisrevolutionizingdisasterresearchandemergency
management.Untilrecently,muchofthisessentialgeospatialinformationwasproprietary,scarce
andinmanycases,unavailableduringsignificantdisasters.
Inthelastdecade,OSMhasrepeatedlybeenshowntobehighlysuitablefordisastermapping
andmanagement.DespitethecontinuouseffortstoimprovecompletenessoftheOSMdatabase,large
portionsoftheworldremainunmapped,especiallyincountriesthatarepronetonaturalhazards,
reflectinglimitedinternetspeedconnectivity,limitedavailabilityofGPSdevices,lackof
technologicallyskilledvolunteersandlimitedawarenessofVGItechnologies[18].
SeveralmethodshavebeenproposedtoassessthecompletenessoftheOSMdatabase,including
evaluationofthecompletenessofstreetnetworks,landuseandbuildings;manyofthemrelyon
externaldatasetsforaccuracyandcompletenessestimation.Theincreasedavailabilityoffreeand
opensourceremotelysenseddatacanbeutilizedtoidentifymappinggapsinOSMdatasets,find
locationsthatrequiremapping,andhelpprioritizeandplanmappingcampaignsandefforts.
RemoteSens.2020,12,11820of27
AlthoughseveralapplicationshaverecentlybeenproposedtopredictthecoverageofOSMbymeans
ofremotelysensedderivedproducts(suchasWorldPop)[87],tothebestofourknowledge,nostudy
hasyetevaluatedthepotentialuseofremotelysensedmeasurementstopredictthecompletenessof
OSMfeatures.
Inthisstudy,wedemonstrateamethodologytoidentifyareaswherebuildingfootprintshave
notyetbeenmappedinOSMdataset.Themethodologyreliesonremotelysensedmeasurementsand
derivedproductsandgeospatialinformationrelatedtotheroadnetworktopredictthecompleteness
ofOSMbuildingfootprintsinthreesmallislandstates(Haiti,DominicaandSt.Lucia).
InthecaseofHaiti,theresultsshowthatlargeportionsofthecountryarestillunmapped,
despitethecontinuedmappingeffortstomaintainafullanduptodatemapwhilealsokeepingpace
withthechangingsociophysicalcharacteristicsofthecountryandtoaidresponseandrecoveryin
futuredisasters[88].Wefindthatinthecaseofthethreecountries,thecoverageofOSMbuilding
footprintsissignificantlycorrelatedwithseveralremotelysensedmeasuresandindicators.As
expected,thecoverageispositivelycorrelatedwiththedistributionofbuiltuplandcover(indicated
byaPearsoncorrelationcoefficientofbetweenr=0.78andr=0.91inthecaseofHaitiandDominica,
respectively).Tosomeextent,thisisnotsurprising,andpreviousstudieshavealreadyshownthe
potentialofremotelysensedderivedproductstopredictthecoverageofOSMbuildingfootprints
[87].However,alimitingfactorinutilizationofremotelysensedderivedproductsisthattheir
availabilityvariesinspaceandtimeandtheseproductsarenotalwaysupdatedonaregularbasis.
Inthisstudy,wedemonstratethepotentialuseoffreeandopensourceremotelysensedindicators
toestimatethecoverageofOSMbuildingfootprints.Weshowthattheintensityofnighttimelights
luminosity(measuredbyVIIRS)ishighlycorrelatedwiththeareaofOSMbuildingfootprints
(indicatedbyaPearsoncorrelationcoefficientofbetweenr=0.65andr=0.75inthecaseofHaitiand
Dominica,respectively).Thisfindingalignswithpreviousstudiesshowingthattheintensityof
nighttimelightsiscloselycorrelatedwithanthropogenicactivitiesandwithchangesinthe
distributionofbuiltuplandcover[89,90],aswellaswiththedistributionofgeographicalfeatures,
suchasthedensityoftheroadnetwork[91].Further,weshowthatremotelysensedspectralindices
signifyingthedistributionofvegetation(NDVI,SAVI)andbuiltuplandcover(NDBI,UI)arealso
correlatedwiththedistributionofOSMbuildingfootprints.Whilepreviousstudiesproposetoutilize
vegetationspectralindices(e.g.,NDVI)toestimatethecompletenessofurbangreenspacesmapping
inOSM[92],weshowthattheseindicesarealsosignificantlycorrelatedwiththecoverageofOSM
buildingfootprints.
Becausedifferentsensorsrecorddistinctcharacteristicsoftheland(e.g.,brightness,temperature,
height,density,texture),datafusiontechniquesthatexploitthebestcharacteristicsofeachtypeof
sensorhavebecomeavaluableprocedureinremotesensinganalysis,includingforurban
applications[70,93]andinmappingthebuiltuplandcover[69].Inthisstudy,weshowthatthe
combinedeffectoftheexplanatoryvariablesexplainsbetween92%and94%ofthevariationinthe
areaofOSMbuildingfootprints,exceedingtheeffectofeachpredictorbyitself.Weshowthatinthe
caseofHaiti,theadditionofnighttimelightstobuiltuplandcoverclassificationproducts(GUF,
WSF)improvesthefitofthemodelby10%,andthatwiththeadditionofremotelysensedmeasures,
thefitofthemodelimprovesby5%.Althoughlandcovercharacteristicsareoftenrelatedtonighttime
lightluminosity(forexample,vegetationdensitydecreaseswhileluminosityincreasesfromtherural
areatotheurbancore),fusingnighttimeanddaytimeremotelysensedmeasuresallowsforan
increaseintheseparabilitybetweenurbanandnonurbanland[94].Fusingdaytimeandnighttime
measurementsenablefeaturecomplementationandcompensationforthelimitationofsingledata
sourcesinextractingurbaninformation[95].
WealsoperformaRandomForestRegressiontopredicttheareaofOSMbuildingfootprintsin
acellandfindthattheregressionexplainsbetween88%and94%ofthevariation(inDominicaand
Haiti,respectively).PreviousstudiessuggestthatthenumberofdecisiontreesoftheRandomForest
isgenerallyproportionaltothemodel’saccuracy[96],althoughtheyshowmixedresultsforthe
optimalnumberoftreesinthedecisiontree.Thenumbervariesbetween10[97]and150trees[98].
RemoteSens.2020,12,11821of27
Here,wefindthatwithRandomForests,accuracyimprovesupto64decisiontreesandthen
moderatelydecreasesasthenumberoftreesincreases.
Finally,wedemonstratethepotentialofourapproachtoidentifyingmappinggaps,orareasthat
lackOSMcoverage.Wedothisbytrainingthemodelwiththegridcellsthatarevisuallyassessedas
relativelycompletelymappedandusethetrainedmodeltopredictthecoverageofOSMbuilding
footprintsthroughoutthecountries.WecapturegridcellsthatlackOSMbuildingfootprintscoverage
(i.e.,gridcellswherethepredictedcoverageofOSMbuildingfootprintslargelyexceedstheactual
coverage).Thesegridcellsrepresentlocationswheremappingeffortscouldpotentiallybetargeted.
Tosummarize,VGIcontributions,especiallyindevelopingcountries,tendtobemadeinspurts,
forexampleinresponsetoaspecifictrigger,suchasanaturaldisasterorhumanitariancrisis,rather
thanasaregular,continuousprocess[13].BecauseOSMdatasetsrelyonvolunteers,thecompleteness
andmappingeffortsvaryinspaceandtime.Thereisaneedforasystematictoolthatwouldguide
andprioritizemappingeffortsandmappingcampaigns.Asmoreandmoreremotelysenseddata
becomeavailabletotheresearchcommunity,ourstudyextendstheexistingliteratureby
demonstratinghowtheycouldbeleveragedtoconductnovel,andcritical,largescaleassessmentsof
thecompletenessoftheOSMdatabase,especiallyinareasathighrisktonaturaldisasters.
Wenoteafewlimitationstothisstudy.First,wedemonstrateourmethodologyinthecasestudy
ofthreesmallislandstates,which,atleasttosomeextent,arecharacterizedbyrelativelysimilar
geographicalconditionsandcharacteristics.Anextensionofthisstudywouldevaluateour
methodologyinadditionalcountriescharacterizedbydiversegeographicalconditions(topography,
landcover,landuse,etc.).Second,inordertocreatethetrainingexamplesforthemodelandto
evaluatetherelationbetweentheexplanatoryvariablesandtheareaofOSMbuildingfootprints,we
createdarelativelysmalldatasetofexamples(gridcells),whichwerevisuallyassessedusinga
subjectivevisualinterpretationmethodasrelativelyfullymapped.Byitsnature,visualinterpretation
maybesubjecttoidiosyncraticvariationacrossindividualsperformingthemanualclassification.An
extensiontothisstudywouldleveragethecrowdtocreateanextensivedatasetofgridcellsvisually
assessedandmappedandwouldaccountforanagreementbetweentheinterpreters.Third,the
analysispresentedinthisstudywasdoneatasinglepointintime.Byitsnature,theOSMdatabase
iscontinuouslyupdatedandisdynamic,whilesimultaneously,newremotelysensedmeasurements
arebeingcollected.Anextensiontothisstudywouldaccountforthesechangesintimeandevaluate
thecompletenessofOSMbuildingfootprintsonanongoingbasisasnewdatabecomesavailable.
ExtensionstoourapproachmayimprovetheidentificationofareasthatlackcompletedOSM
coveragebyaccountingforadditionalinputs;forexample,socioeconomicvariables(including
WorldPop,Facebook’sHighResolutionSettlementLayer(HRSL)andadditional
physical/geographicalcharacteristicsandspectralindices).Furtherextensionstoourapproachmay
alsoincludetheapplicationoflearningalgorithmsandevaluationwithvarioustuningparametersof
theclassifiersandthefitmodels.
5.Conclusions
Globally,therehasbeenanincreaseinthefrequencyandimpactsofmajornaturaldisaster
events;inthenextcentury,itislikelythatclimatechangewillamplifythenumberandseverityof
suchdisasters.Whileaccurateandtimelygeospatialinformationisvitalforthefullcycleofdisaster
riskmanagement,thisdataisnotalwaysavailableforthedisastermanagementcommunitywhen
disastersoccurs.AlthoughVGIplatforms,specificallyOpenStreetMap(OSM),showgreatpotential
tosupporthumanitarianmappingtasks,gapsinVGIdataremainsamajorconcern[99].Thereisan
increasingneedforafullyautomatictoolthatwouldallowtoidentifyareasthatlackacomplete
mappingofOSMfeatures—especiallyinareaspronetohazardevents.Whilepreviousstudieshave
utilizedOSMdataasreferenceforclassificationofbuiltuplandcoverwithsatelliteimagery[100–
103]hereweshowthepotentialuseofpubliclyavailable,remotelysenseddataaspredictorsofthe
spatialcoverageofOSMbuildingfootprints.Thetoolandmethodologywepresentherearetime
efficientandscalable.
RemoteSens.2020,12,11822of27
AnextensiontoourapproachmayimprovetheaccuracyofthepredictionofOSMbuilding
footprintsareabyaddingadditionalremotelysensedmeasures.Incorporatingadditionaldatasets,
suchasnewlydevelopedVIIRSnighttimelightproducts,