Available via license: CC BY
Content may be subject to copyright.
RemoteSens.2020,12,118;doi:10.3390/rs12010118www.mdpi.com/journal/remotesensing
Article
AssessingOpenStreetMapCompletenessfor
ManagementofNaturalDisasterbyMeansof
RemoteSensing:ACaseStudyofThreeSmallIsland
States(Haiti,DominicaandSt.Lucia)
RanGoldblatt1,*,NicholasJones2andJennyMannix3
1NewLightTechnologiesInc.,Washington,DC20005,USA
2WorldBankGroup,Washington,DC20433,USA;njones@worldbankgroup.org
3NewLightTechnologiesInc.,Washington,DC20005,USA;jennifer.mannix@nltgis.com
*Correspondence:ran.goldblatt@nltgis.com;Tel.:+1‐202‐630–0362
Received:26November2019;Accepted:25December2019;Published:1January2020
Abstract:Overthelastfewdecades,manycountries,especiallyislandsintheCaribbean,havebeen
challengedbythedevastatingconsequencesofnaturaldisasters,whichposeasignificantthreatto
humanhealthandsafety.Timelyinformationrelatedtothedistributionofvulnerablepopulation
andcriticalinfrastructureiskeyforeffectivedisasterrelief.OpenStreetMap(OSM)hasrepeatedly
beenshowntobehighlysuitablefordisastermappingandmanagement.However,largeportions
oftheworld,includingcountriesexposedtonaturaldisasters,remainincompletelymapped.Inthis
study,weproposeamethodologythatreliesonremotelysensedmeasurements(e.g.,Visible
InfraredImagingRadiometerSuite(VIIRS),Sentinel‐2andSentinel‐1)andderivedclassification
schemes(e.g.,forestandbuilt‐uplandcover)topredictthecompletenessofOSMbuildingfootprints
inthreesmallislandstates(Haiti,DominicaandSt.Lucia).Wefindthatthecombinatorialeffectsof
thesepredictorsexplainupto94%ofthevariationofthecompletenessofOSMbuildingfootprints.
Ourstudyextendstheexistingliteraturebydemonstratinghowremotelysensedmeasurements
couldbeleveragedtoevaluatethecompletenessoftheOSMdatabase,especiallyincountrieswith
highriskofnaturaldisasters.IdentifyingareasthatlackcoverageofOSMfeaturescouldhelp
prioritizemappingefforts,especiallyinareasvulnerabletonaturalhazardsandwherecurrentdata
gapsposeanobstacletotimelyandevidence‐baseddisasterriskmanagement.
Keywords:OpenStreetMap;OSM;OpenStreetMapcoverage;disastermanagement;remotesensing
1.Introduction
Overthelastfewdecades,manycountrieshavebeenchallengedbythedevastating
consequencesofnaturaldisasterswhichposeasignificantthreattohumanhealthandsafetyand
impactvulnerablecommunitiesandcriticalinfrastructureglobally.Everyyear,naturaldisasters
impactcloseto160millionpeopleworldwide[1],causingdestructionofthephysical,biologicaland
socialenvironments,impactingfoodsecurity,andcausinggloballossesthatamounttoover100
billiondollars[2].Thefrequencyofnaturaldisastershasbeensteadilyincreasingsince1940[3]and
overthenextcentury,climatechangewilllikelyamplifythenumberandseverityofsuchdisasters
[4].
Whiletheimpactsofnaturaldisastersareworldwide,somecountrieshavebeenmorevulnerable
todifferenttypesofdisastersthanothers[5].Forexample,in2017,PuertoRico,SriLankaand
Dominicawereatthetopofthelistofthemostaffectedcountriestonaturaldisasterssuchas
significantprecipitation,floodsandlandslides.Caribbeanislandcountriesareespeciallyexposedto
RemoteSens.2020,12,1182of27
awiderangeofnaturaldisasters[6]andsmallislanddevelopingstates—whicharefrequently
characterizedbycoastalcommunities,geographicisolation,andlimitedtechnicalcapacity—are
amongthemostvulnerablecountriestonaturaldisastersandclimatechange[7].
Recognizingthesetrends,thereisanincreasingneedforefficientandwell‐planneddisaster
managementanddisasterreliefoperations.Thetermdisasterriskmanagementreferstothefull
lifecycleofactionsaimingtoprevent,preparefor,respondto,andrecoverfromdisasters.Generally,
disasterriskmanagementconsistsoffourmainphases:(1)Mitigation,i.e.,activitiesthatreducethe
likelihoodandexpectedadverseimpactsofanaturaldisasterevent;(2)Preparedness,i.e.,plansor
preparationstostrengthenemergencyresponsecapabilities;(3)Response,i.e.,actionstakentosave
livesandpreventpropertydamageinanemergencysituation;and(4)Recovery,i.e.,interventions
aimedatreturningcommunitiesandinfrastructuretoaproperleveloffunctionalityfollowinga
disaster.
Timelygeospatialinformationindicatingthedistributionofvulnerablepopulationandthe
location,availabilityandfunctionalityofcriticalinfrastructureiskeyforeffectivedisasterrelief
operations.Untilrecently,governmentalagenciesandthecommercialsectorweretheprimary
sourcesforgeospatialdatafordisastermanagement.Inthepastdecade,however,thepublichasbeen
increasinglyrecognizedasavaluablesourceforgeospatialinformationfordisastermanagement[8].
Recentdevelopmentsinwebmappingtechnologieshaveledtodisastermanagementoperationsthat
aremoredynamic,transparent,anddecentralized,withanincreasedcontributionbyindividualsand
organizationsfrombothinsideandoutsidetheimpactedarea[9],includingbymeansofgeospatial
informationthatiscontributedbyvolunteers.ThetermVolunteeredGeographicInformation(VGI)
referstogeographicinformationcollectedbyindividuals,oftenonavoluntarybasis[8].Thisdatais
madeopen,freelyaccessible[10]andfillsthedeficienciesoftraditionalmappingtechnologiesand
sourcesofdata[11,12].Governmentsindevelopingcountriesincreasinglyrecognizetheeconomic
andsocialvalueofVGIsandtheirpotentialtoprovidenewwaystointeractwiththecommunityand
forstrengtheningcivilsociety[13].
1.1.OpenStreetMap(OSM)forDisasterManagement
Createdin2004,OpenStreetMap(OSM)isacollaborativeuser‐generatedmappingproject
aimingtoprovideafreelyavailablegeographicalinformationdatabaseoftheworld[14].Duringthe
firstyearoftheproject,mostmappingeffortsfocusedonroadandtransportationnetworks.Today,
however,avarietyofgeographicalfeaturesareconstantlyaddedtoOSM’sdatabase,including
buildingsandtheirfunctionality,landuseandpublictransportationinformation[15].Thisdata
allowslocalgovernmentsandcommunitiestobetterperformriskassessmentandemergency
planning[16–18]andisroutinelyutilizedforvariousdisasterriskmanagementapplications[19,20].
Asoftoday,therearemorethan5.5millionOSMusersandonemillioncontributorswhogenerate
morethan3millionchangeseveryday.Inthecontextofnaturaldisasters,thecoordinationof
volunteers’mappingeffortsisoperatedbytheHumanitarianOpenStreetMapTeam(HOT),
originallyformedaftertheHaitiearthquake,whichconductsactivitiesaimedatenrichingOSMdata
tosupportemergencyreliefoperations(https://wiki.openstreetmap.org/wiki/Stats).Often,when
disastersoccur,thereisalackofthisessentialinformation,whichresultsinmappingcampaigns,
includingMapathons[21],whicharedesignedtomaptheimpactedareas.
OSMdataiscollectedbythreemainmeans[22]:(1)usingGPSrecords,whichcanbeuploaded
tothedatabase;(2)relyingonorthophotosandhigh‐resolutionsatelliteimagerytotraceanddigitize
features;or(3)importingdatasetsfromexternalsourcessuchasadministrativecensusdata.Recently,
largecorporations,includingApple,Microsoft,andFacebookhavebeenhiringeditorstocontribute
totheOSMdatabase[23].SeveraltoolshavealsobeendevelopedtosupportOSM’smappingefforts,
oneofthemistheMapSwipeapp(https://mapswipe.org/),whichenablesvolunteerstomapandtag
geographicalfeaturesonmobilephonesbasedonsatelliteimagery[24].Otherinitiatives,suchas
MissingMaps(https://www.missingmaps.org/)allowvolunteerstracefeaturesbasedonsatellite
imagesbysplittingmappingintosmalltasks,allowingremotevolunteerstoworksimultaneouslyon
RemoteSens.2020,12,1183of27
thesameoverallarea(asof2018,therewerenearly60,000mapperscontributingtoMissingMaps)
[25].
1.2.AssessingOSMCompletenessandAccuracy
AlthoughOSMroadnetworkdataisestimatedtoexceed80%completenessinrelationtothe
world’sroadsandstreets[26],ingeneral,thecoverageandcompletenessofOSMfeatures(including
buildingfootprints)varysignificantly—notonlybetweencountries,butalsowithincountries.For
example,completenessofcoverageofremoteandruralareasisoftenlowerthanthatofhighly
populatedurbanareas[27],andthecoverageofdevelopedcountriestendstobelowerthanthatof
developingcountries[28–32].Thesedifferencesareinpartduetosocietalfactors,suchaspopulation
distributionandpopulationdensity,distancetomajorcitiesandthelocationofcontributingusers
[29,33–37].
WiththeincreasedutilizationofVGIs—includingOSM—fordisasterpreparednessand
response,variousmethodologieshavebeenproposedtoassessthequalityandtheaccuracyofthe
collecteddata[12];forexample,intermsofdatacompleteness,logicalconsistency,positional,
thematic,semanticandspatialaccuracy,temporalqualityandusability[28,38–42].Several
approacheshavebeenproposedtoassessthecompletenessoftheOSMdatabaseandthe
completenessofthestreetnetworks[32,43],thelanduseandthebuildingfootprints[44].The
completenessofthecoveragecanbeassessedbycomparingtheOSMmappedfeatureswithexternal
datasets,forexample,nationaladministrativedata[28,44–47].Suchdatavariesbycountryandisnot
alwaysmadeavailable—especiallyindevelopingcountries.
Inthisstudy,weproposeamethodologythatutilizesremotelysensedobservationstoestimate
thecoverageofOSMmappedfeatures,specificallytoidentifygapsinthecompletenessofOSM
buildingfootprints.Inthepast,expensivesatelliteimageryandlimitedcomputationalpoweronly
allowedanalysisofsmallgeographicalcontexts.Thismodelisbeingreplacedthankstothe
accessibilityofpubliclyavailableandfreesatellitedatathatcaptureeverylocationoneartheveryfew
days.Theavailabilityofdaytime(e.g.,Sentinel‐2,Landsat)andnighttime(e.g.,DMSPOperational
LinescanSystem(OLS)ortheVisibleInfraredImagingRadiometerSuite(VIIRS))satelliteimagery,
togetherwithadvancementsinthecapabilitiesofcloud‐basedcomputationalplatforms,nowallows
foranalyzingLandUseandLandCover(LULC)characteristicsofEarthacrossagreatergeographic
andtemporalscale.LandcoverreferstotheattributesoftheEarthlandsurfaceanditsimmediate
subsurface(e.g.,biota,soil,typography,surface,groundwaterandhumanstructure).Landuserefers
tothepurposeforwhichhumansexploitthelandcover[48].Becauseremotelysensedobservations
typicallycapturetheuniquereflectancecharacteristicsofphysicalobjectsonEarth,mostremote
sensingapplicationsfocusondetectionandclassificationofEarth`slandcovercharacteristics.
DifferentiationbetweendifferenttypesofLandUse(whichtypicallydonotholduniquephysical
characteristics)remainschallenging.InrespecttoOSM,mappedfeaturescanbetaggedaccordingto
both,theirlanduseandlandcover.AlthoughOSMcontributorsarefreetousetheirowntags,there
isaquasi‐officialcollectionoftagsthathasbeenestablishedandagreedupon(forexample,“landuse”
and“landcover”keysorothermorespecifickeyssuchas“building”or“highway”)[49].Previous
studiesdemonstratedthepotentialuseofthesetagstocreatedetailedLULCmaps[50].
Themethodologyweproposeinthisstudyreliesonremotelysensedmeasurementstoestimate
thecoverageofOSMbuildingfootprintsandtoidentify“mappinggaps”(i.e.,areasthathavenotyet
beenmapped).PreviousstudieshaveutilizedOSMdatafordifferentremotesensingapplications,
forexample,forclassificationofurbanareas[51]orforsemanticlabelingofaerialandsatelliteimages
[52].Despitesignificantprogressinthefieldofmachinelearningandtheincreasingavailabilityof
satelliteimagery,thereisstillascarcityofstudiesaimingtoutilizeremotelysensedobservationsto
estimatethecompletenessofOSMbuildingfootprintsatagivenpointintime.Identifyingareasthat
lackcoverageofOSMfeaturescouldhelpplanandprioritizemappingefforts,especiallyinareasthat
arevulnerabletonaturalhazardsandwherecurrentdatagapsposeanobstacletotimelyand
evidence‐baseddisasterriskmanagementactions.
RemoteSens.2020,12,1184of27
Byitsnature,theOSMdatabaseisdynamicandisupdateddailywiththousandsofnewentries.
However,asdiscussedabove,thefrequencyandextentofupdatesvarylargelybygeographical
areas.Someregionsarebeingupdatedmorefrequentlythanothers,andespeciallydeveloping
countriesarenotfullymapped,whichareoftenthemostvulnerabletotheimpactsofnatural
disasters.Theobjectiveofthisstudyistoproposeamethodologytoestimatethecompletenessof
OSMbuildingfootprintsbasedonremotelysensedmeasurementsthatareavailableataglobalscale
andareupdatedfrequently.Wedemonstrateourmethodologyinthecasestudyofthreesmallisland
states:Haiti,DominicaandSt.Lucia.
Theremainderofthisarticleisorganizedasfollows.InSection2,wediscussthemethodology,
thestudyareaandthedataweusetopredictthecoverageofOSMbuildingfootprints.InSection3,
wepresentandevaluatetheresultsinthecaseofHaitiandinSection3.3,weillustratethe
applicabilityofourapproachinthecaseofDominicaandSt.Lucia.InSection4,weofferaconcluding
discussion.
2.MaterialsandMethods
2.1.StudyAreas
Wedemonstrateourmethodologyinthecaseofthreesmallislandstates:Haiti,Dominicaand
St.Lucia(Figure1).
2.1.1.Haiti
LocatedonthewesternsideofHispaniolaIsland,Haiti(27,750km2insize,withapopulationof
approximately11.5million)isthepoorestcountryintheWesternHemisphere,withaGrossDomestic
Product(GDP)percapitaofUS$870[53].Haitiishighlyvulnerabletonaturaldisasters;morethan
96%ofitspopulationisexposedtodifferenttypesofnaturalhazards,particularlyhurricane,coastal
andriverineflood,andearthquake[53].Morethanhalfofthepopulationlivesincitiesandtowns,a
majorshiftfromthe1950swhenapproximately90%ofHaitianslivedinthecountryside[54].Almost
allofHaiti‘s30majorwatershedsexperiencesignificantfloodevents,duetointenseseasonalrainfall,
stormsurgeinthecoastalzones,deforestationanderosion,andsediment‐ladenriverchannels[55].
Furthermore,largeportionsofthecountry`spopulation(e.g.,inthecapital,Port‐au‐Prince)livein
shantytownsbuiltuponsteepandexposedhillsides[56].In2018alone,some2.8millionpeoplewere
consideredtobeinneedofhumanitarianassistancevaluedatUS$252.2million[57].
2.1.2.St.Lucia
AsmallwindwardislandstatelocatedintheCaribbeanSeaandtheNorthAtlanticOcean,St.
Lucia(616km2insize)hasapopulationofapproximately165,000[58]andaGDPpercapitaofUS$
10,315[59].St.Luciaissusceptibletonumerousnaturalhazards,includinghurricanes,landslides,
flooding,andvolcaniceruptions.Itsterrainconsistsmainlyofmountainsandsteepslopesinthe
centerofthecountryduetoitsvolcanicoriginswithlow‐lyingareasalongthecoasts[60].Asof2018,
approximately19%ofthepopulationresidesintheselow‐lyingareas[61].Inaddition,St.Lucia’s
economyishighlydependentontwosources:theexportofbananasandincomefromtourism.Both
havebeennegativelyimpactedrecentlybynaturalhazardssuchasin2016whenHurricaneMatthew
caused70%oftheislandtolosepoweranddamaged80%ofthecountry’sbananaplantations[60].
2.1.3.Dominica
Dominica(approximately74,000people[62])islocatedinLeewardIslandschainintheLesser
AntillesoftheCaribbeanSea,approximately1,200kmsoutheastofHaiti,withlargeportionsofits
populationresidinginthecapitalRoseau(population14,700)andPortsmouth(population5,200)[62].
Dominicaisvulnerabletoawiderangeofnaturalhazards,includinghurricanes,intenserainfall,
slopeinstability,volcaniceruptions,seismicactivities,andtsunamis[63].Reflectingarugged
physicaltopography,mostofthepopulationandinfrastructurearelocatedonthecoast,makingthe
RemoteSens.2020,12,1185of27
countryparticularlyvulnerabletostrongwindsandhighseas[64].InSeptember2017,aCategory5
hurricaneMariahitthecountry,causinglossesanddamagesworth226percentofGDP[65].
Figure1.Locationsandsizecomparisonsofthethreestudyareas:Haiti,Dominica,andSt.Lucia.
2.2.AnalyticalFramework
TheobjectiveofthisstudyistoidentifygapsinthecompletenessofOSMbuildingfootprintsin
threesmallislandstates(Haiti,St.LuciaandDominica)basedonremotelysensedmeasurementsand
othergeospatialfeatures.Theprocedureinvolvessevensteps.
2.2.1.Step1:ConstructanArtificialTessellation
Weconstructanartificialtessellatedgridofcellsthatspaneachofthecountries;eachcellis0.25
squarekminsize(atotalof136,747gridcellsoverHaiti,2,796gridcellsoverSt.Luciaand3,861grid
cellsoverDominica).Eachgridcellwastreatedanindependentunitofanalysis.
2.2.2.Step2:DownloadtheCurrentOSMBuildingFootprints
Wedownloadedthemostup‐to‐dateOSMdataforthethreecountries(datadownloadedinJuly
2019).ForHaiti,wedownloadedthedata(inaShapefileformat)fromGeofabrik
(https://www.geofabrik.de/data/download.html).Atthetimeoftheanalysis,Geofabrikdidnothave
dataforDominicaandSt.Lucia;thus,wedownloadedthedataforthesecountriesfromoverpass
turbo(https://overpass‐turbo.eu/)inaKMLformat(thisdatarequiresadditionalpre‐processingand
weselectedOSMfeaturesthatarelabeledas“building=Yes”).Atthetimeoftheanalysis,therewere
930,000mappedbuildingsinHaiti,38,619mappedbuildingsinDominicaand29,412mapped
buildingsinSt.Lucia.
2.2.3.Step3:CalculateTotalAreaofOSMBuildingFootprintsinaGridCell
WecalculatedthetotalareaofOSMbuildingfootprintsineachgridcell.Thisisthevaluetobe
predictedbytheexplanatoryvariables(theremotelysensedandgeospatialmeasurements).
2.2.4.Step4:PreprocessandAggregatetheRemotelySensedandGeospatialData
Wereliedonseveralpredictors(explanatoryvariables)toestimatethecoverageofOSMbuilding
footprintsinagridcellandtoidentifygapsinOSMcoverage.Wepreprocessedthedataand
aggregatedittothelevelofagridcells(Table1providesadescriptionoftheevaluatedexplanatory
variablesandtheaggregationmeasures).Thepreprocessing,analysisandaggregationofthe
remotelysenseddatawerecompletedbyusingGoogleEarthEngine(GEE).GEEisaplatformthat
leveragescloud‐computingservicestoachieveplanetary‐scaleutilityandhasbeenpreviouslyused
forawiderangeofapplications[66],includingmappingpopulation[67,68]andurbanareas[69,70].
RemoteSens.2020,12,1186of27
NighttimeLights(VIIRS):TheVisibleInfraredImagingRadiometerSuite(VIIRS)isone
ofthekeyinstrumentsonboardtheSuomiNationalPolar‐OrbitingPartnership(SuomiNPP)
spacecraft(launchedin2011).VIIRSinstrumentcollectsvisibleandinfraredimageryandglobal
observationsofland,atmosphere,cryosphereandoceans.Thisinstrumenthassignificant
improvementsoverthecapabilitiesoftheformerDMSP‐OLS[71],notablyitsavailabilityonadaily
basisandhigherspatialresolution(upto500mattheequator).TheVIIRSDNBprovidesglobal
coveragewith12‐hourrevisittime.First,werecordforeachpixelthemaximumvalueofall
overlappingpixels(inthesamelocation)inastackofsevenmonthlycomposites(Jan–July)of2019.
Then,foreachgridcell,wecalculatedaSumofLight(SOL)measure(calculatedasthesumofthe
digitalnumbervaluesofalloverlappingpixelsineachcell).
Sentinel‐2‐DerivedSpectralIndices:TheCopernicusSentinel‐2missioncomprisesa
constellationoftwopolar‐orbitingsatellitesthatcollectmultispectraldatain13spectralbands,with
fourbandsataspatialresolutionof10mand6bandsataspatialresolutionof20m.Therevisitperiod
ofSentinel‐2is5daysattheequator.Wecalculatedfourremotelysensedmeasuressensitiveto
vegetationandbuilt‐uplandcover:NormalizedDifferenceVegetationIndex(NDVI)[72],Soil
AdjustedVegetationIndex(SAVI)[73],NormalizedDifferenceBuilt‐upIndex(NDBI)[74]andUrban
Index(UI)[75].Foreachgridcell,wecalculatedaper‐indexsumvalueofallpixelsoverlappingwith
thegridcell.
Sentinel‐1SAR:Sentinel‐1missioncomprisesaconstellationoftwopolar‐orbitingsatellites,
performingC‐bandsyntheticapertureradarimaging,enablingthemtoacquireimageryindayand
nightconditionsregardlessoftheweather.Sentinel‐1hasa12‐dayrepeatcycle,withaspatial
resolutiondownto5m.Similarlyto[70],wecapturedthetextureofthesurfacebyutilizingSentinel‐
1’sC‐band(singleco‐polarizationverticaltransmitandverticalreceive(VV)acquisitionmodewith
anInterferometricWideSwath(IW)instrumentmode,a250kmswathat5mby20mspatial
resolution(singlelook)).Fromeachscene,weremovedspecklenoiseandperformedradiometric
calibrationandterraincorrection.Tocreatetheannualcomposites,wecalculatedforeachlocation
(pixel)themedianvalueofalloverlappingpixelsinanentirestackofallscenescapturedin2019.For
eachgridcell,wecalculatedtheaveragevalueofallpixelsincorporatedwithintheareaofthegrid
cell.
Slope:Tocapturethetopographyofthesurface,weusedtheGlobalSRTMmTPIdataset
(availableinGEEinaspatialresolutionof270m),wherealocalgradientiscalculatedforeachpixel
basedontheglobalSRTMDEMelevationdata(30mresolution).ThemTPIdistinguishesridgefrom
valleyformsandiscalculatedusingelevationdataforeachlocationsubtractedbythemeanelevation
withinaneighborhood[76].Foreachgridcell,wecalculatedtheaveragevalueofallpixelsinthe
gridcell.
ForestCover:Weestimatedtheextentofforestcoverin2018basedontheHansenGlobal
ForestChangev1.6(2000–2018)[77].First,wedefinedapixelas“forest”intheyear2000ifmorethan
20%ofitwascoveredin2000withforest.Werecordedpixelsthatexperiencedamajoreventofforest
coverlossbetween2000and2018andestimatethetotalareaofforestcoverin2018pergridcell.
UrbanFootprints:Wereliedontworemotelysensedderivedproductssignifyingurbanand
ruralsettlementsthatwereproducedbytheEarthObservationCenteratDLR:TheGlobalUrban
Footprint(GUF)(inaspatialresolutionof~12m)andtheWorldSettlementFootprint(WSF)(ina
spatialresolutionof~10m)[78–80].
OSMTransportationNetworkFeatures:WecalculatedthetotallengthofOSMroads
inacellandthetotalnumberofjunctionsinacellasadditionalpotentialpredictorsofOSM‐building
footprints.
Table1.Thepredictorsusedtopredictper‐cellareaofOpenStreetMap(OSM)buildingfootprints.
PredictorSourceNumber
ofscenes
Per‐cellstatistics
RemoteSens.2020,12,1187of27
NighttimelightsVIIRS7SumofLight(SOL):ThesumofDNmaxvalueof
allpixelsincell,where𝐷𝑁isthemaximum
digitalnumber(DN)valueofpixelinlocationi
over7monthlycompositesin2019.
NDVI
(NIR‐RED)/
(NIR+RED)
Sentinel‐2~42ThesumNDVIvalueofallpixelsinagridcell
SAVI
(NIR‐RED)/
(NIR+RED+L)*
(1+L)
Sentinel‐2~42ThesumSAVIvalueofallpixelsinagridcell
NDBI
(MIR‐NIR)/
(MIR+NIR)
Sentinel‐2~42ThesumNDBIvalueofallpixelsinagridcell
UI
(SWIR2‐NIR)/
(SWIR2+NIR)
Sentinel‐2~42ThesumUIvalueofallpixelsinagridcell
deforestationHansenGlobal
ForestChange
v1.6(2000‐2018)
1Totalforestcoverinagridcell(2018)
Built‐uparea GUF1Totalbuiltupareainagridcell
Built‐upareaWSF1Totalbuiltupareainagridcell
Topography
(slope)
SRTM1Averageslopepergridcell
SurfacetextureSentinel‐1~70Averagetexturepergridcell
RoadsOSM‐ Totallengthofroadsinagridcell
RoadsjunctionsOSM‐ Numberofjunctionsinagridcell
2.2.5.Step5:IdentifyMappedGridCells
WeadoptedavisualinterpretationmethodtovisuallyassessthecompletenessofOSMbuilding
footprintsinthegridcellsinHaitiandSt.Lucia.WeachievedthisbyoverlayingtheOSMbuilding
footprintdatasetwiththemostrecenthigh‐resolutionbasemapimage(providedbyESRI,updated
asof2019[81]).WeidentifiedgridcellsinHaitiandSt.Luciawhereweassessedthatatleast75%of
thebuildingsthatarevisibleinthesatelliteimagehavebeenmapped(weidentified835gridcellsin
Haitiand179gridcellsinSt.Lucia).BecausethemajorityareaofDominicahasbeenmapped,we
skippedthisstepinthecaseofthiscountry.
2.2.6.Step6:PerformCorrelationAnalysisandPrediction
Weevaluatedthecorrelationbetweentheremotelysensedandthegeospatialmeasures(the
explanatoryvariables)andtheareaofOSMbuildingfootprintinagridcellusingaPearson
CorrelationTest,andperformedanOrdinaryLeastSquares(OLS)regressiontoestimatethepotential
ofthevariables,combined,toexplaintheobservedvariationintheareaofOSMbuildingfootprints
inagridcell.Additionally,weevaluatedthepotentialoftheexplanatoryvariablestopredictthearea
ofOSMbuildingfootprintsinagridcellusingaregressionwithRandomForests.RandomForests
[82]aretree‐basedmodelsthatincludekdecisiontreesandprandomlychosenpredictorsforeach
recursion.Whenpredicting,foranexample,itsvariablesarerunthrougheachofthektrees,andthe
kpredictionsareaveragedthroughanarithmeticmean.Eachtreeistrainedusingasubsetof
examplesfromthetrainingset,drawnrandomlywithreplacement,witheachnodeʹsbinaryquestion
determinedusingarandomsubsetofpinputvariables.Weperformedtheregressionwiththe835
gridcellsthatwerevisuallyassessedasbeingrelativelyfullymapped(i.e.,morethan75%ofthe
buildingsinagridcellareassessedasmapped).Toevaluatetheaccuracyoftheprediction,we
RemoteSens.2020,12,1188of27
adoptedafivefoldcross‐validationmethod.Ineachexperiment,theexamplesinoneofthedatafolds
wereleftoutfortestingandtheexamplesintheremainingfourfoldswereusedtotrainthemodel.
Theperformancequalityofthetrainedmodelwastestedontheexamplesintheleft‐outfold,andthe
overallperformancemeasureisthenaveragedoverthefivefolds.Weassessedtheclassification
accuracywithadifferentnumberofdecisiontrees:2,4,8,16,32,64,128,256and512,withminimum
sizeofterminalnodessetto5.
2.2.7.Step7:PredicttheCoverageofOSM‐BuildingFootprintsinEachEntireCountry
Weusedeitherthegridcellsthatarevisuallyassessedasrelativelyfullymapped(inthecaseof
HaitiandSt.Lucia)orallthegridcells(inthecaseofDominica)asreferencesforthetrainingof
RandomForestRegressionandtopredicttheareaofOSMbuildingfootprintsovertheentiregrid
cellsineachcountry.WeidentifiedthegridcellsthatwerepredictedtoincorporateOSMbuilding
footprints,butwerenotyetmapped.
3.Results
Anexaminationofthe136,747cellsspanningHaitishowsthatonly25.1%ofthecellshaveat
leastonemappedbuilding,andonly512ofthe136,747cellshavemorethan10%oftheirareacovered
withbuildingfootprints(Figure2showsahistogramofthedistributionofOSMbuildingfootprints
percell).Onaverage,thereare27.5buildingsinacell(Std=83.4);1530ofthecells(i.e.,only1.1%of
thecellsspanningHaiti)incorporatemorethan100mappedbuildings.Incomparison,8.15%and
6.84%ofthecellsincorporatebuilt‐uplandcoveraccordingtoWSFandGUF,respectively.
Figure2.Thedistribution(histogram)ofOSMbuildingfootprintsarea(squaremeters)pergridcell.
Asdiscussedabove,avisualexaminationofthecompletenessofOSMbuildingfootprintsover
Haitisuggeststhatlargeportionsoftheislandremainunmapped(Figure3a).Figure3b,cshow,as
anillustration,thecoverageofOSMbuildingfootprintsinthecapitalofHaiti,Port‐au‐Princeand
Carrefour,andintheadjacentCarrefourcommune.Whilebuildingsinmanyareaswithinthesecities
havebeenmapped,largeportionsarestillnotfullymapped.Weobservethatdenselymappedzones
ofPort‐au‐Princeco‐existalongsidezonesthatremainentirelyunmapped(Figure3c),avisualpattern
thatmayresultfromtheepisodicengagementofcommunitymappingvolunteersandthedefinition
ofmapping‘tasks’onaneighborhoodscalethroughOSMeditingtools.Moreover,significantparts
innorthernHaitiarenotmapped(Figure4),including,forexample,thecitiesGonaïvesandCap‐
Haitien.
0
2000
4000
6000
8000
10000
12000
200
800
1400
2000
2600
3200
3800
4400
5000
5600
6200
6800
7400
8000
8600
9200
9800
10400
11000
11600
12200
12800
13400
14000
14600
15200
15800
16400
17000
17600
18200
18800
19400
Numberofcells
OSMareapercell(Sqm)
RemoteSens.2020,12,1189of27
Figure3.(a)OSMbuildingfootprintscoverageinHaiti,(b)inthecapitalofHaiti,Port‐au‐Prince,and
(c)intheadjacentCarrefourcommune.YellowindicatesOSMbuildingfootprints.
(a) (b)
Figure4.OSMbuildingfootprintscoveragein(a)thecityofCap‐Haitienand(b)Gonaïvesinnorthern
Haiti.
APearsoncorrelationtestindicatedasignificant(p<0.01)correlationbetweenthetotalareaof
OSMbuildingfootprintsinagridcellandseveraloftheexaminedexplanatoryvariables.As
expected,therewasapositiveandsignificantcorrelationbetweentheareaofOSMbuilding
footprintsinagridcellandthetotalareaofbuilt‐uplandcover,accordingtoWSFandGUF(r=0.73
and0.71,respectively,p<0.01)aswellaswithnighttimelights(VIIRSSOL)(r=0.63,p<0.01).Wefind
asignificant(p<0.01)correlationbetweenOSMbuildingfootprintsareainagridcellwiththefour
Sentinel‐2spectralindices,indicatedbyapositivecorrelationwithUIandNDBI(r=0.59andr=0.47)
andanegativecorrelationwithbothSAVIandNDVI(r=–0.53).
Weidentified835gridcellswhere,accordingtoavisualassessment,atleast75%ofthebuildings
thatwerevisibleinthesatelliteimagearemappedinOSM(Figure5showsexamplesofgridcells
wheremorethan75%ofthestructuresaremapped).ThecorrelationbetweentheareaofOSM
buildingfootprintsinagridcellandtheexaminedpredictorswashighercomparedtotheprevious
experiment,whereallthegridcells(i.e.,136,747gridcells)wereconsidered(forexample,r=0.78and
r=0.65withWSFandVIIRSandr=0.61andr=–0.55withUIandSAVI,respectively)(Table2),which
islikelyduetothefactthatlargeportionsofthecountryarenotmapped(i.e.,therearegridcellsthat
lackOSMcoveragewhileactuallypopulatedandexhibitLULCcharacteristicsofpopulatedareas).
Asexpected,therewerealsosimilaritiesandcorrelationsbetweensomeoftheexplanatoryvariables.
RemoteSens.2020,12,11810of27
Figure6apresentspairwisecorrelationcoefficientsbetweentheexplanatoryvariable(variablesare
orderedaccordingtoahierarchicalclustering).Theexplanatoryvariablesformseveralsimilarity
clusters:aclustercomposedoutofvegetationspectralindices(NDVI,SAVI)andforestcover(which
arepositivelycorrelatedwitheachother),andaclustercomposedoutofbuilt‐uplandcoverspectral
indices(NDBI,UI),togetherwithVIIRS,WSF,GUF,andOSMroadnetworkfeatures.Asexpected,
thereisanegativeandsignificantcorrelationbetweenthevegetationandthebuilt‐uplandcover
spectralindices.Thedendrogramshowninthefigurefurtherhighlightshierarchicalclustersformed
betweenthevariables,notably,OSMareaandVIIRS,UIandNDBI,NDVIandSAVI,androadlength
andnumberofjunctionsinagridcell.
Figure5.Examplesofgridcellsthathavemorethan75%oftheirareamappedwithOSMbuilding
footprints.
Table2.PearsoncorrelationtestbetweentheareaofOSMbuildingfootprintsinagridcellandthe
evaluatedpredictors(thiscorrelationtestincludesonlygridcellsthatwereassessedasmapped,
N=835).
VIIRSGUFWSFNDVINDBISAVI
r0.654*0.76*0.78*–0.551*0.486*–0.551*
UIForestCoverSE1Slope RoadlengthOSMjunctions
r0.614*–0.388*0.16–0.110.69*0.60*
Note:*p<0.01
(a)
RemoteSens.2020,12,11811of27
(b)
(c)
Figure6.Pairwisecorrelationcoefficientsbetweentheexplanatoryvariablesin(a)Haiti,(b)St.Lucia
(calculatedwithinvisuallyassessedgridcells)and(c)Dominica(calculatedwithinallgridcells).
Variablesareorderedaccordingtohierarchicalclustering,whichisalsorepresentedbythe
dendrogram.ThebluelineinthelegendisahistogramofthedistributionofthePearsoncorrelation
coefficients.
AnOrdinaryLeastSquares(OLS)regressionshowsthatnineofthevariablestogetherexplain
upto82%ofthevariationofOSMbuildingfootprintsareainagridcell(R2=0.82,F(12,822)=323.20,
p<0.01)(Table3).Weevaluatedthecontributionoffourtypes(groups)offeaturestothemodelfit
usingastepwiseregressionanalysis:(1)onlyGUFandWSF;(2)withtheadditionofnighttimelights
(VIIRS);(3)withtheadditionoffurtherremotelysensedmeasuresandderivedproducts;(4)withthe
additionofOSMroadnetworkfeatures.Theresultsshowanimprovementofthemodelfitwiththe
additionofeachofthepredictivevariablesgroups(Table4).WhileGUFandWSFtogetherexplain
66%ofthefit,theadditionofnighttimelightsimprovesthefitofthemodel(indicatedbyexplanation
ofupto76%ofthevariation).Theadditionoffurtherremotelysensedmeasures(i.e.,Sentinel‐2‐
derivedspectralindices,slope,textureandforestcover)improvesthemodelfitbyafurther5%(up
RemoteSens.2020,12,11812of27
to81%ofthevariation).WiththeadditionofOSMtransportationnetworkfeaturesthefitofthe
modelimprovesmarginallytoaround82%.
Table3.Modelfitforthestepwiseregressionanalysisofnineevaluatedpredictors.
StepVariableR2AdjustedR2C(p)AICRMSE
1WSF0.6140.613984.918235.213332.1
2UI0.7050.704559.918013.311666.7
3GUF0.7640.763282.917828.110435.7
4VIIRS0.7990.798120.217696.09636.3
5Roadlength0.8140.81349.517631.29264.0
6FCarea0.8200.81923.417605.89118.9
7NDBI0.8220.82117.017599.59078.8
8Numberofjunctions0.8230.82213.117595.69052.3
9Medianslope0.8240.82211.917594.39040.2
Table4.Fourregressionmodeloutputsshowinganimprovementofmodelfitwiththeinclusionof
fourgroupsofvariables:(1)onlyGUFandWSF;(2)additionofnighttimelights(VIIRS);(3)addition
ofremotelysensedmeasuresandderivedproducts;(4)additionofOSMroadnetworkfeatures.
Step(1)(2)(3)(4)
GUF0.115***0.124***0.127***0.138***
(0.010)(0.009)(0.008)(0.008)
WSF0.141***0.081***0.050***0.034***
(0.010)(0.009)(0.009)(0.009)
VIIRS2,214.671***1,276.821**1,073.838***
(124.738)(127.805)(126.619)
NDBI ‐37.152***‐17.790
(11.258)(11.409)
NDVI 72,452.840**64,046.550**
(30,894.100)(29,957.530)
SAVI ‐48,312.220**‐42,708.300**
(20,600.640)(19,976.140)
UI 46.857***25.415**
(9.751)(10.191)
Forestcover 0.060***0.051***
(0.012)(0.011)
Slope 179.453274.377
(192.545)(187.222)
Sentinel‐1 ‐696.229‐295.342
(453.517)(442.270)
Roadlength 1.470***
(0.543)
No.ofjunctions 60.057**
(29.311)
Constant0.716‐699.98830,909.080***17,403.480***
(672.122)(573.990)(3,897.596)(4,351.436)
RemoteSens.2020,12,11813of27
Observations835835835835
R20.6630.7560.8130.825
AdjustedR20.6620.7550.8110.823
ResidualStd.Error12,460.3910,615.9409,329.4209,029.806
FStatistic818.245***
856.591***
357.937***
323.203***
Note:
*p<0.1;**p<0.05;***p<0.01
3.2.PredictionofOSMBuildingFootprintCoverage
TheresultsaboveindicatethattheareaofOSMbuildingfootprintsinagridcellcanbeexplained
byseveraloftheremotelysensedandgeospatialexplanatoryvariables.Toevaluatethepotentialof
thesevariablestopredicttheareaofOSMbuildingfootprintsinacell,weperformedaregression
withRandomForests.
RandomForestregressionpredictsupto89%ofthevariationofOSMbuildingfootprintsina
gridcell.Performanceimproveswiththeadditionofdecisiontreesupto64trees,forexample,from
81%to89%ofthepredictedarea(with2and64decisiontrees,respectively,Figure7).Figure8
presentsacomparisonbetweentheactualandthepredictedareaofOSMbuildingfootprintsinagrid
cell(regressionwith64decisiontrees)(Figure8a).Thetwomostimportantvariablestothemodel
areWSFandGUF,followedbyOSMroadnetworkfeaturesandSentinel‐2derivedspectralindices
(indicatedbyvariableimportancesensitivity(lncNodePurity),Figure8b).
Figure7.TheimprovementofR2withtheincreasefrom2to64decisiontreesintheRandomForest
model.
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
2 4 8 16 32 64 128 256 512
R2
NumberofTrees
RemoteSens.2020,12,11814of27
(a) (b)
Figure8.Comparisonbetweenthe(a)actualandpredictedareaofOSMbuildingcoverageineach
gridcell(using64decisiontrees)and(b)variableimportancesensitivity.
WeusetheRandomForestmodeltopredicttheareaofOSMbuildingfootprintsoverallthegrid
cellsinHaiti(i.e.,wetrainthemodelwiththe835gridcellsthatwereassessedasrelativelyfully
mappedandpredictforthecoverageovertheentiredataset).Figure10bshowsthepredictedareaof
OSMbuildingfootprintspergridcelloverHaiti.TheresultshighlightlargeportionsinHaitithat
havenotyetbeenmapped(e.g.,forexample,thenortherncitiesGonaivesandCap‐Haitien)aswell
aspatchesofunmappedgridcellsaroundmajorcities(e.g.,Port‐au‐Prince).Thisanalysisallowsus
toidentifyareas(gridcells)thatarepredictedtoincorporatelargeareasofbuildingfootprintsbut
arenotmapped(Figure9).
AvisualexaminationshowsthatthepredictedcoverageofOSMbuildingfootprints(Figure10b)
correspondsmorecloselywiththedistributionofbuilt‐uplandcover(accordingtoGUF,for
example)andnighttimelights(VIIRS)(Figure10c,d,respectively)thancomparedtothecurrent
distributionofOSMbuildingfootprints(Figure10a).
Tofurtherevaluatetheaccuracyofthemodel,weperformanOLSregressionanalysisusingthe
entireHaitidataset(i.e.,136,747gridcells).Wefindthattheremotelysensedmeasurementsexplain
upto89%ofthevariationofthepredictedareaofOSMbuildingfootprintsinallthegridcells
spanningHaiti(R2=0.89,F(12,136,734)=90,690,p<0.01).Incomparison,theseindicatorsexplain
only48%ofthevariationofthecurrentareaofOSMbuildingfootprintsintheHaitidataset(R2=0.48,
F(12,136,734)=10,330,p<0.01).
Finally,inordertoidentifygridcellsthatarepredictedtoincorporatebuildingfootprintsbut
areactuallynotmapped,wecalculatedtheratiobetweentheactualandthepredictedareaofOSM
buildingfootprintinagridcell(calculatedasthepredictedareaofOSMbuildingfootprintsinagrid
celldividedbytheactualareaofOSMbuildingfootprintsinagridcell)(Figure11a,c).Thisanalysis
allowsustoidentifygridcellswheretheratiobetweenthepredictedandtheactualareaofOSM
buildingfootprintinagridcellislow(Figure11b,d),highlightinggridcellsthatrequiremapping.
RemoteSens.2020,12,11815of27
(a)
(b)
Figure9.Acomparisonbetween(a)existingOSMbuildingfootprintsand(b)predictedareaofOSM
buildingfootprintsinagridcell(Carrefour,Haiti).Areasshownindarkpurplearepredictedto
includealargeareaofbuildingfootprints.
RemoteSens.2020,12,11816of27
Figure10.(a)TheareaofOSMbuildingfootprintsinagridcellcomparedto(b)theareapredicted
byRandomForestRegression.Thepatternsobservedinthepredictedmapalignwiththeobserved
distributionofthebuilt‐uplandcoveraccordingto(c)GUFandto(d)theintensityofnighttimelights
accordingtoVIIRS.
Figure11.(a,c)TheratiobetweentheactualandthepredictedareaofOSMbuildingfootprintina
gridcell(calculatedasthepredictedareaofOSMbuildingfootprintsinagridcelldividedbythe
actualareaofOSMbuildingfootprintsinagridcell);Gridcellswiththelowestratio(lowerthan20).
(b,d)GridcellsshowninpurplearegridcellswheretheactualareaofOSMbuildingfootprintsis
muchlowerthanthepredictedarea.
RemoteSens.2020,12,11817of27
3.3.EvaluationoftheMethodintheCaseofDominicaandSt.Lucia
Theresultsabovesuggestthepotentialofseveralremotelysensedindicatorstopredictthe
coverageofOSMbuildingfootprints,atleastinthecaseofHaiti.Inordertoassessthevalidityofthe
method,weperformedfurtheranalysisinthecaseoftwoadditionalsmallislandstates:Dominica
andSt.Lucia.AvisualexaminationsuggeststhatthecoverageofOSMbuildingfootprintsin
Dominicaisrelativelycomplete,whilelargeportionsofSt.Luciaremainunmapped.Areaslacking
OSMbuildingfootprintsincludepartsofthecapital,Castries,andthesecond‐largesttown,Vieux
Fort(Figure12).Thesetwoareasaccountforapproximately49%ofthepopulationofSt.Lucia(64,654
and16,624people,respectively[83]).
(a)VieuxFort(b)Castries
Figure12.ExamplesofareasmissingOSMbuildingfootprintsinSt.Lucia((a)VieuxFortand(b)the
Castries).
SimilartothemethodologydescribedinthecaseofHaiti,wecreatedafishnetofgridcells,0.25
km2insize,spanningthetwoislands.InthecaseofDominica,wefoundahighpositiveand
significantcorrelationbetweentheareaofOSMbuildingfootprintsinagridcellandseveral
explanatoryvariables.WefoundahighpositivecorrelationwithGUF,numberofjunctionsandWSF
(betweenr=0.90andr=0.91,p<0.01forboth).ThecorrelationbetweentheareaofOSMbuilding
footprintsandVIIRSisabitlower(r=0.75,p<0.01).WithSentinel‐2‐derivedspectralindices,this
correlationrangesbetweenr=0.35andr=0.38(withUIandNDBI,respectively,p<0.01forboth).An
OLSregressionanalysisrevealsthattogether,thesevariablesexplain92%ofthevariationofOSM
buildingfootprintareainagridcell(R2=0.92,F(12,3846)=3848,p<0.01).RandomForestregression
(with64decisiontrees)resultsinsimilartrends,indicatedbyahighaccuracyrateof88%(regression
accuracyassessedusingfivefoldcross‐validation).
InthecaseofSt.Lucia,whentheanalysiswasdonewiththeentiredatasetofgridcellsoverthe
country(i.e.,N=2781),thecorrelationbetweentheareaofOSMbuildingfootprints,GUFandWSF
rangesbetween0.70and0.75(p<0.01)andthereisalowercorrelationbetweenOSMbuilding
footprintsareaandVIIRS(r=0.58,p<0.01).Together,thepredictorsexplainonly66%ofthevariation
isOSMbuildingfootprints(R2=0.66,F(12,2783)=464.6,p<0.01).Werelatethelowerfitofthemodel
tothefactthatlargeportionsofthecountryhavenotbeenmapped.Thus,wevisuallyassessedthe
completenessofOSMbuildingfootprintsinSt.Luciagridcellsandidentified179gridcellsinwhich
wecouldassessthatmorethan75%oftheirareaismapped.Withthesevisuallyassessedgridcells,
thefitofthemodelimproved,andtogether,thepredictorsexplain92%ofthevariationofOSM
buildingfootprintarea(R2=92%,F(12,166)=166.4,p<0.01).RandomForestregression(with64
decisiontrees)resultsinasimilaraccuracyrate(R2=92%)(Table5).
Finally,weuseRandomForesttopredicttheareaofOSMbuildingfootprintsineachofthe
countries.Figure13presentsthepredictedareaofOSMbuildingfootprintinSt.Lucia.Unmapped
areasinclude,forexample,areassurroundingthecapitalofCastries.Thecentralpartofthecityis
RemoteSens.2020,12,11818of27
denselymappedwhileitsadjacentneighborhoodslackcoverage.Largeswathsofthesurrounding
areasofCharlotte,Vigie,andBiseecompletelylackOSMbuildingfootprints.
Table5.PearsonCorrelationTest,OLSRegression,andRandomForestRegressionforDominicaand
St.Lucia.
DominicaSt.Lucia
Fulldataset
(N=3861)
Full
(N=2781)
Visuallyassessedcells*
(N=179)
(I)PearsonCorrelationTest
GUFr=0.91* r=0.75* r=0.89*
NumofJuncr=0.90*r=0.76*r=0.88
WSFr=0.90*r=0.70*r=0.78
RoadLengthr=0.81* r=0.68*r=0.84*
VIIRSr=0.75*r=0.58*r=0.72*
NDBIr=0.38* r=0.30r=0.3
UIr=0.35* r=0.26r=0.35
Sentinel1r=0.03*r=0.00r=0.20
NDVIr=–0.20* r=–0.19 r=–0.29
SAVIr=–0.20*r=–0.19r=–0.29
Sloper=–0.16r=–0.14*r=0.10*
ForestCoverr=–0.07r=–0.35r=–0.43
(II)OLS
R2=92%
F(12,3846)=3848,p=0.00
R2=66%
F(12,2783)=464.6,p=0.00
R2=92%
F(12,166)=166.4,p=0.00
(III)RandomForest
R2=88%R2=94%
Note:*p<0.01
RemoteSens.2020,12,11819of27
Figure13.(a)ExistingOSMbuildingfootprintsversus(b)predictedareaofOSMbuildingfootprints
inagridcell.Areasshownindarkpurplearepredictedtoincludealargeareaofbuildingfootprints.
4.Discussion
Inrecentdecades,naturaldisastershavebeenresponsibleforanestimated0.1%ofglobaldeaths,
killingonaverage60,000peopleperyear[84].Inthelasttwodecadesalone,developingcountries
haveaccountedformorethanhalfofallreportedcasualties[85].Naturaldisastersoftencause
significantdamagetocommunities,infrastructureandtheenvironment,andrequireimmediate
interventionandimplementationofappropriatemeasuresaimingtosavelives.
Accurateandeasilyaccessiblegeospatialinformationiskeyforaneffectivedisasterrisk
managementcycleandforinformeddecision‐makingduringhumanitarianresponse[86].The
increasingavailabilityofgeospatialinformationisrevolutionizingdisasterresearchandemergency
management.Untilrecently,muchofthisessentialgeospatialinformationwasproprietary,scarce
andinmanycases,unavailableduringsignificantdisasters.
Inthelastdecade,OSMhasrepeatedlybeenshowntobehighlysuitablefordisastermapping
andmanagement.DespitethecontinuouseffortstoimprovecompletenessoftheOSMdatabase,large
portionsoftheworldremainunmapped,especiallyincountriesthatarepronetonaturalhazards,
reflectinglimitedinternetspeedconnectivity,limitedavailabilityofGPSdevices,lackof
technologicallyskilledvolunteersandlimitedawarenessofVGItechnologies[18].
SeveralmethodshavebeenproposedtoassessthecompletenessoftheOSMdatabase,including
evaluationofthecompletenessofstreetnetworks,landuseandbuildings;manyofthemrelyon
externaldatasetsforaccuracyandcompletenessestimation.Theincreasedavailabilityoffreeand
open‐sourceremotelysenseddatacanbeutilizedtoidentifymappinggapsinOSMdatasets,find
locationsthatrequiremapping,andhelpprioritizeandplanmappingcampaignsandefforts.
RemoteSens.2020,12,11820of27
AlthoughseveralapplicationshaverecentlybeenproposedtopredictthecoverageofOSMbymeans
ofremotelysensedderivedproducts(suchasWorldPop)[87],tothebestofourknowledge,nostudy
hasyetevaluatedthepotentialuseofremotelysensedmeasurementstopredictthecompletenessof
OSMfeatures.
Inthisstudy,wedemonstrateamethodologytoidentifyareaswherebuildingfootprintshave
notyetbeenmappedinOSMdataset.Themethodologyreliesonremotelysensedmeasurementsand
derivedproductsandgeospatialinformationrelatedtotheroadnetworktopredictthecompleteness
ofOSMbuildingfootprintsinthreesmallislandstates(Haiti,DominicaandSt.Lucia).
InthecaseofHaiti,theresultsshowthatlargeportionsofthecountryarestillunmapped,
despitethecontinuedmappingeffortstomaintainafullandup‐to‐datemapwhilealsokeepingpace
withthechangingsocio‐physicalcharacteristicsofthecountryandtoaidresponseandrecoveryin
futuredisasters[88].Wefindthatinthecaseofthethreecountries,thecoverageofOSMbuilding
footprintsissignificantlycorrelatedwithseveralremotelysensedmeasuresandindicators.As
expected,thecoverageispositivelycorrelatedwiththedistributionofbuilt‐uplandcover(indicated
byaPearsoncorrelationcoefficientofbetweenr=0.78andr=0.91inthecaseofHaitiandDominica,
respectively).Tosomeextent,thisisnotsurprising,andpreviousstudieshavealreadyshownthe
potentialofremotelysensedderivedproductstopredictthecoverageofOSMbuildingfootprints
[87].However,alimitingfactorinutilizationofremotelysensedderivedproductsisthattheir
availabilityvariesinspaceandtimeandtheseproductsarenotalwaysupdatedonaregularbasis.
Inthisstudy,wedemonstratethepotentialuseoffreeandopen‐sourceremotelysensedindicators
toestimatethecoverageofOSMbuildingfootprints.Weshowthattheintensityofnighttimelights
luminosity(measuredbyVIIRS)ishighlycorrelatedwiththeareaofOSMbuildingfootprints
(indicatedbyaPearsoncorrelationcoefficientofbetweenr=0.65andr=0.75inthecaseofHaitiand
Dominica,respectively).Thisfindingalignswithpreviousstudiesshowingthattheintensityof
nighttimelightsiscloselycorrelatedwithanthropogenicactivitiesandwithchangesinthe
distributionofbuilt‐uplandcover[89,90],aswellaswiththedistributionofgeographicalfeatures,
suchasthedensityoftheroadnetwork[91].Further,weshowthatremotelysensedspectralindices
signifyingthedistributionofvegetation(NDVI,SAVI)andbuilt‐uplandcover(NDBI,UI)arealso
correlatedwiththedistributionofOSMbuildingfootprints.Whilepreviousstudiesproposetoutilize
vegetationspectralindices(e.g.,NDVI)toestimatethecompletenessofurbangreenspacesmapping
inOSM[92],weshowthattheseindicesarealsosignificantlycorrelatedwiththecoverageofOSM
buildingfootprints.
Becausedifferentsensorsrecorddistinctcharacteristicsoftheland(e.g.,brightness,temperature,
height,density,texture),datafusiontechniquesthatexploitthebestcharacteristicsofeachtypeof
sensorhavebecomeavaluableprocedureinremote‐sensinganalysis,includingforurban
applications[70,93]andinmappingthebuilt‐uplandcover[69].Inthisstudy,weshowthatthe
combinedeffectoftheexplanatoryvariablesexplainsbetween92%and94%ofthevariationinthe
areaofOSMbuildingfootprints,exceedingtheeffectofeachpredictorbyitself.Weshowthatinthe
caseofHaiti,theadditionofnighttimelightstobuilt‐uplandcoverclassificationproducts(GUF,
WSF)improvesthefitofthemodelby10%,andthatwiththeadditionofremotelysensedmeasures,
thefitofthemodelimprovesby5%.Althoughlandcovercharacteristicsareoftenrelatedtonighttime
lightluminosity(forexample,vegetationdensitydecreaseswhileluminosityincreasesfromtherural
areatotheurbancore),fusingnighttimeanddaytimeremotelysensedmeasuresallowsforan
increaseintheseparabilitybetweenurbanandnonurbanland[94].Fusingdaytimeandnighttime
measurementsenablefeaturecomplementationandcompensationforthelimitationofsingledata
sourcesinextractingurbaninformation[95].
WealsoperformaRandomForestRegressiontopredicttheareaofOSMbuildingfootprintsin
acellandfindthattheregressionexplainsbetween88%and94%ofthevariation(inDominicaand
Haiti,respectively).PreviousstudiessuggestthatthenumberofdecisiontreesoftheRandomForest
isgenerallyproportionaltothemodel’saccuracy[96],althoughtheyshowmixedresultsforthe
optimalnumberoftreesinthedecisiontree.Thenumbervariesbetween10[97]and150trees[98].
RemoteSens.2020,12,11821of27
Here,wefindthatwithRandomForests,accuracyimprovesupto64decisiontreesandthen
moderatelydecreasesasthenumberoftreesincreases.
Finally,wedemonstratethepotentialofourapproachtoidentifyingmappinggaps,orareasthat
lackOSMcoverage.Wedothisbytrainingthemodelwiththegridcellsthatarevisuallyassessedas
relativelycompletelymappedandusethetrainedmodeltopredictthecoverageofOSMbuilding
footprintsthroughoutthecountries.WecapturegridcellsthatlackOSMbuildingfootprintscoverage
(i.e.,gridcellswherethepredictedcoverageofOSMbuildingfootprintslargelyexceedstheactual
coverage).Thesegridcellsrepresentlocationswheremappingeffortscouldpotentiallybetargeted.
Tosummarize,VGIcontributions,especiallyindevelopingcountries,tendtobemadeinspurts,
forexampleinresponsetoaspecifictrigger,suchasanaturaldisasterorhumanitariancrisis,rather
thanasaregular,continuousprocess[13].BecauseOSMdatasetsrelyonvolunteers,thecompleteness
andmappingeffortsvaryinspaceandtime.Thereisaneedforasystematictoolthatwouldguide
andprioritizemappingeffortsandmappingcampaigns.Asmoreandmoreremotelysenseddata
becomeavailabletotheresearchcommunity,ourstudyextendstheexistingliteratureby
demonstratinghowtheycouldbeleveragedtoconductnovel,andcritical,large‐scaleassessmentsof
thecompletenessoftheOSMdatabase,especiallyinareasathighrisktonaturaldisasters.
Wenoteafewlimitationstothisstudy.First,wedemonstrateourmethodologyinthecasestudy
ofthreesmallislandstates,which,atleasttosomeextent,arecharacterizedbyrelativelysimilar
geographicalconditionsandcharacteristics.Anextensionofthisstudywouldevaluateour
methodologyinadditionalcountriescharacterizedbydiversegeographicalconditions(topography,
landcover,landuse,etc.).Second,inordertocreatethetrainingexamplesforthemodelandto
evaluatetherelationbetweentheexplanatoryvariablesandtheareaofOSMbuildingfootprints,we
createdarelativelysmalldatasetofexamples(gridcells),whichwerevisuallyassessedusinga
subjectivevisualinterpretationmethodasrelativelyfullymapped.Byitsnature,visualinterpretation
maybesubjecttoidiosyncraticvariationacrossindividualsperformingthemanualclassification.An
extensiontothisstudywouldleveragethecrowdtocreateanextensivedatasetofgridcellsvisually
assessedandmappedandwouldaccountforanagreementbetweentheinterpreters.Third,the
analysispresentedinthisstudywasdoneatasinglepointintime.Byitsnature,theOSMdatabase
iscontinuouslyupdatedandisdynamic,whilesimultaneously,newremotelysensedmeasurements
arebeingcollected.Anextensiontothisstudywouldaccountforthesechangesintimeandevaluate
thecompletenessofOSMbuildingfootprintsonanongoingbasisasnewdatabecomesavailable.
ExtensionstoourapproachmayimprovetheidentificationofareasthatlackcompletedOSM
coveragebyaccountingforadditionalinputs;forexample,socioeconomicvariables(including
WorldPop,Facebook’sHighResolutionSettlementLayer(HRSL)andadditional
physical/geographicalcharacteristicsandspectralindices).Furtherextensionstoourapproachmay
alsoincludetheapplicationoflearningalgorithmsandevaluationwithvarioustuningparametersof
theclassifiersandthefitmodels.
5.Conclusions
Globally,therehasbeenanincreaseinthefrequencyandimpactsofmajornaturaldisaster
events;inthenextcentury,itislikelythatclimatechangewillamplifythenumberandseverityof
suchdisasters.Whileaccurateandtimelygeospatialinformationisvitalforthefullcycleofdisaster
riskmanagement,thisdataisnotalwaysavailableforthedisastermanagementcommunitywhen
disastersoccurs.AlthoughVGIplatforms,specificallyOpenStreetMap(OSM),showgreatpotential
tosupporthumanitarianmappingtasks,gapsinVGIdataremainsamajorconcern[99].Thereisan
increasingneedforafullyautomatictoolthatwouldallowtoidentifyareasthatlackacomplete
mappingofOSMfeatures—especiallyinareaspronetohazardevents.Whilepreviousstudieshave
utilizedOSMdataasreferenceforclassificationofbuilt‐uplandcoverwithsatelliteimagery[100–
103]hereweshowthepotentialuseofpubliclyavailable,remotelysenseddataaspredictorsofthe
spatialcoverageofOSMbuildingfootprints.Thetoolandmethodologywepresentherearetime‐
efficientandscalable.
RemoteSens.2020,12,11822of27
AnextensiontoourapproachmayimprovetheaccuracyofthepredictionofOSMbuilding
footprintsareabyaddingadditionalremotelysensedmeasures.Incorporatingadditionaldatasets,
suchasnewlydevelopedVIIRSnighttimelightproducts,