ArticlePDF Available

Assessing the Influences of Band Selection and Pretrained Weights on Semantic-Segmentation-Based Refugee Dwelling Extraction from Satellite Imagery

Authors:

Abstract and Figures

This research assessed the influences of four band combinations and three types of pretrained weights on the performance of semantic segmentation in extracting refugee dwelling footprints of the Kule refugee camp in Ethiopia during a dry season and a wet season from very high spatial resolution imagery. We chose a classical network, U-Net with VGG16 as a backbone, for all segmentation experiments. The selected band combinations include 1) RGBN (Red, Green, Blue, and Near Infrared), 2) RGB, 3) RGN, and 4) RNB. The three types of pretrained weights are 1) randomly initialized weights, 2) pretrained weights from ImageNet, and 3) weights pretrained on data from the Bria refugee camp in the Central African Republic). The results turn out that three-band combinations outperform RGBN bands across all types of weights and seasons. Replacing the B or G band with the N band can improve the performance in extracting dwellings during the wet season but cannot bring improvement to the dry season in general. Pretrained weights from ImageNet achieve the best performance. Weights pretrained on data from the Bria refugee camp produced the lowest IoU and Recall values.
Content may be subject to copyright.
Assessing the Influences of Band Selection and Pretrained Weights
on Semantic-Segmentation-Based Refugee Dwelling Extraction
from Satellite Imagery
Yunya Gao1, Getachew Workineh Gella1, and Nianhua Liu2
1 Christian Doppler Laboratory for geospatial and EO-based humanitarian technologies (GEOHUM), Department of
Geoinformatics – Z_GIS, Paris Lodron University of Salzburg, Salzburg, Austria
2 Department of Geoinformatics – Z_GIS, Paris Lodron University of Salzburg, Salzburg, Austria
Correspondence: Yunya Gao (yunya.gao@plus.ac.at)
Abstract. This research assessed the influences of four
band combinations and three types of pretrained weights
on the performance of semantic segmentation in
extracting refugee dwelling footprints of the Kule refugee
camp in Ethiopia during a dry season and a wet season
from very high spatial resolution imagery. We chose a
classical network, U-Net with VGG16 as a backbone, for
all segmentation experiments. The selected band
combinations include 1) RGBN (Red, Green, Blue, and
Near Infrared), 2) RGB, 3) RGN, and 4) RNB. The three
types of pretrained weights are 1) randomly initialized
weights, 2) pretrained weights from ImageNet, and 3)
weights pretrained on data from the Bria refugee camp in
the Central African Republic). The results turn out that
three-band combinations outperform RGBN bands across
all types of weights and seasons. Replacing the B or G
band with the N band can improve the performance in
extracting dwellings during the wet season but cannot
bring improvement to the dry season in general.
Pretrained weights from ImageNet achieve the best
performance. Weights pretrained on data from the Bria
refugee camp produced the lowest IoU and Recall values.
Keywords. Remote sensing, refugee dwellings, semantic
segmentation, band selection, pretrained weights.
1 Introduction
1.1 Background
Sustainable Development Goals (SDGs) 2, 3, 6, and 7
emphasize the significance of distributing adequate living
resources and health care services to refugees and their
host countries based on the commitment “Leave No One
Behind” (UNHCR, 2020). Population estimation of
refugees in need is essential for logistics planning of the
above resources during humanitarian operations (Çelik et
al., 2012). However, it is usually difficult to collect such
information in the field during conflicts. High-quality and
updated footprints of refugee dwellings from satellite
imagery could be beneficial for refugee population
estimation (Checchi et al., 2013; Spröhnle et al., 2014),
and thus, help achieve the related SDGs.
1.2 Related work
Deep learning approaches, especially Convolutional
Neural Networks (CNN), have attracted researchers’
attention for remote-sensing-based refugee-dwelling
extraction in the last five years. Ghorbanzadeh et al. (2018)
designed a shallow CNN model to extract refugee
dwellings in the Minawao refugee camp. They trained the
model from scratch based on four spectral bands (RGBN)
of WorldView imagery. The results prove CNN has a high
potential in this extraction task from Very High Spatial
Resolution (VHSR) satellite imagery. Ghorbanzadeh et al.
(2021) further combined the designed CNN with Object-
Based Image Analysis (OBIA), which reveals the potential
of combining CNN and expert knowledge for this task.
Quinn et al. (2018) applied a Mask-RCNN model
pretrained on the ImageNet dataset (Jia Deng et al., 2009)
to extract dwellings in thirteen refugee camps. The model
was trained with RGB bands of Google Earth imagery. Lu
& Kwan (2020) compared the performance of two shallow
CNN models, a deep fully CNN (FCN) model based on
VGG16, and a Mask-RCNN model with ResNet-50 as a
backbone in extracting refugee dwellings near Syria-
Jordan border. Both the FCN model and Mask-RCNN
AGILE: GIScience Series, 3, 36, 2022. https://doi.org/10.5194/agile-giss-3-36-2022
Proceedings of the 25th AGILE Conference on Geographic Information Science, 2022.
Editors: E. Parseliunas, A. Mansourian, P. Partsinevelos, and J. Suziedelyte-Visockiene.
This contribution underwent peer review based on a full paper submission.
© Author(s) 2022. This work is distributed under the Creative Commons Attribution 4.0 License.
1 of 6
model were fine-tuned based on pretrained weights from
ImageNet. The results turn out that the FCN model
outperforms the other four models. Wickert et al. (2021)
chose a Faster-RCNN model pretrained on the COCO
dataset (Lin et al., 2014) to count dwelling numbers in nine
refugee camps based on RGB bands of Google Earth
imagery. Tiede et al. (2021) selected an untrained Mask-
RCNN model to extract built-up structures in Sudan based
on RGBN bands of Pléiades-1A satellite imagery. Gella et
al. (2022) applied a Mask-RCNN model pretrained on the
COCO dataset based on RGB bands of WorldView data.
Based on findings from Lu & Kwan (2021), this research
chose a semantic segmentation model for all experiments.
Semantic segmentation algorithms can assign a label to
each pixel in an image and produce a fine-grained
delineation of target objects with embedded spatial
information (Borba et al., 2021). We have test multiple
semantic segmentation models during the preliminary
stage. Eventually, we chose U-Net with VGG16 as a
backbone for all segmentation experiments due to its
effectiveness and efficiency. Besides, U-Net is one of the
most popular architectures for detecting built-up structures
from satellite imagery (Ansari et al., 2020; Jung et al.,
2021; Li et al., 2019).
Most semantic segmentation models are adapted from
deep CNN models pretrained on large image classification
datasets such as ImageNet which consists of more than one
million labelled images (Kemker et al., 2018). Using
pretrained weights (or parameters) from large datasets is
essential because most deep CNN models have millions of
parameters. For example, VGG16 has around 138 million
parameters (Simonyan & Zisserman, 2015). Limited
annotated label data in remote sensing domains are usually
incapable of computing proper settings for randomly
initialized weights (Kemker et al., 2018). Therefore,
choosing proper pretrained weights can play an important
role in this extraction task. However, this topic has not
been discussed.
Furthermore, for multispectral satellite imagery with more
than three RGB bands, band selection is significant before
feeding data to CNN models (Kemker et al., 2018). Dixit
et al. (2021) compared the performance of a semantic
segmentation model (Dilated-ResUnet) under three
datasets, 1) RGB bands, 2) NRGB bands, and 3) NRG
bands of Sentinel-2 imagery. They found the dataset
merged by NRG bands outperforms the other two datasets.
This finding inspires us to assess the influences of the band
selection for refugee dwelling extraction tasks from VHSR
satellite imagery.
1.3 Research problem
To the best of our knowledge, it is still unknown that band
combination performs best for refugee dwelling
extraction. Besides, it is unknown whether seasonal
changes can influence the performance of various band
combinations. This research aims to fill this gap by testing
the performance of four band combinations (RGB, RGN,
RNB and RGBN) in extracting dwellings in the Kule
refugee camp in Ethiopia under a dry season and a wet
season. Additionally, we tested the influences of pretrained
weights by comparing randomly initialized weights
(RIW), pretrained weights from ImageNet, and weights
trained on data of the Bria refugee camp in the Central
African Republic (CAR). The outcomes of this research
may shed light on the selection of bands and weights for
similar tasks in the future.
2 Methodology
2.1 Data preparation Data and Software Availability
The Kule refugee camp, located in the Gambella region,
Ethiopia, was opened in 2014 in response to the major
refugee influx from South Sudan and was fully occupied
in 2016 (UNHCR, 2020a). Bria refugee camp is located
in eastern CAR. The brutal attacks caused by religious
conflicts displaced over 40000 people in 2017 (Médecins
Sans Frontières, 2018). Fig. 1 presents examples of
dwellings in the two camps. We can observe that the
appearances of dwellings and background are different
across two camps and two seasons.
We chose satellite imagery from the Pléiades-1 sensor
with a resolution pandsharpened to 0.5m in GeoTIFF
format for both camps. Considering the area of refugee
dwellings in the two camps mainly ranges from 8m2 to
50m2, the original resolution (2m) makes models
incapable of detecting small dwellings. The Kule imagery
of the dry season and the wet season was retrieved on 24
March 2017 and 22 June 2018 respectively. We use binary
classes that are “built-up structures” and “background” in
Figure 1.
Examples of dwellings in the Kule refugee camp
during the dry season (a) and the wet season (b), and in the Bria
refugee camp (c).
AGILE: GIScience Series, 3, 36, 2022 | https://doi.org/10.5194/agile-giss-3-36-2022
2 of 6
label data. The label data were produced by OBIA and
post-processed by manual correction (Lang et al., 2020).
The testing label data were manually annotated and
checked by two experts in ArcGIS 10.7 software. The
polygon label data were converted to GeoTIFF format
with the same resolution. We eventually created 8286
training patches and 921 validation patches in a shape of
(128, 128) pixels (Gella et al., 2021) with an overlap of 32
pixels, 612 testing patches without any overlap. The data
of the Bria camp were processed in the same way above.
6568 patches were produced to create initial weights for
Kule cases.
2.2 Architecture and model set-up
U-Net was firstly developed for biomedical image
segmentation (Ronneberger et al., 2015), which follows
an encoder-decoder structure. The encoder path is
designed to capture features of input images. The decoder
path is the symmetric expansion of the encoder path,
which could help enable precise localization. It requires
no dense or fully connected layers, and thus, can render
the learning process in an end-to-end fashion. VGG16
architecture wined ILSVRC in 2014 (Simonyan &
Zisserman, 2015). We implemented the model based on
the Segmentation Model Python library (Yakubovskiy,
2019). The brief structure of the model could be found in
Fig. 2. Besides, we selected balanced cross-entropy loss
as a loss function due to the high imbalance between the
two classes (Zhou et al., 2017). The percent of pixels of
built-up structures is only round 2%.
For other hyperparameters, the batch size is 32. Adam
optimizer was chosen due to its fast speed in convergence
(Bock et al., 2018). The model was trained by 200 epochs
with 4x10-4 as an initial learning rate and 2x10-6 as a decay
rate. We used NVIDIA RTX3090 GPU to train and test
models in TensorFlow 2.7 environment.
Figure 2. The structure of U-Net with VGG16 as a backbone.
2.3 Accuracy metrics
We evaluate the results with Precision, Recall, and
Intersection over Union (IoU) of built-up structures (Van
Beers et al., 2019). The calculation of the metrics could
be found in Eq. (1) - (3) where TP, FP, and FN refer to the
number of the True Positive, the False Positive, and the
False Negative pixels for the semantic class.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = TP
TP + FP (1)
𝑅𝑒𝑐𝑎𝑙𝑙 = TP
TP + FN (2)
𝐼𝑜𝑈 = 𝑇𝑃
𝑇𝑃 + FP + FN (3)
3 Results and Discussion
We present the Precision, Recall, IoU values of all
experiments in Table 1. The highest and lowest IoU values
were highlighted with red and black bold text separately
for each season. Bria” refers to weights from models
trained on data of the Bria camp.
Table
1
.
The Precision, Recall, IoU values of all implemented experiments.
AGILE: GIScience Series, 3, 36, 2022 | https://doi.org/10.5194/agile-giss-3-36-2022
3 of 6
Firstly, we could observe ImageNet performs the best
followed by RIW and then Bria” in general. ImageNet
models achieve the highest IoU values, and the most
balanced Precision and Recall values for all three-band
combinations. Whereas “Bria” models produce the lowest
IoU values in all combinations except RGB bands.
Additionally, they produce the highest Precision and the
lowest Recall values in all combinations. These results
demonstrate that “Bria” models missed the most TP pixels
even though they extracted fewer FP pixels than other
pretrained weights. However, the results of all experiments
expose the imbalance issue in this extraction task. The
issue is probably caused by the high imbalance of two
semantic classes as mentioned before. It is a critical issue
in deep learning domains (Johnson & Khoshgoftaar,
2019). More techniques should be applied to solve the
issue to achieve better performance. Besides, it is worth
noting that RIW performs better than “Bria”, which
indicates fine-tuning pretrained weights from other
refugee camps can be harmful. The explanation is beyond
the scope of this research but is worthy of attention for
future research, especially in domain adaptation.
Secondly, we can find three-band combinations
outperform four-band combinations in general. It shows
feeding four-band data directly to a semantic segmentation
model is not recommended under the given conditions of
this research. Besides, using the N band to replace the G or
B band can improve by around 0.02 IoU values compared
to conventional RGB bands for the wet season but cannot
influence the performance for the dry season. N band is
significant in identifying vegetation (Huang et al., 2021)
and probably makes it more important for the wet season
when the surrounding environment is covered by more
vegetation. Therefore, RGN or RNB bands are highly
recommended to replace RGB bands when extracting
refugee dwellings in areas covered by a lot of vegetation.
This finding is consistent with the outcomes of (Dixit et
al., 2021) which prove NRG bands outperform RGB and
NRGB bands in terms of F1-score of the class building
based on Sentinel-2 imagery. These findings indicate the
enhancement of input images based on band combinations
can influence on the performance of semantic
segmentation models.
Fig. 3 presents predicted labels of a subset of testing data
for every band selection, every type of pretrained weights
during the dry season and the wet season. Overall, we can
observe that many FP and FN pixels occur around the
boundary of built-up structures. This type of errors is hard
to be avoided. It has been found that the label data
annotated by different experts can have slightly differences
in the boundary of target objects. Additionally, all models
are incapable of detecting built-up structures occluded by
trees (seen the example during the dry season in Fig. 3).
Figure 3. Predicted labels of a subset of testing data for every
band combination, every type of pretrained weights during the
dry season and the wet season. Blue: False Negative pixels; Red:
False Positive pixels.
4 Conclusions
This research compared the performance of four band
combinations (RGBN, RGB, RGN, RNB) and three types
of pretrained weights (RIW, “Bria”, “ImageNet”) in
extracting refugee dwellings in the Kule refugee camp
during the dry season and the wet season. The results
illustrate that ImageNet outperforms RIW and Bria” in
terms of IoU and Recall values. On the contrary, “Bria”
weights produce the lowest IoU and Recall values.
Overall, three-band combinations achieve better results
than four-band combinations. Using the N band to replace
B or G band is recommended for extraction tasks during
the wet season. This finding may be caused by the
AGILE: GIScience Series, 3, 36, 2022 | https://doi.org/10.5194/agile-giss-3-36-2022
4 of 6
significance of N band in identifying vegetation, which
probably makes it more important for the wet season when
the surrounding environment is covered by more
vegetation.
Data and Software Availability
The VHSR satellite imagery and label data are not
available restricted by licenses and sensitivity of refugees.
Acknowledgement
This work was supported by Austrian Federal Ministry for
Digital and Economic Affairs, the National Foundation for
Research, Technology and Development, the Christian
Doppler Research Association (CDG), and Médecins Sans
Frontières (MSF) Austria.
References
Ansari, R. A., Malhotra, R., & Buddhiraju, K. M. (2020).
Identifying informal settlements using contourlet
assisted deep learning. Sensors (Switzerland),
20(9), 1–15. https://doi.org/10.3390/s20092733
Bock, S., Goppold, J., & Weiß, M. (2018). An
improvement of the convergence proof of the
ADAM-Optimizer. 1–5.
Borba, P., de Carvalho Diniz, F., da Silva, N. C., & de
Souza Bias, E. (2021). Building Footprint
Extraction Using Deep Learning Semantic
Segmentation Techniques: Experiments and
Results. 2021 IEEE International Geoscience and
Remote Sensing Symposium IGARSS, 4708–4711.
Çelik, M., Ergun, Ö., Johnson, B., Keskinocak, P., Lorca,
Á., Pekgün, P., & Swann, J. (2012). Humanitarian
logistics. In New directions in informatics,
optimization, logistics, and production (pp. 18–49).
INFORMS.
Checchi, F., Stewart, B. T., Palmer, J. J., & Grundy, C.
(2013). Validity and feasibility of a satellite
imagery-based method for rapid estimation of
displaced populations. International Journal of
Health Geographics, 12.
https://doi.org/10.1186/1476-072X-12-4
Dixit, M., Chaurasia, K., & Kumar Mishra, V. (2021).
Dilated-ResUnet: A novel deep learning
architecture for building extraction from medium
resolution multi-spectral satellite imagery. Expert
Systems with Applications, 184(June), 115530.
https://doi.org/10.1016/j.eswa.2021.115530
Gella, G. W., Wendt, L., Lang, S., & Braun, A. (2021).
Testing Transferability of Deep- Learning-Based
Dwelling Extraction in Refugee Camps
Methodology 2 . 1 The test sites. GI_Forum, 9(1),
220–227. https://doi.org/10.1553/giscience2021
Gella, G. W., Wendt, L., Lang, S., Tiede, D., Hofer, B.,
Gao, Y., & Braun, A. (2022). Mapping of Dwellings
in IDP/Refugee Settlements from Very High-
Resolution Satellite Imagery Using a Mask Region-
Based Convolutional Neural Network. Remote
Sensing, 14(3). https://doi.org/10.3390/rs14030689
Ghorbanzadeh, O., Tiede, D., Dabiri, Z., Sudmanns, M.,
& Lang, S. (2018). Dwelling extraction in refugee
camps using CNN - First experiences and lessons
learnt. International Archives of the
Photogrammetry, Remote Sensing and Spatial
Information Sciences - ISPRS Archives, 42(1), 161–
166. https://doi.org/10.5194/isprs-archives-XLII-1-
161-2018
Ghorbanzadeh, O., Tiede, D., Wendt, L., Sudmanns, M.,
& Lang, S. (2021). Transferable instance
segmentation of dwellings in a refugee camp -
integrating CNN and OBIA. European Journal of
Remote Sensing, 54(sup1), 127–140.
https://doi.org/10.1080/22797254.2020.1759456
Huang, S., Tang, L., Hupy, J. P., Wang, Y., & Shao, G.
(2021). A commentary review on the use of
normalized difference vegetation index (NDVI) in
the era of popular remote sensing. Journal of
Forestry Research, 32(1), 1–6.
https://doi.org/10.1007/s11676-020-01155-1
Jia Deng, Wei Dong, Socher, R., Li-Jia Li, Kai Li, & Li
Fei-Fei. (2009). ImageNet: A large-scale
hierarchical image database. 248–255.
https://doi.org/10.1109/cvprw.2009.5206848
Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on
deep learning with class imbalance. Journal of Big
Data, 6(1). https://doi.org/10.1186/s40537-019-
0192-5
Jung, H., Choi, H. S., & Kang, M. (2021). Boundary
Enhancement Semantic Segmentation for Building
Extraction From Remote Sensed Image. IEEE
Transactions on Geoscience and Remote Sensing,
1–12. https://doi.org/10.1109/TGRS.2021.3108781
Kemker, R., Salvaggio, C., & Kanan, C. (2018).
Algorithms for semantic segmentation of
multispectral remote sensing imagery using deep
learning. ISPRS Journal of Photogrammetry and
Remote Sensing, 145(June 2017), 60–77.
https://doi.org/10.1016/j.isprsjprs.2018.04.014
Lang, S., Füreder, P., Riedler, B., Wendt, L., Braun, A.,
Tiede, D., Schoepfer, E., Zeil, P., Spröhnle, K., &
Kulessa, K. (2020). Earth observation tools and
services to increase the effectiveness of
humanitarian assistance. European Journal of
Remote Sensing, 53(sup2), 67–85.
Li, W., He, C., Fang, J., Zheng, J., Fu, H., & Yu, L.
AGILE: GIScience Series, 3, 36, 2022 | https://doi.org/10.5194/agile-giss-3-36-2022
5 of 6
(2019). Semantic segmentation-based building
footprint extraction using very high-resolution
satellite images and multi-source GIS data. Remote
Sensing, 11(4). https://doi.org/10.3390/rs11040403
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Dollár, P., & Zitnick, C. L. (2014).
Microsoft coco: Common objects in context.
European Conference on Computer Vision, 740–
755.
Lu, Y., & Kwan, C. (2020). Deep Learning for Effective
Refugee Tent. IEEE GEOSCIENCE AND
REMOTE SENSING LETTERS, 18(8), 16–20.
Médecins Sans Frontières. (2018). Renewed violence
threatens people and healthcare in Bria.
https://www.msf.org/central-african-republic-
renewed-violence-threatens-people-and-
healthcare-bria
Quinn, J. A., Nyhan, M. M., Navarro, C., Coluccia, D.,
Bromley, L., & Luengo-Oroz, M. (2018).
Humanitarian applications of machine learning with
remote-sensing data: Review and case study in
refugee settlement mapping. Philosophical
Transactions of the Royal Society A: Mathematical,
Physical and Engineering Sciences, 376(2128).
https://doi.org/10.1098/rsta.2017.0363
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net:
Convolutional networks for biomedical image
segmentation. International Conference on Medical
Image Computing and Computer-Assisted
Intervention, 234–241.
Simonyan, K., & Zisserman, A. (2015). Very deep
convolutional networks for large-scale image
recognition. 3rd International Conference on
Learning Representations, ICLR 2015 - Conference
Track Proceedings, 1–14.
Spröhnle, K., Tiede, D., Schoepfer, E., Füreder, P.,
Svanberg, A., & Rost, T. (2014). Earth observation-
based dwelling detection approaches in a highly
complex refugee camp environment - A
comparative study. Remote Sensing, 6(10), 9277–
9297. https://doi.org/10.3390/rs6109277
Tiede, D., Schwendemann, G., Alobaidi, A., Wendt, L., &
Lang, S. (2021). Mask R-CNN- based building
extraction from VHR satellite data in operational
humanitarian action: An example related to Covid-
- 19 response in. Transactions in GIS, 1–15.
https://doi.org/10.1111/tgis.12766
UNHCR. (2020a). Kule refugee camp (Issue May).
UNHCR. (2020b). The Sustainable Development Goals
and the Global Compact on Refugees.
https://www.unhcr.org/5efcb5004.pdf
Van Beers, F., Lindström, A., Okafor, E., & Wiering, M.
A. (2019). Deep neural networks with intersection
over union loss for binary image segmentation.
ICPRAM 2019 - Proceedings of the 8th
International Conference on Pattern Recognition
Applications and Methods, Icpram, 438–445.
https://doi.org/10.5220/0007347504380445
Wickert, L., Bogen, M., & Richter, M. (2021). Lessons
Learned on Conducting Dwelling Detection on
VHR Satellite Imagery for the Management of
Humanitarian Operations. Sensors & Transducers,
249(2), 45–53.
Yakubovskiy, P. (2019). Segmentation Models. GitHub
Repository.
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W.,
& Liang, J. (2017). EAST: An efficient and accurate
scene text detector. Proceedings - 30th IEEE
Conference on Computer Vision and Pattern
Recognition, CVPR 2017, 2017-January, 2642–
2651. https://doi.org/10.1109/CVPR.2017.283
AGILE: GIScience Series, 3, 36, 2022 | https://doi.org/10.5194/agile-giss-3-36-2022
6 of 6
... The radar measurements show lower results of F1_Score is 0.795 and Io_U score is 0.659. In [30,31], it was experimented that many input channels have not provided the best outcomes, along with use of dedicated indices can yield poorer results than a complete application of all accessible channels. (0.893 and 0.805) shows the best combination of features and type of encoder type with highest metric. ...
Article
Floods are unexpected natural disasters that can have a major impact on human life, soil along bank erosion, damage vital infrastructure, road closures, economy standard, and society of various affected regions. An initial step of proper assessment is necessary for flood damage along with accurate measurements to easily restore essential damage of infrastructure, relief, and mitigation as quickly as possible. Nowadays, the rapid development of remote sensing images using deep learning as a most positive tool for accurately estimating the extent of overall flood detection surfaces. The monitoring of flood detections from remote sensing images still extends a few issues due to mostly varying from different weather changes conditions, cloud coverage areas that can have a limit to use of level of visible remote sensing satellite collected data. Moreover, Remote Sensing Satellite based observations may not always be mapped to the distribution's flood point peak, also it is very essential for both the flood extent and flood volume estimation. To overcome this challenge, we have presented a new remote sensing technology that integrates with a high resolution multi-spectral satellite data/information by using an advanced Deep Learning to accurately analyze remote sensing based observations. In our experiment, we use the European Space Agency (ESA) launched Sentinel type-1, Sentinel type-2 data and Digital Elevation Model (DEM) to accurately measure flood monitoring results. In our study, we reviewed a real example of the flood situation that happened in 2019 in Kolhapur. In our results, we evaluated a flood volume estimation at 0.0010 km3 in Kolhapur district. Finally, the proposed methodology provides an effective way to accurately motoring floods using low-cost satellite data and deep learning approaches. This project has the potential to improve the more accurate flood detection and mapping which can prevent an exactly timely response and immediate recovery efforts for flood surrounding affected areas.
... The reason why a model trained on a combination of all image channels and indices is not the best option lies in the fact that, when training neural network models, an excessively large feature space can lead to a more complex optimization problem and, consequently, to model divergence and poorer results. For instance, in studies [45,46], it was demonstrated that a greater number of input channels does not always lead to better outcomes, and the use of specialized indices can yield results worse than the application of all available channels. ...
Article
Full-text available
Floods are natural events that can have a significant impacts on the economy and society of affected regions. To mitigate their effects, it is crucial to conduct a rapid and accurate assessment of the damage and take measures to restore critical infrastructure as quickly as possible. Remote sensing monitoring using artificial intelligence is a promising tool for estimating the extent of flooded areas. However, monitoring flood events still presents some challenges due to varying weather conditions and cloud cover that can limit the use of visible satellite data. Additionally, satellite observations may not always correspond to the flood peak, and it is essential to estimate both the extent and volume of the flood. To address these challenges, we propose a methodology that combines multispectral and radar data and utilizes a deep neural network pipeline to analyze the available remote sensing observations for different dates. This approach allows us to estimate the depth of the flood and calculate its volume. Our study uses Sentinel-1, Sentinel-2 data, and Digital Elevation Model (DEM) measurements to provide accurate and reliable flood monitoring results. To validate the developed approach, we consider a flood event occurred in 2021 in Ushmun. As a result, we succeeded to evaluate the volume of that flood event at 0.0087 km3. Overall, our proposed methodology offers a simple yet effective approach to monitoring flood events using satellite data and deep neural networks. It has the potential to improve the accuracy and speed of flood damage assessments, which can aid in the timely response and recovery efforts in affected regions.
... Their experimental results indicated that FCN's semantic segmentation model was superior to CNNs, SAMs, and R-CNN masks in terms of overall accuracy by 4.49%, 3.54%, and 0.88%, respectively. Gao et al. [24] used the U-Net with the VGG16 as a backbone for their evaluations on the effectiveness of extracting refugee dwellings from the VHR imagery. Three types of pretrained weights were used in combination with the four bands. ...
Article
Full-text available
The improvement in computer vision, sensor quality, and remote sensing data availability make satellite imagery increasingly useful for studying human settlements. Several challenges remain to be overcome for some types of settlements, particularly for internally displaced populations (IDPs) and refugee camps. Refugee-dwelling footprints and detailed information derived from satellite imagery are critical for a variety of applications, including humanitarian aid during disasters or conflicts. Nevertheless, extracting dwellings remains difficult due to their differing sizes, shapes, and location variations. In this study, we use U-Net and residual U-Net to deal with dwelling classification in a refugee camp in northern Cameroon, Africa. Specifically, two semantic segmentation networks are adapted and applied. A limited number of randomly divided sample patches is used to train and test the networks based on a single image of the WorldView-3 satellite. Our accuracy assessment was conducted using four different dwelling categories for classification purposes, using metrics such as Precision, Recall, F1, and Kappa coefficient. As a result, F1 ranges from 81% to over 99% and approximately 88.1% to 99.5% based on the U-Net and the residual U-Net, respectively.
Article
Full-text available
For immediate humanitarian response, the presence of up-to-date information plays a key role by providing relevant information on the status and degree of severity of the object or phenomena of interest. When it comes to emergency responses, sometimes, there are situations where onsite ground observations are inefficient or impractical for various reasons. Very high-resolution satellite imagery from space unshackles this limitation. Advances in computer vision, are providing new opportunities for automatic information retrieval from imagery that this study has investigated the potential of instance segmentation model Mask Region-based convolutional neural network (Mask R-CNN) for dwelling detection in IDP/refugee camps. Once the model detection capability is tested, its temporal transferability is also assessed to extract dwelling features from newly obtained unseen images. Given the scarcity of training samples, the study has also investigated the relevance of transfer learning through domain adaptation from sample rich openly available datasets.
Article
Full-text available
Within the constraints of operational work supporting humanitarian organizations in their response to the Covid‐19 pandemic, we conducted building extraction for Khartoum, Sudan. We extracted approximately 1.2 million dwellings and buildings, using a Mask R‐CNN deep learning approach from a Pléiades very high‐resolution satellite image with 0.5 m pixel resolution. Starting from an untrained network, we digitized a few hundred samples and iteratively increased the number of samples by validating initial classification results and adding them to the sample collection. We were able to strike a balance between the need for timely information and the accuracy of the result by combining the output from three different models, each aiming at distinctive types of buildings, in a post‐processing workflow. We obtained a recall of 0.78, precision of 0.77 and F1 score of 0.78, and were able to deliver first results in only 10 days after the initial request. The procedure shows the great potential of convolutional neural network frameworks in combination with GIS routines for dwelling extraction even in an operational setting.
Article
Full-text available
The Normalized Difference Vegetation Index (NDVI), one of the earliest remote sensing analytical products used to simplify the complexities of multi-spectral imagery, is now the most popular index used for vegetation assessment. This popularity and widespread use relate to how an NDVI can be calculated with any multispectral sensor with a visible and a near-IR band. Increasingly low costs and weights of multispectral sensors mean they can be mounted on satellite, aerial, and increasingly—Unmanned Aerial Systems (UAS). While studies have found that the NDVI is effective for expressing vegetation status and quantified vegetation attributes, its widespread use and popularity, especially in UAS applications, carry inherent risks of misuse with end users who received little to no remote sensing education. This article summarizes the progress of NDVI acquisition, highlights the areas of NDVI application, and addresses the critical problems and considerations in using NDVI. Detailed discussion mainly covers three aspects: atmospheric effect, saturation phenomenon, and sensor factors. The use of NDVI can be highly effective as long as its limitations and capabilities are understood. This consideration is particularly important to the UAS user community.
Article
Full-text available
As the global urban population grows due to the influx of migrants from rural areas, many cities in developing countries face the emergence and proliferation of unplanned and informal settlements. However, even though the rise of unplanned development influences planning and management of residential land-use, reliable and detailed information about these areas is often scarce. While formal settlements in urban areas are easily mapped due to their distinct features, this does not hold true for informal settlements because of their microstructure, instability, and variability of shape and texture. Therefore, detecting and mapping these areas remains a challenging task. This research will contribute to the development of tools to identify such informal built-up areas by using an integrated approach of multiscale deep learning. The authors propose a composite architecture for semantic segmentation using the U-net architecture aided by information obtained from a multiscale contourlet transform. This work also analyzes the effects of wavelet and contourlet decompositions in the U-net architecture. The performance was evaluated in terms of precision, recall, F-score, mean intersection over union, and overall accuracy. It was found that the proposed method has better class-discriminating power as compared to existing methods and has an overall classification accuracy of 94.9–95.7%.
Article
Full-text available
The availability and usage of optical very high spatial resolution (VHR) satellite images for efficient support of refugee/IDP (internally displaced people) camp planning and humanitarian aid are growing. In this research, an integrated approach was used for dwelling classification from VHR satellite images, which applied the preliminary results of a convolutional neural network (CNN) model as input data for an object-based image analysis (OBIA) knowledge-based semantic classification method. Unlike standard pixel-based classification methods that usually are applied for the CNN model, our integrated approach aggregates CNN results on separately delineated objects as the basic units of a rule-based classification, to include additional prior-knowledge and spatial concepts in the final instance segmentation. An object-based accuracy assessment methodology was used to assess the accuracy of the classified dwelling categories on a single object-level. Our findings reveal accuracies of more than 90% for each applied parameter of precision, recall and F1-score. We conclude that integrating the CNN models with the OBIA capabilities can be considered an efficient approach for dwelling extraction and classification, integrating not only sample derived knowledge but also prior-knowledge about refugee/IDP camp situations, like dwellings size constraints and additional context. ARTICLE HISTORY
Article
Full-text available
Humanitarian action has rapidly adopted Earth observation (EO) and geospatial technologies shaping them according to their needs. Protracted crises and large-scale population displacements require up-to-date information in many facets of humanitarian action support, from mission planning, resource deployment and monitoring, to nutrition and vaccination campaigns, camp plotting, damage assessment, etc. Even though nearly all assets of remote sensing apply in such demanding scenarios, it remains a challenge to fully implement and sustain a trustful and reliable information service. This paper discusses achievements and open issues in the use and uptake of EO technology, from a technical and organisational point of view, motivated by an information service for Médecins Sans Frontières (MSF) and its extension to other NGO’s information needs in the humanitarian sector. With a focus on EO-based population estimation based on (semi-)automated dwelling counting from very high-resolution optical satellite imagery as well as the exploitation of data integration (including radar sensors), the paper also covers potential service elements with respect to environmental and ground- or surface water monitoring. It investigates workflow elements in relation to information extraction and delivery by illustrating a broad range of application scenarios, and discusses first operational solutions of a customized service portfolio.
Article
Full-text available
The purpose of this study is to examine existing deep learning techniques for addressing class imbalanced data. Effective classification with imbalanced data is an important area of research, as high class imbalance is naturally inherent in many real-world applications, e.g., fraud detection and cancer detection. Moreover, highly imbalanced data poses added difficulty, as most learners will exhibit bias towards the majority class, and in extreme cases, may ignore the minority class altogether. Class imbalance has been studied thoroughly over the last two decades using traditional machine learning models, i.e. non-deep learning. Despite recent advances in deep learning, along with its increasing popularity, very little empirical work in the area of deep learning with class imbalance exists. Having achieved record-breaking performance results in several complex domains, investigating the use of deep neural networks for problems containing high levels of class imbalance is of great interest. Available studies regarding class imbalance and deep learning are surveyed in order to better understand the efficacy of deep learning when applied to class imbalanced data. This survey discusses the implementation details and experimental results for each study, and offers additional insight into their strengths and weaknesses. Several areas of focus include: data complexity, architectures tested, performance interpretation, ease of use, big data application, and generalization to other domains. We have found that research in this area is very limited, that most existing work focuses on computer vision tasks with convolutional neural networks, and that the effects of big data are rarely considered. Several traditional methods for class imbalance, e.g. data sampling and cost-sensitive learning, prove to be applicable in deep learning, while more advanced methods that exploit neural network feature learning abilities show promising results. The survey concludes with a discussion that highlights various gaps in deep learning from class imbalanced data for the purpose of guiding future research.
Article
Image processing via convolutional neural network (CNN) has been developed rapidly for remote sensing technology. Moreover, techniques for accurately extracting building footprints from remote sensed images have attracted considerable interest owing to their wide variety of common applications, including monitoring natural disasters and urban development. Extraction of building footprints can be performed easily by semantic segmentation using U-Net-like CNN architectures. However, obtaining precise boundaries of segmentation masks remains challenging due to various impediments surrounding target objects. In this study, we propose a method to elaborate edges of buildings detected in remote sensed images to enhance the boundaries of segmentation masks. The proposed method adopts holistically nested edge detection (HED) , which extracts edge features at an encoder of a given architecture. In the proposed boundary enhancement (BE) module , an extracted edge and segmentation mask are combined, sharing mutual information. To enable the proposed method efficiently to adapt to a wide variety of conditions, we design a distinctive approach adopting a HED unit and BE module, which is applicable to various semantic segmentation networks containing encoder-decoder structures. Experiments were conducted on five different datasets (DeepGlobe, Urban3D, WHU [high-resolution (HR), low-resolution (LR)], and Massachusetts). The results demonstrate that our proposed approaches improved on the performance of prior methods for extracting building footprints. Comparative experiments were conducted on various backbone architectures including U-Net, ResUNet++, TernausNet, and U-shape spatial pyramid pooling (USPP) to ensure the effectiveness of the proposed method. Based on various evaluation metrics and qualitative analysis, our results show that the proposed method achieved improved performance compared with prior methods for all datasets and backbone networks.
Article
In today's world, satellite images are being utilized for the identification of built-up area, urban planning, disaster management, insurance & tax assessment in an area, and many other social-economic activities. The extraction of the accurate building footprints in densely populated urban areas from medium resolution satellite images is still a challenging task which requires the development of the new methods to solve such problem. In this paper, a novel Dilated-ResUnet deep learning architecture for building extraction from Sentinel-2 satellite images has been proposed. The proposed model has been tested on three novel building datasets that are prepared for three densely populated cities of India (viz. Delhi, Hyderabad and Bengaluru) using Sentinel-2 satellite images and Planet OSM. First FCC (false colour composite) dataset prepared by merging NIR, Red, Green bands, second FCC dataset prepared by merging NIR, Red, Green and Blue bands and third is TCC (true colour composite) dataset by merging red, green and blue bands. The proposed architecture is applied to both the FCC datasets and TCC dataset separately; it has been identified that the proposed model has obtained better building extraction results using FCC (NIR, Red, Green) dataset. The input satellite image enhancement and extensive experimentations to identify the optimal deep learning hyper-parameters using FCC spatial dataset have also been carried out to further improve the performance of the proposed model. The results of the experimentations reveal that the proposed model has out-performed the state of the art models available in literature by achieving the F1-score of 0.4718 and Mean IoU of 0.582 for building extraction from Sentinel-2 satellite images. The outcome of the research work can be utilized for urban planning and management, generate more ground truths for Sentinel-2 satellite images which further can be useful for other societal applications.