Conference PaperPDF Available

Visual and textual analysis of social media and satellite images for flood detection @ multimedia satellite task MediaEval 2017


Abstract and Figures

This paper presents the algorithms that CERTH team deployed in order to tackle disaster recognition tasks and more specifically Disaster Image Retrieval from Social Media (DIRSM) and Flood-Detection in Satellite images (FDSI). Visual and textual analysis, as well as late fusion of their similarity scores, were deployed in social media images, while color analysis in the RGB and near-infrared channel of satellite images was performed in order to discriminate flooded from non-flooded images. Deep Convolutional Neural Network (DCNN), DBpedia Spotlight and combMAX was implemented to tackle DIRSM, while Mahalanobis Distance-based classification and morphological post-processing were applied to deal with FDSI.
Content may be subject to copyright.
Visual and textual analysis of social media and satellite images
for flood detection @ multimedia satellite task MediaEval 2017
Konstantinos Avgerinakis1, Anastasia Moumtzidou1, Stelios Andreadis1,
Emmanouil Michail1, Ilias Gialampoukidis1, Stefanos Vrochidis1, Ioannis Kompatsiaris1
1Centre for Research & Technology Hellas - Information Technologies Institute, Greece,,,,,
This paper presents the algorithms that CERTH team deployed
in order to tackle disaster recognition tasks and more specically
Disaster Image Retrieval from Social Media (DIRSM) and Flood-
Detection in Satellite images (FDSI). Visual and textual analysis,
as well as late fusion of their similarity scores, were deployed in
social media images, while color analysis in the RGB and near-
infrared channel of satellite images was performed in order to
discriminate ooded from non-ooded images. Deep Convolutional
Neural Network (DCNN), DBpedia Spotlight and combMAX was
implemented to tackle DIRSM, while Mahalanobis Distance-based
classication and morphological post-processing were applied to
deal with FDSI.
Security, surveillance and more specically disaster prediction and
classication from social media and satellite sources have raised a
lot of interest in the computer science the last decade. The unob-
trusive and abundant nature of these data rendered them as one of
the most valuable sources to extract and deduct early warning or
identication of an ongoing or eminent disaster.
Multimedia satellite task is a challenge of MediaEval that com-
prises of two tasks: (a) Disaster Image Retrieval from Social Me-
dia (DIRSM) and (b) Flood-Detection in Satellite Images (FDSI).
DIRSM provides a great amount of social media images (YFCC100M-
Dataset) and their metadata (Flickr), while FDSI is comprised of a
large amount of 4 colour-channel, 3 for the RGB spectrum and 1 for
the near-infrared, satellite images from PlanetDB [
]. Both tasks
ask from the participants to leverage any available technology so
as to determine whether a ood event occurs in the provided test
data. As far as visual data are concerned, a ood event is considered
when an image shows an "unexpected high water level in indus-
trial, residential, commercial and agricultural areas". The reader is
suggested to read [
] for further information about the contest and
the provided data.
In this work, CERTH presents its algorithms for DIRSM and FDSI
subtasks. For ood recognition in images, CERTH uses the output of
the last pooling layer of a trained GoogleNet [
] for global keyframe
representation and trains an SVM classier to recognize images that
are related to a ooding event. Textual information is also retrieved
by leveraging the metadata of the social media images by using
DBpedia Spotlight annotation tool [
]. Both of these modalities are
Copyright held by the owner/author(s).
MediaEval’17, 13-15 September 2017, Dublin, Ireland
Figure 1: Block diagram of our multimodal retrieval system
fused with a novel multimodal approach which combines non-linear
graph-based fusion [
] with combMax scoring. For FDSI subtask
CERTH performs a Mahalanobis distance classication and several
morphological and adaptive lters, so as to separate ood from
non-ood areas inside satellite image scene.
2.1 Flood detection from social media (DIRSM)
Social media were crawled in this task so as to acquire images and
text about ood scenarios. For that purposes, two modalities were
deployed and fused with a non-linear graph-based fusion approach.
The rst modality concerned visual analysis and more specif-
ically ood detection inside image samples by adopting a Deep
Convolutional Neural Network (DCNN) framework. GoogleNet [
was trained on 5055 ImageNet concepts, and the output of the last
pooling layer with dimension 1024 was used as a global keyframe
representation. The provided development set was then splitted
into two subsets and used to train an SVM classier and dene its
optimal parameters: t(denes the kernel type) and g(gamma in
kernel function). The best results were achieved for
mial function) and
5. The test environment that CERTH built,
included the evaluation of the precomputed features provided from
the Multimedia-Satellite challenge (i.e. acc, gabor, fcth, jcd, cedd,
eh, sc, cl, and tamura) and DCNN features that were produced from
GooдLeN et
network by fusing the features from
the convolutional layers 3
and 3
. SVM classiers were trained for
all of these features and results showed that the proposed DCNN
feature outperformed most of them signicantly.
The second modality concerns the detection of ood-related text
in social media metadata. For that purposes, DBpedia Spotlight [
MediaEval’17, 13-15 September 2017, Dublin, Ireland K. Avgerinakis et al.
was adapted so as to detect ood, water and related keyphrases that
were acquired from the training set metadata (i.e. title, description,
user tags). A disambiguation algorithm followed up to compare the
aforementioned phrases with the collection, using Jaccard similari-
ties. The similarity scores of the two modalities were also combined
with the use of a late fusion approach that uses non-linear graph
based techniques (random walk, diusion-based) in a weighted
non-linear way [
]. The top-
multimodal objects are ltered with
respect to textual concepts, leading to
similarity matrixes
and query-based
1similarity vectors
. More specically,
10 positive examples were selected from the training set as queries
so as to acquire 10 ranked lists and by using combMAX late fusion
to get the nal list of relevant-to-the-ood multimodal objects. The
overall block diagram of this approach is depicted in Fig. 1.
2.2 Flood detection from satellite images (FDSI)
Satellite images were collected from PlanetDB [
] so that we can
evaluate our localization algorithm in real case scenarios. Local-
ization is based on a Mahalanobis classication framework and
post-processing morphological operations.
Mahalanobis distances with stratied covariance estimates were
computed to train our classier by randomly selecting 10000 sam-
ples (RGB and infrared pixels) from each 7sets of satellite images,
leading into a nal population of 70000 samples. Linear, diagonal
linear, quadratic and diagonal quadratic discriminant functions
were also computed, but Mahalanobis distances achieved the high-
est classication results. For every image of the testing set all pixels
of the image were extracted, creating a four dimensional (R,G,B,NI)
testing set consisting of 102400 samples (320 pixels
320 pixels)
per image. The nal outcome was a binary mask that denoted 1 for
ooded pixels and 0 for non-ooded ones.
Post-processing was then deployed on the acquired binary masks,
in order to eliminate erroneous areas that resulted from the noisy
nature of the dataset. A global lter was initially deployed on the bi-
nary mask so as to eliminate population of ood-denoted pixels that
as a whole did not surpass the 5% of the image size. Similarly, a local
lter followed up so as to eliminate the connected components of
ood-denoted areas that did not surpass the size of 10 pixels. Image
dilation and erosion was nally applied around each pixel and its
surrounding area (circular cell with radius of 4 pixels) to eliminate
small areas that were falsely denoted as ood, but simultaneously
preserve the larger ones.
Social media results for ood situations (DIRSM) are gathered in Ta-
ble 1. Two retrieval approaches were used; (a) single cuto scheme
that returns the top-480 most similar samples and (b) multiple cuto
scheme that combines results from 4dierent thresholds equal to
by averaging their scores so as to conclude into a
nal list.
It is obvious that multiple cutos worked better than a single.
Furthermore, we can observe that visual modality surpassed the
textual by far and this is mainly attributed to the fact that some
keywords related to ood and water might be found under several
irrelevant contexts, leading text retrieval to very low accuracy
rates. Fusion is also aected by the low performance of the textual
Table 1: CERTH results in DIRSM task
Modalities single cuto several cutos
Visual 78.82% 92.27%
Textual 36.15% 39.90%
Fusion 68.57% 83.37%
Table 2: CERTH results in FDSI task
loc01 loc02 loc03 loc04 loc05 loc06 loc07
81.71% 68.33% 82.08% 47.01% 45.84% 64.92% 56.27%
modality and cannot leverage or complement the visual information
in the nal deduction, leading to lower accuracy rates than visual
Results from Satellite images (FDSI) are presented in Table 2.
The accuracy rates are quite diverse amongst them as we acquired
very high rates in some locations such as loc01 and loc03, while
other ones such as loc04 and loc05 were too low. From our point
of view, this is attributed to the colour nature of the data in these
areas, as in the former the separation of water was clear, while in
the latter non-ood areas had similar colour with the ood ones.
Furthermore, groundtruth masks included some non-ood pixels
as ood and as the nature of our algorithm is pixel-wise they were
misclassied as positive samples lead to poor performance models.
Overall, our classier lead to 74.67% localization accuracy rate.
Multimedia satellite challenge gave as the opportunity to test our
algorithm in real case disaster scenarios. Social media and satellite
sources proved extremely valuable and helped us separate ood
scenarios from others. The high average precision rate that visual
features achieved proves that computer vision community can be-
come ever more helpful in disaster detection and it is clear now that
can surpass the ambiguity that text can introduce in the decision
feature. On the other hand, satellite images proved quite noisy and
require deeper investigation in the future.
As a future work, we plan to adopt deeper techniques that exist
in the literature to recognize and discriminate places from each
other, while we also plan to investigate hybrid representations that
combine shallow with deep features so as to achieve even higher
precision rates in the visual part of the system. Text approaches
should undoubtedly revised and get tailored to disaster related
scenarios, while fusion approaches that consider “semantic ltering”
stages based on textual concepts will be revised. Regarding FDSI,
we plan to build a shallow/deep representation scheme that will
leverage both texture (i.e. LBP) and deep features so as to learn to
separate ood from non-ood areas even more eectively.
This work is supported by beAWARE project, partially funded by
the European Commission (H2020-700475).
Multimedia Satellite Task MediaEval’17, 13-15 September 2017, Dublin, Ireland
Benjamin Bischke, Patrick Helber, Christian Schulze, Venkat Srini-
vasan, and Damian Borth. 2017. The Multimedia Satellite Task at
MediaEval 2017. (2017).
Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013.
Improving Eciency and Accuracy in Multilingual Entity Extraction.
In Proceedings of the 9th International Conference on Semantic Systems
Ilias Gialampoukidis, Anastasia Moumtzidou, Dimitris Liparas,
Theodora Tsikrika, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017.
Multimedia retrieval based on non-linear graph-based fusion and par-
tial least squares regression. Multimedia Tools and Applications (2017),
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E.
Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and
Andrew Rabinovich. 2015. Going deeper with convolutions.. In CVPR.
IEEE Computer Society, 1–9.
Planet team. 2017. Planet Application Program Interface: In Space for
Life on Earth. (2017).
... Another is to view DCNN as a feature generator and further apply classic machine learning methods, such as SVM, to the extracted features. This strategy has been explored in (Ahmad et al. 2017b;Avgerinakis et al. 2017). In addition, machine learning classifiers can be trained on the deep features from different pre-trained DCNNs, i.e., pre-trained on both ImageNet and Places365 dataset. ...
... (Lopez-Fuentes et al. 2017;Hanif et al. 2017;Bischke et al. 2017a) for flood. Decision fusion, also known as late fusion, combines the outputs of the text and image classifiers into a final decision, which has been used in(Dao and Pham 2017; Ahmad et al. 2017a,b;Avgerinakis et al. 2017;Dao et al. 2018;Feng and Sester 2018) for floodrelevant social media retrieval. ...
Full-text available
The idea of ‘citizen as sensors’ has gradually become a reality over the past decade. Today, Volunteered Geographic Information (VGI) from citizens is highly involved in acquiring information on natural disasters. In particular, the rapid development of deep learning techniques in computer vision and natural language processing in recent years has allowed more information related to natural disasters to be extracted from social media, such as the severity of building damage and flood water levels. Meanwhile, many recent studies have integrated information extracted from social media with that from other sources, such as remote sensing and sensor networks, to provide comprehensive and detailed information on natural disasters. Therefore, it is of great significance to review the existing work, given the rapid development of this field. In this review, we summarized eight common tasks and their solutions in social media content analysis for natural disasters. We also grouped and analyzed studies that make further use of this extracted information, either standalone or in combination with other sources. Based on the review, we identified and discussed challenges and opportunities
... In the work [3], the authors classified, localized, and estimated severity of different disasters. They used MediaEval [13] dataset for classification and Bow fire dataset [7] for localization. They developed their own dataset named 3F emergency dataset that consisted of flood and fire pictures taken from flicker. ...
... In [13], the researchers present the algorithms that the team deployed to tackle disaster recognition tasks. They made two flood disaster dataset: one of them was made from social media image and the other one from satellite images. ...
Full-text available
The advancement of deep learning technology has enabled us to develop systems that outperform any other classification technique. However, success of any empirical system depends on the quality and diversity of the data available to train the proposed system. In this research, we have carefully accumulated a relatively challenging dataset that contains images collected from various sources for three different disasters: fire, water and land. Besides this, we have also collected images for various damaged infrastructure due to natural or man made calamities and damaged human due to war or accidents. We have also accumulated image data for a class named non-damage that contains images with no such disaster or sign of damage in them. There are 13,720 manually annotated images in this dataset, each image is annotated by three individuals. We are also providing discriminating image class information annotated manually with bounding box for a set of 200 test images. Images are collected from different news portals, social media, and standard datasets made available by other researchers. A three layer attention model (TLAM) is trained and average five fold validation accuracy of 95.88% is achieved. Moreover, on the 200 unseen test images this accuracy is 96.48%. We also generate and compare attention maps for these test images to determine the characteristics of the trained attention model. Our dataset is available at
... Deep Learning based approaches are widely used for flood classification using social media [4,[8][9][10][11][12]. A significant effort has been invested in searching for better solutions using deep learning approaches. ...
... The participants of the workshop used diverse approaches to solve the challenge. Researchers [8] have used Deep Convolutional Neural Network (DCNN) to perform binary image classification on the basis of availability or unavailability of flood. Features have been extracted from images with the help of GoogleNet [9], pretrained on places205, and then extracted features were merged with conventional features, which includes AutoColor Correlation (AC), Edge Histogram (EH), and Tamura. ...
Full-text available
A flood is an overflow of water that swamps dry land. The gravest effects of flooding are the loss of human life and economic losses. An early warning of these events can be very effective in minimizing the losses. Social media websites such as Twitter and Facebook are quite effective in the efficient dissemination of information pertinent to any emergency. Users on these social networking sites share both textual and rich content images and videos. The Multimedia Evaluation Benchmark (MediaEval) offers challenges in the form of shared tasks to develop and evaluate new algorithms, approaches and technologies for explorations and exploitations of multimedia in decision making for real time problems. Since 2015, the MediaEval has been running a shared task of predicting several aspects of flooding and through these shared tasks, many improvements have been observed. In this paper, the classification framework VRBagged-Net is proposed and implemented for flood classification. The framework utilizes the deep learning models Visual Geometry Group (VGG) and Residual Network (ResNet), along with the technique of Bootstrap aggregating (Bagging). Various disaster-based datasets were selected for the validation of the VRBagged-Net framework. All the datasets belong to the MediaEval Benchmark Workshop, this includes Disaster Image Retrieval from Social Media (DIRSM), Flood Classification for Social Multimedia (FCSM) and Image based News Topic Disambiguation (INTD). VRBagged-Net performed encouraging well in all these datasets with slightly different but relevant tasks. It produces Mean Average Precision at different levels of 98.12, and Average Precision at 480 of 93.64 on DIRSM. On the FCSM dataset, it produces an F1 score of 90.58. Moreover, the framework has been applied on the dataset of Image-Based News Topic Disambiguation (INTD), and exceeds the previous best result by producing an F1 evaluation of 93.76. The VRBagged-Net with a slight modification also ranked first in the flood-related Multimedia Task at the MediaEval Workshop 2020.
... The task in MediaEval-2018 [10] focused on the identification of passable and nonpassable roads in social media images. The majority of the solutions proposed for the classification of images into flooded and non-flooded categories relied on deep models (e.g., [3,7,8,19]). A similar trend has been observed in MediaEval-2018 for the identification and classification of passable roads through the information available on social media, where the majority of the methods relied on ensembles of deep models (e.g., [4,10,12,16,18,26]). ...
Full-text available
Disaster analysis in social media content is one of the interesting research domains having an abundance of data. However, there is a lack of labeled data that can be used to train machine learning models for disaster analysis applications. Active learning is one of the possible solutions to such a problem. To this aim, in this paper, we propose and assess the efficacy of an active learning-based framework for disaster analysis using images shared on social media outlets. Specifically, we analyze the performance of different active learning techniques under several sampling and disagreement strategies. Moreover, we collect a large-scale dataset covering images from eight common types of natural disasters. The experimental results show that the use of active learning techniques for disaster analysis in social media images results in a performance comparable to that obtained using human-annotated images with fewer data samples, and could be used in frameworks for disaster analysis in images without the tedious job of manual annotation.
Full-text available
Natural disasters caused by natural processes may lead to significant losses in terms of property and human lives. The timely collection of information about the damage caused by natural disasters is very important and can help reduce losses and speed recovery. Social media has become an important source of information for communication and dissemination of information in emergencies. Under such circumstances, inferring disaster events through the information available in social media will be very useful Satellite data has also been widely used to analyze the impact of natural disasters on the surface of the earth. In this paper, a detailed analysis of how social media and satellite imagery can be used to detect natural disasters is discussed.
Nowadays, one of the most critical challenges is the ongoing climate change, which has multiple and significant impacts on human life in financial and environmental levels. As the adverse effects of unexpected destructive natural extreme events, such as loss of human lives and property, will become more frequent and intensive in the future, especially in developing countries, the efficient confront is required in a holistic manner. Hence, there is an urgent need to develop novelty tools to enhance awareness and preparedness, assess risks and support decision-making, aiming to increase social resilience to climate changes. This work suggests a unified multilayer framework that encapsulates machine learning techniques in the risk assessment process for analysing and fusing dynamically heterogeneous information obtained from the field.
Full-text available
Social network data, utilised as a VGI source, was analysed using the September 2016 flood event in Messinia, South Greece. The flood event led to damage in the urban and rural environment in the general area, and to human deaths. An innovative methodology is based on applying machine learning to classify Twitter content. Tweets were classified into the following ten categories: (i) flood identification, (ii) rain identification, (iii) consequences of the flood, (iv) expressed emotions, (v) ironic attitude to local disaster management authorities, (vi) disaster management information, (vii); volunteer actions, (viii); situation overview, (ix); social effects, and (x); weather information. Some of the categories were divided further, to quantify significant information. The classified output was sequentially geo-referenced by identifying geographic entities within the text of each post (geo-parsing) and replicating each post according to the number of geolocations. The data processing involved various geo-validations and performance metrics. The final output was used to create maps and graphs of different time periods, that provide useful insights into the flood event for disaster management purposes. The applied methodology is an evolution of previous research published by the author, this time providing complete results, based on the analysis of 100 % of the data available, with maps and graphs which demonstrate how the flood event unfolded in different time periods. The methodology is fully automated in terms of data processing, and can be applied using a script developed by the author in the R programming language. This research is a step towards the real-time delivery of advanced information for all disaster management stakeholders, from official authorities and rescue teams, to volunteers and locals who may be situated within the area of a disastrous flood occurrence.
Better prediction and monitoring of flood events are key factors contributing to the reduction of their impact on local communities and infrastructure assets. Flood management involves successive phases characterized by specific types of assessments and interventions. Due to technological advances, computer vision plays an increasing role in flood monitoring, modeling and awareness. However, there is a lack of systemic analysis of computer vision’s relative adequacy to specific needs associated with successive flood management phases. This article presents a systematic review of relevant literature and proposes a need-based evaluation of these use-cases. Finally, the article highlights future areas of research in this domain.
Full-text available
Heterogeneous sources of information, such as images, videos, text and metadata are often used to describe different or complementary views of the same multimedia object, especially in the online news domain and in large annotated image collections. The retrieval of multimedia objects, given a multimodal query, requires the combination of several sources of information in an efficient and scalable way. Towards this direction, we provide a novel unsupervised framework for multimodal fusion of visual and textual similarities, which are based on visual features, visual concepts and textual metadata, integrating non-linear graph-based fusion and Partial Least Squares Regression. The fusion strategy is based on the construction of a multimodal contextual similarity matrix and the non-linear combination of relevance scores from query-based similarity vectors. Our framework can employ more than two modalities and high-level information, without increase in memory complexity, when compared to state-of-the-art baseline methods. The experimental comparison is done in three public multimedia collections in the multimedia retrieval task. The results have shown that the proposed method outperforms the baseline methods, in terms of Mean Average Precision and Precision@20.
Conference Paper
There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.
The Multimedia Satellite Task at MediaEval 2017
  • Benjamin Bischke
  • Patrick Helber
  • Christian Schulze
  • Venkat Srinivasan
  • Damian Borth
Benjamin Bischke, Patrick Helber, Christian Schulze, Venkat Srinivasan, and Damian Borth. 2017. The Multimedia Satellite Task at MediaEval 2017. (2017).