ArticlePDF Available

Abstract and Figures

During disaster response and recovery stages, stakeholders including governmental agencies collect disaster’s impact information to inform disaster relief, resource allocation, and infrastructure reconstruction. The damage data collected using field surveys and satellite imagery are often not available immediately after a disaster while rapid information is crucial for time-sensitive decision makings. Some researchers turned to social media for real-time situational information of disaster damage. However, existing damage assessment research mostly focused on single data modality (i.e. text or image) and made coarse-grained predictions, which limited their practical applications in assisting city-level operations. The difficulties of retrieving useful information from vast noisy social media data have been outlined by many studies. Thus, we propose a data-driven method to locate and assess disaster damage with massive multimodal social media data. The method splits and processes two data modalities, i.e. texts and images, using two modules. The image analysis module uses five machine learning classifiers that are organized in a hierarchical structure. The text analysis module uses a keyword search-based method. They together mine various damage information including hazard types (e.g. wind and flood), hazard severities, damage types (e.g. infrastructure destruction and housing damage). The method is applied and evaluated with two recent hurricane events. In practice, the method acquires damage information throughout extreme events and supplements conventional damage assessment methods. It enables the rapid damage information access and disaster response for both first responders and the general public. The research effort contributes to achieving more transparent and effective disaster relief activities.
Content may be subject to copyright.
Cite: Hao, H., & Wang, Y.* (2020). Leveraging multimodal social media data for rapid
disaster damage assessment. International Journal of Disaster Risk Reduction, 51, 101760.
Leveraging Multimodal Social Media Data for Rapid Disaster Damage Assessment
1. PhD Student, Department of Urban and Regional Planning and Florida Institute for Built
Environment Resilience, College of Design, Construction and Planning, University of Florida,
1480 Inner Road, Gainesville, FL, 32601, USA; Email:
2*Assistant Professor, Department of Urban and Regional Planning and Florida Institute for Built
Environment Resilience, University of Florida, P.O. Box 115706, Gainesville, FL 32611, U.S.A.
(corresponding author); Tel: +1(352) 294-1484; E-mail:; ORCID: 0000-0002-
During disaster response and recovery stages, stakeholders including governmental agencies
collect disaster’s impact information to inform disaster relief, resource allocation, and
infrastructure reconstruction. The damage data collected using field surveys and satellite imagery
are often not available immediately after a disaster while rapid information is crucial for time-
sensitive decision makings. Some researchers turned to social media for real-time situational
information of disaster damage. However, existing damage assessment research mostly focused
on single data modality (i.e. text or image) and made coarse-grained predictions, which limited
their practical applications in assisting city-level operations. The difficulties of retrieving useful
information from vast noisy social media data have been outlined by many studies. Thus, we
propose a data-driven method to locate and assess disaster damage with massive multimodal social
media data. The method splits and processes two data modalities, i.e. texts and images, using two
modules. The image analysis module uses five machine learning classifiers that are organized in a
hierarchical structure. The text analysis module uses a keyword search-based method. They
together mine various damage information including hazard types (e.g. wind and flood), hazard
severities, damage types (e.g. infrastructure destruction and housing damage). The method is
applied and evaluated with two recent hurricane events. In practice, the method acquires damage
information throughout extreme events and supplements conventional damage assessment
methods. It enables the rapid damage information access and disaster response for both first
responders and the general public. The research effort contributes to achieving more transparent
and effective disaster relief activities.
Keywords: computer vision; damage assessment; disaster management; multimodal data analysis;
social media; text mining.
1. Introduction
In each year, different types of disasters including wildfires, storms, droughts, and flooding jointly
cause economic losses up to hundreds of billions of dollars and claim many lives in the United
States (Smith, 2019). Partly due to the impact of climate change and global warming, the past
decade has witnessed increased frequencies and more severe outcomes of natural disasters
worldwide (McWethy et al., 2019; Smith, 2019). In response, human society pays tremendous
effort to minimize the negative impacts of natural disasters. Disaster management thus devotes to
reducing disaster risk and relieving human suffering, with four continuous phases, i.e. mitigation,
preparedness, response, and recovery (Gordon, 2015). Disaster damage data such as the location
and extent of damaged facilities is critical for disaster management operations, which is often
collected in the response and recovery phases. The collected data helps official agencies convey
situational information to the general public, evacuate and rescue people in affected areas, allocate
resources, and plan for future repair and reconstruction.
The disaster damage data is conventionally collected with field surveys, post-disaster satellite
imagery, or Unmanned Aerial Vehicle (UAV) imagery (Erdelj et al., 2017; FEMA, 2016; Yu et
al., 2018). However, none of these authoritative sources can be accessed immediately after the
disaster’s occurrence due to the restricted atmospheric or environmental conditions for the
deployment of labor and equipment (Zhong et al., 2016). While the timely knowledge of disaster
environments and situations is crucial for emergency managers to intervene in the disaster response
phase early and plan for time-sensitive operations such as rescuing affected people and optimizing
shelter locations.
The necessity of timely disaster situation knowledge boosts research exploring user-generated data
such as social media posts for disaster management applications (Anson et al., 2017; Granell &
Ostermann, 2016). Some approaches have been developed to analyze the content of social media
posts with text mining and computer vision techniques. Compared to conventional data sources,
social media provides real-time streaming data with various data formats. Thus, the data can be
collected throughout the disaster event and the analysis can also be performed rapidly and even in
real time. In the context of assessing disaster damage, social media textual contents can describe
experienced or observed damage of affected people while images may represent ground-level
scenes comparable to field assessors’ perceptions. Social media data are also less susceptible to
adverse environmental conditions and do not require the extra deployment of field assessors or
UAV pilots for data collection.
Previous studies assessing disaster damage based on social media data mostly focus on a single
modality of data (e.g. textual or visual data) mine a single type of damage information, e.g. wildfire
perimeters or inundation depth (Zhong et al., 2016; Eilander et al. 2016). Although the massive
social media data are generally considered as a source of big data, the damage assessment uses
posts that are on-topic, posted in affected areas, and include location information. Researchers may
still experience data shortage when searching for specific information in social media data at fine
spatial- or temporal-scale. The damage-related information can reside in either texts or images.
People from different groups may also prefer different social media platforms. Prior studies based
on single modality and source did not maximize the use of available real-time social media data.
As a result, these studies often aggregated and reported results at coarse spatial scales (e.g. state-
and regional- level) (Deng et al., 2016; Wang & Taylor, 2018), which is insufficient for practical
applications such as assisting the city-level emergency operations.
Extracting useful information from vast and noisy background messages for fine-grained damage
mapping is challenging. Moreover, useful information can be delivered in different formats and
describe disaster damage from various perspectives. In this research, we propose a data-driven
method to automatically analyze the massive raw crawled social media data and extract various
damage information from social media texts and images. The method divides the overall task into
steps and implements them with two modules. Each step is responsible for a single task such as
filtering, classification, or keyword search. Jointly, they extract various damage information from
social media posts including hazard types (i.e. wind and flood), hazard severities, and some
specific damage types such as power outage, infrastructure destruction, and house/building
damage. We applied the proposed method in pinpointing damage locations and assessing damages
extent in two recent hurricane damage cases, i.e. the city of Miami impacted by Hurricane Irma
and the city of Houston impacted by Hurricane Harvey. The proposed method offers an additional
data acquisition approach that supplements conventional damage assessment. The method
identifies eyewitness reports (i.e. social media posts) of affected people that show the impact of
disasters on human and community, which could be useful for humanitarian operations and
emergency decision making.
2. Relevant Work
An array of approaches is used to acquire disaster damage information. In practice, official
agencies send human assessors to disaster sites and collect detailed damage information such as
the location, number, type, and severity of damaged buildings and infrastructures (FEMA, 2016).
The field survey yields reliable and detailed damage information but is labor-intensive and time-
consuming. The field survey also inevitably exposes human assessors to dangerous environments.
Recently, some researchers leveraged high-resolution satellite and UAV imagery for rapid
assessment (Jordan, 2015; Novikov et al., 2018). The broad-view imagery provides overviews of
disaster-affected areas, and the high resolution enables damage assessment for individual
structures. However, satellite and UAV imagery is not always available in the short aftermath of a
major disaster. The deployment of UAVs should consider technical issues such as system
reliability, power supply, and physical load (Erdelj et al., 2017). Both data collection methods can
be severely affected by adverse weather and atmospheric conditions accompanied by natural
disasters such as dense clouds, heavy rains, and strong winds (Erdelj et al., 2017; Robinson et al.,
2019). With the increasingly important role of social media and other Web 2.0 applications in
disaster management, some researchers have taken advantage of the user-generated data and
considered citizens as human sensors” with five senses (i.e. touch, sight, hearing, taste, and smell)
to perceive the external environment (Goodchild, 2007). This novel conceptualization opens new
avenues for disaster management and damage assessment research. We summarized damage
assessment works leveraging social media data in four categories according to the analyzed data
modalities, namely, activity-based, text-based, image-based, and multi-modal or fused methods.
2.1 Activity-based methods
The activity-based methods do not investigate the textual or image content of social media posts.
Instead, they depict damage severity indirectly with metrics derived from tweeting frequencies.
Thus the activity-based methods are computationally simple and often output results in coarser
spatial levels such as ZCTA- (U.S. Census Bureau, n.d.), city-, and county-level. For example,
Kryvasheyeu et al. (2016) identified a significant positive correlation between the number of
hurricane-related tweets and the economic loss for New Jersey during Hurricane Sandy. Samuels
et al., (2018) considered possible loss of power or internet connection caused by the disaster, and
used the variation of tweeting frequencies as the metric of damage severity. In the work of Zou et
al., (2019), the ratio of disaster-related tweets and background tweets is considered as the damage
severity metric. In general, the activity-based methods provide limited disaster damage
information regarding both spatial and contextual details of the damage.
2.2 Text-based methods
Damage assessment methods based on social media textual posts have been studied mostly. Many
have used sentiment analysis, topic modeling, and keyword search to retrieve relevant information.
For instance, Wang & Taylor (2018) found a significant negative correlation between average
sentiment scores and the earthquake intensities in disaster-affected areas, while other studies on
hurricanes found little association between the social media text sentiments and damage severities
(Kryvasheyeu et al., 2016; Zou et al., 2019). Some researchers used topic modeling to detect and
locate trending events with geotagged social media texts (Resch et al., 2018). This approach may
not pinpoint the damage location accurately. A few recent studies included the geospatial
characteristics in topic modeling to locate and track small-scale crises during disasters (Wang &
Taylor, 2019; Yao & Wang, 2019).
Additionally, some studies using keyword search-based methods developed pre-defined keyword
lists or tables to identify useful textual posts. For example, Eilander et al. (2016) used keywords
such as “#(number) cm” and “#(number) m” to mine tweets containing flooding depth information
and constructed situational inundation map accordingly. Smith et al., (2017) considered tweets
including words such as “knee-deep” and “waist-deep” as qualified reports that are used to verify
the simulated flooding maps. Deng et al., (2016) divided disaster damage and risk information into
many subcategories such as infrastructure destruction, supply demands, and affected activities.
With keyword lists developed for each subcategory, the method can identify and categorize
qualified posts for more comprehensive situational knowledge. In general, keyword search-based
methods often look for particular information from the social media text corpus. The colloquial
nature of social media textual messages makes it expensive to enumerate all possible keywords
and phrases related to a topic, although some researchers paid substantial amounts of time and
effort to develop large lexicon tables for tweet collecting and mining (Temnikova et al., 2015).
2.3 Image-based methods
Compared to social media texts, images convey more objective and more useful information (Bica
et al., 2017), but much fewer studies explored the use of social media images and computer vision
(CV) for damage assessment. The limited quantity of studies relied on transfer learning and
convolutional neural networks (CNNs) for classifying damage severity levels and locating damage
contents (Alam et al., 2018b; Nguyen et al., 2017). Transfer learning is an approach that leverages
pre-trained deep learning models to solve new problems. The pre-training is often conducted on
large datasets such as the well-known ImageNet dataset, which contains more than ten million
images in twenty thousand categories. The pre-trained model is then re-trained on annotated
datasets for the new problem. Both Alam et al., (2018) and Nguyen et al., (2017) annotated more
than one hundred thousand social media images for the model retraining. ImageNet is a dataset for
object classification (e.g. pizza, bird, and soccer), however, some researchers argued that social
media images include more “scenes” (e.g. highway, bedroom, and parks) than “objects”. Thus they
experimented with the scene-level features that were obtained with models pre-trained for scene-
related CV tasks such as scene recognition or scene parsing (Ahmad et al., 2019). The scene-level
features are also used in some built environment studies (Liu et al., 2017).
2.4 Multi-modal and fused methods
A few prior works harnessed multi-modal social media data for damage type classification
(Mouzannar et al., 2018) and flood detection (Lopez-Fuentes et al., 2017, Huang et al., 2019).
These works considered pre-trained CNNs as feature generators and used two CNNs to extract
features from social media images and texts separately, in which the extracted features are
concatenated for the classification task. This concatenation did not consider the correlation
between different modalities. Pouyanfar et al., (2019) accounted for this by fusing the visual and
audio features into a ranking matrix with Multiple Correspondence Analysis. The final decision
was trained with the fused matrix. Similarly, Lazaridou et al. (2015) fused the visual and textual
in a cross-modal mapping matrix for a multimodal skip-gram model. The model is used for image
labeling and retrieval. Some research learned the common semantic information of different
modalities by forcing the models to learn similar representations for different modalities (You et
al., 2016, Feng et al., 2014) which were implemented by adjusting the objective function.
Research efforts are also found to integrate heterogeneous data sources for damage estimation. For
example, Smith et al., (2017) integrate the rainfall intensity data and social media posts for rapid
flood mapping. The method iteratively simulated flooding maps with rainfall intensity data. Valid
social media posts were used for verification purposes. The simulation result, which mostly
conforms to social media posts, was adopted as the final output. In a method proposed by Huang
et al., (2018), satellite imagery, high-resolution elevation data, and crowdsourced reports were
integrated for flood mapping. The method calculated flooding probability layers for each validated
crowdsourced report based on the elevation data and satellite imagery. The output flooding map
was the weighted combination of different flooding probability layers.
In summary, we identified a few research gaps in existing social media data-based damage
assessment works. Many methods relied on single data modality and text-based methods have been
mostly studied. Social media textual contents are limited by short text length, high subjectivity,
low information quality, and colloquial expression (Agarwal & Yiliyasi, 2010). The damage
information mined from textual posts could be inadequate and unreliable. Therefore, many
activity- and text-based methods aggregate and report results at state-, county-, and ZCTA-level,
with averaging or summation to alleviate individual estimation errors. Second, estimating the
disaster damage with social media images is still challenging due to the loosely-defined forms of
disaster damage, poor signal-to-noise ratio of raw crawled social media images, and the
subjectivity of the damage severity level (Nguyen et al., 2017). Although some researchers fused
two data modalities in a single model for prediction, not many social media posts contain both data
modalities. In addition, the semantic contents of social media texts and associated images are
weakly correlated in many cases (Vadicamo et al., 2017). The fusion of textual and visual features
may not yield promising results in these cases and possibly overlook valuable damage information
that only resides in one modality. Moreover, these image-based and multimodal models developed
for analyzing social media images are limitedly tested with lab-developed datasets, which
comprised human-sorted images with relatively balanced sample distribution. However, the raw
crawled social media images were extremely unbalanced and only included a small portion of on-
topic images for damage assessment (Ning et al., 2020). Models that perform well in lab-developed
datasets still need to be validated over raw crawled social media images.
Therefore, we proposed a method that takes raw crawled multimodal social media data (i.e. both
textual messages and images) and outputs various damage information. Instead of fusing different
modalities in one model, we split and analyzed them in different modules acknowledging that the
textual contents and images often convey different levels of information. The textual messages are
subjective in describing damage situations while images do not tell abstractive information such
as power outage. We used a pre-trained Resnet18 CNN to extract scene-level features from Twitter
and Flickr images. For textual messages, we adapted a keyword search-based method considering
that the disaster damage only occupied a small portion of the raw crawled texts. Other methods
either infer the damage severities with indirect metrics (e.g. activity-based methods and sentiment
analysis) or group textual messages for analyses (e.g. topic modeling), which may be manipulated
by other dominating disaster-related topics rather than disaster damage.
3. Multimodal Data-Driven Damage Assessment Method
The proposed method aims to automatically locate and summarize damage information from visual
and textual contents of the massive social media posts. It consists of four modules (Figure 1). The
Data Input module is responsible for data collection. The collected images and texts are separated
for analyses considering that their contents are often weakly correlated and describe damages from
different perspectives. In the context of disaster events, social media images can convey situational
information like ambient environmental conditions and hazard severities while texts often describe
damaged objects and consequent impacts. Images then go through five image classifiers in the
Image Process module and information including hazard types (i.e. wind and flood) and associated
severity levels is extracted. The Text Process module uses a pre-defined keyword search table to
examine whether the textual message contains any of six pre-defined damage types that are mostly
shared in the textual part of social media posts, including power outage, vehicle damage,
house/building damage, infrastructure destruction, fallen trees, and debris. The Result module
integrates the mined damage information from the Image Process module and Text Process
Figure 1. The Pipeline of Proposed Multimodal Data-Driven Damage Assessment Method.
Yes Presence of
Yes Type of
Severity: 0
No Flood Hazard
Wind Hazard
Keyword Search
Damage Information:
Wind Hazard Presence/Severity
Flood HazardPresence/Severity
Power Outage
House Damage
Infrastructure Damage
Fallen Tree
Vehicle Damage
Text Process
Data Input
Social M edia Data
( Tw it te r, F l ic kr, … )
3.1 Data Input module
We mainly use two sources of social media data: Twitter and Flickr in this study. Twitter is a
popular social media platform with more than 500 million tweets posted daily around the world.
Twitter allows users to post a maximum of 140-character (expanded to 280-character in November
2017) textual messages with links to images and videos. Around one percent of the messages are
geotagged. During disaster events, many affected people tweet to report observations, express
urgent needs, and seek helps, thus making Twitter an ideal data source for disaster management
related research (Olteanu et al., 2015; Tanev et al., 2017). Flickr is also a popular social media
platform, allowing users to share geotagged photos with optional textual descriptions. Flickr
photos are mostly high-resolution images of the natural and built environment, thus have been
used in some tourism studies (Hu et al., 2015). Both data can be accessed via their Application
Program Interface (API). As the Twitter streaming API only returns around one percent of tweets
due to the rate limit (Wang et al., 2017), we restrict the crawled tweets to be geotagged. For each
crawled tweet, we script to check whether it includes a link referring to images and download the
image if it does. Flickr API can return archived geotagged photos. We set the time and location
windows on Flickr API to access photos posted during disaster-affected periods and geotagged in
affected areas.
3.2 Image Process module
The processing of Twitter and Flickr images starts with converting images into numeric feature
vectors and then uses five classifiers to extract the disaster damage information. The five classifiers
are responsible for different tasks with a defined semantic hierarchy. Specifically: 1) one classifier
filters out images showing a perceived built environment; 2) one classifier identifies images
showing hazards; 3) one classifier classifies the damage type; and 4) two classifiers assign severity
levels to identified wind and flood hazards respectively. The five classifiers are organized in a
hierarchical structure (Figure 1). This design helps locate the limited images showing exposed
hazards from the sheer amount of posted images. The hierarchical processing can remove
irrelevant images in early steps and keep remaining images similar in contents, i.e. displayed
scenes and objects. As the succeeding classifiers work on the remaining images with less noise,
the classifiers can better focus on learning the difference between positive and negative samples
with defined semantic labels (Table 1). Therefore, the classifiers can achieve satisfactory
performance even with relatively small training data. In fact, we use 1,795 images for developing
the five classifiers. These images were collected from social media images posted in affected areas
during historical hurricane events and two existing databases: YFCC100M and CrisisMMD
(Thomee et al., 2016; Alam et al., 2018a). Two researchers worked together to annotate the images.
Table 1 summarizes the counts and labels of annotated images for different classification tasks.
Some images are repeatedly used in developing different classifiers. Note that we also include the
false positive images, predicted by preceding classifiers, in the training set of its succeeding
classifier during the training. In this way, the succeeding classifier can gain a certain ability to
remove false positive predictions and mitigate the Type I error with the proposed hierarchical
structure (Hao & Wang, in press).
Table 1. Images used for Developing Different Classifiers
Classification Task
# of Images
Perceived outdoor
environment classifier
Positive (show perceived outdoor environment)
Negative (show other contents, e.g. selfies, maps.)
Hazard presence classifier
Positive (show the evident wind or flood hazard)
Negative (show normal environmental condition)
Hazard type classifier
Wind hazard
Flood hazard
Wind and flooding hazard
Hazard severity classifier
Wind hazard:
Little to none
Flood hazard:
Little to none
Feature extraction represents 2-D images with 1-D numeric features that are used for the following
image classifications. In this study, we adopt a ResNet18 CNN pre-trained on the Places365
dataset as the feature generator, which yields the scene-level features. The ResNet18 CNN has a
relatively compact size and performs well in many CV competitions (He et al., 2016) while the
Places365 is a dataset consisted of more than 10 million images for scene recognition (Zhou et al.,
2018). We made this selection after comparing it with two other feature generators, namely, an
Inception-v3 CNN trained on ILSVRC dataset that extracts object-level features (Szegedy et al.,
2016) and an AutoEncoder trained on the ADE20K dataset which also returns scene-level features
(Zhou et al., 2019). The ResNet18 CNN performs equally or better than the other two feature
generators for the five classification tasks and is hence selected for this study. Features are
extracted as the output of the penultimate layer of the CNN. We adapted the ResNet18 CNN model
with the PyTorch library in Python (Paszke et al., 2019). The features only need to be extracted
once for each crawled image and are repeated for use by the following image classifiers.
The raw-crawled crowdsourced images can include screenshots of texts, maps, selfies, posters,
cartoons, advertisements, and so on (Ning et al., 2020). These images occupied a large portion of
social media images but provide little information on disaster situations. So our first step is to sort
out “informative” images that show perceived real-world environmental conditions. We further
restrict the perceived environment to be outdoor environments. Images taken inside a building may
reveal damage conditions for that individual building, however, they vary too much in terms of
exhibited objects and backgrounds from the outdoor environment. Also, we found very limited
images to train a separate classifier for it. We use 585 positive samples showing perceived outdoor
environments and 645 negative samples for classifier development (Table 1). The samples are
divided into 80% training set and 20% testing set, which is stratified according to labels (the same
setting applies to the development of other image classifiers). We experiment with different
machine learning classifiers including logistic regression (LR) models, decision trees, and support
vector machines (SVMs). We used the Scikit-learn package in Python for the machine learning
classifiers (Pedregosa et al., 2011). The SVM with a linear kernel achieves the highest
classification accuracy of 94.31% (Table 2).
The next step distinguishes images showing evident natural hazards (positive) from images
showing normal outdoor environment views (negative). Examples of common hazard content in
social media images include inundated roads or uprooted trees. We select the binary LR model
with 88.53% accuracy for this step (Table 2).
The third step details the hazard types, which serves as a prerequired procedure for classifying
hazard severity levels. Some previous work modeled the disaster damage severity with social
media images directly without considering distinct hazard types. However, images showing
different hazard types are often highly disparate in terms of contents. For example, an image
showing flood hazards usually contains water body while an image showing wind hazards is
generally represented with uprooted trees or roofs with missing tiles. The classification of hazard
type is a typical multi-label task as an image can include either wind or flood hazard or both. We
use the artificial neural network (ANN) model for this classification task as ANN considers the
possible correlation between different labels. The ANN model can accurately predict both labels
for 82.01% of testing images (Table 2).
The final steps use two classifiers to determine the severity levels of wind/flood hazards. We assign
each image showing hazard contents with one of the following three severity levels:
Little to None: images show no damage, minor adverse weather conditions, or little damage
that does not cause any economic loss or impact human activities (e.g. transportation);
Minor: images show damage that requires money for repair or recovery, and partially affect
human activities; and
Severe: images show severe damage that suggests extreme environmental conditions,
associated with significant economic loss, or severely impact human activities.
We select the multinomial logistic regression models for the severity level classification, which
achieves 83.94% accuracy for flood hazard severity classification and 74.24% accuracy for wind
hazard severity classification (Table 2).
Table 2 summarizes the selected classifiers and their associated accuracies for each classification
task. Note that these accuracies are based on test images.
Table 2. Classifier Selection and Performance for Each Classification Task
Classification Task
Perceived outdoor environment classifier
Hazard presence classifier
Hazard type classifier
Wind hazard severity classifier
Flood hazard severity classifier
3.3 Text process module
Social media users can describe their observations with different wording, phrases, and word
sequences in textual posts. The colloquial expressions impede conventional keyword search-based
methods with limited keywords and phrases from identifying much information, while developing
a large keyword table takes enormous efforts and time. Therefore, we adopt a two-list search
method in the Text Process module (Figure 1) aimed to detect on-topic texts maximally with fewer
efforts spent on enumerating possible word/phrases combinations for search. The two-list keyword
search method identifies pre-defined damage types with one list collecting physical damaged
objectives (e.g. roadway) and the other list collecting descriptive words of the damages (e.g.
submerged). A textual post is considered to describe a type of disaster damage when it concurrently
has words/phrases in both lists of the same damage type. This is performed after we remove the
punctuations, URLs, emoji, numbers, and stopwords in texts as well as stemming each word to its
root form. We defined six types of damage information that are mostly discussed on social media
platforms including power outage, vehicle damage, house damage, infrastructure destruction,
fallen trees, and debris. The fallen trees and debris are not damages themselves, however, they
indicate the presence of strong winds or flood water in reported locations and take money for
removal. The fallen trees may also damage properties like houses and vehicles.
The qualified textual damage reports should be the ones posted by affected people and discussing
their own experiences or observations. These texts often express damages with colloquial phrases
and details, such as the detailed damaged objects (e.g. “roofs” vs. “houses”), locations (e.g. “curbs”
vs. roads), and damage forms (e.g. “blow”, vs. “destroy”). We particularly include these words in
the keyword search table. Besides, we also refer to sources like EMTerms collection (Temnikova
et al., 2015) for keywords collection. Table 3 presents the two-list keyword search table. Note
some descriptive words such as “submerge” and “blown” indicate the type of hazards that cause
the damage. We also collect this information from the textual reports.
Table 3. Keyword (Stemmed) Search Table for Different Damage Types.
Damage Type
Word Lists
Power Outage
['power', 'powerlin', 'electr', 'nopow']
['fix', 'destroy', 'broken', 'damag', 'gone', 'knock', 'lost', 'without power', 'not
have power', "don't have", 'restor', 'cut', 'outag', 'no power', 'nopow', 'wait for
power', 'power back', 'outta', 'lack of', 'out of', 'went off', 'flick']
Vehicle Damage
['car', 'truck', 'van', 'vehicl', 'bu', 'motorcycl']
['flip', 'overturn', 'smash', 'damag', 'submerg', 'flood-damag', 'flood', 'lost',
'destroy', 'wreck', 'in the water', 'under water', 'in water']
['roof', 'window', 'hous', 'home', 'wall', 'build', 'basement', 'porch', 'yard', 'door',
['crack', 'lose', 'lost', 'destroy', 'damag', 'destruct', 'corrupt', 'flood-damag',
'flood', 'rip', 'blow', 'reconstruct', 'clean', 'rebuild', 'reconstruct', 'pull', 'blown',
'collaps', 'submerg', 'shake', 'water in', 'water on', 'water over']
['street', 'road', 'dam', 'bridg', 'power cabl', 'trail', 'parkway', 'rd', 'hwi',
'highway', 'fwi', 'freeway', 'dr', 'drive', 'blvd', 'boulevard', 'ramp', 'lane',
'mainlan', 'curb', 'expressway', 'school', 'church', 'airport', 'chemic plant',
['destroy', 'damag', 'destruct', 'corrupt', 'flood-damag', 'flood', 'collaps',
'submerg', 'fallen', 'high water', 'under water', 'water on', 'water in', 'water
over', 'underwat']
Tree Fall
['tree', 'branch', 'limb']
['fall', 'uproot', 'down', 'fallen']
3.4 Result module
The Result module integrates different types of hazards and damage information mined with the
Image Process and Text Process modules. The outputs are individual-level estimations. Figure 2
shows some example outputs. The percentages in brackets are estimated probabilities.
Figure 2. Examples of Damage Information Mining Method Output.
4. Empirical Case Study
4.1 Case descriptions and data collections
This section presented two case studies that applied the proposed method to assess damage
situations in 1) the city of Miami, Florida as affected by Hurricane Irma; and 2) the city of Houston,
Texas as affected by Hurricane Harvey. Both cities were impacted severely during hurricane
events. Miami experienced sustained winds of 45-55kt during Hurricane Irma (Cangialosi et al.,
2018). The storm tide and urban runoff caused a 3-5 ft. inundation along Biscayne Bay shoreline
and in downtown Miami. There were also widespread tree and power pole damage reported in the
metro area (Cangialosi et al., 2018). Houston was less affected by the wind hazard during
Hurricane Harvey compared to Miami. However, the exceptional rainfall and storm tides caused
massive flooding that inundated nearly one-third of the city, disabled major roads, and cut power
connections to households (Blake & Zelinsky, 2018). The severe impacts and versatile damage
types make these two cases ideal for evaluating the proposed method.
Text: “The Morning After # cocowalk #coconutgrove
#saturday#flood #flooding #waterflood #traffi c ……”
Perceived outdoor environment?: Yes (97.75%)
Damage presence?: Yes (99.91%)
Damage type: Flood (100.00%)
Damage severity: Minor (92.92%)
Damag e (text): ()
Text: “Justo a esta hora! @ SchenlyPark”
Perceived outdoor environment?: Yes (99.38%)
Damage presence?: Yes (99.91%)
Damage type: Wind (100.00%)
Damage severity: Minor (92.29%)
Damag e (text): ()
Text: “I'm in Houston. House flooded and I have no idea
what the status of my @lindsaylohancollection
Perceived outdoor environment?: Yes (96.62%)
Damage presence?: Yes ( 99.67%)
Damage type: Flood (100.00%), Wind (0.00%)
Damage severity: Severe (97.23%)
Damag e (text): (‘house damage', 'hous', ‘flood')
We collected social media posts that were posted within two weeks after hurricaneslandfall from
Twitter and Flickr. Table 4 shows the data sources, volumes, and temporal spans of collected data,
and in Figure 3 plotted the spatial distribution of collected data for the two cases. Retweets were
not included in the analyses.
Table 4. Geotagged Social Media Data Volume
Count of Records
Temporal Span
Miami (Irma)
1,555 images and 4,006 texts
09/10/2017 09/23/2017
94 images and associated textual descriptions
09/10/2017 09/23/2017
Houston (Harvey)
8,642 images and 24,696 texts
08/25/2017 09/07/2017
1,011 images and associated textual descriptions
08/25/2017 09/07/2017
Figure 1. Spatial Distribution of Tweets (left column) and Flickr Photos (right column) in
Houston (top row) and Miami (bottom row) during Hurricanes
4.2 Identified damage reports
For each social media post, the method examined whether it contained any of the ten defined
hazard/damage information. 89 (2.17%) and 793 (3.08%) posts were identified to include at least
one type of damage/hazard information (as reported by either text or image or both) for the Miami
and Houston case respectively. We presented the counts of identified damage reports in Table 5.
It was found that Miami suffered from both wind and flood hazards during Hurricane Irma.
However, Houston was mainly affected by the flooding with most identified damage reports, i.e.
180 images and 563 texts, indicating flood hazards. In Figure 4, we showed the distribution of
identified damage reports in points and densities. We used the bandwidth of 500 m for the density
mapping. The density is weighted by the ratio of the damage report counts and the raw
crowdsourced posts counts. Figure 4 shows that the identified damage reports clustered in the
central downtown area and distributed along roadways for the Houston case. The damage reports
were generally located along the shoreline for the Miami case, which indicated that these areas
experienced noticeable damage during hurricanes.
We plotted the counts of identified damage reports by days in Figure 5. Irma hit Florida on
September 10, 2017 and Harvey struck Texas on the night of August 25, 2017. We found that most
damage reports were posted in the first three to four days following the hurricanes’ landfall. These
reports can be mined immediately once they were posted.
Table 5. Counts of Different Reported Damage Information.
Damage Type
# of Reports
(Miami Case)
# of Reports
(Houston Case)
Wind Hazard (Image)
Flood Hazard (Image)
Power Outage (Text)
Vehicle Damage (Text)
House/Building Damage (Text)
Infrastructure (Text)
Fallen Tree (Text)
Debris (Text)
Flood Hazard (Text)
Wind Hazard (Text)
Figure 4. Spatial Distribution and Density Map of Damage Reports.
Figure 5. Temporal Distribution of Damage Reports
4.3 Mining hazard types and severity information with Image Process module
More widespread flood hazard reports (180) were found in Houston during Hurricane Harvey
compared to wind hazard reports (20). In comparison, Miami received slightly more wind hazard
reports (41) than flood hazard reports (21) during Hurricane Irma (Table 5). Figures 6-9 show the
density maps of identified wind and flood hazards as well as some representative images for the
two cases. The densities were weighted according to estimated severity levels. Images showing
wind hazards for the Houston case were located very sparsely across the city (Figure 6). In general,
wind hazards were represented as fallen trees and wrecked boats (Figure 6 and Figure 8). Some
images showing debris piles were falsely identified as wind hazard reports (Figure 6). The
proposed method can identify flooding in different environments including residential, downtown,
and streets (see Figure 7 and Figure 9).
9/10 9/11 9/ 12 9/13 9/14 9/15 9/16 9/ 17 9/18 9/19 9/20 9/21 9/ 22 9/23
(a). Miami case
340 323
30 36 21 25 20 17 19
8/25 8/26 8/ 27 8/28 8/29 8/30 8/31 9/1 9/2 9/3 9/4 9/5 9/6 9/7
(b). Houston case
Figure 6. Density Map of Wind Hazard Reports in Houston during Hurricane Harvey
Figure 7. Density Map of Flood Hazard Reports in Houston during Hurricane Harvey
Figure 8. Density Map of Wind Hazard Reports in Miami during Hurricane Irma
Res i den ti a l
Figure 9. Density Map of Flood Hazard Reports in Miami during Hurricane Irma
4.4 Mining damage types with Text Process module
We mined six types of damage information from textual messages (Table 1). Most textual reports
identified with the proposed two-list search approach were related to disaster damage. Some of
them did not reflect situations that happened at the geotagging locations and some may not refer
to the damage experienced or observed by users who posted the textual message. Table 6 shows
some representative examples of truly and falsely identified textual damage reports. The falsely
identified damage reports included general concerns and comments on the disaster events
expressed by affected people (e.g., e2 and e7) or official accounts (e.g., e9), situations about other
affected people (e.g., e10), negation (e.g., e3), assumption (e.g., 13), and advertisements (e.g., e6).
Moreover, some texts are too vague to determine whether the contents are about users’ own
experiences or observations (e.g. e4, e11, and e14).
Table 6. Examples of Mined Damage Information from Textual Reports.
Example of textual damage report
True positive:
e1: “Almost 10 days after #hurricaneharvey hit #Houston our building is still closed without power…
False positive:
e2: “I know many are without power, if you know someone that may need help being evacuated
e3: “Trying to enjoy the electricity before it goes out @ [user] Houston, Houston, Texa
e4: Some of you can't live without power!!!Always have a good book on hand during storms!
True positive:
e5: “Not pictured... half of my car sitting in water. @ [user] -…
False positive:
e6: “This is the new must-have vehicle in Houston! Rule the flood! #hurricaneharvey #houstonstrong.
e7: “People have lost everything, due to Hurricane Harvey's damage. Homes, lives, memories,
True positive:
e8: “The work doesn't stop! Carpet cleaning because of roof leaks. Support the long term recovery…
False positive:
e9: “Flooded Home??Here are some helpful tips on what to do with your the wet or damaged
e10: “This is Baby Susie, my new best friend. Her and her dad, Dennis lost their home in Meyerland
e11: “I went out in the back yard in my wellies last night to inspect the flood and came back with…
True positive:
e12 “Closed due to flooding. in #Baytown on I-10 Baytown E Fwy Inbound between Crosby Lynchburg
and Magnolia…
False positive:
e13 “Day 4. More rain. Luckily the water runs off to the street. If the street floods, we're screwed
e14 “Flooding, what flooding? @ Briargrove Drive Townhousescondominium Association
Fallen Tree
True positive:
e15 “Oak Tree down in Memorial Northwest subdivision @ Memorial Northwest, Spring, Texas
False positive:
e16 “Water has receded down to the trees. Like the Stars Spangled Banne
True positive:
e17 “Harvey bags and more debris on front yards in the greater Meyerland borough of Houston a
False positive:
e18 “Full day of unloading relief supplies & 18-wheeler trailers, cleaning flood debris at a partner
e19 “ #debris removal guidelines #FEMA #harvey #texas @ Houston, Texas
We then checked the textual posts that were identified to convey damage information. Table 7
records the number of identified true and false positive cases. The indeterminate ones are the
textual posts that we cannot tell whether the user observes a damage based on the present text (e.g.
e4, e11, and e14 in Table 6).
The negative cases are not counted as the false negative cases can always be reduced with more
keywords considered. It can be seen from Table 7 that the method works for most predefined
damage. Especially for the “Infrastructure Damage”. However, the prediction of “House Damage”
is associated with low precision. We found that most of the false positive predictions are due to
Table 7. Counts and percentages of true and false positive predictions of the Text Process
Damage Type
True positive
False positive
Power Outage
18 (80.8%)
3 (13.6%)
1 (4.5%)
Vehicle Damage
6 (60.0%)
2 (20.0%)
2 (20.0%)
House Damage
22 (44.0%)
21 (42.0%)
7 (14%)
Infrastructure Damage
536 (98.5%)
5 (0.9%)
3 (0.6%)
Fallen Tree
3 (100.0%)
0 (0.0%)
0 (0.0%)
7 (87.5%)
1 (12.5%)
0 (0.0%)
4.5 Performance of image classifiers on identifying wind and flood hazards.
To evaluate how the hierarchical image classifiers performed with raw crawled data, we annotated
the 1,555 Twitter images collected for the Miami case and tracked how these images flow through
the Image Process module. The process was demonstrated in Figure 10.
Figure 10. The filtering and classification of Twitter images collected for the Miami case.
When working as an integral, the image classifiers correctly identified 29 and 14 images showing
wind and flood hazards (true positive) from the 1,555 raw crawled Twitter images, and only missed
very few images that present one of the two hazard types (false negative) and only 5 and 6 images
were falsely predicted. Overall, the image classifiers are effective in dealing with the noisy
semantic contents of social media images. We summarized the performance metrics of the
hierarchical image classifiers on identifying the two hazards in Table 8.
Table 8. Performance metrics of the hierarchical image classifiers on identifying hazards
Hazard type
Wind hazard
Flood hazard
5. Discussion
A data-driven method is proposed to assess damage information with multimodal social media
data. The method leverages a set of machine learning approaches to process textual messages and
visual images separately. While many previous studies relied on single data modalities and
aggregated results in coarser spatial levels, our method outputs individual estimations that can
assist city-level emergency operations. A few studies (e.g. Lopez-Fuentes et al., 2017, Mouzannar
et al., 2018) concatenated image and textual features for disaster damage assessment. They were
not efficient in analyzing posts with missing modalities or which the two modalities were weakly
correlated. The separation of two data modalities also allows the method to include more data
sources with visual and textual data formats.
Social media textual messages are limited by its short text length, subjectivity, low information
quality, colloquial expressions, and so forth (Agarwal & Yiliyasi, 2010). We addressed some of
1,555 raw
crawled images
189 images
Tru e:183
Fals e: 6
1,336 images
Tru e:1,320
Fa ls e:16
Presence of
125 images
Tru e:118
Fa ls e:7
Type of
Po s itiv e Po s itiv e
64 images
Tru e :52
Fals e: 12
10 images
Tru e :7
Fa ls e:3
20 images
Tru e :14
Fa ls e:6
34 images
Tru e :29
Fa ls e:5
these limitations with the proposed two-list keyword search method that uses two keyword lists
for each pre-defined damage type. We adopt this method intending to detect damage efficiently
from texts with few efforts spent on enumerating word/phrases combinations. The results show
that the textual damage reports identified with this approach are generally on-topic, though some
do not talk about the users’ own experience or observations. Previous related works (Nguyen et
al., 2017) also found it difficult to assess the damage with social media images due to the poor
signal-to-noise ratio and loosely-defined damage forms. We address these challenges by devising
five image classifiers with a defined semantic hierarchy to find out images providing information
of interest, i.e. wind and flood hazards. The hierarchical structure also serves as a robust filter that
removes irrelevant and less informative images in early steps, which is effective for the noisy and
unbalanced social media images. Our method shows satisfactory performance when tested with
raw crawled social media data.
The proposed method is possibly limited in a few aspects and is open for further improvements.
First, we used around 1,800 training and testing images for the development of five classifiers.
Though the classifiers show comparable performance to other related works analyzing social
media data (e.g. Nguyen et al., 2017, Alam et al., 2018, Ning et al., 2020), the performance can
always be improved with more annotated images for classifier development. Second, both studied
cases found few (2-3%) posts containing damage information and yielded few qualified reports as
a consequence. A reason is that we downloaded images about two years after the event occurrence
and many image links were invalid. Also, we restricted crawled posts to be geotagged. Some recent
works using approaches such as geoparsing and reverse geocoding can extract location information
from textual messages (Middleton et al., 2018). The method can yield more estimations once these
approaches are considered. Future research in this direction should also keep paying attention to
other emerging data courses for disaster management operations. Third, the data-driven damage
assessment approach is built based on the assumption that the crawled eyewitness reports
accurately reflect the ground truth scenarios at the time when the post is created for the location
where the report is geotagged. This assumption may not always be true as affected population
sometimes do not immediately report their observed damages. They may report damages when
they moved to a safer location (Eilander et al., 2016). However, the result of this study showed
that most damage reports were posted within two to three days after the hurricanes’ landfall (Figure
5), which suggested that most affected people reported their observations without much time lapse.
Figures 6 - 9 also showed that the built environment scenarios (e.g. residential, commercial,
natural) identified from the images correspond to their associated locations. For the textual damage
reports, the infrastructure destruction reports were found located on major roadways and bridges.
These demonstrate the overall validity of the assumption for a course-grained time period (e.g. a
day) and spatial scale (e.g. a neighborhood). Future studies can inspect the temporal and locational
consistency between the reported damage and the ground truth for individual posts once data is
available. Moreover, social media data can suffer from spatial bias (Zhang & Zhu, 2018). Regions
with no damage report identified do not translate to no damage presence (Zhong et al., 2016).
Potential users of similar data-driven methods should also be informed by this limitation when
using analysis results of social media data. Last, though the method can process social media data
rapidly, a real-time setting could facilitate the method in practical applications.
6. Conclusion
A data-driven method is proposed to mine rapid, fine-grained, and comprehensive damage
information from multimodal social media data, and its effectiveness is tested with two recent
hurricane cases. Emergency managers and first responders frequently engage in time-sensitive
decision-making and operations throughout the disaster events. The early access to fine-grained
disaster damage information can largely relieve uncertainties in these processes, which in turn
improves the rapidity and efficiency of disaster response and reduce consequent losses. Our
method offers a supplementary resource for the acquisition of timely disaster damage information,
which could be useful in the absence of authoritative data acquisition approaches. The proposed
method also provides experience for future research in this direction on countering the noisy,
unstructured, and unbalanced social media data, especially for research looking for specific
information that only resides in few social media posts. In addition to our presented case study,
the general research framework also applies to other disasters and extreme events with data of
similar modalities from other sources.
This material is based upon work supported by the early-career faculty start-up fund and graduate
research assistantships at the University of Florida. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the authors and do not necessarily reflect
the views of the University of Florida.
Declaration of Interest
Agarwal, N., & Yiliyasi, Y. (2010). Information quality challenges in social media. Proceedings
of the 2010 International Conference on Information Quality (ICIQ). 12-14 November 2010
Little Rock, Arkansas, SA.
Ahmad, K., Pogorelov, K., Riegler, M., Ostroukhova, O., Halvorsen, P., Conci, N., & Dahyot, R.
(2019). Automatic detection of passable roads after floods in remote sensed and social
media data. Signal Processing: Image Communication, 74, 110–118.
Alam, F., Ofli, F., & Imran, M. (2018a). CrisisMMD: Multimodal Twitter Datasets from Natural
Disasters. In Proceedings of International AAAI Conference on Web and Social Media
(ICWSM), Stanford, California, USA. 465-473.
Alam, F., Ofli, F., & Imran, M. (2018b). Processing Social Media Images by Combining Human
and Machine Computing during Crises. International Journal of Human-Computer
Interaction, 34(4), 311–327.
Anson, S., Watson, H., Wadhwa, K., & Metz, K. (2017). Analysing social media data for disaster
preparedness: Understanding the opportunities and barriers faced by humanitarian actors.
International Journal of Disaster Risk Reduction, 21, 131–139.
Bica, M., Palen, L., & Bopp, C. (2017). Visual representations of disaster. Proceedings of the
ACM Conference on Computer Supported Cooperative Work, CSCW, 1262–1276.
Blake, E. S., & Zelinsky, D. A. (2018). HURRICANE HARVEY. Retrieved from
Cangialosi, J. P., Latto, A. S., & Berg, R. (2018). HURRICANE IRMA. Retrieved from
Coronese, M., Lamperti, F., Keller, K., Chiaromonte, F., & Roventini, A. (2019). Evidence for
sharp increase in the economic damages of extreme natural disasters. In Proceedings of the
National Academy of Sciences of the United States of America, 116(43), 21450–21455.
Wallemacq, P., and Below, R. (2015). The human cost of natural disasters: A global perspective.
Brussels, Belgium: Centre for Research on the Epidemiology of Disasters. 1-55.
Deng, Q., Liu, Y., Zhang, H., Deng, X., & Ma, Y. (2016). A new crowdsourcing model to assess
disaster using microblog data in typhoon Haiyan. Natural Hazards, 84(2), 1241–1256.
Eilander, D., Trambauer, P., Wagemaker, J., & Van Loenen, A. (2016). Harvesting Social Media
for Generation of Near Real-time Flood Maps. Procedia Engineering, 154, 176–183.
Erdelj, M., Król, M., & Natalizio, E. (2017). Wireless Sensor Networks and Multi-UAV systems
for natural disaster management. Computer Networks, 124, 72–86.
FEMA. (2016). Damage Assessment Operations Manual. Retrieved from
Feng, F., Wang, X., & Li, R. (2014, November). Cross-modal retrieval with correspondence
autoencoder. In Proceedings of the 22nd ACM international conference on Multimedia, 7-
Goodchild, M. F. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal
69(4), 211–221.
Granell, C., & Ostermann, F. O. (2016). Beyond data collection: Objectives and methods of
research using VGI and geo-social media for disaster management. Computers,
Environment and Urban Systems, 59, 231–243.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 770–778.
Hao, H., & Wang, Y. (in press). Hurricane Damage Assessment with Multi-, Crowd-Sourced
Image Data: A Case Study of Hurricane Irma in the City of Miami. In 17th International
Conference on Information System for Crisis Response and Management (ISCRAM).
Hu, Y., Gao, S., Janowicz, K., Yu, B., Li, W., & Prasad, S. (2015). Extracting and understanding
urban areas of interest using geotagged photos. Computers, Environment and Urban
Systems, 54, 240–254.
Huang, X., Wang, C., & Li, Z. (2018). A near real-time flood-mapping approach by integrating
social media and post-event satellite imagery. Annals of GIS, 24(2), 113–123.
Huang, X., Wang, C., Li, Z., & Ning, H. (2019). A visual–textual fused approach to automated
tagging of flood-related tweets during a flood event. International Journal of Digital
Earth, 12(11), 1248-1264.
James A. Gordon. (2015). Comprehensive Emergency Management for Local Governments::
Demystifying Emergency Planning. Brookfield, CT: Rothstein Associates Inc.
Jordan, B. R. (2015). A bird’s-eye view of geology: The use of micro drones/UAVs in geologic
fieldwork and education. GSA Today, 50–52.
Kryvasheyeu, Y., Chen, H., Obradovich, N., Moro, E., Van Hentenryck, P., Fowler, J., &
Cebrian, M. (2016). Rapid assessment of disaster damage using social media activity.
Science Advances, 2(3) 1-11.
Lazaridou, A., Pham, N. T., & Baroni, M. (2015). Combining language and vision with a
multimodal skip-gram model. In 2015 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, 153–163.
Liu, L., Silva, E. A., Wu, C., & Wang, H. (2017). A machine learning-based method for the
large-scale evaluation of the qualities of the urban environment. Computers, Environment
and Urban Systems, 65, 113–125.
Lopez-Fuentes, L., Van De Weijer, J., Bolaños, M., & Skinnemoen, H. (2017). Multi-modal
Deep Learning Approach for Flood Detection. In Proceeding of the MediaEval 2017
Workshop. 1–3.
McWethy, D. B., Schoennagel, T., Higuera, P. E., Krawchuk, M., Harvey, B. J., Metcalf, E. C.,
Schultz, C., Miller, C., Metcalf, A. L., Buma, B., Virapongse, A., Kulig, J. C., Stedman, R.
C., Ratajczak, Z., Nelson, C. R., & Kolden, C. (2019). Rethinking resilience to wildfire.
Nature Sustainability, 2(9), 797–804.
Middleton, S. E., Kordopatis-Zilos, G., Papadopoulos, S., & Kompatsiaris, Y. (2018). Location
extraction from social media: Geoparsing, location disambiguation, and geotagging. ACM
Transactions on Information Systems (TOIS), 36(4), 1-27.
Mouzannar, H., Rizk, Y., & Awad, M. (2018). Damage Identification in Social Media Posts
using Multimodal Deep Learning. In 15th International Conference on Information System
for Crisis Response and Management (ISCRAM), Rochester, NY, May, 529-543.
Nguyen, D. T., Ofli, F., Imran, M., & Mitra, P. (2017). Damage assessment from social media
imagery data during disasters. In Proceedings of the 2017 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017, 569–
Ning, Li, Hodgson, & Wang. (2020). Prototyping a Social Media Flooding Photo Screening
System Based on Deep Learning. ISPRS International Journal of Geo-Information, 9(2), 1-
Novikov, G., Trekin, A., Potapov, G., Ignatiev, V., & Burnaev, E. (2018). Satellite imagery
analysis for operational damage assessment in emergency situations. Lecture Notes in
Business Information Processing, 320, 347–358.
Olteanu, A., Vieweg, S., & Castillo, C. (2015). What to expect when the unexpected happens:
Social media communications across crises. In Proceedings of the 18th ACM conference on
computer supported cooperative work & social computing. 994-1009
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Desmaison, A. (2019).
Pytorch: An imperative style, high-performance deep learning library. In Advances in
Neural Information Processing Systems. 8026-8037.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas,
J. (2011). Scikit-learn: Machine learning in Python. the Journal of Machine Learning
Research, 12, 2825-2830.
Pouyanfar, S., Tao, Y., Tian, H., Chen, S. C., & Shyu, M. L. (2019). Multimodal deep learning
based on multiple correspondence analysis for disaster management. World Wide Web,
22(5), 1893–1911.
Resch, B., Usländer, F., & Havas, C. (2018). Combining machine-learning topic models and
spatiotemporal analysis of social media data for disaster footprint and damage assessment.
Cartography and Geographic Information Science, 45(4), 362–376.
Robinson, T. R., Rosser, N., & Walters, R. J. (2019). The Spatial and Temporal Influence of
Cloud Cover on Satellite-Based Emergency Mapping of Earthquake Disasters. Scientific
Reports, 9(1), 1-9.
Samuels, R., Taylor, J., & Mohammadi, N. (2018). The Sound of Silence: Exploring How
Decreases in Tweets Contribute to Local Crisis Identification. In 15th International
Conference on Information System for Crisis Response and Management (ISCRAM),
Rochester, NY, May.
Smith, A. (2019). 2018’s Billion Dollar Disasters in Context | NOAA Climate.Gov.
Retrieved from
Smith, L., Liang, Q., James, P., & Lin, W. (2017). Assessing the utility of social media as a data
source for flood risk management using a real-time modelling framework. Journal of Flood
Risk Management, 10(3), 370–380.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception
Architecture for Computer Vision. In Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016-December, 2818–2826.
Tanev, H., Zavarella, V., & Steinberger, J. (2017, May). Monitoring disaster impact: detecting
micro-events and eyewitness reports in mainstream and social media. In 14th International
Conference on Information System for Crisis Response and Management (ISCRAM).
Temnikova, I., Castillo, C., & Vieweg, S. (2015). EMTerms 1.0: A terminological resource for
crisis tweets. ISCRAM 2015 Conference Proceedings - 12th International Conference on
Information Systems for Crisis Response and Management, 2015-January, 147–157.
Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D. & Li, L.J.
(2016). YFCC100M: The new data in multimedia research. Communications of the
ACM, 59(2), 64-73.
U.S. Census Bureau, (n.d.). ZIP Code Tabulation Areas (ZCTAs). Retrieved from
Vadicamo, L., Carrara, F., Cimino, A., Cresci, S., Dell’Orletta, F., Falchi, F., & Tesconi, M.
(2017). Cross-Media Learning for Image Sentiment Analysis in the Wild. In Proceedings -
2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, 2018-
January, 308–317.
Wang, Y., Wang, Q., & Taylor, J. E. (2017). Aggregated responses of human mobility to severe
winter storms: An empirical study. PLOS ONE, 12(12), e0188734.
Wang, Y., & Taylor, J. E. (2018). Coupling sentiment and human mobility in natural disasters: a
Twitter-based study of the 2014 South Napa Earthquake. Natural Hazards, 92(2), 907–925.
Wang, Y., & Taylor, J. E. (2019). DUET: Data-Driven Approach Based on Latent Dirichlet
Allocation Topic Modeling. Journal of Computing in Civil Engineering, 33(3), 04019023.
Yao, F., & Wang, Y. (2019). Tracking urban geo-topics based on dynamic topic model.
Computers, Environment and Urban Systems. 79, 101419.
You, Q., Luo, J., Jin, H., & Yang, J. (2016). Cross-modality consistent regression for joint
visual-textual sentiment analysis of social multimedia. In Proceedings of the Ninth ACM
international conference on Web search and data mining. 13-22.
Yu, M., Yang, C., & Li, Y. (2018). Big data in natural disaster management: A review. In
Geosciences (Switzerland) 8(5). 165.
Zhang, G., & Zhu, A. X. (2018). The representativeness and spatial bias of volunteered
geographic information: a review. Annals of GIS, 24(3), 151–162.
Zhong, X., Duckham, M., Chong, D., & Tolhurst, K. (2016). Real-time estimation of wildfire
perimeters from curated crowdsourcing. Scientific reports, 6, 24206.
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2018). Places: A 10 Million
Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 40(6), 1452–1464.
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic
Understanding of Scenes Through the ADE20K Dataset. International Journal of Computer
Vision, 127(3), 302–321.
Zou, L., Lam, N. S. N., Shams, S., Cai, H., Meyer, M. A., Yang, S., Lee, K., Park, S. J., &
Reams, M. A. (2019). Social and geographical disparities in Twitter use during Hurricane
Harvey. International Journal of Digital Earth, 12(11), 1300–1318.
... P r e p r i n t n o t p e e r r e v i e w e d the performance of the ML models. Similarly multimodal social media data in combination with ML models were used by Hao and Wang (2020) for rapid disaster damage assessment [52]. The evaluation measure was only accuracy. ...
... Social media outlets have also been widely explored for a diversified set of applications in disasters and emergency situations [2]. For instance, Hao et al. [9] proposed a multi-modal framework utilizing multi-social media imagery and textual information for damage assessment in disaster-hit areas. ...
Full-text available
This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively.
... During disasters, people can facilitate information diffusion, gain situational awareness, and request assistance to enhance disaster response through social media platforms [4][5][6][7][8][9][10]. Such digital platforms can provide a communication channel to disadvantaged communities through the accounts of local residents or volunteers [11][12][13]. For instance, one recent work utilized power outages, pipe bursts, and food accessibility data on Mapbox, SafeGraph, and 311 in Harris Country, Texas, during Uri, and the following analysis revealed that low-income and racial/ethnic minority groups were more disrupted [14]. ...
Full-text available
The winter storm Uri that occurred in February 2021 affected many regions in Canada, the United States, and Mexico. The State of Texas was severely impacted due to the failure in the electricity supply infrastructure compounded by its limited connectivity to other grid systems in the United States. The georeferenced estimation of the storm’s impact is crucial for response and recovery. However, such information was not available until several months afterward, mainly due to the time-consuming and costly assessment processes. The latency to provide timely information particularly impacted people in the economically disadvantaged communities, who lack resources to ameliorate the impact of the storm. This work explores the potential for disaster impact estimation based on the analysis of instant social media content, which can provide actionable information to assist first responders, volunteers, governments, and the general public. In our prototype, a deep neural network (DNN) uses geolocated social media content (texts, images, and videos) to provide monetary assessments of the damage at zip code level caused by Uri, achieving up to 70% accuracy. In addition, the performance analysis across geographical regions shows that the fully trained model is able to estimate the damage for economically disadvantaged regions, such as West Texas. Our methods have the potential to promote social equity by guiding the deployment or recovery resources to the regions where it is needed based on damage assessment.
... One such direction is in automated systems for moderating content like hate speech [55,14], violent content [3] and fake news [31,1] on social platforms. Such platforms have also shown how useful they can be in response to disaster assessment [32] and management [38]. Other interesting research problems analyze content shared on these platforms to understand the dynamics of content likeability and social validation for content creators [59], influence and opinion propagation for social media marketing [12,69], and the components that can make content trend and go viral on social media [65,26]. ...
Full-text available
We present a computational approach for estimating emotion contagion on social media networks. Built on a foundation of psychology literature, our approach estimates the degree to which the perceivers' emotional states (positive or negative) start to match those of the expressors, based on the latter's content. We use a combination of deep learning and social network analysis to model emotion contagion as a diffusion process in dynamic social network graphs, taking into consideration key aspects like causality, homophily, and interference. We evaluate our approach on user behavior data obtained from a popular social media platform for sharing short videos. We analyze the behavior of 48 users over a span of 8 weeks (over 200k audio-visual short posts analyzed) and estimate how contagious the users with whom they engage with are on social media. As per the theory of diffusion, we account for the videos a user watches during this time (inflow) and the daily engagements; liking, sharing, downloading or creating new videos (outflow) to estimate contagion. To validate our approach and analysis, we obtain human feedback on these 48 social media platform users with an online study by collecting responses of about 150 participants. We report users who interact with more number of creators on the platform are 12% less prone to contagion, and those who consume more content of `negative' sentiment are 23% more prone to contagion. We will publicly release our code upon acceptance.
... There is also a keyword-based data-retrieval method and modeled the data during the training stage using SVM with an AUC of 0.937. This model to detect alarming posts of social media such as Twitter has been improved using deep learning methods such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) [22][23][24]. ...
Full-text available
Emergency care is one of the cornerstone parts of the world health organization’s action plan. Rapid response and immediate care are considered in agile emergency care. Artificial intelligence (AI) and informatics have been applied to fulfill these requirements through automated emergency technology. Machine learning (ML) is one of the main parts of some of these proposed technologies. There are various ML algorithms and techniques which are potentially applicable for different purposes of emergency care. AI-based approaches using classification and clustering algorithms, natural language processing, and text mining are some of the possible techniques that could prove useful for investigating models of emergency prevention and management and proposing improved procedures for handling such critical situations. ML is known as a field of AI which attempts to automatically learn from data and applies that learning to make better decisions. Decision-support tools can apply the results of either supervised or various semi-supervised or unsupervised learning methods to tackle the how decisions about emergency situations are typically handled by the best professionals at the scene of an emergency, in the pre-hospital, and in healthcare facility settings. Enhanced and rapid communication at the moment of emergency, with the most effective decision making for triaging to estimate the acute nature of injuries and possible complications, how to keep a patient stable on the way to the care facility, and also avoiding adverse drug reactions, are some of the possible directions for exploring how ML can help to gather the data and to make emergency management more efficient and effective. The wide range of scenarios present in emergency situations and the complexity of different legal and ethical constraints on what responding personnel are allowed to perform on an injured subject before reaching a hospital makes for a most challenging set of problems for investigating the components of “intelligent” decision support that could help in these highly interactive and humanly tragic situations.
Social media can be a significant tool for transportation and transit agencies providing passengers with real-time information on traffic events. Moreover, COVID-19 and other limitations have compelled the agencies to engage with travelers online to promote public knowledge about COVID-related issues. It is, therefore, important to understand the agencies’ communication patterns. In this original study, the Twitter communication patterns of different transportation actors—types of message, communication sufficiency, consistency, and coordination—were examined using a social media data-driven approach applying text mining techniques and dynamic network analysis. A total of 850,000 tweets from 395 different transportation and transit agencies, starting in 2018 and the periods before, during and after the pandemic, were studied. Transportation agencies (federal, state, and city) were found to be less active on Twitter and mostly discussed safety measures, project management, and so forth. By contrast, the transit agencies (local bus and light, heavy, and commuter rail) were more active on Twitter and shared information about crashes, schedule information, passenger services, and so forth. Moreover, transportation agencies shared minimal pandemic safety information than transit agencies. Dynamic network analysis reveals interaction patterns among different transportation actors that are poorly connected and coordinated among themselves and with different health agencies (e.g., Centers for Disease Control and Prevention [CDC] and the Federal Emergency Management Agency [FEMA]). The outcome of this study provides understanding to improve existing communication plans, critical information dissemination efficacy, and the coordination of different transportation actors in general and during unprecedented health crises.
This study proposed and demonstrated the “Noah's Ark” effect, a concept wherein major disaster scenarios generate radical engagement in disaster preparedness within local communities as opposed to the “cry wolf” effect. The study setting was the town of Kuroshio in Kochi Prefecture, Japan, where a large tsunami was expected to hit, according to the Cabinet Office's New Estimation in March 2012. The study quantitatively and qualitatively analyzed Japanese newspaper articles on “disaster prevention” in Kuroshio through text mining and performed a comparison between the tsunami disaster caused by the Tohoku Earthquake and the expected tsunami devastation in other areas. The results revealed that the Noah's Ark effect is characterized by five features: (1) increasing disaster preparedness issues to the same level as those in tsunami-affected areas; (2) focusing on preventing a disaster or hazard (disaster as an event) rather than preparing for issues brought about by a disaster (disaster as a process); (3) involving gradual shifts rather than sudden changes observed in the affected areas (4) resident voluntary action (e.g., evacuation training); rather than dependence on government measures; and (5) promoting and simplifying specific issues. In regard to the negative impact of disaster forecast and estimation, the findings suggest the need to consider not only the cry-wolf effect, which applies to high-frequency but small-scale disasters (e.g., floods), but also the Noah's Ark effect, which applies to low-frequency but large-scale disasters (e.g., tsunamis).
When huge disasters strike, the afflicted areas need help from all sides. However, it is difficult for relevant departments to quickly obtain relief needs of disaster areas and accurately distribute relief materials on demand. To solve this problem, this paper proposes a relief demands urgency evaluation approach which integrates Natural Language Processing (NLP), Analytic Hierarchy Process (AHP), EWM (the entropy weight method), and the Grey Relational Technique for Order Preference by Similarity to Ideal Solution (Grey TOPSIS). First, the evaluation index system of disaster relief demand is constructed from four aspects, emergency support demands, emergency rescue demands, basic life support demands, and public infrastructure support demands. Then, the indices are assigned based on social media data and real-time reports, and the weight is assigned based on AHP and EWM. At last, the Grey TOPSIS is used to evaluate the relief demand urgency of different disaster areas. Due to the high timeliness of social media data, our approach is efficient. Taking Typhoon Lekima as an example, we evaluate the disaster relief needs of cities in Zhejiang Province, Jiangsu Province, and Shandong Province and compare the evaluation values with official post disaster statistics. Results show that the urgency of disaster relief needs calculated by our proposed method is significantly correlated with actual economic losses. Moreover, the method can identify specific disaster relief needs, so as to improve the timeliness and accuracy of emergency rescue.
Full-text available
After significant earthquakes, we can see images posted on social media platforms by individuals and media agencies owing to the mass usage of smartphones these days. These images can be utilized to provide information about the shaking damage in the earthquake region both to the public and research community, and potentially to guide rescue work. This paper presents an automated way to extract the damaged buildings images after earthquakes from social media platforms such as Twitter and thus identify the particular user posts containing such images. Using transfer learning and ~ 6500 manually labelled images, we trained a deep learning model to recognize images with damaged buildings in the scene. The trained model achieved good performance when tested on newly acquired images of earthquakes at different locations and when ran in near real-time on Twitter feed after the 2020 M7.0 earthquake in Turkey. Furthermore, to better understand how the model makes decisions, we also implemented the Grad-CAM method to visualize the important regions on the images that facilitate the decision.
Information residing in multiple modalities (e.g., text, image) of social media posts can jointly provide more comprehensive and clearer insights into an ongoing emergency. To identify information valuable for humanitarian aid from noisy multimodal data, we first clarify the categories of humanitarian information, and define a multi-label multimodal humanitarian information identification task, which can adapt to the label inconsistency issue caused by modality independence while maintaining the correlation between modalities. We proposed a Multimodal Humanitarian Information Identification Model that simultaneously captures the Correlation and Independence between modalities (CIMHIM). A tailor-made dataset containing 4,383 annotated text-image pairs was built to evaluate the effectiveness of our model. The experimental results show that CIMHIM outperforms both unimodal and multimodal baseline methods by at least 0.019 in macro-F1 and 0.022 in accuracy. The combination of OCR text, object-level features, and the decision rule based on label correlations enhances the overall performance of CIMHIM. Additional experiments on a similar dataset (CrisisMMD) also demonstrate the robustness of CIMHIM. The task, model, and dataset proposed in this study contribute to the practice of leveraging multimodal social media resources to support effective emergency response.
Full-text available
This article aims to implement a prototype screening system to identify flooding-related photos from social media. These photos, associated with their geographic locations, can provide free, timely, and reliable visual information about flood events to the decision-makers. This screening system, designed for application to social media images, includes several key modules: tweet/image downloading, flooding photo detection, and a WebGIS application for human verification. In this study, a training dataset of 4800 flooding photos was built based on an iterative method using a convolutional neural network (CNN) developed and trained to detect flooding photos. The system was designed in a way that the CNN can be re-trained by a larger training dataset when more analyst-verified flooding photos are being added to the training set in an iterative manner. The total accuracy of flooding photo detection was 93% in a balanced test set, and the precision ranges from 46–63% in the highly imbalanced real-time tweets. The system is plug-in enabled, permitting flexible changes to the classification module. Therefore, the system architecture and key components may be utilized in other types of disaster events, such as wildfires, earthquakes for the damage/impact assessment.
Full-text available
The ability to rapidly access optical satellite imagery is now an intrinsic component of managing the disaster response that follows a major earthquake. These images provide synoptic data on the impacts, extent, and intensity of damage, which is essential for mitigating further losses by feeding into the response coordination. However, whilst the efficiency of the response can be hampered when cloud cover limits image availability, spatio-temporal variations in cloud cover have never been considered as part of the design of effective disaster mapping. Here we show how annual variations in cloud cover may affect our capacity to respond rapidly throughout the year and consequently contribute to overall earthquake risk. We find that on a global scale when accounting for cloud, the worst time of year for an earthquake disaster is between June and August. During these months, 40% of the global population at risk from earthquakes are obscured from optical satellite view for >3 consecutive days. Southeastern Asia is particularly strongly affected, accounting for the majority of the population at risk from earthquakes that could be obscured by cloud in every month. Our results demonstrate the importance of the timing of earthquakes in terms of our capacity to respond effectively, highlighting the need for more intelligent design of disaster response that is not overly reliant on optical satellite imagery.
Full-text available
Record-breaking fire seasons are becoming increasingly common worldwide, and large wildfires are having extraordinary impacts on people and property, despite years of investments to support social–ecological resilience to wildfires. This has prompted new calls for land management and policy reforms as current land and fire management approaches have been unable to effectively respond to the rapid changes in climate and development patterns that strongly control fire behaviour and continue to exacerbate the risks and hazards to human communities. Promoting social–ecological resilience in rapidly changing, fire-susceptible landscapes requires adoption of multiple perspectives of resilience, extending beyond ‘basic resilience’ (or bouncing back to a similar state) to include ‘adaptive resilience’ and ‘transformative resilience’, which require substantial and explicit changes to social–ecological systems. Clarifying these different perspectives and identifying where they will be most effective helps prioritize efforts to better coexist with wildfire in an increasingly flammable world. PDF of full text available here:
Full-text available
The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset.
Full-text available
In recent years, social media such as Twitter have received much attention as a new data source for rapid flood awareness. The timely response and large coverage provided by citizen sensors significantly compensate the limitations of non-timely remote sensing data and spatially isolated river gauges. However, automatic extraction of flood tweets from a massive tweets pool remains a challenge. Taking the Houston Flood in 2017 as a study case, this paper presents an automated flood tweets extraction approach by mining both visual and textual information a tweet contains. A CNN architecture was designed to classify the visual content of flood pictures during the Houston Flood. A sensitivity test was then applied to extract flood sensitive keywords that were further used to refine the CNN classified results. A duplication test was finally performed to trim the database by removing the duplicated pictures to create the flood tweets pool for the flood event. The results indicated that coupling CNN classification results with flood sensitive words in tweets allows a significant increase in precision while keeps the recall rate in a high level. The elimination of tweets containing duplicated pictures greatly contributes to higher spatio-temporal relevance to the flood.
During natural and man-made disasters, people use social media platforms such as Twitter to post textual and multimedia content to report updates about injured or dead people, infrastructure damage, missing or found people, among other information types. Studies have revealed that this online information, if processed timely and effectively, is extremely useful for humanitarian organizations to gain situational awareness and plan relief operations. In addition to the analysis of textual content, recent studies have shown that imagery content on social media can boost disaster response significantly. Despite extensive research that mainly focuses on textual content to extract useful information, limited work has focused on the use of imagery content or the combination of both content types. One of the reasons is the lack of labeled imagery data in this domain. Therefore, in this paper, we aim to tackle this limitation by releasing a large multi-modal dataset from natural disasters collected from Twitter. We provide three types of annotations, which are useful to address a number of crisis response and management tasks for different humanitarian organizations.
Modern cities are facing critical environmental and social problems that are difficult to solve using conventional planning approaches due to the cities' magnitude and complexity. Recent developments in sensing technologies and urban computing, however, integrate new data resources and technologies to tackle these challenges. Popular social networking platforms such as Twitter provide new data sources on important events (e.g., cultural activities, political campaigns, accidents, crises) providing rich knowledge about urban systems and human dynamics. This research is intended to develop a method for effectively monitoring important information during such events and helping with planning and policymaking. We use semantically similar and geographically close geo-topics to represent important local events. This research proposes a data-driven system for detecting and tracking the semantic, spatial, and temporal dynamics of these geo-topics, specifically designed for geo-tagged tweets. The system consists of data preprocessing, geo-topic generation, and geo-topic tracking modules. The preprocessing module can remove robotic and semantically trivial texts. In the geo-topic generation module, we use spatial factors to measure the spatial impacts of geo-tagged tweets by applying an exponential decay function to the pairwise distances between tweets. We then improve the dynamic topic model (DTM) by embedding the spatial factors to enable the generation of geo-topics in semantic, spatial, and temporal dimensions simultaneously. The geo-topic tracking module monitors semantic change by detecting changes in certain keywords' probabilities and the volumes of tweets belonging to different geo-topics. This module also uses radius of gyration and trajectory-pattern mining to track and analyze the movement patterns of geo-topics. We employed the tracking system in three disaster cases in different U.S. cities to track small-scale emergencies and crises. These implementations demonstrated the effectiveness of the system for identifying and tracking geo-topics at fine temporal and geographic scales. The system also has strong potential in creating planning-related analyses for policy makers, improving the situational awareness of the general public, and serving as a basis for urban information systems that contribute to smart, agile, and resilient city developments.
Social networking platforms have been widely employed to detect and track physical events in population-dense urban areas. They can be effective tools to understand what happens and when and where it happens, either retrospectively or in real time. Correspondingly, a variety of approaches have been proposed for detecting either targeted or general events. However, neither type of event detection technique has been developed to detect urban emergencies that happen in specific geographic locations and with unpredictable characteristics. Therefore, we propose a spatial and data-driven detecting urban emergencies technique (DUET) for natural hazards, manmade disasters, and other emergencies. The method addresses both geographic and semantic dimensions of events using a geotopic detection module and evaluates their crisis levels on the basis of the intensity of negative sentiment through a ranking module. DUET was designed specifically for georeferenced tweets from a Twitter streaming application programming interface (API). To validate the technique, we conducted multiple experiments with geotagged tweets in different urban environments over a period of four to six consecutive hours. DUET successfully identified emergencies of different types among all the candidate geotopics. Our future work focuses on enabling online-mode detection with high scalability with large volumes of streaming data and providing interactive visualization through a GIS system. DUET can identify emergencies of general types and provide timely emergency reports both to first responders and to the public. The technique contributes to building an efficient and open disaster information system through a crowdsourcing effort and adding agility to urban resilience regarding crisis detection, situation awareness, and information diffusion.
This paper addresses the problem of floods classification and floods aftermath detection based on both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed for two corresponding tasks at the MediaEval 2018 benchmarking challenge. The tasks are (i) identification of images providing evidence for road passability and (ii) differentiation and detection of passable and non-passable roads in images from two complementary sources of information. For the first challenge, we mainly rely on object and scene-level features extracted through multiple deep models pre-trained on the ImageNet and Places datasets. The object and scene-level features are then combined using early, late and double fusion techniques. To identify whether or not it is possible for a vehicle to pass a road in satellite images, we rely on Convolutional Neural Networks and a transfer learning-based classification approach. The evaluation of the proposed methods is carried out on the large-scale datasets provided for the benchmark competition. The results demonstrate significant improvement in the performance over the recent state-of-art approaches.
Social media such as Twitter is increasingly being used as an effective platform to observe human behaviors in disastrous events. However, uneven social media use among different groups of population in different regions could lead to biased consequences and affect disaster resilience. This paper studies the Twitter use during 2017 Hurricane Harvey in 76 counties in Texas and Louisiana. We seek to answer a fundamental question: did social-geographical disparities of Twitter use exist during the three phases of emergency management (preparedness, response, recovery)? We employed a Twitter data mining framework to process the data and calculate two indexes: Ratio and Sentiment. Regression analyses between the Ratio indexes and the social-geographical characteristics of the counties at the three phrases reveal significant social and geographical disparities in Twitter use during Hurricane Harvey. Communities with higher disaster-related Twitter use in Harvey generally were communities having better social and geographical conditions. These results of Twitter use patterns can be used to compare with future similar studies to see whether the Twitter use disparities have increased or decreased. Future research is also needed to examine the effects of Twitter use disparities on disaster resilience and to test whether Twitter use can predict community resilience.