Conference Paper
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Disaster monitoring based on social media posts has raised a lot of interest in the domain of computer science the last decade, mainly due to the wide area of applications in public safety and security and due to the pervasiveness not solely on daily communication but also in life-threating situations. Social media can be used as a valuable source for producing early warnings of eminent disasters. This paper presents a framework to analyse social media multimodal content, in order to decide if the content is relevant to flooding. This is very important since it enhances the crisis situational awareness and supports various crisis management procedures such as preparedness. Evaluation on a benchmark dataset shows very good performance in both text and image classification modules.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A process constantly crawls new posts from social media, and in particular Twitter, that match predefined criteria, i.e. areas of interest, keywords, and interesting accounts. Before storing the tweets to a MongoDB database, the process also enriches them with information that is derived from analysis techniques about (i) the estimation of their validity to avoid fake news, (ii) the estimation of their relevance to the examined use cases (Moumtzidou, 2018), (iii) the detection of locations mentioned inside the text of tweets in order to geotag them, and (iv) the extraction of visual concepts from the images accompanying the posts. ...
... The crowdsourcing can contain all information in the form of unstructured textual data [12]. Supposedly, each word from the crowd should be analyzed intensively because it contains the important information that leads to how the flood prevention and actions can be taken and executed. ...
Article
Full-text available
-This paper proposes a new framework for crisis-mapping with flood prediction model based on the crowdsourcing data. Crisis-mapping is still at infancy stage development and offers opportunities for exploration. In fact, the application of the crisis-mapping gives fast information delivery and continuous updates for crisis and emergency evacuation using sensors. However, current crisis-mapping is more to the information dissemination of flood-related information and lack of flood prediction capability. Therefore, this paper applied artificial neural network for flood prediction model in the proposed framework. Sensor data from the crowdsourcing platform can be used to predict the flood-related measures to support continuous flood monitoring. In addition, the proposed framework makes used of the unstructured data from the Twitters to support the flood warnings dissemination to locate flood area with no sensor installation. Based on the results of the experiment, the fitted model from the optimization process gives 90.9% of accuracy performance. The significance of this study is that we provide a new alternative in flood warnings dissemination that can be used to predict and visualized the flood occurrence. This prediction is significant to agencies and authorities to identify the flood risk before its occurrence and crisis-maps can be used as an analytics tool for future city planning.
... Kumar et al. [5] modelled the evacuation behaviour of the residents of New York City in the aftermath of Hurricane Sandy. Moumtzidou et al. [6] combined textual and visual information to determine the relevance of social media content with respect to flooding. Ghosh et al. [2] reported that class specific TF-IDF boosting improves classifier performance on microblogs posted during disaster events. ...
Article
Full-text available
The Second Workshop on Exploitation of Social Media for Emergency Relief and Preparedness (SMERP) was held in conjunction with The Web Conference (WWW) 2018 at Lyon, France. A primary aim of the workshop was to promote multi-modal and multi-view information retrieval from the social media content in disaster situations. The workshop programme included keynote talks, a peer-reviewed paper track, and a panel discussion on the relevant research problems in the scope of the workshop.
Article
Full-text available
The idea of ‘citizen as sensors’ has gradually become a reality over the past decade. Today, Volunteered Geographic Information (VGI) from citizens is highly involved in acquiring information on natural disasters. In particular, the rapid development of deep learning techniques in computer vision and natural language processing in recent years has allowed more information related to natural disasters to be extracted from social media, such as the severity of building damage and flood water levels. Meanwhile, many recent studies have integrated information extracted from social media with that from other sources, such as remote sensing and sensor networks, to provide comprehensive and detailed information on natural disasters. Therefore, it is of great significance to review the existing work, given the rapid development of this field. In this review, we summarized eight common tasks and their solutions in social media content analysis for natural disasters. We also grouped and analyzed studies that make further use of this extracted information, either standalone or in combination with other sources. Based on the review, we identified and discussed challenges and opportunities
Chapter
Nowadays, one of the most critical challenges is the ongoing climate change, which has multiple and significant impacts on human life in financial and environmental levels. As the adverse effects of unexpected destructive natural extreme events, such as loss of human lives and property, will become more frequent and intensive in the future, especially in developing countries, the efficient confront is required in a holistic manner. Hence, there is an urgent need to develop novelty tools to enhance awareness and preparedness, assess risks and support decision-making, aiming to increase social resilience to climate changes. This work suggests a unified multilayer framework that encapsulates machine learning techniques in the risk assessment process for analysing and fusing dynamically heterogeneous information obtained from the field.
Conference Paper
Full-text available
Thanks to their worldwide extension and speed, online social networks have become a common and effective way of communication throughout emergencies. The messages posted during a disaster may be either crisis-relevant (alerts, help requests, damage descriptions, etc.) or not (feelings, opinions, etc.) In this paper, we propose a machine learning approach for creating a classifier able to distinguish between informative and not informative messages, and to understand common patterns inside these two classes. We also investigate similarities and differences in the words that mostly occur across three different natural disasters: fire, earthquake and floods. The results, obtained with real data extracted from Twitter during past emergency events, demonstrate the viability of our approach in providing a filtering service able to deliver only informative contents to crisis managers in a view of improving the operational picture during emergency situations.
Article
Full-text available
An important source of information presently is social media, which reports any major event including natural disasters. Social media also includes conversational data. As a result, the volume of data on social media has an enormous increase. During the time of natural disaster like floods, tsunami, earthquake, landslide, etc., people require information in those situations, so that relief operations like help, medical facilities can save many lives (Bifet et al. in J Mach Learn Res Proc Track 17:5–11, 2011). An attempt is made in this article on Geoparsing which will identify the places of disaster on a Map. Geoparsing is a process of converting free text description of locations into the geographical identifier in an unambiguous manner with the help of longitude and latitude. With the help of geographical coordinates, it can be mapped and entered into geographical information system. A real-time, reliable at robust twitter messages which are the source of the information can handle a large amount of data. After collecting tweets at the real time we can parse them for the disaster situation and its location. This information will help to identify the exact location of the event. For knowing information on the natural disaster, tweets are extracted from twitter to R-Studio environment. First the extracted tweets from twitter are parsed using R about “Natural Disaster”. Later we parsed the tweets and store in CSV format in R database. For all posted data tweets are calculated and stored in a file. Later visual analysis is performed for the data store using R Statistical Software. Further, it is useful to assess the severity of the natural disaster. Sentiment analysis (Rahmath in IJAIEM 3(5):1–3, 2014) of user tweets is useful for decision making (Rao et al. in Int J Comput Sci Inf Technol 6(3):2923–7, 2015).
Article
Full-text available
Increasingly, user generated content (UGC) in social media postings and their associated metadata such as time and location stamps are being used to provide useful operational information during natural hazard events such as hurricanes, storms and floods. The main advantage of these new sources of data are twofold. First, in a purely additive sense, they can provide much denser geographical coverage of the hazard as compared to traditional sensor networks. Second, they provide what physical sensors are not able to do: By documenting personal observations and experiences, they directly record the impact of a hazard on the human environment. For this reason interpretation of the content (e.g., hashtags, images, text, emojis, etc) and metadata (e.g., keywords, tags, geolocation) have been a focus of much research into social media analytics. However, as choices of semantic tags in the current methods are usually reduced to the exact name or type of the event (e.g., hashtags ‘#Sandy’ or ‘#flooding’), the main limitation of such approaches remains their mere nowcasting capacity. In this study we make use of polysemous tags of images posted during several recent flood events and demonstrate how such volunteered geographic data can be used to provide early warning of an event before its outbreak.
Conference Paper
Full-text available
In this study we compare three different fine-tuning strategies in order to investigate the best way to transfer the parameters of popular deep convolutional neural networks that were trained for a visual annotation task on one dataset, to a new, considerably different dataset. We focus on the concept-based image/video annotation problem and use ImageNet as the source dataset, while the TRECVID SIN 2013 and PASCAL VOC-2012 classification datasets are used as the target datasets. A large set of experiments examines the effectiveness of three fine-tuning strategies on each of three different pre-trained DCNNs and each target dataset. The reported results give rise to guidelines for effectively fine-tuning a DCNN for concept-based visual annotation.
Article
Full-text available
This paper presents an open platform, which collects multimodal environmental data related to air quality from several sources including official open sources, social media and citizens. Collecting and fusing different sources of air quality data into a unified air quality indicator is a highly challenging problem, leveraging recent advances in image analysis, open hardware, machine learning and data fusion and is expected to result in increased geographical coverage and temporal granularity of air quality data.
Article
Full-text available
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
Article
Full-text available
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
Article
Full-text available
Social media platforms provide active communication channels during mass convergence and emergency events such as disasters caused by natural hazards. As a result, first responders, decision makers, and the public can use this information to gain insight into the situation as it unfolds. In particular, many social media messages communicated during emergencies convey timely, actionable information. Processing social media messages to obtain such information, however, involves solving multiple challenges including: handling information overload, filtering credible information, and prioritizing different classes of messages. These challenges can be mapped to classical information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. We survey the state of the art regarding computational methods to process social media messages, focusing on their application in emergency response scenarios. We examine the particularities of this setting, and then methodically examine a series of key sub-problems ranging from the detection of events to the creation of actionable and useful summaries.
Article
Full-text available
The proposed social media crisis mapping platform for natural disasters uses locations from gazetteer, street map, and volunteered geographic information (VGI) sources for areas at risk of disaster and matches them to geoparsed real-time tweet data streams. The authors use statistical analysis to generate real-time crisis maps. Geoparsing results are benchmarked against existing published work and evaluated across multilingual datasets. Two case studies compare five-day tweet crisis maps to official post-event impact assessment from the US National Geospatial Agency (NGA), compiled from verified satellite and aerial imagery sources.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Full-text available
Social media are changing the way people communicate both in their day-to-day lives and during disasters that threaten public health. Engaging with and using such media may help the emergency-management community to respond to disasters.
Technical Report
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Conference Paper
This paper presents an open platform, which collects multimodal environmental data related to air quality from several sources including official open sources, social media and citizens. Collecting and fusing different sources of air quality data into a unified air quality indicator is a highly challenging problem, leveraging recent advances in image analysis, open hardware, machine learning and data fusion. The collection of data from multiple sources aims at having complementary information, which is expected to result in increased geographical coverage and temporal granularity of air quality data. This diversity of sources constitutes also the main novelty of the platform presented compared with the existing applications.
Article
With the recent explosive growth of e-commerce and online communication, a new genre of text, short text, has been extensively applied in many areas. So many researches focus on short text mining. It is a challenge to classify the short text owing to its natural characters, such as sparseness, large-scale, immediacy, non-standardization. It is difficult for traditional methods to deal with short text classification mainly because too limited words in short text cannot represent the feature space and the relationship between words and documents. Several researches and reviews on text classification are shown in recent times. However, only a few of researches focus on short text classification. This paper discusses the characters of short text and the difficulty of short text classification. Then we introduce the existing popular works on short text classifiers and models, including short text classification using sematic analysis, semi-supervised short text classification, ensemble short text classification, and real-time classification. The evaluations of short text classification are analyzed in our paper. Finally we summarize the existing classification technology and prospect for development trend of short text classification
Article
Natural disasters on devastating scales are happening more often. As these are becoming a major issue in many countries, there is a growing need for effective countermeasures using information and communications technology (ICT). Fujitsu Laboratories develops technologies for disaster prevention and mitigation, drawing on expert knowledge of specialists in this field. Effective ways to reduce the impact of natural disasters include their early detection, and forecasting of the vulnerable areas and scale of potential damage. In this paper, we first describe an enhanced estimation technique involving social networking services (SNS), in order to quickly identify the locus of a disaster. Reliable information on disaster-stricken areas can be obtained by combining data from SNS with other sources. We then describe a method to optimize the parameters of a flood-forecasting simulator that helps to identify high-risk areas in large river basins. This makes it possible to automate parameter configurations, which had been difficult to do before.
Book
Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book. © 2012 Springer Science+Business Media, LLC. All rights reserved.
Social media is a vital source of information during any major event, especially natural disasters. However, with the exponential increase in volume of social media data, so comes the increase in conversational data that does not provide valuable information, especially in the context of disaster events, thus, diminishing peoples’ ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. This project focuses on the development of a Bayesian approach to the classification of tweets (posts on Twitter) during Hurricane Sandy in order to distinguish “informational” from “conversational” tweets. We designed an effective set of features and used them as input to Naïve Bayes classifiers. In comparison to a “bag of words” approach, the new feature set provides similar results in the classification of tweets. However, the designed feature set contains only 9 features compared with more than 3000 features for “bag of words.” When the feature set is combined with “bag of words”, accuracy achieves 85.2914%. If integrated into disaster-related systems, our approach can serve as a boon to any person or organization seeking to extract useful information in the midst of a natural disaster.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Conference Paper
There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.
Article
The described system uses natural language processing and data mining techniques to extract situation awareness information from Twitter messages generated during various disasters and crises.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
A short message classification algorithm for tweet classification Recent Trends in Information Technology (ICRTIT)
  • P Selvaperumal
  • Suruliandi
P Selvaperumal and A Suruliandi. 2014. A short message classification algorithm for tweet classification. In Recent Trends in Information Technology (ICRTIT), 2014 International Conference on. IEEE, 1-3.