Conference PaperPDF Available

CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs



Content may be subject to copyright.
CrisICSum: Interpretable Classification and Summarization
Platform for Crisis Events from Microblogs
Thi Huyen Nguyen
L3S Research Center
Hanover, Germany
Miroslav Shaltev
L3S Research Center
Hanover, Germany
Koustav Rudra
Indian Institute of Technology
(Indian School of Mines)
Dhanbad, India
Microblogging platforms such as Twitter, receive massive messages
during crisis events. Real-time insights are crucial for emergency
response. Hence, there is a need to develop faithful tools for e-
ciently digesting information. In this paper, we present CrisICSum,
a platform for classication and summarization of crisis events.
The objective of CrisICSum is to classify user posts during disaster
events into dierent humanitarian classes (i.e., damage, aected
people, etc.) and generate summaries of class-level messages. Unlike
existing systems, CrisICSum employs an interpretable by design
backend classier. It can generate explanations for output decisions.
Besides, the platform allows user feedback on both classication and
summarization phases. CrisICSum is designed and run as an easily
integrated web application. Backend models are interchangeable.
The system can assist users and human organizations in improving
response eorts during disaster situations. CrisICSum is available
Computing methodologies
Learning from critiques;
Human-centered computing
User studies;Information
systems Clustering and classication;Summarization.
Learning with feedback, classication, summarization, crisis events
ACM Reference Format:
Thi Huyen Nguyen, Miroslav Shaltev, and Koustav Rudra. 2022. CrisICSum:
Interpretable Classication and Summarization Platform for Crisis Events
from Microblogs. In Proceedings of the 31st ACM International Conference on
Information and Knowledge Management (CIKM ’22), October 17–21, 2022,
Atlanta, GA, USA. ACM, New York, NY, USA, 5 pages.
During crisis events, a signicant amount of information is posted
on social media platforms. The real-time information on these sites
Both authors contributed equally to the paper
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9236-5/22/10.. . $15.00
Figure 1: CrisICSum - Overall Architechture
is essential for situational awareness and aid responses [
]. How-
ever, not all the messages posted during disasters are relevant. The
overwhelming data make it dicult for humans to digest infor-
mation quickly. Beside detecting crisis-related messages, human
organizations and aid responders usually want to obtain short up-
dates at granular levels such as damages, aected people, needs,
etc. Many research groups [
] have developed methods
and tools that apply machine learning techniques on social media
texts for disaster response. However, these works mainly focus on
the individual task of classication or summarization. Besides, it is
quite opaque how the models come to make predictions. There also
exist proprietary solutions [
], but their system behaviors are
also non-transparent. Apart from that, it is dicult to customize
and update such applications because they are not open source.
Recently, there has been an increasing interest in designing in-
terpretable models that have a trade-o between performance and
explainability. Some recent studies proposed interpretable classi-
cation and summarization approaches [
]. The models
allow generating results with high accuracy along with human-
understandable explanation behind the model outputs. Generally,
accurately predictive and transparent models are more usable and
defensible for many application domains such as health care and
societal problems.
Designing ecient models for predictions and summarization
of crisis events from Microblogs such as Twitter in real-time is a
challenging task. Tweets are short, noisy, and much dierent from
sentences from news articles. Hence, models that are originally
designed for formal text datasets do not usually perform well on
Twitter texts. Besides, crisis events evolve over time, with a large
volume of new messages posted every day. The performance of
models trained on a limited dataset can drop signicantly over
time when applying to new data or similar disaster texts. It makes
end-users lose their trust and expectation in using the models. The
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Thi Huyen Nguyen, Miroslav Shaltev, & Koustav Rudra
Figure 2: BERT2BERT architecture
issue can partly be solved by learning and updating models using
error correction data from user feedback. Furthermore, existing
summarization studies mainly apply an unsupervised approach due
to the unavailable labeled data for training. There is also a lack of
gold summaries for summarization evaluation. Human coordination
can help improve existing models’ performance or provide data for
supervised approach and evaluation.
This paper presents CrisICSum, an interpretable open-source
tool for emergency response. CrisICSum considers the two im-
portant tasks of crisis events, which are classication and summa-
rization, and allows human coordination for error correction of
prediction and explanation. The overall architecture of CrisICSum
is illustrated in Figure 1. CrisICSum takes messages posted at the
time of disaster events and classies tweets into dierent humani-
tarian classes such as ‘infrastructure damage’, caution and advice’,
‘aected people and evacuation’, etc. These classes are dened and
used by United Nations Oce for the Coordination of Humani-
tarian Aairs (UN OCHA) and in many previous studies [
End-users can then select to generate summaries for the data of any
specied class. Currently, we use BERT2BERT and RATSUM [
as back-end classication and summarization models. BERT2BERT
predicts class labels and extracts small snippets, so-called ratio-
nales, from the original inputs as explanations/evidence for the
output decisions. Then, RATSUM uses the input data along with
rationales to generate summaries of tweets in each class. CrisIC-
Sum also allows to integrate other classication and summarization
algorithms [7, 9–12] as back-end models.
Our paper is organized as follows. In section 2, we discuss the
methodological background of BERT2BERT and RATSUM. Then,
we introduce our CrisICSum architecture in detail in section 3.
Finally, we conclude our paper in section 4.
In this section, we briey describe the methodology of BERT2BERT
and RATSUM models [10].
Constraint Description
𝑗=1𝑡𝑗·𝐿𝑒𝑛𝑔𝑡ℎ (𝑗) 𝑀𝐿𝑒𝑛𝑔𝑡ℎ (𝑗): number of words in tweet 𝑗
𝑀: user-dened summary word length
Í𝑗𝑇𝑖𝑡𝑗𝑢𝑖,𝑖 =[1· · · 𝑈]𝑇𝑖: set of tweets containing rationale 𝑖
Í𝑖𝑅𝑗𝑢𝑖 |𝑅𝑗| × 𝑡𝑗, 𝑗 =[1· · · 𝑇]𝑅𝑗: set of rationale words in 𝑗
Table 1: RATSUM constraints
2.1 Interpreatable BERT2BERT classication
BERT2BERT is an interpretable by design pipeline model, which
consists of two prediction stages. The rst stage applies a multi-
task learning approach. It learns to predict class labels and extract
evidence, so-called rationales, simultaneously. The model has a
shared BERTweet encoder and two prediction decoders. Input
tweets are pre-processed and fed to the shared encoder. The rst
decoder is a fully connected Softmax layer. It takes embedding
representation of rst [CLS] token from BERTweet encoder and
learns to minimize a cross-entropy loss function
L𝑡𝑎𝑠𝑘 1
. The second
decoder is a binary classier, which predicts whether a token is a
part of rationales. The decoder consists of a GRU (Gated Recurrent
Unit) layer and a Sigmoid output layer. The loss value is computed
based on a weighted binary cross-entropy function
L𝑡𝑎𝑠𝑘 2
, where
the class weights are inverse frequencies of rationale and non-
rationale tokens from input texts.
Finally, the multi-task classier at the rst learning stage jointly
optimizes the two losses.
L=L𝑡𝑎𝑠𝑘 1+𝜆L𝑡𝑎𝑠𝑘2(1)
where 𝜆is used to control the weight between the two losses.
The second BERT2BERT stage extracts nal classication labels
of input tweets. It takes the extracted rationales from the previous
step as inputs and applies a simple BERTweet-based classier for
prediction. This learning phase makes BERT2BERT an interpretable
by design model. It is transparent to end-users that rationales are
truly evidence for the output decisions.
2.2 RATSUM Summarization
RATSUM is an extractive summarization model. Given a set of
tweets along with class labels and rationales in a user-specied
time window, RATSUM extracts the most representative and di-
verse tweets as summaries. The model applies an Integer Linear
Programming (ILP) approach by optimizing the following objective
function with constraints in Table 1.
𝑚𝑎𝑥 (
is the set of tweets,
𝑡𝑗 {
species if a tweet
is selected.
is the set of unique rationales/numerals in
indicates whether a rationale/numeral
is chosen.
the importance score of a word
determined by the logarithm of
document frequency.
CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
Figure 3: CrisICSum - Interactive Classication
This section presents the architecture of CrisICSum system.
3.1 System overview
CrisICSum can be run as a web application. BERT2BERT classier
is trained on a NVIDIA GTX 1080Ti machine. The trained model
is saved and loaded on CPU. RATSUM is an unsupervised summa-
rizer that does not require any labeled data for training. Output
summaries are computed and extracted on CPU. Both BERT2BERT
and RATSUM are implemented in Python. Currently, BERT2BERT
is trained for two disaster types, such as typhoons or earthquakes.
However, other trained models can be added to our CrisICSum
system. Besides, the web application design of CrisICSum ensures
that BERT2BERT and RATSUM can be replaceable by dierent classi-
cation and summarization models.
Figure 1 shows an overall architecture of CrisICSum. The main
objective of our system is to classify messages posted on social me-
dia during disaster events into dierent humanitarian classes and
then generate short updates of user-specied length for each class.
We allow users to perform interactive classication and summa-
rization. For interactive classication, end-users are asked to enter
an input tweet, event type, and event location as shown in Figure 3.
At this point, CrisICSum employs BERT2BERT which was trained
on labeled datasets (Nepal Earthquake or Typhoon Hagupit). When
users choose an earthquake or typhoon type, CrisICSum loads the
trained model on Nepal earthquake or Typhoon Hagupit dataset,
respectively, for prediction. The current BERT2BERT model trained
on an event (i.e., Nepal earthquake) can eectively make predic-
tions on a similar event (i.e., Mexico earthquake) [
]. CrisICSum
can be easily customized to load dierent trained models for var-
ious event types. When all the inputs are summited, CrisICSum
returns a class label for the input tweet and rationale snippets as
evidence/explanation for the output label (below part in Figure
3). Users can send feedback for classication or go back to the
main page. For summarization, end-users give the system a le of
tweets (Figure 4a). Besides, users are asked to choose event type,
event location and press the classication button. After that, the
corresponding classication model is called to classify the input
tweets and extract rationales. Users can choose to download the
classied tweets for further purposes and leave the system or go to
summarization. At the summarization step, we ask users to choose
a class to be summarized, enter the output limit and click the sum-
marization button. Our system returns a word cloud of extracted
rationales from the chosen class along with a list of the most infor-
mative and diverse tweets as a summary (Figure 4b). Then, users
can send feedback on the result, submit other les, or go back to
the main page.
3.2 User feedback
Our CrisICSum system allows user feedback for both classication
and summarization outputs. Generally, BERT2BERT classier was
trained on small labeled datasets. The model can generate errors,
especially when used in long-ranging disasters or similar but dif-
ferent datasets. In this case, user feedback can be used to improve
performance by learning from errors. Besides, RATSUM returns
summaries based on an unsupervised approach, and the model
does not rely on any human-annotated data. The results might not
satisfy the need of end-users. Therefore, feedback can be useful to
evaluate and ne-tune the model.
3.2.1 Classification feedback. When classication results are dis-
played (as shown in Figure 3), users can give feedback for class
labels or extracted rationales. Figure 5 illustrates an example of the
feedback options:
Feedback on class labels: When BERT2BERT makes a wrong
prediction for a given tweet, users can select the correct label
from a drop-down list.
Feedback on rationales: rationales/evidence extracted by
BERT2BERT for the predicted class label are shown to the
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Thi Huyen Nguyen, Miroslav Shaltev, & Koustav Rudra
(a) Classication (b) Summarization
Figure 4: CrisICSum - Batch classication and summarization
Figure 5: User feedback - Class label and rationales
users. In case the rationale words are redundant, incorrect,
or not informative enough, users can help write down words
that are falsely predicted/dismissed.
3.2.2 Summarization feedback. On the summarization page (Figure
4b), there is an option for users to give feedback on the results. When
users click the button ‘summarization’, the extracted summary
by RATSUM model is displayed. It includes a short list of tweets.
We ask the users to evaluate the results from three aspects - (i).
informativeness (content richness of the summary), (ii). diversity
(ability to cover multiple aspects), and (iii). assistance to responders.
Users can give their input on a scale of 1-5, where 1 is not that
helpful, and 5 is the most useful.
Figure 6: User feedback - Summarization
All the feedback will be saved to our databases for further pur-
poses, such as model ne-tuning, evaluation, etc.
This paper presents CrisICSum, a platform to classify and summa-
rize Twitter messages during crisis events. CrisICSum employs a
recent interpretable by design classier and rationale-based sum-
marizer as back-end models. The platform allows gathering user
correction of errors on class labels and explanations at the clas-
sication phase. Besides, users can evaluate dierent aspects of
summarization results. The user feedbacks play an important role
in providing data and improving the performance of models on
new and long-ranging disaster events. Furthermore, we also make
CrisICSum an easy-integrated system of dierent classication and
summarization models. For future work, we plan to add an option
for users to automatically collect data during a specic crisis event
for classication and summarization.
This work was partially funded by the DFG Grant NI-1760/1-1,
and the European Union’s Horizon 2020 research and innovation
programme under grant agreement No. 101021866.
CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
[1] 2018. IncidentEye.
[2] 2019. Everbridge.
Fabian Abel, Claudia Hau, Geert-Jan Houben, Richard Stronkman, and Ke Tao.
2012. Twitcident: ghting re with information from social web streams. In
Proceedings of the 21st World Wide Web Conference, WWW 2012, Lyon, France,
April 16-20, 2012 (Companion Volume). ACM, 305–308.
Mark A. Cameron, Robert Power, Bella Robinson, and Jie Yin. 2012. Emergency
situation awareness from twitter for crisis management. In Proceedings of the
21st World Wide Web Conference, WWW 2012, Lyon, France, April 16-20, 2012
(Companion Volume). ACM, 695–698.
Muhammad Imran, Carlos Castillo, Ji Lucas, Patrick Meier, and Sarah Vieweg.
2014. AIDR: articial intelligence for disaster response. In 23rd International
World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, April 7-11, 2014,
Companion Volume. ACM, 159–162.
Muhammad Imran, Prasenjit Mitra, and Carlos Castillo. 2016. Twitter as a
Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages.
In Proceedings of the Tenth International Conference on Language Resources and
Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016. European Language
Resources Association (ELRA).
Ling Min Serena Khoo, Hai Leong Chieu, Zhong Qian, and Jing Jiang. 2020.
Interpretable Rumor Detection in Microblogs by Attending to User Interactions.
In The Thirty-Fourth AAAI Conference on Articial Intelligence, AAAI 2020, The
Thirty-Second Innovative Applications of Articial Intelligence Conference, IAAI
2020, The Tenth AAAI Symposium on Educational Advances in Articial Intelligence,
EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 8783–8790.
Rajdeep Mukherjee, Uppada Vishnu, Hari Chandana Peruri, Sourangshu Bhat-
tacharya, Koustav Rudra, Pawan Goyal, and Niloy Ganguly. 2022. MTLTS: A
Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related
Microblogs. In WSDM ’22: The Fifteenth ACM International Conference on Web
Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21 - 25, 2022.
ACM, 755–763.
Dat Tien Nguyen, Kamla Al-Mannai, Shaq R. Joty, Hassan Sajjad, Muhammad
Imran, and Prasenjit Mitra. 2017. Robust Classication of Crisis-Related Data
on Social Networks Using Convolutional Neural Networks. In Proceedings of the
Eleventh International Conference on Web andSo cial Media, ICWSM 2017, Montréal,
Québec, Canada, May 15-18, 2017. AAAI Press, 632–635.
Thi Huyen Nguyen and Koustav Rudra. 2022. Towards an Interpretable Approach
to Classify and Summarize Crisis Events from Microblogs. In WWW ’22: The
ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022. ACM,
Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, and Saptarshi
Ghosh. 2015. Extracting Situational Information from Microblogs during Disaster
Events: a Classication-Summarization Approach. In Proceedings of the 24th
ACM International Conference on Information and Knowledge Management, CIKM
2015, Melbourne, VIC, Australia, October 19 - 23, 2015. ACM, 583–592. https:
Ritesh Sarkhel, Moniba Keymanesh, Arnab Nandi, and Srinivasan Parthasarathy.
2020. Interpretable Multi-headed Attention for Abstractive Summarization at
Controllable Lengths. In Proceedings of the 28th International Conference on
Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December
8-13, 2020. International Committee on Computational Linguistics, 6871–6882.
TREC-IS. 2022. TREC Incident Streams: Enabling emergency services with media
István Varga, Motoki Sano, Kentaro Torisawa, Chikara Hashimoto, Kiyonori
Ohtake, Takao Kawai, Jong-Hoon Oh, and Stijn De Saeger. 2013. Aid is Out There:
Looking for Help from Tweets during a Large Scale Disaster. In Proceedings of
the 51st Annual Meeting of the Association for Computational Linguistics, ACL
2013, 4-9 August 2013, Soa, Bulgaria, Volume 1: Long Papers. The Association for
Computer Linguistics, 1619–1629.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The role of social media, in particular microblogging platforms such as Twitter, as a conduit for actionable and tactical information during disasters is increasingly acknowledged. However, time-critical analysis of big crisis data on social media streams brings challenges to machine learning techniques, especially the ones that use supervised learning. The Scarcity of labeled data, particularly in the early hours of a crisis, delays the machine learning process. The current state-of-the-art classification methods require a significant amount of labeled data specific to a particular event for training plus a lot of feature engineering to achieve best results. In this work, we introduce neural network based classification methods for binary and multi-class tweet classification task. We show that neural network based models do not require any feature engineering and perform better than state-of-the-art methods. In the early hours of a disaster when no labeled data is available, our proposed method makes the best use of the out-of-event data and achieves good results.
Full-text available
Microblogging platforms such as Twitter provide active communication channels during mass convergence and emergency events such as earthquakes, typhoons. During the sudden onset of a crisis situation, affected people post useful information on Twitter that can be used for situational awareness and other humanitarian disaster response efforts, if processed timely and effectively. Processing social media information pose multiple challenges such as parsing noisy, brief and informal messages, learning information categories from the incoming stream of messages and classifying them into different classes among others. One of the basic necessities of many of these tasks is the availability of data, in particular human-annotated data. In this paper, we present human-annotated Twitter corpora collected during 19 different crises that took place between 2013 and 2015. To demonstrate the utility of the annotations, we train machine learning classifiers. Moreover, we publish first largest word2vec word embeddings trained on 52 million crisis-related tweets. To deal with tweets language issues, we present human-annotated normalized lexical resources for different lexical variations.
Conference Paper
Full-text available
Microblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information is available in these sites; however , this information is immersed among hundreds of thousands of tweets, mostly containing sentiments and opinion of the masses, that are posted during such events. To effectively utilize microblog-ging sites during disaster events, it is necessary to (i) extract the situational information from among the large amounts of sentiment and opinion, and (ii) summarize the situational information, to help decision-making processes when time is critical. In this paper , we develop a novel framework which first classifies tweets to extract situational information, and then summarizes the information. The proposed framework takes into consideration the typi-calities pertaining to disaster events where (i) the same tweet often contains a mixture of situational and non-situational information, and (ii) certain numerical information, such as number of casualties , vary rapidly with time, and thus achieves superior performance compared to state-of-the-art tweet summarization approaches.
Conference Paper
Full-text available
We present AIDR (Artificial Intelligence for Disaster Response), a platform designed to perform automatic classification of crisis-related microblog communications. AIDR enables humans and machines to work together to apply human intelligence to large-scale data at high speed. The objective of AIDR is to classify messages that people post during disasters into a set of user-defined categories of information (e.g., "needs", "damage", etc.) For this purpose, the system continuously ingests data from Twitter, processes it (i.e., using machine learning classification techniques) and leverages human-participation (through crowdsourcing) in real-time. AIDR has been successfully tested to classify informative vs. non-informative tweets posted during the 2013 Pakistan Earthquake. Overall, we achieved a classification quality (measured using AUC) of 80%. AIDR is available at
Conference Paper
Full-text available
The 2011 Great East Japan Earthquake caused a wide range of problems, and as countermeasures, many aid activities were carried out. Many of these problems and aid activities were reported via Twitter. However, most problem reports and corre-sponding aid messages were not success-fully exchanged between victims and lo-cal governments or humanitarian organi-zations, overwhelmed by the vast amount of information. As a result, victims could not receive necessary aid and humanitar-ian organizations wasted resources on re-dundant efforts. In this paper, we propose a method for discovering matches between problem reports and aid messages. Our system contributes to problem-solving in a large scale disaster situation by facilitat-ing communication between victims and humanitarian organizations.
We address rumor detection by learning to differentiate between the community's response to real and fake claims in microblogs. Existing state-of-the-art models are based on tree models that model conversational trees. However, in social media, a user posting a reply might be replying to the entire thread rather than to a specific user. We propose a post-level attention model (PLAN) to model long distance interactions between tweets with the multi-head attention mechanism in a transformer network. We investigated variants of this model: (1) a structure aware self-attention model (StA-PLAN) that incorporates tree structure information in the transformer network, and (2) a hierarchical token and post-level attention model (StA-HiTPLAN) that learns a sentence representation with token-level self-attention. To the best of our knowledge, we are the first to evaluate our models on two rumor detection data sets: the PHEME data set as well as the Twitter15 and Twitter16 data sets. We show that our best models outperform current state-of-the-art models for both data sets. Moreover, the attention mechanism allows us to explain rumor detection predictions at both token-level and post-level.
In this paper, we present Twitcident, a framework and Web-based system for filtering, searching and analyzing information about real-world incidents or crises. Twitcident connects to emergency broadcasting services and automatically starts tracking and filtering information from Social Web streams (Twitter) when a new incident occurs. It enriches the semantics of streamed Twitter messages to profile incidents and to continuously improve and adapt the information filtering to the current temporal context. Faceted search and analytical tools allow users to retrieve particular information fragments and overview and analyze the current situation as reported on the Social Web. Demo: