Content uploaded by Huyen Nguyen
Author content
All content in this area was uploaded by Huyen Nguyen on Oct 20, 2022
Content may be subject to copyright.
CrisICSum: Interpretable Classification and Summarization
Platform for Crisis Events from Microblogs
Thi Huyen Nguyen∗
nguyen@l3s.de
L3S Research Center
Hanover, Germany
Miroslav Shaltev∗
shaltev@l3s.de
L3S Research Center
Hanover, Germany
Koustav Rudra
koustav@iitism.ac.in
Indian Institute of Technology
(Indian School of Mines)
Dhanbad, India
ABSTRACT
Microblogging platforms such as Twitter, receive massive messages
during crisis events. Real-time insights are crucial for emergency
response. Hence, there is a need to develop faithful tools for e-
ciently digesting information. In this paper, we present CrisICSum,
a platform for classication and summarization of crisis events.
The objective of CrisICSum is to classify user posts during disaster
events into dierent humanitarian classes (i.e., damage, aected
people, etc.) and generate summaries of class-level messages. Unlike
existing systems, CrisICSum employs an interpretable by design
backend classier. It can generate explanations for output decisions.
Besides, the platform allows user feedback on both classication and
summarization phases. CrisICSum is designed and run as an easily
integrated web application. Backend models are interchangeable.
The system can assist users and human organizations in improving
response eorts during disaster situations. CrisICSum is available
at https://crisicsum.l3s.uni-hannover.de
CCS CONCEPTS
•Computing methodologies
→
Learning from critiques;•
Human-centered computing
→
User studies;•Information
systems →Clustering and classication;Summarization.
KEYWORDS
Learning with feedback, classication, summarization, crisis events
ACM Reference Format:
Thi Huyen Nguyen, Miroslav Shaltev, and Koustav Rudra. 2022. CrisICSum:
Interpretable Classication and Summarization Platform for Crisis Events
from Microblogs. In Proceedings of the 31st ACM International Conference on
Information and Knowledge Management (CIKM ’22), October 17–21, 2022,
Atlanta, GA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.
1145/3511808.3557191
1 INTRODUCTION
During crisis events, a signicant amount of information is posted
on social media platforms. The real-time information on these sites
∗Both authors contributed equally to the paper
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9236-5/22/10.. . $15.00
https://doi.org/10.1145/3511808.3557191
Figure 1: CrisICSum - Overall Architechture
is essential for situational awareness and aid responses [
4
,
14
]. How-
ever, not all the messages posted during disasters are relevant. The
overwhelming data make it dicult for humans to digest infor-
mation quickly. Beside detecting crisis-related messages, human
organizations and aid responders usually want to obtain short up-
dates at granular levels such as damages, aected people, needs,
etc. Many research groups [
3
,
5
,
8
,
13
] have developed methods
and tools that apply machine learning techniques on social media
texts for disaster response. However, these works mainly focus on
the individual task of classication or summarization. Besides, it is
quite opaque how the models come to make predictions. There also
exist proprietary solutions [
1
,
2
], but their system behaviors are
also non-transparent. Apart from that, it is dicult to customize
and update such applications because they are not open source.
Recently, there has been an increasing interest in designing in-
terpretable models that have a trade-o between performance and
explainability. Some recent studies proposed interpretable classi-
cation and summarization approaches [
7
,
10
,
12
]. The models
allow generating results with high accuracy along with human-
understandable explanation behind the model outputs. Generally,
accurately predictive and transparent models are more usable and
defensible for many application domains such as health care and
societal problems.
Designing ecient models for predictions and summarization
of crisis events from Microblogs such as Twitter in real-time is a
challenging task. Tweets are short, noisy, and much dierent from
sentences from news articles. Hence, models that are originally
designed for formal text datasets do not usually perform well on
Twitter texts. Besides, crisis events evolve over time, with a large
volume of new messages posted every day. The performance of
models trained on a limited dataset can drop signicantly over
time when applying to new data or similar disaster texts. It makes
end-users lose their trust and expectation in using the models. The
4941
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Thi Huyen Nguyen, Miroslav Shaltev, & Koustav Rudra
Figure 2: BERT2BERT architecture
issue can partly be solved by learning and updating models using
error correction data from user feedback. Furthermore, existing
summarization studies mainly apply an unsupervised approach due
to the unavailable labeled data for training. There is also a lack of
gold summaries for summarization evaluation. Human coordination
can help improve existing models’ performance or provide data for
supervised approach and evaluation.
This paper presents CrisICSum, an interpretable open-source
tool for emergency response. CrisICSum considers the two im-
portant tasks of crisis events, which are classication and summa-
rization, and allows human coordination for error correction of
prediction and explanation. The overall architecture of CrisICSum
is illustrated in Figure 1. CrisICSum takes messages posted at the
time of disaster events and classies tweets into dierent humani-
tarian classes such as ‘infrastructure damage’, ‘caution and advice’,
‘aected people and evacuation’, etc. These classes are dened and
used by United Nations Oce for the Coordination of Humani-
tarian Aairs (UN OCHA) and in many previous studies [
6
,
10
].
End-users can then select to generate summaries for the data of any
specied class. Currently, we use BERT2BERT and RATSUM [
10
]
as back-end classication and summarization models. BERT2BERT
predicts class labels and extracts small snippets, so-called ratio-
nales, from the original inputs as explanations/evidence for the
output decisions. Then, RATSUM uses the input data along with
rationales to generate summaries of tweets in each class. CrisIC-
Sum also allows to integrate other classication and summarization
algorithms [7, 9–12] as back-end models.
Our paper is organized as follows. In section 2, we discuss the
methodological background of BERT2BERT and RATSUM. Then,
we introduce our CrisICSum architecture in detail in section 3.
Finally, we conclude our paper in section 4.
2 METHODOLOGICAL BACKGROUND
In this section, we briey describe the methodology of BERT2BERT
and RATSUM models [10].
Constraint Description
Í𝑇
𝑗=1𝑡𝑗·𝐿𝑒𝑛𝑔𝑡ℎ (𝑗) ≤ 𝑀𝐿𝑒𝑛𝑔𝑡ℎ (𝑗): number of words in tweet 𝑗
𝑀: user-dened summary word length
Í𝑗∈𝑇𝑖𝑡𝑗≥𝑢𝑖,𝑖 =[1· · · 𝑈]𝑇𝑖: set of tweets containing rationale 𝑖
Í𝑖∈𝑅𝑗𝑢𝑖≥ |𝑅𝑗| × 𝑡𝑗, 𝑗 =[1· · · 𝑇]𝑅𝑗: set of rationale words in 𝑗
Table 1: RATSUM constraints
2.1 Interpreatable BERT2BERT classication
BERT2BERT is an interpretable by design pipeline model, which
consists of two prediction stages. The rst stage applies a multi-
task learning approach. It learns to predict class labels and extract
evidence, so-called rationales, simultaneously. The model has a
shared BERTweet encoder and two prediction decoders. Input
tweets are pre-processed and fed to the shared encoder. The rst
decoder is a fully connected Softmax layer. It takes embedding
representation of rst [CLS] token from BERTweet encoder and
learns to minimize a cross-entropy loss function
L𝑡𝑎𝑠𝑘 1
. The second
decoder is a binary classier, which predicts whether a token is a
part of rationales. The decoder consists of a GRU (Gated Recurrent
Unit) layer and a Sigmoid output layer. The loss value is computed
based on a weighted binary cross-entropy function
L𝑡𝑎𝑠𝑘 2
, where
the class weights are inverse frequencies of rationale and non-
rationale tokens from input texts.
Finally, the multi-task classier at the rst learning stage jointly
optimizes the two losses.
L=L𝑡𝑎𝑠𝑘 1+𝜆L𝑡𝑎𝑠𝑘2(1)
where 𝜆is used to control the weight between the two losses.
The second BERT2BERT stage extracts nal classication labels
of input tweets. It takes the extracted rationales from the previous
step as inputs and applies a simple BERTweet-based classier for
prediction. This learning phase makes BERT2BERT an interpretable
by design model. It is transparent to end-users that rationales are
truly evidence for the output decisions.
2.2 RATSUM Summarization
RATSUM is an extractive summarization model. Given a set of
tweets along with class labels and rationales in a user-specied
time window, RATSUM extracts the most representative and di-
verse tweets as summaries. The model applies an Integer Linear
Programming (ILP) approach by optimizing the following objective
function with constraints in Table 1.
𝑚𝑎𝑥 (
𝑇
𝑗=1
𝑡𝑗+
𝑈
𝑖=1
𝑆(𝑖).𝑢𝑖)(2)
where:
𝑇
is the set of tweets,
𝑡𝑗∈ {
0
,
1
}
species if a tweet
𝑗
is selected.
𝑈
is the set of unique rationales/numerals in
𝑇
,
𝑢𝑖∈
{
0
,
1
}
indicates whether a rationale/numeral
𝑖
is chosen.
𝑆(𝑖)
is
the importance score of a word
𝑖
determined by the logarithm of
document frequency.
4942
CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
Figure 3: CrisICSum - Interactive Classication
3 SYSTEM ARCHITECHTURE
This section presents the architecture of CrisICSum system.
3.1 System overview
CrisICSum can be run as a web application. BERT2BERT classier
is trained on a NVIDIA GTX 1080Ti machine. The trained model
is saved and loaded on CPU. RATSUM is an unsupervised summa-
rizer that does not require any labeled data for training. Output
summaries are computed and extracted on CPU. Both BERT2BERT
and RATSUM are implemented in Python. Currently, BERT2BERT
is trained for two disaster types, such as typhoons or earthquakes.
However, other trained models can be added to our CrisICSum
system. Besides, the web application design of CrisICSum ensures
that BERT2BERT and RATSUM can be replaceable by dierent classi-
cation and summarization models.
Figure 1 shows an overall architecture of CrisICSum. The main
objective of our system is to classify messages posted on social me-
dia during disaster events into dierent humanitarian classes and
then generate short updates of user-specied length for each class.
We allow users to perform interactive classication and summa-
rization. For interactive classication, end-users are asked to enter
an input tweet, event type, and event location as shown in Figure 3.
At this point, CrisICSum employs BERT2BERT which was trained
on labeled datasets (Nepal Earthquake or Typhoon Hagupit). When
users choose an earthquake or typhoon type, CrisICSum loads the
trained model on Nepal earthquake or Typhoon Hagupit dataset,
respectively, for prediction. The current BERT2BERT model trained
on an event (i.e., Nepal earthquake) can eectively make predic-
tions on a similar event (i.e., Mexico earthquake) [
10
]. CrisICSum
can be easily customized to load dierent trained models for var-
ious event types. When all the inputs are summited, CrisICSum
returns a class label for the input tweet and rationale snippets as
evidence/explanation for the output label (below part in Figure
3). Users can send feedback for classication or go back to the
main page. For summarization, end-users give the system a le of
tweets (Figure 4a). Besides, users are asked to choose event type,
event location and press the classication button. After that, the
corresponding classication model is called to classify the input
tweets and extract rationales. Users can choose to download the
classied tweets for further purposes and leave the system or go to
summarization. At the summarization step, we ask users to choose
a class to be summarized, enter the output limit and click the sum-
marization button. Our system returns a word cloud of extracted
rationales from the chosen class along with a list of the most infor-
mative and diverse tweets as a summary (Figure 4b). Then, users
can send feedback on the result, submit other les, or go back to
the main page.
3.2 User feedback
Our CrisICSum system allows user feedback for both classication
and summarization outputs. Generally, BERT2BERT classier was
trained on small labeled datasets. The model can generate errors,
especially when used in long-ranging disasters or similar but dif-
ferent datasets. In this case, user feedback can be used to improve
performance by learning from errors. Besides, RATSUM returns
summaries based on an unsupervised approach, and the model
does not rely on any human-annotated data. The results might not
satisfy the need of end-users. Therefore, feedback can be useful to
evaluate and ne-tune the model.
3.2.1 Classification feedback. When classication results are dis-
played (as shown in Figure 3), users can give feedback for class
labels or extracted rationales. Figure 5 illustrates an example of the
feedback options:
•
Feedback on class labels: When BERT2BERT makes a wrong
prediction for a given tweet, users can select the correct label
from a drop-down list.
•
Feedback on rationales: rationales/evidence extracted by
BERT2BERT for the predicted class label are shown to the
4943
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA Thi Huyen Nguyen, Miroslav Shaltev, & Koustav Rudra
(a) Classication (b) Summarization
Figure 4: CrisICSum - Batch classication and summarization
Figure 5: User feedback - Class label and rationales
users. In case the rationale words are redundant, incorrect,
or not informative enough, users can help write down words
that are falsely predicted/dismissed.
3.2.2 Summarization feedback. On the summarization page (Figure
4b), there is an option for users to give feedback on the results. When
users click the button ‘summarization’, the extracted summary
by RATSUM model is displayed. It includes a short list of tweets.
We ask the users to evaluate the results from three aspects - (i).
informativeness (content richness of the summary), (ii). diversity
(ability to cover multiple aspects), and (iii). assistance to responders.
Users can give their input on a scale of 1-5, where 1 is not that
helpful, and 5 is the most useful.
Figure 6: User feedback - Summarization
All the feedback will be saved to our databases for further pur-
poses, such as model ne-tuning, evaluation, etc.
4 CONCLUSION
This paper presents CrisICSum, a platform to classify and summa-
rize Twitter messages during crisis events. CrisICSum employs a
recent interpretable by design classier and rationale-based sum-
marizer as back-end models. The platform allows gathering user
correction of errors on class labels and explanations at the clas-
sication phase. Besides, users can evaluate dierent aspects of
summarization results. The user feedbacks play an important role
in providing data and improving the performance of models on
new and long-ranging disaster events. Furthermore, we also make
CrisICSum an easy-integrated system of dierent classication and
summarization models. For future work, we plan to add an option
for users to automatically collect data during a specic crisis event
for classication and summarization.
ACKNOWLEDGMENTS
This work was partially funded by the DFG Grant NI-1760/1-1,
and the European Union’s Horizon 2020 research and innovation
programme under grant agreement No. 101021866.
4944
CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs CIKM ’22, October 17–21, 2022, Atlanta, GA, USA
REFERENCES
[1] 2018. IncidentEye. https://www.incidenteye.com/
[2] 2019. Everbridge. https://www.everbridge.com/products/crisis-management/
[3]
Fabian Abel, Claudia Hau, Geert-Jan Houben, Richard Stronkman, and Ke Tao.
2012. Twitcident: ghting re with information from social web streams. In
Proceedings of the 21st World Wide Web Conference, WWW 2012, Lyon, France,
April 16-20, 2012 (Companion Volume). ACM, 305–308. https://doi.org/10.1145/
2187980.2188035
[4]
Mark A. Cameron, Robert Power, Bella Robinson, and Jie Yin. 2012. Emergency
situation awareness from twitter for crisis management. In Proceedings of the
21st World Wide Web Conference, WWW 2012, Lyon, France, April 16-20, 2012
(Companion Volume). ACM, 695–698. https://doi.org/10.1145/2187980.2188183
[5]
Muhammad Imran, Carlos Castillo, Ji Lucas, Patrick Meier, and Sarah Vieweg.
2014. AIDR: articial intelligence for disaster response. In 23rd International
World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, April 7-11, 2014,
Companion Volume. ACM, 159–162. https://doi.org/10.1145/2567948.2577034
[6]
Muhammad Imran, Prasenjit Mitra, and Carlos Castillo. 2016. Twitter as a
Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages.
In Proceedings of the Tenth International Conference on Language Resources and
Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016. European Language
Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/
summaries/842.html
[7]
Ling Min Serena Khoo, Hai Leong Chieu, Zhong Qian, and Jing Jiang. 2020.
Interpretable Rumor Detection in Microblogs by Attending to User Interactions.
In The Thirty-Fourth AAAI Conference on Articial Intelligence, AAAI 2020, The
Thirty-Second Innovative Applications of Articial Intelligence Conference, IAAI
2020, The Tenth AAAI Symposium on Educational Advances in Articial Intelligence,
EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 8783–8790.
https://ojs.aaai.org/index.php/AAAI/article/view/6405
[8]
Rajdeep Mukherjee, Uppada Vishnu, Hari Chandana Peruri, Sourangshu Bhat-
tacharya, Koustav Rudra, Pawan Goyal, and Niloy Ganguly. 2022. MTLTS: A
Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related
Microblogs. In WSDM ’22: The Fifteenth ACM International Conference on Web
Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21 - 25, 2022.
ACM, 755–763. https://doi.org/10.1145/3488560.3498536
[9]
Dat Tien Nguyen, Kamla Al-Mannai, Shaq R. Joty, Hassan Sajjad, Muhammad
Imran, and Prasenjit Mitra. 2017. Robust Classication of Crisis-Related Data
on Social Networks Using Convolutional Neural Networks. In Proceedings of the
Eleventh International Conference on Web andSo cial Media, ICWSM 2017, Montréal,
Québec, Canada, May 15-18, 2017. AAAI Press, 632–635. https://aaai.org/ocs/
index.php/ICWSM/ICWSM17/paper/view/15655
[10]
Thi Huyen Nguyen and Koustav Rudra. 2022. Towards an Interpretable Approach
to Classify and Summarize Crisis Events from Microblogs. In WWW ’22: The
ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022. ACM,
3641–3650. https://doi.org/10.1145/3485447.3512259
[11]
Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, and Saptarshi
Ghosh. 2015. Extracting Situational Information from Microblogs during Disaster
Events: a Classication-Summarization Approach. In Proceedings of the 24th
ACM International Conference on Information and Knowledge Management, CIKM
2015, Melbourne, VIC, Australia, October 19 - 23, 2015. ACM, 583–592. https:
//doi.org/10.1145/2806416.2806485
[12]
Ritesh Sarkhel, Moniba Keymanesh, Arnab Nandi, and Srinivasan Parthasarathy.
2020. Interpretable Multi-headed Attention for Abstractive Summarization at
Controllable Lengths. In Proceedings of the 28th International Conference on
Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December
8-13, 2020. International Committee on Computational Linguistics, 6871–6882.
https://doi.org/10.18653/v1/2020.coling-main.606
[13]
TREC-IS. 2022. TREC Incident Streams: Enabling emergency services with media
data. http://dcs.gla.ac.uk/~richardm/TREC_IS/
[14]
István Varga, Motoki Sano, Kentaro Torisawa, Chikara Hashimoto, Kiyonori
Ohtake, Takao Kawai, Jong-Hoon Oh, and Stijn De Saeger. 2013. Aid is Out There:
Looking for Help from Tweets during a Large Scale Disaster. In Proceedings of
the 51st Annual Meeting of the Association for Computational Linguistics, ACL
2013, 4-9 August 2013, Soa, Bulgaria, Volume 1: Long Papers. The Association for
Computer Linguistics, 1619–1629. https://aclanthology.org/P13-1159/
4945