Content uploaded by Gerhard Backfried
Author content
All content in this area was uploaded by Gerhard Backfried on Jan 29, 2018
Content may be subject to copyright.
Sentiment Analysis of Media in German on the
Refugee Crisis in Europe
Gerhard Backfried and Gayane Shalunts
SAIL LABS Technology GmbH
Vienna, Austria
{gerhard.backfried, gayane.shalunts}@sail-labs.com
Abstract. Since the summer of 2015, the refugee crisis in Europe has
grown to be one of the biggest challenges Europe has faced since WW2.
The development of this humanitarian crisis are the topic of discussions
throughout Europe and covered by media on a daily basis. Germany
in particular has been the focus of migration. Over time, in Germany
and the neighboring German speaking countries a shift could be ob-
served, from the initial hospitable Willkommenskultur (welcome culture),
to more reserved and skeptical points of view. These factors - Germany as
the prime-destination for migrants, as well as a shift in public perception
and media coverage - are the motivation for our analysis. The current
article investigates the coverage of this crisis on traditional and social
media, employing sentiment analysis to detect tendencies and relates
these to real-world events. To this end, sentiment analysis was applied
to textual documents of a data-set collected from relevant and highly cir-
culated German, Austrian and Swiss traditional media sources and from
social media in the course of six months from October 2015 to March of
2016.
Key words: Sentiment analysis, media analysis, refugee crisis
1 Introduction
Sentiment Analysis (SA) tackles the problem of determining the objectivity or
polarity of the input. The main parameters defining the scope of an SA method
are the target language, domain and media type (traditional or social media).
The most common application is the monitoring of public opinions in marketing
(product reviews) and politics (election campaigns). Whereas the research field
is active, most publications are limited to the domains of movie and product
reviews in English. SA-approaches can be divided into two broad categories:
machine learning and lexicon-based ones. Machine learning methods are im-
plemented as supervised binary (positive/negative) classification approaches, in
which classifiers are trained on labeled data [1, 2]. The dependency on a labeled
dataset is considered a major drawback, as labeling is usually costly and im-
possible in some cases. In contrast, lexicon-based methods [3] use a predefined
set of patterns (sentiment dictionary or lexicon) associating each entry with a
specific sentiment and score and do not require any labeled training data. Here
2 Gerhard Backfried and Gayane Shalunts
the challenge lies in designing an appropriate lexicon for the target domain. A
comparison of eight state-of-the-art SA methods (SentiWordNet [4], SASA [5],
PANAS-t [6], Emoticons, SentiStrength [7], LIWC [8], SenticNet [9] and Happi-
ness Index [10]) is performed in [1]. All experiments are carried out using two
English datasets of Online Social Networks messages. The authors report that
the examined methods have different levels of applicability on real-world events
and vary widely in their agreement on the predicted polarity. The authors in [11]
also limit their work to English, but target the domain of news. SA is applied
in the context of the refugee crisis to tweets in English by [12]. In general, the
number of SA approaches for languages other than English is limited. Sentimen-
tWS [13] and [14] analyze textual data in German. In the present paper, the
state-of-the-art SA tool SentiSAIL [15] is employed as it supports the process-
ing of content in German and has been adapted to the domain of news articles,
specifically to news on disasters and crises [16].
The analysis of sources in German is motivated by the fact that Germany
and Austria are affected by the refugee crises to a great extent. Swiss sources in
German were also included due to their proximity, even though the situation in
Switzerland is different. The period of time covered by this paper corresponds to
an important period, when the initial, enthusiastic welcome-culture was slowly
fading and being replaced by more concerned opinions as to whether the affected
countries would be able to cope with the massive influx of refugees. The sources
covered reflect traditional and social media and include the leading news outlets
of the three countries, as well as a variety of accounts from Twitter and Facebook
(only publicly available information was processed!).
The current article makes the following contributions: (i) presents an auto-
matically compiled corpus of texts from traditional and social media in German,
covering the refugee crisis over a period of six months from October 2015 to
March 2016, (ii) investigates the temporal development of sentiment across the
different sources and types of media, (iii) identifies the most prominent sources
and differences in their behavior across the period. The remainder of the paper is
organized as follows: Section 2 clarifies the methodology of SentiSAIL. Section 3
presents the corpora, empirical setup and findings. Section 4 concludes the work
and proposes alleys for further research.
2 SentiSAIL Lexicon-based Approach
SentiSAIL is a multilingual SA tool addressing the domain of general news and
particularly the coverage of disasters/crises [17, 15]. It is based on one of the
state-of-the-art SA methods, SentiStrength [7] and integrated into the SAIL
LABS Media Mining System (MMS) for Open Source Intelligence (OSINT) [18].
SentiSAIL addresses content from traditional and social media in a variety of lan-
guages (English, German, Russian, Spanish, French and Arabic). Performance
of SentiSAIL on a trilingual traditional media corpus is reported in [15]. Sen-
tiSAIL was also used to analyze social media data in German concerning the
European floods 2013 [17]. Like [3], it employs a lexicon-based approach, us-
ing lexicons of words associated with scores of positive or negative orientation.
Sentiment Analysis of Media in German on the Refugee Crisis in Europe 3
Features such as stemming, boosting (intensification or weakening), negation as
well as the scoring of phrases and idioms aim to model the structure and se-
mantics of the language. Whereas SentiStrength is optimized for and evaluated
on social media content, SentiSAIL targets both social and traditional media
data. Social media features are parameterized and may be disabled during tra-
ditional media processing. SentiStrength and SentiSAIL features are compared
in [15] on a proprietary traditional media corpus, reporting SentiSAIL’s perfor-
mance improvement to be moderate for English and considerable for German
and Russian. SentiSAIL, like [19], solves a dual classification task by assigning a
text into one of the following 4 classes: positive,negative,mixed (both positive
and negative) or neutral (neither positive, nor negative). The dual classification
scheme is motivated by the ability of humans to experience positive and negative
emotions simultaneously [20]. The classification of input text is performed in a
3-step process as outlined in [15].
3 Experimental Setup and Results
The corpus of documents in German covering the humanitarian crisis of refugees
in Europe was compiled using the SAIL LABS MMS, a system for the collection
and processing of data from open sources [18]. It spans documents from tradi-
tional media (Web-Feeds and -pages) and from social media (Twitter, Facebook)
covering the period from October 2015 to March 2016. Documents from tradi-
tional media comprise 48733 articles from 68 of the most circulated traditional
media sources in Germany, Austria and Switzerland. The social media corpus
contains 16593 tweets, posts and comments from Twitter and Facebook from a
total of 5996 accounts. All documents were selected by using keywords based
on the German word Fl¨uchtling (refugee). The same words were subsequently
excluded from SA to avoid negative bias.
Fig. 1 and Fig. 2 present a break-down of the percentages of documents
of three sentiment classes for traditional and social media. All texts pertain-
ing to the mixed class are considered as half-positive and half-negative for all
evaluations and visualizations. They clearly display the higher percentage of
sentiment-laden content on social media, where the neutral class only accounts
for 21% compared to 67% on the traditional media. Fig. 3 displays the percentage
of positive, negative and neutral articles compared to the overall volume of tra-
ditional media articles per day. Over time, a slight upward trend can be observed
for both, the negative and positive classes, with a more pronounced rise of nega-
tive documents possibly indicating more polarized reporting. The percentage of
neutral documents decreases accordingly. Fig. 4 is the equivalent chart of Fig. 3
for social media. A slight rise in negative posts can be observed over time, while
positive posts decline and neutral ones stay on approximately the same level.
The percentages of positive and negative posts are both constantly higher than
on traditional media, confirming that sentiment is generally expressed more ac-
tively on social media (a comments section of traditional news typically behaves
like social media in this respect). Overall, the dominating temporal sentiment
in traditional media is neutral whereas social media are dominated by negative
4 Gerhard Backfried and Gayane Shalunts
Fig. 1. Sentiment distribution in tra-
ditional media
Fig. 2. Sentiment distribution in social
media
sentiment. This trend persists throughout the observed period and may be ex-
plained by the tendency of traditional media to provide rather unbiased coverage,
whereas content on social media tends to be sentiment-laden. This tendency is
in line with the findings of [21], who report on social media reactions to news on
traditional media.
It is difficult to precisely relate all positive and negative peaks in sentiment for
traditional and social media to real-world events; however, the following events
may be related to those peaks 1.
– October 29 2015: (negative) Pegida 2demonstrations attacking Germany
politicians Angela Merkel and Sigmar Gabriel.
– November 21 2015: (negative) cancelation of the soccer-match between Ger-
many and France in Hannover
– December 15 2015 (negative) left-extremist demonstrations and clashes be-
tween demonstrators and police in Leipzig
– January 1 2016: (negative) sexual assaults by migrants during New Year’s
celebrations in Cologne
– February 7 2016: (negative) Pegida Aktionstag (day of action)
– March 22 2016: (negative) terror attacks at Brussels Airport
Of the above events, the sexual assaults committed during the New Year’s
celebrations in Cologne likely had the greatest impact on media coverage and
also resulted in legal action by the German state. However, several other key
events which happened during the period - e.g. Austria’s introduction of upper-
limits (Feb 19), the effective closing of the Balkan route (March 10) or a summit
with Turkey (March 18) - do not seem to have left direct traces on sentiment.
Table 1 displays an overview of the five most active (most articles) sources
per country 3. The ratio of negative to positive articles is most pronounced for
1Information about these events has been taken from http://zeitstrahl-
߬uchtlingskrise.org, providing excellent coverage and history of events concerning
the refugee crisis, accessed on 2016/06/08
2Pegida: Patriotische Europ¨aer gegen Islamisierung des Abendlandes (Patriotic Eu-
ropeans Against the Islamisation of the West), www.pegida.de
3Germany: Passauer Neue Presse, Frankfurter Allgemeine, Focus, Welt, Spiegel; Aus-
tria: Der Standard, Kleine Zeitung, Salzburger Nachrichten, Die Presse, Wiener
Sentiment Analysis of Media in German on the Refugee Crisis in Europe 5
Fig. 3. Relative sentiment of traditional media
Fig. 4. Relative sentiment social media
Germany with Swiss and Austrian newspapers exhibiting a much lower ratio.
The percentage of neutral articles is similar for all three countries, indicating
that objectiveness is approximately the same for the most active sources. The
percentage of positive articles is approximately the same for the three countries;
the percentage of negative articles slightly higher for Germany. Passau, at the
border of Germany and Austria became a hot-spot for migrants crossing into
Germany, which may explain the unusually high number of articles produced
by the Passauer Neue Presse. Based in the Austrian province bordering Ger-
many, the Ober¨osterreiche Nachrichten is the Austrian paper with the highest
percentage of positive articles. The General-Anzeiger Bonn and 20 Minuten are
Zeitung; Switzerland: Neue Z¨urcher Zeitung, Aargauer Zeitung, Tagesanzeiger,
Basler Zeitung, 20 Minuten
Germany Austria Switzerland
Number of articles top-5 sources 13280 7737 3966
Avg ratio negative/positive 120.08 24.31 58.29
Avg % neutral 65% 69% 70%
Avg % negative 34% 30% 29%
Avg % positive 1% 2% 1%
Table 1. Statistics for top-5 active sources per country
6 Gerhard Backfried and Gayane Shalunts
the most positive papers for Germany and Switzerland respectively. The three
papers with the highest circulation - Bild (Germany), Kronenzeitung (Austria)
and Tagesanzeiger (Switzerland) - dominate news distribution and are known
to have a large impact on public opinion. Bild and Kronenzeitung exhibit a
slightly more positive tendency than the top-5 most active papers in the respec-
tive countries, whereas Tagesanzeiger is slightly more negative than the average
Swiss papers.
4 Conclusion and Future Work
The paper presented the results of applying SA to a corpus of textual data cov-
ering the European refugee crisis during the period of October 2015 to March
2016. The distribution of sentiment in traditional media was substantially dif-
ferent from that of social media, with more neutral content being published in
traditional media. Both types of media show a tendency for negative content
to increase over time. On social media, a decline of positive posts could be ob-
served. These changes may be related to the general shift of attitudes towards
refugees over the observed period. The percentages of positive, negative and
neutral sentiment for the five most active news sources in Germany, Austria and
Switzerland are similar, with German sources exhibiting slightly more negative
articles. Several real-world events could be connected to the local maxima in sen-
timent values. Other notable events were not directly reflected in the sentiment
of articles and posts. Future work will include the analysis of actors and voices
on social media to gain further insights on how these are linked to traditional
media and on differences detected between actors from the different countries.
References
1. Gon¸calves, P., Ara´ujo, M., Benevenuto, F., Cha, M.: Comparing and Combining
Sentiment Analysis Methods. In: Proc. of the 1st ACM Conference on Online
Social Networks (COSN 2013), Boston, USA, ACM (2013) 27–38
2. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment Classification Us-
ing Machine Learning Techniques. In: Proc. of the ACL conference on Empirical
Methods in Natural Language Processing (EMNLP ’02), Philadelphia, PA, USA
(2002) 79–86
3. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based Methods
for Sentiment Analysis. Computational Linguistics 37(2) (2011) 267–307
4. Esuli, A., Sebastiani, F.: SENTIWORDNET: A Publicly Available Lexical Re-
source for Opinion Mining. In: Proc. of the 5th Conference on Language Resources
and Evaluation (LREC06). (2006) 417–422
5. Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A System for Real-
time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle. In: ACL
(System Demonstrations). (2012) 115–120
6. Gon¸calves, P., Benevenuto, F., Cha, M.: PANAS-t: A Pychometric Scale for Mea-
suring Sentiments on Twitter. CoRR abs/1308.1857 (2013)
7. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment Strength
Detection in Short Informal Text. J. American Society for Information Science
and Technology 61(12) (2010) 2544–2558
Sentiment Analysis of Media in German on the Refugee Crisis in Europe 7
8. Tausczik, Y.R., Pennebaker, J.W.: The Psychological Meaning of Words: LIWC
and Computerized Text Analysis Methods. Journal of Language and Social Psy-
chology 29(1) (2010) 25–54
9. Cambria, E., Speer, R., Havasi, C., Hussain, A.: SenticNet: A Publicly Available
Semantic Resource for Opinion Mining. In: AAAI Fall Symposium: Commonsense
Knowledge. (2010) 14–18
10. Dodds, P.S., Danforth, C.M.: Measuring the happiness of large-scale written ex-
pression: songs, blogs, and presidents. Journal of Happiness Studies 11(4) (2009)
441–456
11. Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., van der Goot, E., Halkia,
M., Pouliquen, B., Belyaeva, J.: Sentiment Analysis in the News. In: Proc. of the
7th International Conference on Language Resources and Evaluation (LREC’10),
Valletta, Malta, ELRA (2010)
12. Coletto, M., Esuli, A., Lucchese, C., Muntean, C.I., Nardini, F.M., Perego, R.,
Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks:
Perception of the mediterranean refugees crisis. In: Workshop on Social Network
Analysis Surveillance Technologies (SNAST 16), San Francisco, USA (2016)
13. Remus, R., Quasthoff, U., Heyer, G.: SentiWS - a German-language Resource for
Sentiment Analysis. In: Proc. of the 7th International Conference on Language
Resources and Evaluation (LREC), Valletta, Malta (2010) 1168–1171
14. Momtazi, S.: Fine-grained German Sentiment Analysis on Social Media. In:
Proc. of the 8th International Conference on Language Resources and Evaluation
(LREC’12), Istanbul, Turkey, ELRA (2012) 1215–1220
15. Shalunts, G., Backfried, G.: SentiSAIL: Sentiment Analysis in English, German
and Russian. In: Proc. of the 11th International Conference on Machine Learning
and Data Mining. MLDM ’15, Hamburg, Germany (2015) 87–97
16. Backfried, G., G¨ollner, J., Quirchmayr, G., Rainer, K., Kienast, G., G.Thallinger,
Schmidt, C., Peer, A.: Integration of Media Sources for Situation Analysis in the
Different Phases of Disaster Management: The QuOIMA Project. In: Proc. of
European Intelligence and Security Informatics Conference (EISIC ’13), Uppsala,
Sweden (2013) 143–146
17. Shalunts, G., Backfried, G., Prinz, K.: Sentiment Analysis of German Social Media
Data for Natural Disasters. In: Proc. of the 11th International Conference on
Information Systems for Crisis Response and Management (ISCRAM), University
Park, Pennsylvania, USA (2014) 752–756
18. Backfried, G., Schmidt, C., Pfeiffer, M., Quirchmayr, G., Glanzer, M., Rainer,
K.: Open source intelligence in disaster management. In: Proc. of the European
Intelligence and Security Informatics Conference (EISIC), Odense, Denmark, IEEE
(2012) 254–258
19. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing Contextual Polarity: An Explo-
ration of Features for Phrase-Level Sentiment Analysis. Computational Linguistics
(2009) 399–433
20. Norman, G.J., Norris, C.J., Gollan, J., Ito, T.A., Hawkley, L.C., Larsen, J.T.,
Cacioppo, J.T., Berntson, G.G.: Current Emotion Research in Psychophysiology:
The Neurobiology of Evaluative Bivalence. Emotion Review 3(3) (2011) 349–359
21. Tan, C., Friggeri, A., Adamic, L.A.: Lost in Propagation? Unfolding News Cycles
from the Source. In: Proc. of the 10th International AAAI Conference Web and
Social Media (ICWSM 2016), Cologne, Germany (2016)