Conference PaperPDF Available

Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis

Authors:

Abstract and Figures

Some users of social media are spreading racist, sexist, and otherwise hateful content. For the purpose of training a hate speech detection system, the reliability of the annotations is crucial, but there is no universally agreed-upon definition. We collected potentially hateful messages and asked two groups of internet users to determine whether they were hate speech or not, whether they should be banned or not and to rate their degree of offensiveness. One of the groups was shown a definition prior to completing the survey. We aimed to assess whether hate speech can be annotated reliably, and the extent to which existing definitions are in accordance with subjective ratings. Our results indicate that showing users a definition caused them to partially align their own opinion with the definition but did not improve reliability, which was very low overall. We conclude that the presence of hate speech should perhaps not be considered a binary yes-or-no decision, and raters need more detailed instructions for the annotation.
Content may be subject to copyright.
Measuring the Reliability of Hate Speech Annotations:
The Case of the European Refugee Crisis
Bj¨
orn Ross Michael Rist Guillermo Carbonell
Benjamin Cabrera Nils Kurowsky Michael Wojatzki
Research Training Group ”User-Centred Social Media”
Department of Computer Science and Applied Cognitive Science
University of Duisburg-Essen
firstname.lastname@uni-due.de
Abstract
Some users of social media are spreading
racist, sexist, and otherwise hateful con-
tent. For the purpose of training a hate
speech detection system, the reliability of
the annotations is crucial, but there is no
universally agreed-upon definition. We
collected potentially hateful messages and
asked two groups of internet users to de-
termine whether they were hate speech or
not, whether they should be banned or not
and to rate their degree of offensiveness.
One of the groups was shown a definition
prior to completing the survey. We aimed
to assess whether hate speech can be an-
notated reliably, and the extent to which
existing definitions are in accordance with
subjective ratings. Our results indicate that
showing users a definition caused them to
partially align their own opinion with the
definition but did not improve reliability,
which was very low overall. We conclude
that the presence of hate speech should per-
haps not be considered a binary yes-or-no
decision, and raters need more detailed in-
structions for the annotation.
1 Introduction
Social media are sometimes used to disseminate
hateful messages. In Europe, the current surge in
hate speech has been linked to the ongoing refugee
crisis. Lawmakers and social media sites are in-
creasingly aware of the problem and are developing
approaches to deal with it, for example promising
to remove illegal messages within 24 hours after
they are reported (Titcomb, 2016).
This raises the question of how hate speech can
be detected automatically. Such an automatic detec-
tion method could be used to scan the large amount
of text generated on the internet for hateful content
and report it to the relevant authorities. It would
also make it easier for researchers to examine the
diffusion of hateful content through social media
on a large scale.
From a natural language processing perspective,
hate speech detection can be considered a classifica-
tion task: given an utterance, determine whether or
not it contains hate speech. Training a classifier re-
quires a large amount of data that is unambiguously
hate speech. This data is typically obtained by man-
ually annotating a set of texts based on whether a
certain element contains hate speech.
The reliability of the human annotations is essen-
tial, both to ensure that the algorithm can accurately
learn the characteristics of hate speech, and as an
upper bound on the expected performance (Warner
and Hirschberg, 2012; Waseem and Hovy, 2016).
As a preliminary step, six annotators rated 469
tweets. We found that agreement was very low (see
Section 3). We then carried out group discussions
to find possible reasons. They revealed that there
is considerable ambiguity in existing definitions. A
given statement may be considered hate speech or
not depending on someone’s cultural background
and personal sensibilities. The wording of the ques-
tion may also play a role.
We decided to investigate the issue of reliability
further by conducting a more comprehensive study
across a large number of annotators, which we
present in this paper.
Our contribution in this paper is threefold:
To the best of our knowledge, this paper
presents the first attempt at compiling a Ger-
man hate speech corpus for the refugee crisis.
1
We provide an estimate of the reliability of
hate speech annotations.
We investigate how the reliability of the anno-
tations is affected by the exact question asked.
1
Available at
https://github.com/UCSM-DUE/
IWG_hatespeech_public
NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication, Sept. 2016
6
2 Hate Speech
For the purpose of building a classifier, Warner
and Hirschberg (2012) define hate speech as “abu-
sive speech targeting specific group characteristics,
such as ethnic origin, religion, gender, or sexual
orientation”. More recent approaches rely on lists
of guidelines such as a tweet being hate speech if
it “uses a sexist or racial slur” (Waseem and Hovy,
2016). These approaches are similar in that they
leave plenty of room for personal interpretation,
since there may be differences in what is consid-
ered offensive. For instance, while the utterance
“the refugees will live off our money” is clearly gen-
eralising and maybe unfair, it is unclear if this is
already hate speech. More precise definitions from
law are specific to certain jurisdictions and there-
fore do not capture all forms of offensive, hateful
speech, see e.g. Matsuda (1993). In practice, so-
cial media services are using their own definitions
which have been subject to adjustments over the
years (Jeong, 2016). As of June 2016, Twitter bans
hateful conduct2.
With the rise in popularity of social media, the
presence of hate speech has grown on the internet.
Posting a tweet takes little more than a working
internet connection but may be seen by users all
over the world.
Along with the presence of hate speech, its real-
life consequences are also growing. It can be a
precursor and incentive for hate crimes, and it can
be so severe that it can even be a health issue (Bur-
nap and Williams, 2014). It is also known that
hate speech does not only mirror existing opin-
ions in the reader but can also induce new negative
feelings towards its targets (Martin et al., 2013).
Hate speech has recently gained some interest as
a research topic on the one hand – e.g. (Djuric
et al., 2014; Burnap and Williams, 2014; Silva et
al., 2016) – but also as a problem to deal with in
politics such as the No Hate Speech Movement by
the Council of Europe.
The current refugee crisis has made it evident
that governments, organisations and the public
share an interest in controlling hate speech in social
media. However, there seems to be little consensus
on what hate speech actually is.
2
“You may not promote violence against or directly attack
or threaten other people on the basis of race, ethnicity, national
origin, sexual orientation, gender, gender identity, religious
affiliation, age, disability, or disease. We also do not allow
accounts whose primary purpose is inciting harm towards
others on the basis of these categories.”, The Twitter Rules
3 Compiling A Hate Speech Corpus
As previously mentioned, there is no German hate
speech corpus available for our needs, especially
not for the very recent topic of the refugee crisis
in Europe. We therefore had to compile our own
corpus. We used Twitter as a source as it offers
recent comments on current events. In our study
we only considered the textual content of tweets
that contain certain keywords, ignoring those that
contain pictures or links. This section provides a
detailed description of the approach we used to
select the tweets and subsequently annotate them.
To find a large amount of hate speech on the
refugee crisis, we used 10 hashtags
3
that can be
used in an insulting or offensive way. Using
these hashtags we gathered 13 766 tweets in total,
roughly dating from February to March 2016. How-
ever, these tweets contained a lot of non-textual
content which we filtered out automatically by re-
moving tweets consisting solely of links or im-
ages. We also only considered original tweets, as
retweets or replies to other tweets might only be
clearly understandable when reading both tweets
together. In addition, we removed duplicates and
near-duplicates by discarding tweets that had a nor-
malised Levenshtein edit distance smaller than .85
to an aforementioned tweet. A first inspection of
the remaining tweets indicated that not all search
terms were equally suited for our needs. The search
term #Pack (vermin or lowlife) found a potentially
large amount of hate speech not directly linked to
the refugee crisis. It was therefore discarded. As
a last step, the remaining tweets were manually
read to eliminate those which were difficult to un-
derstand or incomprehensible. After these filtering
steps, our corpus consists of 541 tweets, none of
which are duplicates, contain links or pictures, or
are retweets or replies.
As a first measurement of the frequency of hate
speech in our corpus, we personally annotated them
based on our previous expertise. The 541 tweets
were split into six parts and each part was annotated
by two out of six annotators in order to determine
if hate speech was present or not. The annotators
were rotated so that each pair of annotators only
evaluated one part. Additionally the offensiveness
of a tweet was rated on a 6-point Likert scale, the
same scale used later in the study.
3
#Pack,#Aslyanten,#WehrDich,#Krimmigranten,
#Rapefugees,#Islamfaschisten,#RefugeesNotWelcome,
#Islamisierung,#AsylantenInvasion,#Scharia
NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication, Sept. 2016
7
Even among researchers familiar with the defini-
tions outlined above, there was still a low level of
agreement (Krippendorff’s
a=.38
). This supports
our claim that a clearer definition is necessary in
order to be able to train a reliable classifier. The
low reliability could of course be explained by vary-
ing personal attitudes or backgrounds, but clearly
needs more consideration.
4 Methods
In order to assess the reliability of the hate speech
definitions on social media more comprehensively,
we developed two online surveys in a between-
subjects design. They were completed by 56 par-
ticipants in total (see Table 1). The main goal was
to examine the extent to which non-experts agree
upon their understanding of hate speech given a
diversity of social media content. We used the
Twitter definition of hateful conduct in the first sur-
vey. This definition was presented at the beginning,
and again above every tweet. The second survey
did not contain any definition. Participants were
randomly assigned one of the two surveys.
The surveys consisted of 20 tweets presented in
a random order. For each tweet, each participant
was asked three questions. Depending on the sur-
vey, participants were asked
(1)
to answer (yes/no)
if they considered the tweet hate speech, either
based on the definition or based on their personal
opinion. Afterwards they were asked
(2)
to answer
(yes/no) if the tweet should be banned from Twitter.
Participants were finally asked
(3)
to answer how
offensive they thought the tweet was on a 6-point
Likert scale from 1 (Not offensive at all) to 6 (Very
offensive). If they answered 4 or higher, the par-
ticipants had the option to state which particular
words they found offensive.
After the annotation of the 20 tweets, partici-
pants were asked to voluntarily answer an open
question regarding the definition of hate speech.
In the survey with the definition, they were asked
if the definition of Twitter was sufficient. In the
survey without the definition, the participants were
asked to suggest a definition themselves. Finally,
sociodemographic data were collected, including
age, gender and more specific information regard-
ing the participant’s political orientation, migration
background, and personal position regarding the
refugee situation in Europe.
The surveys were approved by the ethical com-
mittee of the Department of Computer Science and
Applied Cognitive Science of the Faculty of Engi-
neering at the University of Duisburg-Essen.
5 Preliminary Results and Discussion
Since the surveys were completed by 56 partici-
pants, they resulted in 1120 annotations. Table 1
shows some summary statistics.
Def. No def. p r
Participants 25 31
Age (mean) 33.3 30.5
Gender (% female) 43.5 58.6
Hate Speech (% yes) 32.6 40.3 .26 .15
Ban (% yes) 32.6 17.6 .01 -.32
Offensive (mean) 3.49 3.42 .55 -.08
Table 1: Summary statistics with p values and ef-
fect size estimates from WMW tests. Not all par-
ticipants chose to report their age or gender.
To assess whether the definition had any effect,
we calculated, for each participant, the percentage
of tweets they considered hate speech or suggested
to ban and their mean offensiveness rating. This
allowed us to compare the two samples for each of
the three questions. Preliminary Shapiro-Wilk tests
indicated that some of the data were not normally
distributed. We therefore used the Wilcoxon-Mann-
Whitney (WMW) test to compare the three pairs of
series. The results are reported in Table 1.
Participants who were shown the definition were
more likely to suggest to ban the tweet. In fact,
participants in group one very rarely gave differ-
ent answers to questions one and two (18 of 500
instances or 3.6%). This suggests that participants
in that group aligned their own opinion with the
definition.
We chose Krippendorff’s
a
to assess reliabil-
ity, a measure from content analysis, where human
coders are required to be interchangeable. There-
fore, it measures agreement instead of association,
which leaves no room for the individual predilec-
tions of coders. It can be applied to any number
of coders and to interval as well as nominal data.
(Krippendorff, 2004)
This allowed us to compare agreement between
both groups for all three questions. Figure 1 visu-
alises the results. Overall, agreement was very low,
ranging from
a=.18
to
.29
. In contrast, for the
purpose of content analysis, Krippendorff recom-
mends a minimum of
a=.80
, or a minimum of
.66
for applications where some uncertainty is un-
NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication, Sept. 2016
8
0.0
0.1
0.2
0.3
1 2 3
Question
Alpha
Group
No definition
Definition
Figure 1: Reliability (Krippendorff’s
a
) for the
different groups and questions
problematic (Krippendorff, 2004). Reliability did
not consistently increase when participants were
shown a definition.
To measure the extent to which the annotations
using the Twitter definition (question one in group
one) were in accordance with participants’ opinions
(question one in group two), we calculated, for each
tweet, the percentage of participants in each group
who considered it hate speech, and then calculated
Pearson’s correlation coefficient. The two series
correlate strongly (
r=.895,p<.0001
), indicating
that they measure the same underlying construct.
6 Conclusion and Future Work
This paper describes the creation of our hate speech
corpus and offers first insights into the low agree-
ment among users when it comes to identifying
hateful messages. Our results imply that hate
speech is a vague concept that requires significantly
better definitions and guidelines in order to be anno-
tated reliably. Based on the present findings, we are
planning to develop a new coding scheme which in-
cludes clear-cut criteria that let people distinguish
hate speech from other content.
Researchers who are building a hate speech de-
tection system might want to collect multiple labels
for each tweet and average the results. Of course
this approach does not make the original data any
more reliable (Krippendorff, 2004). Yet, collecting
the opinions of more users gives a more detailed
picture of objective (or intersubjective) hatefulness.
For the same reason, researchers might want to con-
sider hate speech detection a regression problem,
predicting, for example, the degree of hatefulness
of a message, instead of a binary yes-or-no classifi-
cation task.
In the future, finding the characteristics that
make users consider content hateful will be use-
ful for building a model that automatically detects
hate speech and users who spread hateful content,
and for determining what makes users disseminate
hateful content.
Acknowledgments
This work was supported by the Deutsche
Forschungsgemeinschaft (DFG) under grant No.
GRK 2167, Research Training Group ”User-
Centred Social Media”.
References
Peter Burnap and Matthew Leighton Williams. 2014.
Hate Speech, Machine Classification and Statistical
Modelling of Information Flows on Twitter: Inter-
pretation and Communication for Policy Decision
Making. In Proceedings of IPP 2014, pages 1–18.
Nemanja Djuric, Robin Morris Jing Zhou, Mihajlo Gr-
bovic, Vladan Radosavljevic, and Narayan Bhamidi-
pati. 2014. Hate Speech Detection with Comment
Embeddings. In ICML 2014, volume 32, pages
1188–1196.
Sarah Jeong. 2016. The History of Twitter’s Rules.
VICE Motherboard.
Klaus Krippendorff. 2004. Reliability in Content Anal-
ysis: Some Common Misconceptions and Recom-
mendations. HCR, 30(3):411–433.
Ryan C Martin, Kelsey Ryan Coyier, Leah M VanSis-
tine, and Kelly L Schroeder. 2013. Anger on the In-
ternet: the Perceived Value of Rant-Sites. Cyberpsy-
chology, behavior and social networking, 16(2):119–
22.
Mari J Matsuda. 1993. Words that Wound - Criti-
cal Race Theory, Assaultive Speech, and the First
Amendment. Westview Press, New York.
Leandro Silva, Mainack Mondal, Denzil Correa,
Fabr´
ıcio Benevenuto, and Ingmar Weber. 2016. An-
alyzing the Targets of Hate in Online Social Media.
In Proceedings of ICWSM 2016, pages 687–90.
James Titcomb. 2016. Facebook and Twitter promise
to crack down on internet hate speech. The Tele-
graph.
William Warner and Julia Hirschberg. 2012. Detecting
Hate Speech on the World Wide Web. In Proceed-
ings of LSM 2012, pages 19–26. ACL.
Zeerak Waseem and Dirk Hovy. 2016. Hateful Sym-
bols or Hateful People? Predictive Features for Hate
Speech Detection on Twitter. In Proceedings of
NAACL-HLT, pages 88–93.
NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication, Sept. 2016
9
... In the following, we describe the data and the annotation process as well as present quantitative results showing the prevalence of our explicit and implicit misogyny classes in the data. Kennedy et al. (2022) documented that the annotation of hate speech has been shown to lead to a high level of disagreement between the annotators, see also Ross et al. (2016). According to Mostafazadeh Davani et al. (2020) this is due to a combination of factors, including differences in understanding of the definition of hate speech, interpretation of the annotated texts, or assessment of the harm done to certain groups, i.e. inconsistent application of the definition of hate speech to different social groups. ...
Preprint
Full-text available
Social media has become an everyday means of interaction and information sharing on the Internet. However, posts on social networks are often aggressive and toxic, especially when the topic is controversial or politically charged. Radicalization, extreme speech, and in particular online misogyny against women in the public eye have become alarmingly negative features of online discussions. The present study proposes a methodological approach to contribute to ongoing discussions about the multiple ways in which women, their experiences, and their choices are attacked in polarized social media responses. Based on a review of theories on and detection methods for misogyny, we present a classification scheme that incorporates eleven different explicit as well as implicit layers of online misogyny. We also apply our classes to a case study related to online aggression against Amber Heard in the context of her allegations of domestic violence against Johnny Depp. We finally evaluate the reliability of Google's Perspective API -- a standard for detecting toxic language -- for determining gender discrimination as toxicity. We show that a large part of online misogyny, especially when verbalized without expletive terms but instead more implicitly is not captured automatically.
... A note about the datasets: Biases in annotations of these datasets have been noted by earlier authors. The definition of abuse varies across different datasets (Roß et al. 2016) and oftentimes these definitions are incompatible (Fortuna, Soler, and Wanner 2020). Awal et al. (2020) noted inconsistencies among three popular abusive language datasets: Davidson, Founta and Waseem 2016. ...
Preprint
Full-text available
Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RAFT (Rationale Adaptor for Few-shoT classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RAFT models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RAFT-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RAFT-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness.
... This labeling may be done by researchers themselves (Waseem and Hovy 2016;Kwok and Wang 2013;Djuric et al. 2015;Magu, Joshi, and Luo 2017), selected annotators (Warner and Hirschberg 2012;Gitari et al. 2015), or crowd-sourcing services (Burnap and Williams 2016). Hate speech speech has been pointed out as a difficult subject to annotate on (Waseem 2016;Ross et al. 2017). Chatzakou et al. annotate tweets in sessions, clustering several tweets to help annotators get a grasp on context. ...
Article
Full-text available
Current approaches to characterize and detect hate speech focus on content posted in Online Social Networks (OSNs). They face shortcomings to get the full picture of hate speech due to its subjectivity and the noisiness of OSN text. This work partially addresses these issues by shifting the focus towards users. We obtain a sample of Twitter's retweet graph with 100,386 users and annotate 4,972 as hateful or normal, and also find 668 users suspended after 4 months. Our analysis shows that hateful/suspended users differ from normal/active ones in terms of their activity patterns, word usage and network structure. Exploiting Twitter's network of connections, we find that a node embedding algorithm outperforms content-based approaches for detecting both hateful and suspended users. Overall, we present a user-centric view of hate speech, paving the way for better detection and understanding of this relevant and challenging issue.
... Nefret söyleminin net bir tanımı olmamasına rağmen, tipik olarak ırk, etnik köken, cinsiyet, cinsel yönelim, sosyoekonomik sınıf, siyasi bağlantı veya din gibi özelliklere dayalı olarak bir grubu (veya bazen bir kişiyi) hedef alan saldırgan dili kapsar [25]. Bu çalışmalardan bazıları kapsamı belirli bir hedefe, genellikle ırk [26][27] kadınlar [26] ve mülteciler [28], belirli bir ideolojiye sahip nefret söylemi [29] veya tek bir önemli olayla ilgili nefret söylemidir [30]. Bir diğer yaygın saldırgan dil alt alanı da siber zorbalığın tespit edilmesidir [24,[31][32][33][34]. Nefret söyleminin aksine, siber zorbalığın hedefi genellikle tek bir kişidir ve bu genellikle bir çocuktur. ...
Article
Full-text available
Sosyal medya platformlarında kullanıcıların paylaşımlar arasında saldırgan dil barındıran içeriklerin önemli oranda arttığı gözlemlenmiştir. Çalışma Türkçe dilinde bu sorunun çözümüne katkı sağlamayı amaçlamaktadır. Bu çalışmada Twitter platformundan elde edilen bir veri seti oluşturulmuştur. 14752 Türkçe tweet metninden oluşan bu veri seti etiketleyiciler tarafından manuel olarak etiketlenmiş ve LSTM (Long ShortTerm Memory) ve GRU (Gated Recurrent Units) modellerinin sınıflandırma performansları karşılaştırılmıştır. Bilinebildiği kadarıyla saldırgan dil tespitine yönelik bu alanda yapılan çalışmalara bakıldığında çalışma Türkçe dilinde çoklu sınıflandırma yapılan ilk çalışmadır. Burada word2vec yöntemi ile kelime temsilleri elde edilmiştir. Böylelikle genişletilmiş derlem kullanımının sınıflandırma performanslarına katkısı karşılaştırılmıştır. Çalışmada yapılan ikili sınıflandırma da genişletilmiş derlem kullanımıyla en yüksek performans GRU modeli F1-makro değeri %94,49’dur. Çoklu sınıflandırmada elde edilen sınıflandırma performans değerleri genişletilmiş derlemin katkısıyla GRU F1-makro değeri %71,97 ve %54,10’dur. Bu alanda Türk dili literatürüne katkı sağlamak amacıyla mevcut çalışmanın veri setleri ve genişletilmiş derlem kelime vektörleri paylaşılacaktır
... Antisemitism and conspiracy theories are inherently complex phenomena that can be difficult to annotate, especially when expressed as short texts in messenger services or social media platforms [Ozalp et al., 2020]. Thus, careful elaboration of underlying theories is necessary for reliable annotation of datasets [Roß et al., 2016]. ...
Preprint
Full-text available
Over the course of the COVID-19 pandemic, existing conspiracy theories were refreshed and new ones were created, often interwoven with antisemitic narratives, stereotypes and codes. The sheer volume of antisemitic and conspiracy theory content on the Internet makes data-driven algorithmic approaches essential for anti-discrimination organizations and researchers alike. However, the manifestation and dissemination of these two interrelated phenomena is still quite under-researched in scholarly empirical research of large text corpora. Algorithmic approaches for the detection and classification of specific contents usually require labeled datasets, annotated based on conceptually sound guidelines. While there is a growing number of datasets for the more general phenomenon of hate speech, the development of corpora and annotation guidelines for antisemitic and conspiracy content is still in its infancy, especially for languages other than English. We contribute to closing this gap by developing an annotation guide for antisemitic and conspiracy theory online content in the context of the COVID-19 pandemic. We provide working definitions, including specific forms of antisemitism such as encoded and post-Holocaust antisemitism. We use these to annotate a German-language dataset consisting of ~3,700 Telegram messages sent between 03/2020 and 12/2021.
... Hate Speech is a complex phenomenon which is dependent on the relationships between various communities and social groups. (Poletto et al. 2020) mention several definitions of hate speech, although there is no consensus on one formal definition (Ross et al. 2016). Therefore, it is difficult to develop automatic systems that determine whether a message contains any fragments of hate speech. ...
Article
Full-text available
In this paper we propose an approach to exploit the fine-grained knowledge expressed by individual human annotators during a hate speech (HS) detection task, before the aggregation of single judgments in a gold standard dataset eliminates non-majority perspectives. We automatically divide the annotators into groups, aiming at grouping them by similar personal characteristics (ethnicity, social background, culture etc.). To serve a multi-lingual perspective, we performed classification experiments on three different Twitter datasets in English and Italian languages. We created different gold standards, one for each group, and trained a state-of-the-art deep learning model on them, showing that supervised models informed by different perspectives on the target phenomena outperform a baseline represented by models trained on fully aggregated data. Finally, we implemented an ensemble approach that combines the single perspective-aware classifiers into an inclusive model. The results show that this strategy further improves the classification performance, especially with a significant boost in the recall of HS prediction.
... We use abuse as an umbrella term covering any kind of harmful content on the Web, as this is accepted practice in the field (Vidgen et al., 2019;Waseem et al., 2017). Abuse is hard to recognise, due to ambiguity in its definition and differences in annotator sensitivity (Ross et al., 2016). Recent research suggests embracing disagreements by developing multi-annotator architectures that capture differences in annotator perspective (Davani et al., 2022;Basile et al., 2021;. ...
Preprint
Full-text available
To proactively offer social media users a safe online experience, there is a need for systems that can detect harmful posts and promptly alert platform moderators. In order to guarantee the enforcement of a consistent policy, moderators are provided with detailed guidelines. In contrast, most state-of-the-art models learn what abuse is from labelled examples and as a result base their predictions on spurious cues, such as the presence of group identifiers, which can be unreliable. In this work we introduce the concept of policy-aware abuse detection, abandoning the unrealistic expectation that systems can reliably learn which phenomena constitute abuse from inspecting the data alone. We propose a machine-friendly representation of the policy that moderators wish to enforce, by breaking it down into a collection of intents and slots. We collect and annotate a dataset of 3,535 English posts with such slots, and show how architectures for intent classification and slot filling can be used for abuse detection, while providing a rationale for model decisions.
Chapter
Hass und aggressives Verhalten im Netz werden immer größere Probleme. Der bisher etablierte Versuch zur Lösung des Problems ist das Löschen von Kommentaren, doch um dem grundlegenden Problem entgegenzuwirken, müssen Ursachen für die Entstehung von Hass im Netz bekämpft werden. In diesem Kapitel wird daher neben Grundlagen der Hatespeechanalyse insbesondere auf Gruppierungen, Informationsfluss und die Ausbreitung von Hass in sozialen Netzwerken eingegangen. Daraus werden dann Maßnahmen zur Bekämpfung der Ursachen von Hass abgeleitet und diskutiert.
Article
Full-text available
The unforeseen event, which is the appearance of SARS-CoV-2, has become a huge challenge for universities in Poland and in the world, especially in the field of management. The first months of transition to distance learning were a test of organizational and didactic possibilities for universities, and also indicated new directions of changes in education. The aim of the article is to identify the possibilities of using e-learning in higher education in the future. The research used the case study and diagnostic survey methods. The authors wanted to show that the experiences gained in the pandemic can be applied in the future in the implementation of studies using distance learning (e-learning). Based on the research conducted among both university employees and students, an attempt was made to define guidelines for building a win-win scenario for the use of e-learning.
Chapter
Comment sections have established themselves as essential elements of the public discourse. However, they put considerable pressure on the hosting organizations to keep them clean of hateful and abusive comments. This is necessary to prevent violating legal regulations and to avoid appalling their readers. With commenting being a typically free feature and anonymity encouraging increasingly daunting comments, many newspapers struggle to operate economically viable comment sections. Hence, throughout the last decade, researchers set forth to develop machine learning (ML) models to automate this work. With increasingly sophisticated algorithms, research is starting on comment moderation support systems that integrate ML models to relieve moderators from parts of their workload. Our research sets forth to assess the attitudes of moderators towards such systems to provide guidance for future developments. This paper presents the findings from three conducted expert interviews, which also included tool usage observations.KeywordsCommunity managementMachine learningContent moderationComment moderation support systemDigital work
Article
Full-text available
Social media systems allow Internet users a congenial platform to freely express their thoughts and opinions. Although this property represents incredible and unique communication opportunities, it also brings along important challenges. Online hate speech is an archetypal example of such challenges. Despite its magnitude and scale, there is a significant gap in understanding the nature of hate speech on social media. In this paper, we provide the first of a kind systematic large scale measurement study of the main targets of hate speech in online social media. To do that, we gather traces from two social media systems: Whisper and Twitter. We then develop and validate a methodology to identify hate speech on both these systems. Our results identify online hate speech forms and offer a broader understanding of the phenomenon, providing directions for prevention and detection approaches.
Conference Paper
Full-text available
We present an approach to detecting hate speech in online text, where hate speech is defined as abusive speech targeting specific group characteristics, such as ethnic origin, religion, gender, or sexual orientation. While hate speech against any group may exhibit some common characteristics, we have observed that hatred against each different group is typically characterized by the use of a small set of high frequency stereotypical words; however, such words may be used in either a positive or a negative sense, making our task similar to that of words sense disambiguation. In this paper we describe our definition of hate speech, the collection and annotation of our hate speech corpus, and a mechanism for detecting some commonly used methods of evading common "dirty word" filters. We describe pilot classification experiments in which we classify anti-semitic speech reaching an accuracy 94%, precision of 68% and recall at 60%, for an F1 measure of. 6375.
Conference Paper
We address the problem of hate speech detection in online user comments. Hate speech, defined as an "abusive speech targeting specific group characteristics, such as ethnicity, religion, or gender", is an important problem plaguing websites that allow users to leave feedback, having a negative impact on their online business and overall user experience. We propose to learn distributed low-dimensional representations of comments using recently proposed neural language models, that can then be fed as inputs to a classification algorithm. Our approach addresses issues of high-dimensionality and sparsity that impact the current state-of-the-art, resulting in highly efficient and effective hate speech detectors.
Article
In a recent article in this journal, Lombard, Snyder-Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests, compared the virtues and drawbacks of five popular reliability measures, and proposed guidelines and standards for their use. Their discussion revealed that numerous misconceptions circulate in the content analysis literature regarding how these measures behave and can aid or deceive content analysts in their effort to ensure the reliability of their data. This article proposes three conditions for statistical measures to serve as indices of the reliability of data and examines the mathematical structure and the behavior of the five coefficients discussed by the authors, as well as two others. It compares common beliefs about these coefficients with what they actually do and concludes with alternative recommendations for testing reliability in content analysis and similar data-making efforts.
Article
Abstract Despite evidence that anger is routinely expressed over the Internet via weblogs, social networking Web sites, and other venues, no published research has explored the way in which anger is experienced and expressed online. Consequently, we know very little about how anger is experienced in such settings. Two studies were conducted to explore how people experience and express their anger on a particular type of Web site, known as a rant-site. Study 1 surveyed rant-site visitors to better understand the perceived value of the Web sites and found that while they become relaxed immediately after posting, they also experience more anger than most and express their anger in maladaptive ways. Study 2 explored the emotional impact of reading and writing rants and found that for most participants, reading and writing rants were associated with negative shifts in mood.
Facebook and Twitter promise to crack down on internet hate speech. The Telegraph
  • James Titcomb
James Titcomb. 2016. Facebook and Twitter promise to crack down on internet hate speech. The Telegraph.
Words that Wound -Critical Race Theory, Assaultive Speech, and the First Amendment
  • J Mari
  • Matsuda
Mari J Matsuda. 1993. Words that Wound -Critical Race Theory, Assaultive Speech, and the First Amendment. Westview Press, New York.
The History of Twitter's Rules. VICE Motherboard
  • Sarah Jeong
Sarah Jeong. 2016. The History of Twitter's Rules. VICE Motherboard.
Hate Speech, Machine Classification and Statistical Modelling of Information Flows on Twitter: Interpretation and Communication for Policy Decision Making
  • Peter Burnap
  • Matthew Leighton Williams
Peter Burnap and Matthew Leighton Williams. 2014. Hate Speech, Machine Classification and Statistical Modelling of Information Flows on Twitter: Interpretation and Communication for Policy Decision Making. In Proceedings of IPP 2014, pages 1-18.