PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

With the online proliferation of hate speech, organizations and governments are trying to tackle this issue, without upsetting the `freedom of speech'. In this paper, we try to understand the temporal effects of allowing hate speech on a platform (Gab) as a norm (protected as freedom of speech). We observe that the amount of hate speech is steadily increasing, with new users being exposed to hate speech at a faster rate. We also observe that the language used by the Gab users are aligning more with the hateful users with time. We believe that the hate community is evolving a new language culture of their own in this unmoderated environment and the rest of the (benign) population is slowly getting adapted to this new language culture. Our work provides empirical observations to the HCI questions regarding the freedom of hate speech.
Content may be subject to copyright.
Temporal effects of Unmoderated Hate speech in Gab
Binny Mathew, Anurag Illendula, Punyajoy Saha, Soumya Sarkar,
Pawan Goyal, Animesh Mukherjee
Indian Institute of Technology(IIT), Kharagpur
binnymathew@iitkgp.ac.in, {aianurag09, punyajoysaha1998}@gmail.com, soumya015@iitkgp.ac.in
{pawang, animeshm}@cse.iitkgp.ac.in
Abstract
With the online proliferation of hate speech, organizations and
governments are trying to tackle this issue, without upsetting
the ‘freedom of speech’. In this paper, we try to understand
the temporal effects of allowing hate speech on a platform
(Gab) as a norm (protected as freedom of speech). We observe
that the amount of hate speech is steadily increasing, with new
users being exposed to hate speech at a faster rate. We also
observe that the language used by the Gab users are aligning
more with the hateful users with time. We believe that the hate
community is evolving a new language culture of their own
in this unmoderated environment and the rest of the (benign)
population is slowly getting adapted to this new language
culture. Our work provides empirical observations to the HCI
questions regarding the freedom of hate speech.
1 Introduction
The question about where is the borderline or whether there is
indeed any borderline between ‘free speech’ and ‘hate speech’
is an ongoing subject of debate which has recently gained a
lot of attention. With crimes related to hate speech increasing
in the recent times
1
, hate speech is considered to be one
of the fundamental problems that plague the Internet. The
online dissemination of hate speech has even lead to real-life
tragic events such as genocide of the Rohingya community
in Myanmar, anti-Muslim mob violence in Sri Lanka, and
the Pittsburg shooting. The big tech giants are also unable to
control the massive dissemination of hate speech2.
Recently, there have been a lot of research concerning
multiple aspects of hate speech such as detection (Davidson et
al
.
2017; Badjatiya et al
.
2017; Zhang, Robinson, and Tepper
2018), analysis (Chandrasekharan et al
.
2017a; Olteanu et
al
.
2018), target identification (Silva et al
.
2016; Mondal,
Silva, and Benevenuto 2017; ElSherief et al
.
2018), counter-
hate speech (Gagliardone et al
.
2015; Mathew et al
.
2018c;
Benesch et al
.
2016) etc. However, very little is known about
the temporal effects of hate speech in online social media,
especially if it is considered as normative. In order to have
a clear understanding on this, we would need to see the
effects on a platform which allows free flow of hate speech.
1https://www.justice.gov/hatecrimes/hate-crime-statistics
2https://tinyurl.com/facebook-leaked-moderation
To understand the true nature of the hateful users, we need
to study them in an environment that would not stop them
from following/enacting on their beliefs. This led us to focus
our study on Gab (
Gab.com
). Gab is a social media site
that calls itself the ‘champion of free speech’. The site does
not prohibit a user from posting any hateful content. This
natural environment in which the only moderation is what the
community members impose on themselves provides a rich
platform for our study. Using a large dataset of
21M
posts
spanning around two years since the inception of the site, we
develop a data pipeline which allows us to study the temporal
effects of hate speech in an unmoderated environment. Our
work adds the temporal dimension to the existing literature
on hate speech and tries to study and characterize hate in
unmoderated online social media.
Despite the importance of understanding hate speech in the
current socio-political environment, there is little HCI work
which looks into the temporal aspects of these issues. This
paper fills an important research gap in understanding how
hate speech evolves in an environment where it is protected
under the umbrella of free speech. This paper also opens up
questions on how new HCI design policies of online plat-
forms should be regulated to minimize/mitigate the problem
of the temporal growth of hate speech. We posit that HCI
research, acknowledging the far-reaching mal consequences
of this problem, should factor it into the ongoing popular
initiative of platform governance3.
Outline of the work
To understand the temporal characteristics, we needed data
from consecutive time points in Gab. As a first step, using a
heuristic, we generate successive graphs which capture the
different time snapshots of Gab at one month intervals. Then,
using the DeGroot model, we assign a hate intensity score
to every user in the temporal snapshot and categorize them
based on their degrees of hate. We then perform several lin-
guistic and network studies on these users across the different
time snapshots.
3
https://www.tandfonline.com/eprint/
KxDwNEpqTY86MNpRDHE9/full
arXiv:1909.10966v1 [cs.SI] 24 Sep 2019
Research questions
1. RQ1
: How can we characterize the growth of hate speech
in Gab?
2. RQ2
: How have the hate speakers affected the Gab com-
munity as a whole?
RQ1 attempts to investigate the general growth of hate
speech in Gab. Previous research on Gab (Zannettou et al
.
2018) states that the hateful content is 2.4x as compared to
Twitter. RQ2, on the other hand, attempts to identify how
these hateful users have affected the Gab community. We
study this from two different perspectives: language and net-
work characteristics.
Key observations
For RQ1, we found that the amount of hate speech in Gab is
consistently increasing. This is true for the new users joining
as well. We found that the recently joining new users take
much less time to become hateful as compared those that
joined at earlier time periods. Further, the fraction of users
getting exposed to hate speech is increasing as well.
For RQ2, we found that the language used by the com-
munity is aligning more with the hateful users as compared
to the non-hateful ones. The hateful users also seem to be
playing a pivotal role from the network point of view. We
found that the hateful users reach the core of the community
faster and in larger sizes.
2 Prior Work
The hate speech research has a substantial literature and it has
recently gained a lot of attention from the Computer Science
perspective. In the following sections, we will examine the
various aspects of research on hate speech. Interested read-
ers can follow Fortuna et al. (Fortuna and Nunes 2018) and
Schmidt et al. (Schmidt and Wiegand 2017) for a comprehen-
sive survey of this subject.
Definition of hate speech
Hate speech lies in a complex confluence of freedom of ex-
pression, individual, group and minority rights, as well as
concepts of dignity, liberty and equality (Gagliardone et al
.
2015). Owing to the subjective nature of this issue, deciding
if a given piece of text contains hate speech is onerous. In
this paper, we use the hate speech definition outlined in the
work done by Elsherief et al. (ElSherief et al
.
2018). The
author defines hate speech as a “direct and serious attack on
any protected category of people based on their race, ethnic-
ity, national origin, religion, sex, gender, sexual orientation,
disability or disease.”. Others have a slightly different defini-
tion for hate speech but the spirit is mostly the same. In our
work we shall mostly go by this definition unless otherwise
explicitly mentioned.
Related concepts
Hate speech is a complex phenomenon, intrinsically asso-
ciated to relationships among groups, and also relying on
linguistic nuances (Fortuna and Nunes 2018). It is related
to some of the concepts in social science such as incivil-
ity (Maity et al
.
2018), radicalization (Agarwal and Sureka
2015), cyberbullying (Chen 2011), abusive language (Chan-
drasekharan et al
.
2017b; Nobata et al
.
2016), toxicity (Gu-
nasekara and Nejadgholi 2018; Srivastava, Khurana, and
Tewari 2018), profanity (Sood, Antin, and Churchill 2012)
and extremism (McNamee, Peterson, and Pe
˜
na 2010). Owing
to the overlap between hate speech and these concepts, some-
times it becomes hard to differentiate between them (David-
son et al
.
2017). Teh et al. (Teh, Cheng, and Chee 2018) ob-
tained a list of frequently used profane words from comments
in YouTube videos and categorized them into 8 different types
of hate speech. The authors aim to use these profane words
for automatic hate speech detection. Malmasi et al. (Malmasi
and Zampieri 2018) attempts to distinguish profanity from
hate speech by building models with features such as n-grams,
skip-grams and clustering-based word representations.
Effects of hate speech
Previous studies have found that public expressions of hate
speech affects the devaluation of minority members (Green-
berg and Pyszczynski 1985), the exclusion of minorities from
the society (Mullen and Rice 2003), psychological well-
being and the suicide rate among minorities (Mullen and
Smyth 2004), and the discriminatory distribution of pub-
lic resources (Fasoli, Maass, and Carnaghi 2015). Frequent
and repetitive exposure to hate speech has been shown to
desensitize the individual to this form of speech and subse-
quently to lower evaluations of the victims and greater dis-
tancing, thus increasing outgroup prejudice (Soral, Bilewicz,
and Winiewski 2018). Olteanu et. al (Olteanu et al
.
2018)
studied the effect of violent attacks on the volume and type
of hateful speech on two social media platforms, Twitter and
Reddit. They found that extremist violence tends to lead to
an increase in online hate speech, in particular, on messages
directly advocating violence.
Computational approaches
The research interest in hate speech, from a computer science
perspective, is gaining interest. Larger datasets (Davidson
et al
.
2017; Founta et al
.
2018; de Gibert et al
.
2018) and
different approaches have been devised by researchers to de-
tect hateful social media comments. These methods include
techniques such as dictionary-based (Guermazi, Hammami,
and Hamadou 2007), distributional semantics (Djuric et al
.
2015), multi-feature (Salminen et al
.
2018) and neural net-
works (Badjatiya et al. 2017).
Burnap et al. (Burnap and Williams 2016) used a bag of
words approach combined with hate lexicons to build ma-
chine learning classifiers. Gitari et al. (Gitari et al
.
2015)
used sentiment analysis along with subjectivity detection to
generate a set of words related to hate speech for hate speech
classification. Chau et. al (Chau and Xu 2007) used analy-
sis of hyperlinks among web pages to identify hate group
communities. Zhou et al. (Zhou et al
.
2005) used multidimen-
sional scaling (MDS) algorithm to represent the proximity of
hate websites and thus capture their level of similarity. Lie et
al. (Liu and Forss 2015) incorporated LDA topic modelling
for improving the performance of the hate speech detection
task. Saleem et al. (Saleem et al
.
2017) proposed an approach
to detecting hateful speech using self-identifying hate com-
munities as training data for hate speech classifiers. Davidson
et al. (Davidson et al
.
2017) used crowd-sourcing to label
tweets into three categories: hate speech, only offensive lan-
guage, and those with neither. Waseem et al. (Waseem and
Hovy 2016) presented a list of criteria based on critical race
theory to identify racist and sexist slurs.
More recently, researchers have started using deep learn-
ing methods (Badjatiya et al
.
2017; Zhang, Robinson, and
Tepper 2018) and graph embedding techniques (Ribeiro et
al
.
2018) to detect hate speech. Badjatiya et al. (Badjatiya
et al
.
2017) applied several deep learning architectures and
improved the benchmark score by
18 F1 points. Zhang et
al. (Zhang, Robinson, and Tepper 2018) used deep neural
network, combining convolutional and gated recurrent net-
works to improve the results on 6 out of 7 datasets. Gao et
al. (Gao and Huang 2017) utilized the context information
accompanied with the text to develop hate speech detection
models. Grondahl et al. (Gr
¨
ondahl et al
.
2018) found that
several of the existing state-of-the-art hate speech detection
models work well only when tested on the same type of data
they were trained on.
While most of the computational approaches focus on
detecting if a given text contains hate speech, very few works
focus on the user account level detection. Gian et al. (Qian
et al
.
2018) proposed a model that leverages intra-user and
inter-user representation learning for hate speech detection.
Gibson (Gibson 2017) studied the moderation policies
on Reddit communities and observed that ‘safe space’ have
higher levels of censorship and is directly related to the polite-
ness in the community. Studying the effects of hate speech in
online social media remains an understudied area in HCI re-
search. By employing our data processing pipeline, we study
the temporal effects of hate speech on Gab.
3 Dataset
The Gab social network
Gab is a social media platform launched in August 2016
known for promoting itself as the “Champion of free speech”.
However, it has been criticized for being a shield for alt-
right users (Zannettou et al
.
2018). The site is very similar
to Twitter but has very loose moderation policies. According
to the Gab guidelines, the site does not restrain users from
using hateful speech
4
. The site allows users to read and write
posts of up to 3,000 characters. The site employs an upvoting
and downvoting mechanism for posts and categorizes posts
into different topics such as News, Sports, Politics, etc.
Dataset collection
We use the dataset developed by Mathew et al. (Mathew et
al
.
2018a) for our analysis. For the sake of completeness of
the paper, we present the general statistics of the dataset in
Table 1. The dataset contains information from August 2016
to July 2018. We do not use the data for the initial two months
(August-September 2016) and the last month (July 2018) as
they had less posts.
4https://gab.com/about/tos
Property Value
Number of posts 21,207,961
Number of reply posts 6,601,521
Number of quote posts 2,085,828
Number of reposts 5,850,331
Number of posts with attachments 9,669,374
Number of user accounts 341,332
Average follower per account 62.56
Average following per account 60.93
Table 1: Description of the dataset.
Dataset
Month 1
Month 2
Month N-1
Month N
Split the data
month wise
Hate
Intensity
Hate
Intensity
Hate
Intensity
Hate
Intensity
User Hate
Vector
DeGroot
Categorize
using
threshold
DeGroot
DeGroot
DeGroot
Figure 1: Our overall methodology to generate the hate vector
for a user.
4 Methodology
To address our research questions, we need to have a tem-
poral overview of the activity of each user. So, our first task
involves generating temporal snapshots to capture the month-
wise activity of the users. We develop a pipeline to generate
the hate vectors of each users for this purpose. A hate vector
is a representation used to capture the activity of each user.
Higher value in the hate vector is an indication of the hate-
fulness of a user, whereas a lower value means that the user
potentially did not indulge in any hateful activity.
In this section, we will explain the pipeline we used to
study the temporal properties of hate. The pipeline mainly
consists of the following three tasks:
1. Generating temporal snapshots
: We divide the data
such that a snapshot will represent the activities of a par-
ticular month.
2. Hate intensity calculation
: We calculate the hate inten-
sity score for each user, which represent the hateful activity
of a user based on his/her posts, reposts, and network con-
nections.
3. User profiling
: We profile users based on his/her temporal
activity of hate speech, which is represented by a vector of
his/her hate intensity score.
Figure 1 shows our overall data processing pipeline.
5 Generating Temporal Snapshots
In order to study the temporal nature of hate speech, we need
a temporal sequence of posts, reposts, users being followed,
and users following the account. Thus, for each snapshot, we
should have a list of the new posts, reposts, followers, and
following of each user. This would allow us to have a better
picture of the user stance/opinion in each snapshot. The Gab
dataset gives us the information regarding the post creation
date, but it does not provide any information about when a
particular user started following another user. Using various
data points we have, we come up with a technique in the
following section to approximate the month in which a user
started following another user.
New followers in each snapshot
While the post creation date is available in the dataset, the
Gab API does not provide us with the information regard-
ing when a particular user started following another user.
Hence, we apply a heuristic by Meeder et al. (Meeder et al
.
2011), which was used in previous works (Lang and Wu 2011;
Antonakaki, Ioannidis, and Fragopoulou 2015); to get a lower
bound on the following link creation date. The heuristic is
based on the fact that the API returns the list of follow-
ers/friends of a user ordered by the link creation time. We can
thus obtain a lower bound on the follow time using the ac-
count creation date of a followers. For instance, if a user
UA
is followed by users
{U0, U1, . . . , Un}
(in this order through
time) and the users joined Gab on dates
{D0, D1, . . . , Dn}
,
then we can know for certain that
U1
was not following
UA
before
max(D0, D1)
. We applied this heuristic on our
dataset and ordered all of the following relationships accord-
ing to this. The authors (Meeder et al
.
2011) proved that this
heuristic is pretty accurate (within several minutes) specially
on time periods where there are high follow rates. Since in
our case we have considered a much larger window of one
month, it would provide a fairly accurate estimate about the
link creation time.
Hence the above heuristic helps to get the list of follow-
ers/friends each month for a particular user. This information,
combined with the creation dates of his posts allows us to
construct a temporal snapshot of his/her activity each month.
Dynamic graph generation
We consider the Gab graph (
G
) as a dynamic graph with
no parallel edges. We represent the dynamic graph
G
as a
set of successive time step graphs
{G0, . . . , Gtmax }
, where
Gs= (Vs, Es)
denotes the graph at snapshot
s
, where the
set of nodes is
Vs
(=
{Ss
i=0 Vi}
) nodes and the set of edges
is
Es
(=
{Ss
i=0 Es}
). In this paper, we consider the time
duration between each successive snapshot as
one month
.
An example of this dynamic graph is provided in figure 2.
Each snapshot,
Gs
is a weighted directed graph with the
users as the nodes and the edges representing the follow-
ing relationship. The edge weight is calculated based on the
user’s posting and reposting activity. We shall explain the ex-
act mechanism of calculation of this weight in the following
section.
6 Hate Intensity calculation
We make use of the temporal snapshots to calculate the hate
intensity of a user. The notion of hate intensity allows us to
capture the overall hatefulness of a user. A user with a high
value of hate intensity would be considered to be a potential
hateful user as compared to another with lower value. The
hate intensity value ranges from 0 to 1, with 1 representing
G0G1G2
A A A
B B BC C
D
New
User
Existing
User
New
Edge
Existing
Edge
Figure 2: An example dynamic graph. The nodes represent
user accounts and the edges represent the ‘follows’ relation-
ship.
highly hateful user and values close to zero representing
non-hateful user.
We use the DeGroot model (DeGroot 1974; Golub and
Jackson 2010; Ribeiro et al
.
2018; Mathew et al
.
2018a) to
calculate the hate intensity of a user at each snapshot. Similar
to Mathew et al. (Mathew et al
.
2018a), our purpose of using
DeGroot model is to capture users who did not use these hate
keywords explicitly, yet have a high potential to spread hate.
We later perform manual evaluation to ensure the quality of
the model.
DeGroot model
In the DeGroot opinion dynamics model (DeGroot 1974),
each individual has a fixed set of neighbours, and the local
interaction is captured by taking the convex combination of
his/her own opinion and the opinions of his/her neighbours
at each time step (Xu, Liu, and Ba
s¸
ar 2015). The DeGroot
model describes how each user repeatedly updates its opinion
to the average of those of its neighbours. Since this model
reflects the fundamental human cognitive capability of tak-
ing convex combinations when integrating related informa-
tion (Anderson 1981), it has been studied extensively in the
past decades (Chamley, Scaglione, and Li 2013). We will
now briefly explain the DeGroot model and how we modify
it to calculate the hate intensity of a user account.
In the DeGroot model, each user starts with an initial belief.
In each time step, the user interacts with its neighbours and
updates his/her belief based on the neighbour’s beliefs. The
readers should remember that each snapshot is a directed
graph,
Gs= (Vs, Es)
with
Vs
representing the set of vertices
and Esrepresenting the set of edges at snapshot s. Let N(i)
denote the set of neighbours of node
i
and
zi(t)
denote the
belief of the node
i
at iteration
t
. The update rule in this
model is the following:
zi(t+ 1) = wiizi(t)+PjN(i)wij zj(t)
wii+PjN(i)wij
where (i, j)Es.
For each snapshot, we assign the initial edge weights based
on the following criteria:
wij =
eRij if Fij = 1
eRij if Fij = 0 and Rij >0
0if Fij = 0 and Rij = 0
1 + Piif i=j
(1)
where
Rij
denotes the number of reposts done by user
i
, where the original post was made user
j
.
Fij
represents
the following relationship, where
Fij = 1
means that user
i
is following user
j
, and
Fij = 0
means that user
i
is not
following user
j
. Similarly,
Pi
denotes the number of posts
by user i.
We then run the DeGroot model on each snapshot graph for
5 iterations, similar to Mathew et al. (Mathew et al
.
2018a),
to obtain the hate score for each of the users.
Hate lexicon
We initially started with the lexicon set available in Mathew
et al. (Mathew et al
.
2018a). These high-precision keywords
were selected from Hatebase
5
and Urban dictionary
6
. To
further enhance the quality of the lexicon, we adopt the word
embedding method, skip-gram (Mikolov et al
.
2013), to learn
distributed representation of the words from our Gab dataset
in an unsupervised manner. This would allow us to enhance
the hate lexicon with words that are specific to the dataset
as well as spelling variations used by the Gab users. For
example, we found more than five variants for the derogatory
term ni**er in the dataset used by hateful users. We manually
went through the words and carefully selected only those
words which could be used in a hateful context. This resulted
in a final set of 187 phrases which we have made public
7
for the use of future researchers. In figure 3, we plot the
% of posts that have at least one of the words from these
hate lexicon. We can observe from these initial results that
the volume of hateful posts on Gab is increasing over time.
Further, in order to establish the quality of this lexicon, we
collected three posts randomly for each of the words in the
hate lexicon. Two of the authors independently annotated
these posts for the presence of hate speech, which yielded
88.5% agreement where both the annotators found the posts
to be hateful. The value indicates that the lexicons developed
are of high quality. The annotators were instructed to follow
the definition of hate speech used in Elsherief et al. (ElSherief
et al. 2018).
Calculating the hate score
Using the high precision hate lexicon directly to assign a
hate score to a user should be problematic because of two
reasons: first, we might miss out on a large set of users who
might not use any of the words in the hate lexicon directly
or use spelling variations, thereby, getting a much lower
score. Second, many of the users share hateful messages via
images, videos and external links. Using the hate lexicon for
these users will not work. Instead, we use a variant of the
5https://hatebase.org
6https://www.urbandictionary.com
7
https://www.dropbox.com/sh/spidpraeln0qrtj/
AACyFRPAWURXT05dbHwH9-Kta?dl=0
Figure 3: The percentage of posts over time that have at least
one of the hate words. The red line shows the increasing trend
of posting such messages on Gab.
methodology used in Riberio et al. (Ribeiro et al
.
2018) to
assign each user in each snapshot a value in the range
[0,1]
which indicates the users’ propensity to be hateful.
We enumerate the steps of our methodology below. We
apply this procedure for each snapshot to get the hate score
for each user.
We identify the initial set of potential hateful users as those
who have used the words from the hate lexicon in at least
two posts. Rest of the users are identified as non-hateful
users.
Using the snapshot graph, we assign the edge weight ac-
cording to equation 1. We convert this graph into a belief
graph by reversing the edges in the original graph and
normalizing the edge weights between 0and 1.
We then run a diffusion process based on the DeGroot’s
learning model on the belief network. We assign an ini-
tial belief value of
1
to the set of potential hateful users
identified earlier and 0to all the other users.
We observe the belief values of all the users in the network
after five iterations of the diffusion process.
Threshold selection
The DeGroot’s model will assign each user a hate score in the
range
[0,1]
with
0
implying the least hateful and
1
implying
highly hateful. In order to draw the boundary between the
hateful and non-hateful users, we need a threshold value,
above which we might be able to call a user is hateful. The
same argument goes for the non-hateful users as well: a
threshold value below which the user can be considered to
be non-hateful.
In order to select such threshold values, we used
k
-
means (MacQueen and others 1967; Jain, Murty, and Flynn
1999) as a clustering algorithm on the scalar values of the
hate score. Briefly,
k
-means selects
k
points in space to be
the initial guess of the
k
centroids. Remaining points are
then allocated to the nearest centroid. The whole procedure
Figure 4: The distribution of hate scores and the centroid
values based on the k-means algorithm.
Figure 5: The number of accounts that are labelled as low,
mid, and high hate in each of the snapshots.
is repeated until no points switch cluster assignment or a
number of iterations is performed. In our case, we assign
k= 3
which would give us three regions in the range
[0,1]
represented by three centroids
CL
,
CM
, and
CH
denoting
‘low hate’, ‘medium hate’ and ‘high hate’, respectively. The
purpose of having medium hate category is to capture the
ambiguous users. These will be the users who will have val-
ues that are neither high enough to be considered hateful nor
low enough to be considered non-hateful. We apply
k
-means
algorithm on the list of hate scores from all the snapshots.
Figure 5 shows the fraction of users in each category of hate
in each snapshot. The DeGroot model is biased toward non-
hate users as in every snapshot, a substantial fraction of users
are initially assigned a value of zero. As shown in figure 4,
the centroid values are 0.0421 (
CL
), 0.2111 (
CM
), 0.5778
(
CH
) for the low, mid, and high hate score users, respectively.
H MH H H H H H H H H H H HL M
(a) Hateful user
HL LL L L L L L L L L L LM M
(b) Non-hateful user
Figure 6: The hate vector consisting of sequence of low (L),
mid (M), and high (H) hate. (a) An example of a hateful user.
(b) An example of a non-hateful user.
7 User profiling
Using the centroid values (
CL
,
CM
, and
CH
), we transform
the activities of a user into a sequence of low, medium, and
high hate over time. We denote this sequence by a vector
Vhate
. Each entry in
Vhate
consists of one of the three values
of low, mid, and high hate. This would allow us to find the
changes in the perspective of a user at multiple time points.
Consider the example given in figure 6a. The vector rep-
resents a user who had high hate score for most of the time
period with intermittent low and medium hate score. Simi-
larly, figure 6b shows a user who had low hate score for most
of the time period. For the purpose of this study, we mainly
focus on only two types of user profiles: consistently hateful
users and the consistently non-hateful users.
We would like to point out here that other types of varia-
tions could also be possible. Like a user’s hate score might
change from one category to other multiples times, but we
have not considered such cases here.
In order to find these users, we adopt a conservative ap-
proach and categorize the users based on the following crite-
ria:
Hateful
: We would call a user as hateful if at least 75% of
his/her Vhate entries contain an ‘H’.
Non-hateful
: We would call a user as non-hateful user if at
least 75% of his/her Vhate entries contain an ‘L’.
In addition, we used the following filters on the user ac-
counts as well:
The user should have posted at least five times.
The account should be created before February 2018 so
that there are at least six snapshots available for the user.
We have not considered users with hate score in the mid-
region as they are ambiguous. After the filtering, the number
of users in the two different categories are noted in Table 2.
In the following section, we will perform textual and network
analysis on these types of users and try to characterize them.
Sampling the appropriate set of non-hateful users
We use the non-hateful users as the candidates in the control
group. Our idea of the control group is to find non-hate users
with similar characteristics as the hateful users. For sanity
check purpose, we identify users who have (nearly) the same
Type #users
Hateful 1,019
Non-hateful 19,814
Table 2: Number of user accounts in each type of hate
activity rate as the users in the hate set. We define the activity
rate of a user as the sum of the number of posts and reposts
done by the user, divided by the account age as of June
2018. For each hateful user, we identify a user from the
non-hateful set with the nearest activity rate. We repeat this
process for all the users in the hate list. We then performed
Mann-Whitney
U
-test (DePuy, Berger, and Zhou 2005) to
measure the goodness. We found the value of
U= 517,208
and
p
-value = 0.441. This indicates that the hate and non-hate
users have nearly the same distribution. By using this subset
of non-hate users, we aim to capture any general trend in
Gab. Our final set consists of 1,019 hateful users and the
corresponding 1,019 non-hateful users who have very similar
activity profile.
Evaluation of user profiles
We evaluate the quality of the final dataset of hateful and
non-hateful accounts through human judgment. We ask two
annotators to determine if a given account is hateful or non-
hateful as per their perception. Since Gab does not have any
policy for hate speech, we use the definition of hate speech
provided by Elsherief et al. (ElSherief et al
.
2018) for this
task. We provide the annotators with a class balanced random
sample of 200 user accounts.
Each account was evaluated by two independent annota-
tors. After the labelling was complete, we only took those
annotations where both the annotators agreed. This gave us
a final set of 258 user accounts, where 135 accounts were
hateful and 123 accounts were non-hateful. We compared
these ground truth annotations with our model predictions
and found that they were in almost 100% agreement.
8 RQ1: How can we characterize the growth
of hate speech in Gab?
The volume of hate speech is increasing
As a first step to measure the growth of hate speech in Gab,
we use the hate lexicon that we generated to find the number
of posts which contain them in each month. We can observe
from figure 3 that the amount of hate speech in Gab is indeed
increasing.
More number of new users are becoming hateful
Another important aspect about the growth that we consid-
ered was the fraction of new users who are getting exposed
to hate. In this scenario, we say that a user
A
has become
hateful, if his/her hate vector has the entry ‘H’ at least
N
times within
T
months from the account creation. In figure 7,
we plot for
T= 3
and
N= 1,2,3
‘H’ entries, to observe the
fraction of users for each month who are becoming hateful.
As we can observe, the fraction of users being exposed to
hate speech is increasing over time.
New users are becoming hateful at a faster rate
In figure 8, we show how much time does a user take to have
the first, second and third ‘H’ entry
8
. We observe that with
8
The reason for the initial dip in the plot is that some of the
users who have 1 ‘H’ do not further account for 2 ‘H’ and 3 ‘H’
Figure 7: The figure shows the fraction of users in each month
who got assigned at least
(1, 2, 3)
‘H’ within the first
three
months of their joining.
Figure 8: The figure shows the amount of time (in months)
that each user requires to achieve
N
‘H’ entries from his/her
month of joining.
time the time required for a user to get his/her first exposure
to hate decreases in Gab.
9 RQ2: What was the impact of the hateful
users on Gab?
Hate users receive replies much faster
In order to understand the user engagement, we define ‘first
reply time’(FRT) which tries to measure the time taken to get
the first comment/reply to a post by a user.
since they were never assigned a ‘H’ after the first one.
(a) Hateful user movement. (b) Non-hateful user movement.
Figure 9: Alluvial diagram to show the core-transition for the users. A lower core value represents that a node is situated deeper
in the network.
We define the ‘first reply time’ (FRT) for a set of users
U
as
F RTU=1
|U|P(u)UTu
, where
Tu
represents the average
time taken to get the first reply for the posts written by a user
u.
We calculated the FRT values for the set of hateful and
non-hateful users and found that the average time for the first
reply is 51.32 minutes for non-hate users, whereas it is 44.38
minutes for the hate users (
p
-value
<0.001
). The indicates
that the community is engaging with the hateful users at a
faster speed as compared to the non-hateful users.
Hateful users: lone wolf or clans
In this section, we study the hateful and non-hateful users
from a network-centric perspective by leveraging user-level
dynamic graph. This approach has been shown to be effective
in extracting anomalous patterns in microblogging platforms
such as Twitter and Weibo (Zhang et al
.
2015; Shin, Eliassi-
Rad, and Faloutsos 2016). In similar lines, we conduct an
unique experiment, where we track the influence of hateful
and non-hateful users across successive temporal snapshots.
We utilize the node metric – k-core or coreness to identify
influential users in the network (Shin, Eliassi-Rad, and Falout-
sos 2016). Nodes with high coreness are embedded in major
information pathways. Hence they have been shown to be in-
fluential spreaders, that can diffuse information to a large por-
tion of the network (Malliaros, Rossi, and Vazirgiannis 2016;
Kitsak et al
.
2010). For further details about coreness and its
several applications in functional role identification, we refer
to Malliaros et al. (Malliaros et al
.
2019). We first calculate
coreness of the undirected follower/followee graph for each
temporal snapshot using k-core decomposition (Malliaros,
Rossi, and Vazirgiannis 2016). In each snapshot, we subdi-
vide all the nodes into
10
buckets where consecutive buckets
comprise increasing order of influential nodes, i.e., the bot-
tom
10
percentile nodes to the top
10
percentile nodes in
the network. We calculate the proportion of each category
of users in all the proposed buckets across multiple dynamic
graphs. We further estimate the proportion of migration from
different buckets in consecutive snapshots. We illustrate re-
sults as a flow diagram in figure 9. The innermost core is
labeled 0, the next one labeled 1 and so on. The bars that
have been annotated with a label
NA
denote the proportion
of users who have eventually been detected to be in a particu-
lar category but have not yet entered in the network at that
time point (Account is not yet created).
Position of hateful users
: We demonstrate the flow hateful
users in figure 9a. The leftmost bar denotes the entire group
strength. The following bars indicate consecutive time points,
each showcasing the evolution of the network.
We could observe several interesting patterns in figure 9a.
In the initial three time points, we observe that a large propor-
tion of users are confined to the outer shells of the network.
This forms a network-centric validation of the hypothesis
that newly joined users tend to familiarize themselves with
the norms of the community and do not exert considerable
influence (Danescu-Niculescu-Mizil et al
.
2013). However, in
the final time points we observe that the hateful users rapidly
rise in ranks and the majority of them assimilate in the inner
cores. This trend among Gab users has been found consis-
tent with other microblogging sites like Twitter (Ribeiro et
al
.
2018) where hate mongers have been found with higher
eigenvector and betweenness centrality compared to normal
accounts. There are also surprising cases where a fraction
of users who have just joined the network, become part of
the inner core very quickly. We believe that this is by their
virtue of already knowing a lot of ‘inner core’ Gab users even
before they join the platform.
Position of non-hateful users
: Focusing on figure 9b, which
illustrates the case of non-hateful users, we see a contrasting
trend. The flow diagram shows that users already in influ-
ential buckets continue to stay there over consecutive time
periods. The increase in core size at a time point can be
mostly attributed to the nodes of the nearby cores in the pre-
vious time point. We also observe that in the final snapshot
of the graph all the cores tend to have a similar number of
nodes. These results are in sharp contrast with those observed
for the hateful users (figure 9a).
Acceleration toward the core
: We were also interested in
understanding the rate at which the users were accelerating
toward the core. To this end, we calculated the time it took
Figure 10: The ratio of hateful users to non-hateful users for
each month in a specific core of the network
for the users to reach bucket 0 from their account creation
time. We found that a hateful user takes only 3.36 month to
do this, whereas a non-hateful users requires 6.94 months to
reach an inner core in the network. We test the significance
of this result with the Mann-Whitney
U
-test and found
U=
35203.5and p-value= 2.68e06.
To further understand the volume of users transitioning
in-between the cores of the network, we compute the ratio
of the hateful to the non-hateful users in a given core for
each month. Figure 10 plots the ratio values. A value of 1.0
means that an equal number of hateful and non-hateful users
occupy the same core in a particular month. A value less
than one means that there were more non-hateful users in
a particular core than there were hateful users. We observe
that in the initial time periods (October 2016 - July 2017),
the non-hateful users were occupying the inner core of the
network more. However, after this, the fraction of hateful
users in the innermost started increasing, and around August
2017 the fraction of hateful users surpassed the non-hateful
ones. We observe similar trends in all the four innermost
cores (0, 1, 2, and 3).
Gab community is increasingly using language
similar to users with high hate scores
Gab started in August 2016 with the intent to become the
‘champion of free speech’. Since its inception, it has attracted
several types of users. As the community evolves, so does
the members in the community. To understand the temporal
nature of the language of the users and the community, we
utilize the framework developed by Danescu et al. (Danescu-
Niculescu-Mizil et al
.
2013). In their work, the authors use
language models to track the linguistic change in communi-
ties.
We use kenLM (Heafield et al
.
2013) to generate language
models for each snapshot. These ‘Snapshot Language Mod-
els’ (SLM) are generated for each month, and they capture
the linguistic state of a community at one point of time. The
Figure 11: Month wise entropy of the predictions obtained
from the H-SLM and N-SLM when the full data for that
month is used as the test set.
SLMs allow us to capture how close a particular utterance
is to a community. The ‘Hate Snapshot Language Model’
(H-SLM) is generated using the posts written by the users
with high hate score in a snapshot as the training data. Simi-
larly, we generate the ‘Non-hate Snapshot Language Model’
(N-SLM), which uses the posts written by users with low
hate score in a snapshot for the training data. Note that un-
like in the previous sections where we were building hate
vectors aggregated over different time snapshots to call a
user hateful/non-hateful, here we consider posts of users with
high/low hate scores for a given snapshot to build the snap-
shot wise training data for the language models
9
. For a given
snapshot, we use the full data for testing. Using these two
models, we test them on all the posts of the month and report
the average cross entropy
H(d, SLMct) = 1
|d|X
bid
SLMct(bi)
where
H(.)
represents the cross-entropy,
SLMct(bi)
is the
probability assigned to bigram
bi
from comment
d
in
community-month
ct
. Here, the community can be hate (H-
SLM) or non-hate (N-SLM)10.
A higher value of cross-entropy indicates that the posts of
the month deviate from the respective type of the community
(hate/ non-hate). We plot the entropy values in figure 11. As
is observed, the whole Gab community seems to be more
aligned with the language model built using the posts of the
users with high hate scores. A remarkable observation is
that from around May 2017, the language used by the Gab
community started getting closer to the language of users with
high hate scores. This may indicate that the hate community
9
It is not possible to extend the hate vector concept here as we
are building language models snapshot by snapshot.
10
We controlled for the spurious length effect by considering only
the initial 30 words only (Danescu-Niculescu-Mizil et al
.
2013); the
same controls are used in the cross-entropy calculations.
is evolving its own linguistic culture and the other users on
the platform are slowly (and possibly unknowingly) aligning
themselves to this culture thus making it the normative. If left
unmoderated, one might notice in the near future that there
would be only one language on this platform – the language
of hate.
10 Discussion
Ethical considerations and implications
The ongoing discussion of whether to include hate speech
under the umbrella of free speech is a subject of great sig-
nificance. A major argument used by the supporters of hate
speech is that any form of censorship of speech is a violation
of freedom of speech. Our work provides a peek into the hate
ecosystem developed on a platform which does not have any
sort of moderation, apart form ‘self-censorship’.
We caution against our work being perceived as a means
for full-scale censorship. Our work is not indented to be
perceived as a support for full-scale censorship. We simply
argue that the ‘free flow’ of hate speech should be stopped.
We leave it to the platform or government to implement a
system that would reduce such speech in the online space.
Moderation
Although the intent of Gab was to provide support for free
speech, it is acting as a fertile ground for the fringe commu-
nities such as alt-right, neo-Nazis etc. Freedom of speech
which harms others is no longer considered freedom. While
banning of users and/or comments is not democratic, plat-
forms/governments would still need to curb such hateful
contents for proper functioning of the ‘online’ society. Re-
cently, some of the works have started looking into alter-
natives to the banning approach. One of the contenders for
this is counterspeech (Benesch 2014; Mathew et al
.
2018c;
Mathew et al
.
2018b). The basic idea is that instead of ban-
ning hate speech, we should use crowdsourced responses to
reply to these messages. The main advantage of such an ap-
proach is that it does not violate the freedom of speech. How-
ever, there are some doubts on how much applicable/practical
this approach is. Large scale studies would need to be done
to observe the benefits and costs of such an approach. Under-
standing the effects of such moderation is an area of future
work.
We suggest that social media platforms could start incen-
tive programs for counterspeech. They could also provide
interface to group moderators to identify hateful activities
and take precautionary steps. This would allow platforms to
stop the spread of hateful messages in an early stage itself.
The effect of hate speech from influential users is also much
more as compared to others and thus targeted campaigns are
required to overcome such adverse effects.
Monitoring the growth of hate speech
The platform should have interface which allows moderators
to monitor the growth of hate speech in the community. This
could be a crowdsourced effort which could help identify
users who are attempting to spread hate speech.
As we have seen that the new users are gravitating toward
the hateful community at a faster rate and quantity, there is a
need for methods that could detect and prevent such move-
ment. There could exist radicalization pipelines (Ribeiro et al
.
2019) which could navigate a user toward hateful contents.
Platforms should make sure that their user feed and recom-
mendation algorithms are free from such issues. Exposure
to such content could also lead to desensitization toward the
victim community (Soral, Bilewicz, and Winiewski 2018).
We would need methods which would take the user network
into consideration as well. Instead of waiting for a user to
post his/her hateful post after the indoctrination, the plat-
forms will need to be proactive instead of reactive. Some
simple methods such as nudge (Thaler and Sunstein 2009),
or changing the user feed to reduce polarization (Celis et al
.
2019) could be an initial step. Further research is required in
this area to study these points more carefully.
Platform governance – the rising role of HCI
All the points that we had discussed above related to modera-
tion and monitoring can be aptly summarized as initiatives
toward platform governance. We believe that within this ini-
tiative the HCI design principles of the social media platforms
need to completely overhauled. In February 2019, the United
Kingdom’s Digital, Media, Culture, and Sport (DCMS) com-
mittee issued a verdict that social media platforms can no
longer hide themselves behind the claim that they are merely
a ‘platform’ and therefore have no responsibility of regulat-
ing the content of their sites
11
. In fact, the European Union
now has the ‘EU Code of Conduct on Terror and Hate Con-
tent’ (CoT) that applies to the entire EU region. Despite the
increase in toxic content and harassment, Twitter did not have
a policy of their own to mitigate these issues until the com-
pany created a new organisation – ‘Twitter Trust and Safety
Council’ in 2015. A common way deployed by the EU to
combat such online hate content involves creation of working
groups that combine voices from different avenues including
academia, industry and civil society. For instance, in Jan-
uary 2018, 39 experts met to frame the ‘Code of Practice
on Online Disinformation’ which was signed by tech giants
like Facebook, Google etc. We believe that HCI practitioners
have a lead role to play in such committees and any code of
conduct cannot materialize unless the HCI design policies of
these platforms are reexamined from scratch.
11 Limitations and future works
There are several limitations of our work. We are well aware
that studies conducted on only one social media such as Gab
have certain limitations and drawbacks. Especially, since
other social media sites delete/suspend hateful posts and/or
users, it becomes hard to conduct similar studies on those
platforms. The initial keywords selected for the hate users
were in English. This would bias the initial belief value as-
signment as users who use non-English hate speech would
not be detected directly. But, since these users follow similar
11
https://policyreview.info/articles/analysis/platform-
governance-triangle-conceptualising-informal-regulation-online-
content
hate users and repost several of their content, we would still
detect many of them. We plan to take up the multilingual as-
pect as an immediate future work. Another major limitation
of our work is the high-precision focus of the work which
would leave out several users who could have been hateful.
As part of the future work, we also plan to use the discourse
structure of these hateful users for better understanding of the
tactics used by these users in spreading hate speech (Phadke
et al
.
2018). This would allow us to break down the hate
speech discourse into multiple components and study them
in detail.
12 Conclusion
In this paper, we perform the first temporal analysis of hate
speech in online social media. Using an extensive dataset
of 21M posts by 314K users on Gab, we divide the dataset
into multiple snapshots and assign a hate score to each user
at every snapshot. We then check for variations in the hate
score of the user. We characterize these account types on the
basis of text and network structure. We observe that a large
fraction of hateful users occupy the core of the Gab network
and they reach the core at a much faster rate as compared
to non-hateful users. The language of the hate users seem to
pervade across the whole network and convert even benign
users unknowingly to speak the language of hate. Our work
would be extremely useful to platform designers to detect
the hateful users at an early stage and introduce appropriate
measure to change the users’ stance.
References
[Agarwal and Sureka 2015]
Agarwal, S., and Sureka, A.
2015. Using knn and svm based one-class classifier for
detecting online radicalization on twitter. In International
Conference on Distributed Computing and Internet Technol-
ogy, 431–442. Springer.
[Anderson 1981]
Anderson, N. H. 1981. Foundations of
information integration theory.
[Antonakaki, Ioannidis, and Fragopoulou 2015]
Antonakaki,
D.; Ioannidis, S.; and Fragopoulou, P. 2015. Evolving
twitter: an experimental analysis of graph properties of the
social graph. arXiv preprint arXiv:1510.01091.
[Badjatiya et al. 2017]
Badjatiya, P.; Gupta, S.; Gupta, M.;
and Varma, V. 2017. Deep learning for hate speech detection
in tweets. WWW, 759–760.
[Benesch et al. 2016]
Benesch, S.; Ruths, D.; Dillon, K. P.;
Saleem, H. M.; and Wright, L. 2016. Counterspeech on
twitter: A field study. Dangerous Speech Project. Available
at: https://dangerousspeech.org/counterspeech-on-twitter-a-
field-study.
[Benesch 2014]
Benesch, S. 2014. Countering dangerous
speech: new ideas for genocide prevention. Washington, DC:
US Holocaust Memorial Museum.
[Burnap and Williams 2016]
Burnap, P., and Williams, M. L.
2016. Us and them: identifying cyber hate on twitter across
multiple protected characteristics. EPJ Data Science 5(1):11.
[Celis et al. 2019]
Celis, L. E.; Kapoor, S.; Salehi, F.; and
Vishnoi, N. 2019. Controlling polarization in personal-
ization: An algorithmic framework. In Proceedings of the
Conference on Fairness, Accountability, and Transparency,
160–169. ACM.
[Chamley, Scaglione, and Li 2013]
Chamley, C.; Scaglione,
A.; and Li, L. 2013. Models for the diffusion of beliefs
in social networks: An overview. IEEE Signal Processing
Magazine 30(3):16–29.
[Chandrasekharan et al. 2017a]
Chandrasekharan, E.;
Pavalanathan, U.; Srinivasan, A.; Glynn, A.; Eisenstein, J.;
and Gilbert, E. 2017a. You can’t stay here: The efficacy of
reddit’s 2015 ban examined through hate speech. PACMHCI
1:31:1–31:22.
[Chandrasekharan et al. 2017b]
Chandrasekharan, E.;
Samory, M.; Srinivasan, A.; and Gilbert, E. 2017b. The
bag of communities: Identifying abusive behavior online
with preexisting internet data. In Proceedings of the 2017
CHI Conference on Human Factors in Computing Systems,
3175–3187. ACM.
[Chau and Xu 2007]
Chau, M., and Xu, J. 2007. Mining
communities and their relationships in blogs: A study of on-
line hate groups. International Journal of Human-Computer
Studies 65(1):57–70.
[Chen 2011]
Chen, Y. 2011. Detecting offensive language
in social medias for protection of adolescent online safety.
Ph.D. Dissertation. The Pennsylvania State University.
[Danescu-Niculescu-Mizil et al. 2013]
Danescu-Niculescu-
Mizil, C.; West, R.; Jurafsky, D.; Leskovec, J.; and Potts,
C. 2013. No country for old members: User lifecycle and
linguistic change in online communities. In Proceedings
of the 22nd international conference on World Wide Web,
307–318. ACM.
[Davidson et al. 2017]
Davidson, T.; Warmsley, D.; Macy,
M.; and Weber, I. 2017. Automated hate speech detection and
the problem of offensive language. In Eleventh International
AAAI Conference on Web and Social Media.
[de Gibert et al. 2018]
de Gibert, O.; Perez, N.; Pablos, A. G.;
and Cuadros, M. 2018. Hate speech dataset from a white
supremacy forum. In Proceedings of the 2nd Workshop on
Abusive Language Online (ALW2), 11–20.
[DeGroot 1974]
DeGroot, M. H. 1974. Reaching a consensus.
Journal of the American Statistical Association 69(345):118–
121.
[DePuy, Berger, and Zhou 2005]
DePuy, V.; Berger, V. W.;
and Zhou, Y. 2005. W ilcoxon–m ann–w hitney test. Ency-
clopedia of statistics in behavioral science.
[Djuric et al. 2015]
Djuric, N.; Zhou, J.; Morris, R.; Grbovic,
M.; Radosavljevic, V.; and Bhamidipati, N. 2015. Hate
speech detection with comment embeddings. In Proceedings
of the 24th international conference on world wide web, 29–
30. ACM.
[ElSherief et al. 2018]
ElSherief, M.; Kulkarni, V.; Nguyen,
D.; Wang, W. Y.; and Belding, E. 2018. Hate lingo: A target-
based linguistic analysis of hate speech in social media. In
Twelfth International AAAI Conference on Web and Social
Media.
[Fasoli, Maass, and Carnaghi 2015]
Fasoli, F.; Maass, A.;
and Carnaghi, A. 2015. Labelling and discrimination: Do ho-
mophobic epithets undermine fair distribution of resources?
British Journal of Social Psychology 54(2):383–393.
[Fortuna and Nunes 2018]
Fortuna, P., and Nunes, S. 2018.
A survey on automatic detection of hate speech in text. ACM
Computing Surveys (CSUR) 51(4):85.
[Founta et al. 2018]
Founta, A. M.; Djouvas, C.; Chatzakou,
D.; Leontiadis, I.; Blackburn, J.; Stringhini, G.; Vakali, A.;
Sirivianos, M.; and Kourtellis, N. 2018. Large scale crowd-
sourcing and characterization of twitter abusive behavior. In
Twelfth International AAAI Conference on Web and Social
Media.
[Gagliardone et al. 2015]
Gagliardone, I.; Gal, D.; Alves, T.;
and Martinez, G. 2015. Countering online hate speech.
Unesco Publishing.
[Gao and Huang 2017]
Gao, L., and Huang, R. 2017. Detect-
ing online hate speech using context aware models. arXiv
preprint arXiv:1710.07395.
[Gibson 2017]
Gibson, A. 2017. Safe spaces & free speech:
Effects of moderation policy on structures of online forum
discussions. In Proceedings of the 50th Hawaii International
Conference on System Sciences.
[Gitari et al. 2015]
Gitari, N. D.; Zuping, Z.; Damien, H.; and
Long, J. 2015. A lexicon-based approach for hate speech de-
tection. International Journal of Multimedia and Ubiquitous
Engineering 10(4):215–230.
[Golub and Jackson 2010]
Golub, B., and Jackson, M. O.
2010. Naive learning in social networks and the wisdom
of crowds. American Economic Journal: Microeconomics
2(1):112–49.
[Greenberg and Pyszczynski 1985]
Greenberg, J., and
Pyszczynski, T. 1985. The effect of an overheard ethnic slur
on evaluations of the target: How to spread a social disease.
Journal of Experimental Social Psychology 21(1):61–72.
[Gr¨
ondahl et al. 2018]
Gr
¨
ondahl, T.; Pajola, L.; Juuti, M.;
Conti, M.; and Asokan, N. 2018. All you need is”
love”: Evading hate-speech detection. arXiv preprint
arXiv:1808.09115.
[Guermazi, Hammami, and Hamadou 2007]
Guermazi, R.;
Hammami, M.; and Hamadou, A. B. 2007. Using a semi-
automatic keyword dictionary for improving violent web
site filtering. In 2007 Third International IEEE Conference
on Signal-Image Technologies and Internet-Based System,
337–344. IEEE.
[Gunasekara and Nejadgholi 2018]
Gunasekara, I., and Ne-
jadgholi, I. 2018. A review of standard text classification
practices for multi-label toxicity identification of online con-
tent. In Proceedings of the 2nd Workshop on Abusive Lan-
guage Online (ALW2), 21–25.
[Heafield et al. 2013]
Heafield, K.; Pouzyrevsky, I.; Clark,
J. H.; and Koehn, P. 2013. Scalable modified Kneser-Ney lan-
guage model estimation. In Proceedings of the 51st Annual
Meeting of the Association for Computational Linguistics,
690–696.
[Jain, Murty, and Flynn 1999]
Jain, A. K.; Murty, M. N.; and
Flynn, P. J. 1999. Data clustering: a review. ACM computing
surveys (CSUR) 31(3):264–323.
[Kitsak et al. 2010]
Kitsak, M.; Gallos, L. K.; Havlin, S.; Lil-
jeros, F.; Muchnik, L.; Stanley, H. E.; and Makse, H. A. 2010.
Identification of influential spreaders in complex networks.
Nature physics 6(11):888.
[Lang and Wu 2011]
Lang, J., and Wu, S. F. 2011. Anti-
preferential attachment: If i follow you, will you follow me?
In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE
Third Inernational Conference on Social Computing (Social-
Com), 2011 IEEE Third International Conference on, 339–
346. IEEE.
[Liu and Forss 2015]
Liu, S., and Forss, T. 2015. New classi-
fication models for detecting hate and violence web content.
In 2015 7th International Joint Conference on Knowledge
Discovery, Knowledge Engineering and Knowledge Manage-
ment (IC3K), volume 1, 487–495. IEEE.
[MacQueen and others 1967]
MacQueen, J., et al. 1967.
Some methods for classification and analysis of multivariate
observations. In Proceedings of the fifth Berkeley sympo-
sium on mathematical statistics and probability, volume 1,
281–297. Oakland, CA, USA.
[Maity et al. 2018]
Maity, S. K.; Chakraborty, A.; Goyal, P.;
and Mukherjee, A. 2018. Opinion conflicts: An effective
route to detect incivility in twitter. Proceedings of the ACM
on Human-Computer Interaction 2(CSCW):117.
[Malliaros et al. 2019]
Malliaros, F.; Giatsidis, C.; Pa-
padopoulos, A.; and Vazirgiannis, M. 2019. The core decom-
position of networks: Theory, algorithms and applications.
[Malliaros, Rossi, and Vazirgiannis 2016]
Malliaros, F. D.;
Rossi, M.-E. G.; and Vazirgiannis, M. 2016. Locating influ-
ential nodes in complex networks. Scientific reports 6:19307.
[Malmasi and Zampieri 2018]
Malmasi, S., and Zampieri, M.
2018. Challenges in discriminating profanity from hate
speech. Journal of Experimental & Theoretical Artificial
Intelligence 30(2):187–202.
[Mathew et al. 2018a]
Mathew, B.; Dutt, R.; Goyal, P.; and
Mukherjee, A. 2018a. Spread of hate speech in online social
media. arXiv preprint arXiv:1812.01693.
[Mathew et al. 2018b]
Mathew, B.; Kumar, N.; Goyal, P.;
Mukherjee, A.; et al. 2018b. Analyzing the hate and counter
speech accounts on twitter. arXiv preprint arXiv:1812.02712.
[Mathew et al. 2018c]
Mathew, B.; Tharad, H.; Rajgaria, S.;
Singhania, P.; Maity, S. K.; Goyal, P.; and Mukherje, A.
2018c. Thou shalt not hate: Countering online hate speech.
arXiv preprint arXiv:1808.04409.
[McNamee, Peterson, and Pe˜
na 2010]
McNamee, L. G.; Pe-
terson, B. L.; and Pe
˜
na, J. 2010. A call to educate, partici-
pate, invoke and indict: Understanding the communication of
online hate groups. Communication Monographs 77(2):257–
280.
[Meeder et al. 2011]
Meeder, B.; Karrer, B.; Sayedi, A.; Ravi,
R.; Borgs, C.; and Chayes, J. 2011. We know who you
followed last summer: inferring social link creation times in
twitter. In Proceedings of the 20th international conference
on World wide web, 517–526. ACM.
[Mikolov et al. 2013]
Mikolov, T.; Sutskever, I.; Chen, K.;
Corrado, G. S.; and Dean, J. 2013. Distributed representa-
tions of words and phrases and their compositionality. In
Advances in neural information processing systems, 3111–
3119.
[Mondal, Silva, and Benevenuto 2017]
Mondal, M.; Silva,
L. A.; and Benevenuto, F. 2017. A measurement study of
hate speech in social media. In Proceedings of the 28th ACM
Conference on Hypertext and Social Media, 85–94. ACM.
[Mullen and Rice 2003]
Mullen, B., and Rice, D. R. 2003.
Ethnophaulisms and exclusion: The behavioral consequences
of cognitive representation of ethnic immigrant groups. Per-
sonality and Social Psychology Bulletin 29(8):1056–1067.
[Mullen and Smyth 2004]
Mullen, B., and Smyth, J. M. 2004.
Immigrant suicide rates as a function of ethnophaulisms: Hate
speech predicts death. Psychosomatic Medicine 66(3):343–
348.
[Nobata et al. 2016]
Nobata, C.; Tetreault, J.; Thomas, A.;
Mehdad, Y.; and Chang, Y. 2016. Abusive language de-
tection in online user content. In Proceedings of the 25th
international conference on world wide web, 145–153. Inter-
national World Wide Web Conferences Steering Committee.
[Olteanu et al. 2018]
Olteanu, A.; Castillo, C.; Boy, J.; and
Varshney, K. R. 2018. The effect of extremist violence on
hateful speech online. In ICWSM.
[Phadke et al. 2018]
Phadke, S.; Lloyd, J.; Hawdon, J.;
Samory, M.; and Mitra, T. 2018. Framing hate with hate
frames: Designing the codebook. In Companion of the 2018
ACM Conference on Computer Supported Cooperative Work
and Social Computing, 201–204. ACM.
[Qian et al. 2018]
Qian, J.; ElSherief, M.; Belding, E.; and
Wang, W. Y. 2018. Leveraging intra-user and inter-user
representation learning for automated hate speech detection.
In NAACL, volume 2, 118–123.
[Ribeiro et al. 2018]
Ribeiro, M. H.; Calais, P. H.; Santos,
Y. A.; Almeida, V. A.; and Meira Jr, W. 2018. Charac-
terizing and detecting hateful users on twitter. In Twelfth
International AAAI Conference on Web and Social Media.
[Ribeiro et al. 2019]
Ribeiro, M. H.; Ottoni, R.; West, R.;
Almeida, V. A.; and Meira, W. 2019. Auditing radicalization
pathways on youtube. arXiv preprint arXiv:1908.08313.
[Saleem et al. 2017]
Saleem, H. M.; Dillon, K. P.; Benesch,
S.; and Ruths, D. 2017. A web of hate: Tackling
hateful speech in online social spaces. arXiv preprint
arXiv:1709.10159.
[Salminen et al. 2018]
Salminen, J.; Almerekhi, H.;
Milenkovi
´
c, M.; Jung, S.-g.; An, J.; Kwak, H.; and Jansen,
B. J. 2018. Anatomy of online hate: developing a taxonomy
and machine learning models for identifying and classifying
hate in online news media. In Twelfth International AAAI
Conference on Web and Social Media.
[Schmidt and Wiegand 2017]
Schmidt, A., and Wiegand, M.
2017. A survey on hate speech detection using natural lan-
guage processing. In Proceedings of the Fifth International
Workshop on Natural Language Processing for Social Media,
1–10.
[Shin, Eliassi-Rad, and Faloutsos 2016]
Shin, K.; Eliassi-
Rad, T.; and Faloutsos, C. 2016. Corescope: Graph mining
using k-core analysispatterns, anomalies and algorithms. In
2016 IEEE 16th International Conference on Data Mining
(ICDM), 469–478. IEEE.
[Silva et al. 2016]
Silva, L.; Mondal, M.; Correa, D.; Ben-
evenuto, F.; and Weber, I. 2016. Analyzing the targets of
hate in online social media. In Tenth International AAAI
Conference on Web and Social Media.
[Sood, Antin, and Churchill 2012]
Sood, S.; Antin, J.; and
Churchill, E. 2012. Profanity use in online communities. In
Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems, 1481–1490. ACM.
[Soral, Bilewicz, and Winiewski 2018]
Soral, W.; Bilewicz,
M.; and Winiewski, M. 2018. Exposure to hate speech
increases prejudice through desensitization. Aggressive be-
havior 44(2):136–146.
[Srivastava, Khurana, and Tewari 2018]
Srivastava, S.; Khu-
rana, P.; and Tewari, V. 2018. Identifying aggression and
toxicity in comments using capsule network. In Proceedings
of the First Workshop on Trolling, Aggression and Cyberbul-
lying (TRAC-2018), 98–105.
[Teh, Cheng, and Chee 2018]
Teh, P. L.; Cheng, C.-B.; and
Chee, W. M. 2018. Identifying and categorising profane
words in hate speech. In Proceedings of the 2nd International
Conference on Compute and Data Analysis, 65–69. ACM.
[Thaler and Sunstein 2009]
Thaler, R. H., and Sunstein, C. R.
2009. Nudge: Improving decisions about health, wealth, and
happiness. Penguin.
[Waseem and Hovy 2016]
Waseem, Z., and Hovy, D. 2016.
Hateful symbols or hateful people? predictive features for
hate speech detection on twitter. In Proceedings of the
NAACL student research workshop, 88–93.
[Xu, Liu, and Bas¸ar 2015]
Xu, Z.; Liu, J.; and Ba
s¸
ar, T. 2015.
On a modified degroot-friedkin model of opinion dynamics.
In 2015 American Control Conference (ACC), 1047–1052.
IEEE.
[Zannettou et al. 2018]
Zannettou, S.; Bradlyn, B.; De Cristo-
faro, E.; Kwak, H.; Sirivianos, M.; Stringini, G.; and Black-
burn, J. 2018. What is gab: A bastion of free speech or
an alt-right echo chamber. In Companion of the The Web
Conference 2018 on The Web Conference 2018, 1007–1014.
[Zhang et al. 2015]
Zhang, J.; Tang, J.; Li, J.; Liu, Y.; and
Xing, C. 2015. Who influenced you? predicting retweet via
social influence locality. ACM Transactions on Knowledge
Discovery from Data (TKDD) 9(3):25.
[Zhang, Robinson, and Tepper 2018]
Zhang, Z.; Robinson,
D.; and Tepper, J. 2018. Detecting hate speech on twit-
ter using a convolution-gru based deep neural network. In
European Semantic Web Conference, 745–760. Springer.
[Zhou et al. 2005]
Zhou, Y.; Reid, E.; Qin, J.; Chen, H.; and
Lai, G. 2005. Us domestic extremist groups on the web: link
and content analysis. IEEE intelligent systems 20(5):44–51.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Twitter is the largest and most popular micro-blogging website on Internet. Due to low publication barrier, anonymity and wide penetration, Twitter has become an easy target or platform for extremists to disseminate their ideologies and opinions by posting hate and extremism promoting tweets. Millions of tweets are posted on Twitter everyday and it is practically impossible for Twitter moderators or an intelligence and security analyst to manually identify such tweets, users and communities. However, automatic classification of tweets into pre-defined categories is a non-trivial problem problem due to short text of the tweet (the maximum length of a tweet can be 140 characters) and noisy content (incorrect grammar, spelling mistakes, presence of standard and non-standard abbreviations and slang). We frame the problem of hate and extremism promoting tweet detection as a one-class or unary-class categorization problem by learning a statistical model from a training set containing only the objects of one class . We propose several linguistic features such as presence of war, religious, negative emotions and offensive terms to discriminate hate and extremism promoting tweets from other tweets. We employ a single-class SVM and KNN algorithm for one-class classification task. We conduct a case-study on Jihad, perform a characterization study of the tweets and measure the precision and recall of the machine-learning based classifier. Experimental results on large and real-world dataset demonstrate that the proposed approach is effective with F-score of 0.60 and 0.83 for the KNN and SVM classifier respectively.
Article
Full-text available
Hateful and antagonistic content published and propagated via the World Wide Web has the potential to cause harm and suffering on an individual basis, and lead to social tension and disorder beyond cyber space. Despite new legislation aimed at prosecuting those who misuse new forms of communication to post threatening, harassing, or grossly offensive language - or cyber hate - and the fact large social media companies have committed to protecting their users from harm, it goes largely unpunished due to difficulties in policing online public spaces. To support the automatic detection of cyber hate online, specifically on Twitter, we build multiple individual models to classify cyber hate for a range of protected characteristics including race, disability and sexual orientation. We use text parsing to extract typed dependencies, which represent syntactic and grammatical relationships between words, and are shown to capture ‘othering’ language - consistently improving machine classification for different types of cyber hate beyond the use of a Bag of Words and known hateful terms. Furthermore, we build a data-driven blended model of cyber hate to improve classification where more than one protected characteristic may be attacked (e.g. race and sexual orientation), contributing to the nascent study of intersectionality in hate crime.
Article
In three studies (two representative nationwide surveys, N = 1,007, N = 682 and one experimental, N = 76) we explored the effects of exposure to hate speech on outgroup prejudice. Following the General Aggression Model, we suggest that frequent and repetitive exposure to hate speech leads to desensitization to this form of verbal violence and subsequently to lower evaluations of the victims and greater distancing, thus increasing outgroup prejudice. In the first survey study, we found that lower sensitivity to hate speech was a positive mediator of the relationship between frequent exposure to hate speech and outgroup prejudice. In the second study, we obtained a crucial confirmation of these effects. After desensitization training individuals were less sensitive to hate speech and more prejudiced towards hate speech victims than their counterparts in the control condition. In the final study, we replicated several previous effects and additionally found that the effects of exposure to hate speech on prejudice were mediated by a lower sensitivity to hate speech, and not by lower sensitivity to social norms. Altogether, our studies are the first to elucidate the effects of exposure to hate speech on outgroup prejudice.
Conference Paper
Since its earliest days, harassment and abuse have plagued the Internet. Recent research has focused on in-domain methods to detect abusive content and faces several challenges, most notably the need to obtain large training corpora. In this paper, we introduce a novel computational approach to address this problem called Bag of Communities (BoC)---a technique that leverages large-scale, preexisting data from other Internet communities. We then apply BoC toward identifying abusive behavior within a major Internet community. Specifically, we compute a post's similarity to 9 other communities from 4chan, Reddit, Voat and MetaFilter. We show that a BoC model can be used on communities "off the shelf" with roughly 75% accuracy---no training examples are needed from the target community. A dynamic BoC model achieves 91.18% accuracy after seeing 100,000 human-moderated posts, and uniformly outperforms in-domain methods. Using this conceptual and empirical work, we argue that the BoC approach may allow communities to deal with a range of common problems, like abusive behavior, faster and with fewer engineering resources.
Conference Paper
We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.