Content uploaded by Amanda Jane Davies
Author content
All content in this area was uploaded by Amanda Jane Davies on Jun 15, 2019
Content may be subject to copyright.
Journal of Criminological Research, Policy and Practice
Understanding the expression of grievances in the Arabic Twitter-sphere using machine learning
Yeslam Al-Saggaf, Amanda Davies,
Article information:
To cite this document:
Yeslam Al-Saggaf, Amanda Davies, (2019) "Understanding the expression of grievances in the Arabic Twitter-sphere using
machine learning", Journal of Criminological Research, Policy and Practice, https://doi.org/10.1108/JCRPP-02-2019-0009
Permanent link to this document:
https://doi.org/10.1108/JCRPP-02-2019-0009
Downloaded on: 14 June 2019, At: 23:13 (PT)
References: this document contains references to 39 other documents.
To copy this document: permissions@emeraldinsight.com
The fulltext of this document has been downloaded 8 times since 2019*
Access to this document was granted through an Emerald subscription provided by
Token:Eprints:DGWDZTC7HNDMTZRE9RXV:
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service
information about how to choose which publication to write for and submission guidelines are available for all. Please visit
www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company manages a portfolio of
more than 290 journals and over 2,350 books and book series volumes, as well as providing an extensive range of online
products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee on Publication
Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation.
*Related content and download information correct at time of download.
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
Understanding the expression
of grievances in the Arabic
Twitter-sphere using machine learning
Yeslam Al-Saggaf and Amanda Davies
Abstract
Purpose –The purpose of this paper is to discuss the design, application and findings of a case study in
which the application of a machine learning algorithm is utilised to identify the grievances in Twitter in an
Arabian context.
Design/methodology/approach –To understand the characteristics of the Twitter users who expressed
the identified grievances, data mining techniques and social network analysis were utilised. The study
extracted a total of 23,363 tweets and these were stored as a data set. The machine learning algorithm
applied to this data set was followed by utilising a data mining process to explore the characteristics of the
Twitter feed users. The network of the users was mapped and the individual level of interactivity and network
density were calculated.
Findings –The machine learning algorithm revealed 12 themes all of which were underpinned by the
coalition of Arab countries blockade of Qatar. The data mining analysis revealed that the tweets could be
clustered in three clusters, the main cluster included users with a large number of followers and friends but
who did not mention other users in their tweets. The social network analysis revealed that whilst a large
proportion of users engaged in direct messages with others, the network ties between them were not
registered as strong.
Practical implications –Borum (2011) notes that invoking grievances is the first step in the radicalisation
process. It is hoped that by understanding these grievances, the study will shed light on what radical groups
could invoke to win the sympathy of aggrieved people.
Originality/value –In combination, the machine learning algorithm offered insights into the grievances
expressed within the tweets in an Arabian context. The data mining and the social network analyses revealed
the characteristics of the Twitter users highlighting identifying and managing early intervention of radicalisation.
Keywords Data mining, Machine learning, Twitter, Social network analysis, Grievances,
Online radicalization
Paper type Research paper
Introduction
There are a number of narratives that radicals can invoke to win the sympathy of young people
online including discrimination (Ungar et al., 2018), a sense of alienation from the surrounding
society (Sabouni et al., 2017), a shared identity in diaspora grounded in a perception of
victimhood, the perception that Western society media is conspiring against them to undermine
them (Aly, 2009; Ewart et al., 2017) and damage their image, and the view that the clash of
civilisations between the East and the West is inevitable. Yusoufzai and Emmerling (2017)
preliminary findings suggested four contributing factors to Muslims joining terrorist organisations:
identity crisis couched in terms of the struggle to maintain a balance between different cultural
aspects of identity; personal needs such as sensation seeking; economic deprivation, i.e.
receiving less than deserved; and empathy for the suffering of Muslims. The last two offer
support to Torok’s (2016) and Al-Saggaf’s (2016) preliminary findings that proposed that online
radicalisation discourse appeals especially to those with grievances. In the face of this, it is no
Received 6 February 2019
Revised 16 April 2019
Accepted 19 April 2019
Yeslam Al-Saggaf is based at
the School of Computing and
Mathematics, Charles Sturt
University, Albury, Australia.
Amanda Davies is based at the
School of Policing Studies,
Charles Sturt
University –Wagga Wagga
Campus, Wagga Wagga,
Australia and Policing &
Security, Rabdan Academy,
Abu Dhabi,
United Arab Emirates.
DOI 10.1108/JCRPP-02-2019-0009 © Emerald Publishing Limited, ISSN 2056-3841
j
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
surprise that radicals are increasingly using social media to exploit the grievances of alienated
youth (Baaken and Schlegel, 2017).
To contribute to the contextualisation of the work presented here, the meanings attributed to the
terms, radicalisation, terrorism and extremism as adopted in the context of this work are offered.
A continuing conundrum which surrounds research and discussion associated with
radicalisation, terrorism and extremism is as suggested by Schmidt (2016), a lack of a
universally accepted definition by government or academia. Whilst acknowledging the widely held
view of the continuing debate in relation to agreed definitions, interconnection between the
descriptors is able to be perceived. The extent of this dilemma is exampled by the work of
Schmidt (2016) in identifying more than 100 different definitions/interpretations of the term
terrorism. A helpful definition is offered by the Federal Bureau of Investigation as referred to by
Maikovich (2005) in quoting Whittaker (2001, p. 3) “[terrorism is] the unlawful use of force or
violence against persons or property to intimidate or coerce a government, civilian population, or
any segment thereof, in furtherance of political or social objectives”. Similarly, the literature
suggests a conclusive definition for “radicalization”is yet to be established (see Moghaddam,
2005; Spencer, 2006). Veldius and Staun, in their work, centred on Islamic radicalisation offer the
following guide to describing radicalisation:
Definitions of radicalisation most often centre around two different foci: 1) on violent radicalisation, where
emphasis is put on the active pursuit or acceptance of theuse of violence to attain the stated goal; 2) on a
broader sense of radicalisation, where emphasis is placed on the active pursuit or acceptance of
far-reaching changes in society, which may or maynot constitute a danger to democracy and may or may
not involve the threat of or use of violence to attain the stated goals.
The work of Stephens et al. (2019, p. 2) in developing a literature review associated with
discussion of preventing violent extremism proffer a working reference for defining extremism
as follows:
An important conceptual distinction is often posited between idealistic and behavioral definitions of
extremism, meaning it can be used to refer to “political ideas that are diametrically opposed to a
society’s core values. […] Or it can mean the methods by which actors seek to realize any political
aim”. The concept of “violent extremism”tends toward a more behavioral than idealistic definition, in
that it places focus on violence as a means, rather than the holding of extreme views themselves.
Baaken and Schlegel (2017) suggest radicals are aware that the youth’s passive consumption
of their propaganda is not enough to radicalise them. For this reason, they engage with the
youth directly (through social media). As digital natives accept social media content more
naturally than older adults and digital natives resonate well with messages communicated to
them by other digital natives, radicals use messengers who are themselves digital natives
(Baaken and Schlegel, 2017). Baaken and Schlegel (2017) proffer that the utilisation of digital
natives in this process enables them to effectively arouse emotions and debate issues which
gives them a perception of legitimacy. Radicals also use homophily (Rowe and Saif, 2016), the
tendency of digital natives to associate themselves with like-minded individuals, to their
advantage. The problem with this is that these individuals tend to ignore news or information
that does not align with their views and beliefs (Del Vicario et al., 2016). This causes their current
mindsets, views and ideas to be reinforced, which, in turn, leads to the formation and spread of
biased views that thrive on misinformation.
Further, Torok’s (2016) and Al-Saggaf’s (2016) observation that grievances are context-specific,
not applying to broad community groups or situations, suggests that culture may play a role in
how grievances are exploited. From a cultural perspective, the group dynamics in the tribal
tradition of eastern societies, such as Arabic societies, is different from that in Western societies.
Hierarchy, or power distance, as Hofstede (1997) calls it, plays an important role in group
dynamics in these societies. In the case of societies with a high-power distance value, such as the
Arabic societies, hierarchy may mean certain members of the group, such as the elders, the
affluent or those who come from a noble bloodline, enjoy special status compared to others
(Al-Saggaf et al., 2002). This, in turn, may limit the challenging of their views by others
during discussions. In some of the Arab societies, for example, older members of the community
usually dominate face-to-face discussions, and younger individuals are seldom given the same
opportunity to express themselves. They are reminded from childhood not to talk in the presence
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
of elders. This, in turn, sees them grow up lacking confidence in their own thinking and being
more trusting of the ideas of older people. Online anonymity allows individuals to express their
views freely without regard for other people’s status. However, this only holds for as long as
individuals’real-life identities are not known; once individuals know each other offline, the status
of those who enjoy it offline follows them online (Al-Saggaf et al., 2002).
Limited studies have investigated grievances from a specific context, that is, one with set
parameters and criteria. This gap in the literature is the premise for the study. In addition to
utilising qualitative research approaches, such as interviewing (Edwards and Gribbon, 2013;
Torok, 2013) to study online radicalisation, several other strategies have been employed to
study this phenomenon, including social network analysis and sentiment analysis (Bermingham
et al., 2009), investigative data mining (Wadhwa and Bhatia, 2013), exploratory data mining
approach (Rowe and Saif, 2016) machine learning (Scanlon and Gerber, 2014) and time series
analysis (Scanlon and Gerber, 2015). Bradbury et al. (2017), for example, used a complex
systems approach, specifically the RPAS multi-faceted technique, to separate the authors of
anonymous “radical”social media posts from non-radical posts and found the technique can
differentiate these posts with precision. The RPAS, which is based on the indicators richness
(R), personal pronouns (P), referential activity power (A) and sensory (S), is a text analysis
technique for creating a stylistic signature of person based on his/her writing (Bradbury et al.,
2017). The RPAS, which draws on a writer’s personality, is often used in combination with
multiple regression analysis and cross-validation for the purpose of separating people from their
writing. This study used machine learning together with data mining and the social network
analyses to identify grievances expressed in Twitter in an Arabian context. The following
sections offer explanation of the methodology and data collection applied to the study and the
subsequent findings.
Methodology and data collection
Stage 1
To understand the grievances, a crawl of Twitter using the Twitter Archiving Google Sheet (TAGs)
App (https://tags.hawksey.info/get-tags/) was conducted with the help of an Arabic slang word
(not a hashtag) X. This suggests that any tweet by any Twitter user that contained the
searched keyword could have been retrieved by the TAGs App. The TAGs App works
by automatically retrieving results from a Twitter search and storing them into a Google Sheet.
For more information on how the TAGs App was used to extract tweets from Twitter see
Al-Saggaf and Chutikulrungsee (2015). This first stage of data collection, i.e. first crawl of Twitter,
was performed once by selecting the Run Now function. To enable the TAGs App to retrieve
data on behalf of the researcher, the researcher developed a Twitter App (https://developer.
twitter.com/) that programmatically authorised the TAGs App to search Twitter on behalf of
the researcher.
The data collection began on 22 July 2018. A total of 3,652 tweets containing the searched
keyword were retrieved and stored into a Google Sheet. The 3,652 tweets are not the only tweets
containing the searched keyword that ever posted to Twitter; rather they are those posted to
Twitter in the last seven days between 22 July 2018 and 2 August 2018. The Run Now function
returns the tweets posted in the last seven days. It should also be noted that Twitter places a limit
on the number of application programme interface (API) requests that a certain user can make at
any hour, thus restricting the number of tweets that can be retrieved by the API during that hour.
As it was not possible to retrieve all tweets related to the searched keyword, the 3,652 tweets
may not be the only tweets posted to Twitter that are related to the searched keyword. There may
be other tweets posted during the period of 22 July 2018–2 August 2018 that could not be
retrieved due to the Twitter API restrictions.
Stage 2
From the data set obtained in Stage 1, the top 10 tweeters were identified and their tweets were
manually studied to verify they are “aggrieved”. The verification involved looking for evidence of
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
heightened emotive words such as “traitor”,“treacherous”,“mercenary”,“worthless”,“unjust”,
“animal”,“garbage bin”,“scum”,“filthy”,“drug lords”“insect”,“cancer”as well as expressions
such as “death to the rulers of country X”,“rulers of country X are dogs”,“rulers of country X will
be crushed”,“rulers of country X are thieves”,“may you rot in hell rulers of country X”and “damn
you rulers of country X”. These emotive words and expressions were informed by Al-Saggaf’s
(2016) study. Next, the tweets posted by the usernames of these ten prominent tweeters were
collected every hour from Twitter, also using the TAGs App, over a period of more than two
months (from 22 July 2018–26 September 2018). This was achieved by configuring the TAGs
App to collect the tweets posted by these usernames every hour during that period. A total of
23,363 tweets were extracted and stored in a data set. The reason the data collection stopped
on 26 September 2018 was because the TAGs App retrieves a maximum of 30,000 records as
per its default configuration.
Stage 3 and Twitter user characteristics
In addition to the tweets, the TAGS App also returned the unique tweet ID, the Twitter user-name,
the time the tweet was posted, the sender’s language, the sender’s unique user ID, the source of
the tweet, the sender’s profile image URL, the sender’s number of followers, the sender’s
number of friends, the sender’s status URL, the hashtags included in the tweet and the
“in_reply_to_screen_name”. To understand the characteristics of the tracked users, such as the
percentage of retweets to original tweets, the data set was queried using SQLite (2019) (https://
sqlite.org), “a self-contained, serverless, zero-configuration, transactional SQL database engine”.
The characteristics of the tracked users are listed in Table I.
Analysis of the data set revealed the following:
■Users’language: six out of ten users indicated that their language was Arabic with four users
indicating another language; three users selected English as their language while the fourth
user selected French as the language of tweeting.
■Geographic location of the users: whilst four users chose not to disclose their location, possibly
for security considerations, the remaining six users listed the country in which they reside.
The user who contributed 15,119 tweets (User 5), which accounted for almost 65 per cent of
the total number of tweets in the data set, listed the location as Qatar. In total, 56 per cent of
this user’s tweets were retweets, suggesting only 6,653 tweets by this user were original
tweets. Also, 54 per cent of User 7 tweets were retweets. Interestingly, this user also listed
Qatar as the country of residence. It should also be noted that while Users 3, 7, 8 and
10 were found to be among the top tweeters at the time of the data collection (for Stage 1), in
the second stage, they were found to have posted the least number of tweets (38, 28, 26 and
82 tweets, respectively). There are a plethora of potential explanations including, their
accounts were suspended, or they were arrested or just stopped tweeting altogether.
■Users’number of followers: as presented in Table I, with the exception of User 5, Users 2, 4 and
10 had slightly more followers than the remaining six users. With the exception of two users who
Table I The characteristics of the tracked users
User Language Followers Friends Location Tweets Percentage of RT
User 1 Arabic 46 232 1,772 35
User 2 Arabic 335 1,284 North Yemen 764 32
User 3 Arabic 89 374 Iraq 38 –
User 4 French 241 89 Algérie 1,639 28
User 5 Arabic 8,914 5,786 Qatar 15,119 56
User 6 English 32 138 211 14
User 7 English 40 279 Qatar 28 54
User 8 Arabic 38 63 –26 –
User 9 English 49 274 –352 13
User 10 Arabic 155 164 Saudi Arabia 82 30
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
had less than 100 followees, all other users enjoyed a considerable following with User 5 having
5,786 followees, suggesting this user who contributed nearly 65 per cent of the total number of
tweets in the data set could not have been a Twitter bot. A Twitter bot is a computer programme
that manages a user’s Twitter account through the Twitter API. The Twitter bot can carry out a
number of tasks autonomously including tweeting, retweeting, liking, following, unfollowing and
sending direct messages to other accounts within the user’s network. While Twitter users have
a choice with regards whom to follow, they have no choice regarding who follows them. It would
appear from the Table I figures that whilst the users varied in the number of tweets, number of
followers and number of followees, they all engaged in retweeting other users’tweets.
The pattern of tweeting. The TAGS App also generates a count of the tweets over the data
collection period. Shedding light into the tweet volume overtime offered insight into the pattern of
tweeting that the tracked users followed. As indicated in Figure 1, tweeting peaked, reaching in
certain days more than 650 tweets per day, when the media reported on a politically significant
event (e.g. the death of soldiers from the coalition of Arab countries, the expelling of the Canadian
ambassador, and the bombing of a bus full of school children). In association, the volume of
tweets dropped to under 150 tweets a day when the media moved to a less important story.
Data analysis
To explore the interaction between the expression of grievances and the users’level of
interactivity (the presence of “@”replies in their tweets) and also their network sizes (their number
of followers and number of followees), two data mining techniques, specifically a clustering
algorithm (Al-Saggaf and Islam, 2013) using GenClustPlusPlus (Rahman and Islam, 2018) and a
decision forest algorithm (Al-Saggaf and Islam, 2015) using SysFor (Islam and Giggins, 2011)
were applied to the data set. Data mining techniques have the capacity to discover covert, useful
and interesting patterns (i.e. logic rules) from a given data set without requiring any domain
knowledge (Al-Saggaf and Islam, 2015).
The clustering algorithm clusters the records within the data set such that similar records are grouped
together in one cluster and dissimilar records are grouped together in different clusters (Al-Saggaf and
Islam, 2013). A decision forest generates a set of decision trees containing logic rules that can reveal
patterns within a data set (Islam and Giggins, 2011). A decision tree is a flowchart like structure that
discovers a set of logic rules, where each record of a training data set (i.e. the data set which is used
to build the decision tree) falls in one and only one leaf (logic rule) of a tree (Al-Saggaf and Islam, 2015).
SysFor and GenClustPlusPlus techniques were implemented in WEKA (www.cs.waikato.ac.nz/
ml/weka/). SysFor does not support tree visualisation. To display the generated trees,
Figure 1 The pattern of tweeting
Tweet volume over time (max 60 days)
700
600
500
400
300
200
Count
23 July 2018
25 July 2018
27 July 2018
29 July 2018
31 July 2018
2 August 2018
4 August 2018
6 August 2018
8 August 2018
10 August 2018
12 August 2018
14 August 2018
16 August 2018
18 August 2018
20 August 2018
22 August 2018
24 August 2018
26 August 2018
28 August 2018
30 August 2018
1 September 2018
3 September 2018
5 September 2018
7 September 2018
9 September 2018
11 September 2018
13 September 2018
15 September 2018
17 September 2018
100
0
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
prefuseTree, which is a WEKA plugin for visualising trees, was used. To prepare the data set for
the data mining analysis, a class attribute, a column in the data set, with two class values,
“A”(assigned to the tracked users) and “U”(assigned to the untracked users), was created.
The following attributes were used in the data mining analysis: the class attribute with records
labelled as either A (tracked) or U (untracked), “in_reply_to_user_id_str”with records labelled as
either “Y”(yes, @ reply was present) or “N”(no, @ reply was not present), “user_followers_count”,
whose values ranged from zero to 9,999,999, “user_friends_count”, whose values ranged from
zero to 9,999,999, and “user_lang”whose values included languages such as, “ar”,“fr”,“en”,“tr”,
“ru”, etc. In the case of SysFor, since rules with high support and confidence are considered to
be valid (Al-Saggaf and Islam, 2015), in this study, only rules having high support andconfidence are
reported. The validity of a rule is measured by two indicators called support and confidence.
Support of a rule represents the ratio of the number of records following the antecedent of the rule to
the total number of records in the data set. Confidence of a rule is the ratio of the number of records
in a leaf in the dominant class to the total number of records in the leaf (Al-Saggaf and Islam, 2015).
Social network analysis
To understand the characteristics of the network of the tracked users, the network of these users,
which shows the directed ties from one user to the other, was graphed using Gephi. Other network
characteristics such as the tracked user’s network density, reciprocity, clustering walktrap and
transitivity were calculated using both Gephi and the igraph package for R. To understand the
tracked users’level of interactivity, i.e. the interaction between the tracked users and their network,
which can be found by calculating the percentage of “@”replies in their tweets, the data set was
queried using SQLite. The level of interactivity for the tracked users is listed in Table III.
Machine learning
To identify the main themes within the tweets, a machine learning algorithm, Topic Models using
Latent Dirichlet Allocation (LDA), was applied to the data set. The LDA is a Bayesian mixture
model for discrete data where topics are assumed to be uncorrelated (Grun and Hornik, 2011).
The study used an implementation of Topic Models for R and the algorithm was applied in
accordance with Graham and Ackland’s (2015) tutorial. Topic Models is a machine learning
algorithm that generates themes from a collection of documents. Each topic comprises a cluster
of words that often appear together. Topic Models takes into consideration the context such that
it associates words that have a similar meaning and can differentiate between the usage of words
with multiple meanings (Howes et al., 2013). With the “tm”and Topic Modelling packages loaded,
all the tweets by the ten users were read in R to find out the main topics (themes) that can
represent these tweets. Running the Topic Modelling script in R on the data set produced
12 topics. Unlike in Al-Saggaf (2016) where the Topic Modelling analysis did not reveal definite
topics, rather a consistent pattern throughout the topics, the analysis conducted for this study
revealed instantly recognisable themes. A reliable explanation for this is that in this case study, the
tweets were first transliterated and the Topic Models algorithm was applied on the transliterated
version of the data. After the analysis was performed, the data were reverse-transliterated.
Results
Understanding the characteristics of the Twitter users using data mining
The clustering analysis (see Table II) showed the tweets can be clustered in four clusters.
Cluster 1 includes a large group of tweets by the tracked users with a large number of followers
and friends but who did not mention other users in their tweets. Cluster 2 includes a large group
of tweets by the tracked users with a large number of followers and friends but who did mention
other users in their tweets. Cluster 3 includes a small group of tweets by the non-tracked users
with a small number of followers and friends who did not mention other users in their tweets.
Cluster 4 includes a small group of tweets by the non-tracked users with a small number of
followers and friends but who did mention other users in their tweets. The non-tracked users who
mentioned other users in their tweets and had a very small number of followers and friends
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
(https://help.twitter.com/en/using-twitter/following-faqs) suggests that they may be government
agents. Their small network size suggests they are not established Twitter users; they are on
Twitter to engage with the tracked users.
The decision forests analysis (see Figure 2) highlighted a number of interesting rules. For example,
when user language was Arabic, the number of friends was between 5,619 and 5,903, 15,086 tweets
(64.6 per cent of the total number of tweets) were tweeted by either a tracked user or a non-tracked
user. Out of the 15,086 tweets, 99.99 per cent (i.e. 15,085 tweets) involved tweets by tracked users.
This trend in the data is consistent with the result of the cluster analysis in that tracked users enjoyed a
good following (see the number of friends for both clusters one and two). In contrast, when the
number of friends was less than 5,619 and the number of followers was greater than 443, 1,051
tweets were tweeted by non-tracked users suggesting the non-tracked are followed significantly less.
This, again, could be due to their mission in Twitter, i.e. to mainly interact with the tracked users.
Users’level of interactivity
Twitter differentiates between replies and mentions. Tweets starting with “@”symbol are
“at replies”. Tweets that do not begin with the “@”symbol but the tweet includes it, are mentions
(Al-Saggaf et al., 2016). Since replies are direct messages to users and as Ackland (2013) noted,
they are a better indication of the existence of social ties between two users than, for example,
their number of followers and friends, the “in_reply_to_user_id_str”column was labelled as either
“Y”(yes, @ reply is not null) or “N”(no, @ reply is null). SQLite was used to query the data set for
information about users’level of interactivity. As can be observed from Table III, all ten users
interacted with their network, that is, included “@ replies”in their tweets; with Users 3 and
8 interacting at 100 and 96 per cent, respectively.
This highlights a significant level of interactivity and indicates these users interacted with other users
more using directed messages. Interestingly, the non-tracked users interacted more in their tweets
Table II The clustering analysis
Attribute Full data (23,363) Cluster no. 1 (10,971) Cluster no. 2 (9,060) Cluster no. 3 (1,261) Cluster no. 4 (2,071)
@ replies present No No Yes No Yes
Number of followers 9,202 9,241 9,227 476 108
Number of friends 5,806 5,811 5,807 353 215
Class Tracked users Tracked users Tracked users Untracked users Untracked users
Figure 2 The decision forests analysis
user_friends_count
user_friends_count user_friends_count
>5,619 user_lang
user_followers_count
=ar
=fr
=en
=tr
=ru
=sv
=de
=es
=fa
=he
=da
=en-GB
=nl
=en-gb
=pt
A (8.14/1.0)
A (15,093.76/8.76)
U (8.84)
A (0.0)
A (0.0)
A (0.0)
A (0.0)
U (1.0)
A (0.0)
A (0.0)
A (0.0)
A (0.0)
A (0.0)
A (0.0)
U (1.0)
5,619
user_friends_count > 5,619 user_lang
user_followers_count
5,619
>5,903
5,903
user_followers_count
U (1,051.42)
> 443
443
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
(65 per cent) compared to the tracked users 45 per cent (see Table IV). This further confirms that
the non-tracked users’main mission in Twitter is to communicate with the tracked users.
Network graph
The network of the tracked users represents the direct messages between users. Five areas in
the network were significantly crowded with one area (almost full circle) extremely crowded. This
shows there were five influential users. It is clear from Table I (see the tracked users’number of
tweets) that the most influential user was User 5 (15,119 tweets), followed by User 1 (1,772
tweets), then User 4 (1,639 tweets) followed by User 2 (764 tweets) and User 9 (352 tweets). The
high level of interaction among the users is also evident in the network graph.
Network density
Network density relates to the number of network ties as a proportion of the maximum possible
number of network ties. That is, while it is very likely that people attending a family reunion will
know each other, it is very unlikely that a group of unrelated people travelling in a plane will
know each other. Network density measures the degree of the nodes cohesion. A network
density of one is a very cohesive network in which every member knows the other. On the other
hand, a network density of zero suggests the network is not cohesive (Al-Saggaf, 2016). Other
network characteristics include reciprocity, which is the number of dyads with reciprocated
(mutual) edges divided by the number of dyads with a single edge; clustering walktrap (group/
mod), which is the community identification (walktrap algorithm) with group referring to the
number of communities; and, transitivity, which measures the extent that a friend of your friend
is also your friend.
As indicated in Table V, the network density is closer to 0 than to 1. This indicates that the
network is not cohesive. Thus, while, on the one hand, the interactivity overall (the percentage of
@ replies in tweets) was 47.64 per cent (see Table IV ), which showed that a higher level of users
were engaged in directed messages with other users, the networks density was closer to 0 than
to 1, suggesting the heavy interaction is not among users known to each other. This is also
evidenced by almost zero reciprocity and a very negligible value for the transitivity. The fact that
the number of communities identified (577) is large suggests this is not about a small number of
users who are communicating with their strong ties.
Table III The tracked users’level of interactivity
User Interaction: percentage of @ replies
User 1 59.4
User 2 33.2
User 3 100
User 4 66
User 5 40
User 6 85.7
User 7 46.4
User 8 96
User 9 80.9
User 10 46.6
Table IV Overall users’level of interactivity
Type Total_number_of_records Total_number_of_@ replies Percentage of @ replies (%)
Overall 23,363 11,131 47.64
Tracked users 20,031 9,060 45.2
Non-tracked users 3,332 2,071 62.15
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
Machine learning (topic models)
A total of 12 themes were revealed from the Topic Models analysis (see Figure 3 and Table VI).
The topics revolved around major regional events that the media reported, such as:
■the arrest of an economist (Burjes Al-Burges) for allegedly criticising the government;
■the coalition of Arab countries ongoing war with the Houthi rebels, who in 2014 overthrew the
government and took control of most of the northern part of Yemen, and the threats
they posed on the coalition of Arab countries (e.g. the ballistic missiles attacks that targeted
Saudi Arabia);
Figure 3 The result of the Topic Models analysis
Highest word probabilities for each topic
Different words are associated with different topics
Topic 1
Topic 2
Topic 6
0.00 0.01 0.02 0.03 0.00 0.01 0.02 0.03 0.00
0.01 0.02 0.03 0.00 0.01 0.02 0.03
Topic 10
Topic 3
Topic 7
Topic 11
Topic 4
Topic 8
Topic 12
Topic 5
Topic 9
azmh
3am
Grd
sfyr
bKyr
abda
anfshm
azm0
3ajl
Sal7
aTrdwa
hhhhhh
3zmy
kml
j
dajdajda
a3Zm
7al0
mst7yl
bSra7h
yQal
3bas
Tyb
S7y7
KlaS
alibab
3fwa
aKwan
Sba7
Wnw
yallh
njd
SdQt
almhm
Kwnh
7ram
klam
3yd
a7sn
3sA
sayks
bykw
amyn
yarb
al3almyn
sb7an
hhhhhhhh
a3tQal
SdQty
brjs
albrjs
bay
alSla0
als3wdyh
alkndy
alsfyr
tTrd
Ganm
aldwsry
al7myr
SdQ
walil
Swrh
al7mdllh
Wkra
anWhd
f3la
hhhhhhhhhhhh
alWbwk
Wklh
nZam
walWkr
fySI
akyd
Kald
alrd
hna
alKyr
aKty
hhhhhhhhhhh
balymn
ahm
jhnm
kfw
almjd
alTyb
allyr0
a7lam
mjzr0
3yb
ymkn
hhhhhhhhhh
alKyr
n3m
alb7ryn
zyn
balDbT
mmkn
t7taj
klam
Wahd
alSwr0
Tb3a
hhhhhhhhh
a3rf
yla
al3yd
taryK
almWklh
jyb
almanya
SdQt
alZahr
mylad
alGrby
balDbT
alkhrba
7md
alwTn
wnt
amn
alSwrh
al7mar
3lm
jdyd0
s3wdyh
altGrydh
Table VI The topics
Topic Details
1 Crisis with Canada
2 Keep going (laughing at the thought anyone who defends Qatar is from Azmy’s cell +dismiss calmly and coldly)
3 Arrest of an economist +“prayer against”
4 Houthi threats (ballistic missiles and unmanned planes) on petroleum facilities, oil tankers and airports
5 Hajj Eid (festivity) –conflict with Qatar still on
6 Electronic fly
7 Expelling the Canadian ambassador
8 Coalition soldiers’bodies +Yemeni children bus massacre
9 Report of attack on an airport
10 Traitors of Sykes–Picot
11 Fences
12 Bahrain close alliance
Table V Network density and other characteristics
Density Vertices Edges Reciprocity Clustering walktrap (group/mod) Transitivity (clustering coefficient)
0.001 4,523 11,132 0.00198584 577/0.607 0.007
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
■the feud between Saudi Arabia and Canada, which erupted when the Canadian ambassador
tweeted support for a Saudi woman’s’rights activist; and
■the revelations regarding the “deal of the century”(http://studies.aljazeera.net/en/reports/20
18/11/181106114236864.html) between Palestine and Israel as solution to end the conflict
between the two peoples.
A qualitative observation of the tweets revealed that all of the 12 topics reflected one major
grievance: the coalition of Arab countries blockade of Qatar. The grievance can be seen in
the extreme anger (emotive words in the tweets) directed towards the coalition of Arab
countries. The anger is expressed in the form of shaming, spitefulness, gloating, malice,
mocking, insulting, belittling, derogatory naming, cursing and “praying against”. The qualitative
observation of the tweets showed also that the conversations between the Twitter users
often turned to fierce fights in which all kinds of abusive and profane swear words were
used. This use of language and the extreme anger expressed in the tweets offers
another piece of evidence of the presence of the grievance experienced by the studied
Twitter users.
Discussion and conclusion
The clustering data mining analysis suggested that the non-tracked users who mentioned other
users in their tweets but who had a very small number of followers and friends may be
government agents with the main task in Twitter to interact with the tracked users. The decision
forests data mining analysis confirmed this trend in the data when it revealed that the non-tracked
are followed significantly less compared to the tracked users.
The social network analysis revealed that while both the tracked and non-tracked users
interacted with other users more using directed messages, the non-tracked users interacted
significantly higher than the tracked users offering another piece of evidence for the conclusion
that the main task of the non-tracked users is to interact with the tracked users. Of note,
the social network analysis revealed that while both the tracked and non-tracked users
interacted heavily with other users in directed messages, the network density overall was
almost 0, indicating the interacting users are not close to each other. The values for the
other network characteristics were also insignificant, confirming the result relating to the
network density.
The machine learning algorithm revealed 12 topics that characterised the analysed conversation.
While the topics revolved around major regional events that the media reported, all the topics
reflected one major grievance, namely, the coalition of Arab countries blockade of Qatar. The
grievance could be captured from the use of strong angry language that the aggrieved users used
against the coalition of Arab countries.
Practitioners, such as law enforcement authorities, should try to engage with the aggrieved
users online, and if possible try to address their grievances, so as not to leave the door open for
radicals to use those grievances as a justification to radicalise them. The widespread adoption
of social media sites has completely changed the environment in which radicals operate.
Social media sites like YouTube, Facebook, Twitter, Instagram and Snapchat have allowed
them to create their own channels to spread their messages and advance their agenda.
Addressing grievances online should be part of practitioners’radicalisation prevention and
intervention strategies.
Together the machine learning, the data mining and the social network analyses offered
insights into the grievances in an Arabian context and the characteristics of those who
expressed these grievances. In the event that only one technique had been used, it would not
have been possible to cross check the observations made using one technique against the
observations made using another technique. This technique triangulation (Al-Saggaf and
Williamson, 2006) proved to be effective in obtaining deeper insights about the phenomenon
based on data sources supporting one another. Future studies that involve collecting and
analysing social media data could benefit from using machine learning, data mining and
social network analyses together. Technique triangulation, which is common in qualitative
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
research, could be one approach to establish the credibility of findings from studies
involving big data.
Individuals can become radicalised as a result of a personal grievance (Borum, 2011). This
study sheds light on what radical groups could invoke to win the sympathy of aggrieved
people in an Arabian context. Police and intelligence agencies are monitoring social media
platforms where radicalised individuals try to recruit others and are conscious of the
need to counter the extremist narrative on such platforms. As Upal (2015) notes, the efforts so
far have not been informed by a thorough understanding of the potential sympathisers’
identities, cultures and grievances. This study contributes to answering the calls for
research in this domain by utilising a framework for gathering and analysing grievance trends in
Twitter feeds.
References
Ackland, R. (2013), Web Social Science: Concepts, Data and Tools for Social Scientists in The Digital Age,
Sage, London.
Al-Saggaf, Y. (2016), “Understanding online radicalisation using data science”,International Journal of Cyber
Warfare and Terrorism, Vol. 6 No. 4, pp. 12-27.
Al-Saggaf, Y. and Chutikulrungsee, T.T. (2015), “Twitter usage in Australia and Saudi Arabia and
influence of culture: an exploratory cross-country comparison”, in Paterno, D., Bourk, M. and Matheson,
D. (Eds), Refereed proceedings of the Australian and New Zealand Communication Association
Conference: Rethinking Communication, Space and Identity, The Australian and New Zealand
Communication Association (ANZCA), Thirroul, pp. 1-12, available at: www.anzca.net/conferences/
past-conferences/
Al-Saggaf, Y. and Islam, Z. (2013), “A malicious use of a clustering algorithm to threaten the privacy of
a social networking site user”,World Journal of Computer Application and Technology, Vol. 1 No. 2,
pp. 29-34.
Al-Saggaf, Y. and Islam, Z. (2015), “Data mining and privacy of social network sites’users: implications of the
data mining problem”,Science and Engineering Ethics, Vol. 21 No. 4, pp. 941-66.
Al-Saggaf, Y. and Williamson, K. (2006), “Doing ethnography from within a constructivist paradigm to explore
virtual communities in Saudi Arabia”,Qualitative Sociology Review, Vol. 2 No. 2, pp. 5-20.
Al-Saggaf, Y., Utz, S. and Lin, R. (2016), “Venting negative emotions on Twitter and the number of followers and
followees”,International Journal of Sociotechnology and Knowledge Development, Vol. 8 No. 1, pp. 45-56.
Al-Saggaf, Y., Williamson, K. and Weckert, J. (2002), “Online communities in Saudi Arabia: an ethnographic
study”,ACIS 2002 Proceedings, p. 62.
Aly, A. (2009), “Online radicalisation and the Muslim diaspora”,Proceedings of the Strategic Policy Forum,
Australian Strategic Policy Institute,Perth,5 May, pp. 7-8.
Baaken, T. and Schlegel, L. (2017), “Fishermen or swarm dynamics? Should we understand jihadist online-
radicalization as a top-down or bottom-up process?”,Journal for Deradicalization, Vol. 2017 No. 13, pp. 178-212.
Bermingham, A., Conway, M., McInerney, L., O’Hare, N. and Smeaton, A.F. (2009), “Combining social
network analysis and sentiment analysis to explore the potential for online radicalisation: social network
analysis and mining”,IEEE International Conference on Advances in Social Network Analysis and Mining
(ASONAM’09),July, pp. 231-6.
Borum, R. (2011), “Radicalization into violent extremism I: a review of social science theories”,Journal of
Strategic Security, Vol. 4 No. 4, pp. 7-36, available at: http://doi.org/10.5038/1944-0472.4.4.1
Bradbury, R., Bossomaier, T. and Kernot, D. (2017), “Predicting the emergence of self-radicalisation through
social media: a complex systems approach”, in Conway, M., Jarvis, L., Lehane, O., Macdonald, S. and
Nouri, L. (Eds), Terrorists’Use of the Internet: Assessment and Response, IOS Press, Amsterdam,
pp. 379-89.
Del Vicario, M., Bessi, A., Zollo, F., Petroni, F., Scala, A., Caldarelli, G., Stanley, H.E. and Quattrociocchi, W.
(2016), The Spreading of Misinformation Online, PNAS Early Edition, National Academy of Sciences,
Washington, DC, pp. 1-6.
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
Edwards, C. and Gribbon, L. (2013), “Pathways to violent extremism in the digital era”,The RUSI Journal,
Vol. 158 No. 5, pp. 40-7.
Ewart,J.,Cherney,A.andMurphy,K.(2017),“News media coverage of Islam and Muslims in Australia:
an opinion survey among Australian Muslims”,Journal of Muslim Minority Affairs,Vol.37No.2,
pp. 147-63.
Graham, T. and Ackland, R. (2015), “Topic modeling of tweets in R: a tutorial and methodology”, available at:
www.academia.edu/19255535/Topic_Modeling_of_Tweets_in_R_A_Tutorial_and_Methodology (accessed
22 May 2019).
Grun, B. and Hornik, K. (2011), “Topicmodels: an R package for fitting topic models”,Journal of Statistical
Software, Vol. 40 No. 13, pp. 1-30.
Hofstede, G. (1997), Cultures and Organisations: Software of the Mind, McGraw-Hill, New York, NY.
Howes, C., Purver, M. and McCabe, R. (2013), “Investigating topic modelling for therapy dialogue analysis”,
Proceedings of IWCS Workshop on Computational Semantics in Clinical Text (CSCT), pp. 7-16.
Islam, M.Z. and Giggins, H. (2011), “Knowledge discovery through SysFor: a systematically developed forest
of multiple decision trees”,inVamplew,P.,Stranieri,A.,Ong,K.-L.,Christen,P.andKennedy,P.J.(Eds),
Proceedings of the Ninth Australasian Data Mining Conference (AusDM 11) CRPIT,ACS,Ballarat,1–2 December,
pp. 205-10.
Maikovich, A.K. (2005), “A new understanding of terrorism using cognitive dissonance principles”,Journal for
the Theory of Social Behaviour, Vol. 35 No. 4, pp. 373-97.
Moghaddam, F. (2005), “The staircase to terrorism: a psychological exploration”,American Psychologist,
Vol. 60 No. 2, pp. 161-9.
Rahman, M.A. and Islam, M.Z. (2018), “Application of a density based clustering technique on biomedical
datasets”,Applied Soft Computing, Vol. 73 No. 2018, pp. 623-34.
Rowe, M. and Saif, H. (2016), “Mining pro-ISIS radicalisation signals from social media users”,Proceedings of
the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016), pp. 329-38.
Sabouni, S., Cullen, A. and Armitage, L. (2017), “A preliminary radicalisation framework based on social
engineering techniques”,IEEE 2017 International Conference on Cyber Situational Awareness, Data Analytics
and Assessment (Cyber SA),June, pp. 1-5.
Scanlon, J.R. and Gerber, M.S. (2014), “Automatic detection of cyber-recruitment by violent extremists”,
Security Informatics, Vol. 3 No. 1, pp. 1-10.
Scanlon, J.R. and Gerber, M.S. (2015), “Forecasting violent extremist cyber recruitment”,IEEE Transactions
on Information Forensics and Security, Vol. 10 No. 11, pp. 2461-70.
Schmidt, A. (2016), “Research on radicalisation: topics and themes”,Perspectives on Terrorism,Vol.10No.3,
pp. 26-32.
Spencer, A. (2006), “Questioning the concept of ‘new terrorism’”,Peace Conflict & Development, Vol. 2006
No. 8, pp. 1-33.
SQLite (2019), “About”, available at: https://sqlite.org/about.html (accessed 22 May 2019).
Stephens, W., Sieckelinck, S. and Boutellier, H. (2019), “Preventing violent extremism: a review of the
literature, studies in conflict & terrorism”,Studies in Conflict & Terrorism, pp. 1-16, doi: 10.1080/1057610X.
2018.1543144.
Torok, R. (2013), “Developing an explanatory model for the process of online radicalisation and terrorism”,
Security Informatics, Vol. 2 No. 1, pp. 1-10.
Torok, R. (2016), “Discourses of terrorism: the role of Internet technologies (social media and online
propaganda) on Islamic radicalisation, extremism and recruitment post 9/11”, unpublished PhD thesis, Edith
Cowan University.
Ungar, M., Hadfield, K., Amarasingam, A., Morgan, S. and Grossman, M. (2018), “The association between
discrimination and violence among Somali Canadian youth”,Journal of Ethnic and Migration Studies, Vol. 44
No. 13, pp. 2273-85.
Upal, M.A. (2015), “Confronting Islamic Jihadist movements”,Journal of Terrorism Research, Vol. 6 No. 2,
pp. 57-69.
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)
Wadhwa, P. and Bhatia, M.P.S. (2013), “Tracking on-line radicalization using investigative data mining”,
IEEE 2013 National Conference on Communication (NCC),February, pp. 1-5.
Yusoufzai, K. and Emmerling, F. (2017), “Explaining violent radicalization in Western Muslims: a four factor
model”,Journal of Terrorism Research, Vol. 8 No. 1, pp. 68-80.
Further reading
Veldhuis, T. and Staun, J. (2009), Islamic Radicalisation: A Root Cause Model, Netherlands Institute of
International Relations Clingendael, The Hague, ISBN/EAN: 978-90-5031-146-5.
Corresponding author
Yeslam Al-Saggaf can be contacted at: yalsaggaf@csu.edu.au
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
JOURNAL OF CRIMINOLOGICAL RESEARCH, POLICY AND PRACTICE
Downloaded by eFADA of Ankabut UAE At 23:13 14 June 2019 (PT)