ChapterPDF Available

Searching for Extremist Content Online Using The Dark Crawler and Sentiment Analysis

Authors:

Abstract

Purpose – This chapter examines how sentiment analysis and web-crawling technology can be used to conduct large-scale data analyses of extremist content online. Methods/approach – The authors describe a customized web-crawler that was developed for the purpose of collecting, classifying, and interpreting extremist content online and on a large scale, followed by an overview of a relatively novel machine learning tool, sentiment analysis, which has sparked the interest of some researchers in the field of terrorism and extremism studies. The authors conclude with a discussion of what they believe is the future applicability of sentiment analysis within the online political violence research domain. Findings – In order to gain a broader understanding of online extremism, or to improve the means by which researchers and practitioners “search for a needle in a haystack,” the authors recommend that social scientists continue to collaborate with computer scientists, combining sentiment analysis software with other classification tools and research methods, as well as validate sentiment analysis programs and adapt sentiment analysis software to new and evolving radical online spaces.
179
CHAPTER 11
SEARCHING FOR EXTREMIST
CONTENT ONLINE USING THE
DARK CRAWLER AND SENTIMENT
ANALYSIS
Ryan Scrivens, Tiana Gaudette, Garth Davies and
Richard Frank
ABSTRACT
Purpose – This chapter examines how sentiment analysis and web-crawling
technology can be used to conduct large-scale data analyses of extremist
content online.
Methods/approach – The authors describe a customized web-crawler that was
developed for the purpose of collecting, classifying, and interpreting extremist
content online and on a large scale, followed by an overview of a relatively novel
machine learning tool, sentiment analysis, which has sparked the interest of
some researchers in the eld of terrorism and extremism studies. The authors
conclude with a discussion of what they believe is the future applicability of
sentiment analysis within the online political violence research domain.
Findings – In order to gain a broader understanding of online extremism,
or to improve the means by which researchers and practitioners “search for a
needle in a haystack,” the authors recommend that social scientists continue
to collaborate with computer scientists, combining sentiment analysis soft-
ware with other classication tools and research methods, as well as validate
sentiment analysis programs and adapt sentiment analysis software to new
and evolving radical online spaces.
Methods of Criminology and Criminal Justice Research
Sociology of Crime, Law and Deviance, Volume 24, 179–194
Copyright © 2019 by Emerald Publishing Limited
All rights of reproduction in any form reserved
ISSN: 1521-6136/doi:10.1108/S1521-613620190000024016
180 RYAN SCRIVENS ET AL.
Originality/value – This chapter provides researchers and practitioners who are
faced with new challenges in detecting extremist content online with insights
regarding the applicability of a specic set of machine learning techniques and
research methods to conduct large-scale data analyses in the eld of terrorism
and extremism studies.
Keywords: Sentiment analysis; web-crawler; machine learning;
terrorism; extremism; internet
INTRODUCTION
Violent extremists and those who subscribe to radical beliefs have left their digital
footprints online since the inception of the World Wide Web. Notable examples
include Anders Breivik, the Norwegian far-right terrorist convicted of killing 77
people in 2011, who was a registered member of a white supremacy web forum
(Southern Poverty Law Center, 2014) and had ties to a far-right wing social media
site (Bartlett & Littler, 2011); Dylann Roof, the 21 year old who murdered nine
Black parishioners in Charleston, South Carolina, in 2015, and who allegedly
posted messages on a white power website (Hankes, 2015); and Aaron Driver, the
Canadian suspected of planning a terrorist attack in 2016, who showed explicit
support for the so-called “Islamic State” (IS) on several social media platforms
(Amarasingam, 2016).
It should come as little surprise that, in an increasingly digital world, iden-
tifying signs of extremism online sits at the top of the priority list for counter-
extremist agencies (Cohen, Johansson, Kaati, & Mork, 2014), with the current
focus of government-funded research on the development of advanced informa-
tion technologies and risk assessment tools to identify and counter the threat of
violent extremism on the Internet (Sageman, 2014). Within this context, criminol-
ogists have argued that successfully identifying radical content online (i.e., behav-
iors, patterns, or processes), on a large scale, is the rst step in reacting to it (e.g.,
Bouchard, Joffres, & Frank, 2014; Davies, Bouchard, Wu, Joffres, & Frank, 2015;
Frank, Bouchard, Davies, & Mei, 2015; Mei & Frank, 2015; Williams & Burnap,
2015). Yet in the last 10 years alone, it is estimated that the number of individuals
with access to the Internet has increased threefold (Internet World Stats, 2019),
from over 1 billion users in 2005 to more than 3.8 billion as of 2019 (Internet Live
Stats, 2019). With all of these new users, more information has been generated,
leading to a ood of data.
It is becoming increasingly difcult, nearly impossible really, to manually
search for violent extremists, potentially violent extremists, or even users who
post radical content online because the Internet contains an overwhelming
amount of information. These new conditions have necessitated guided data l-
tering methods, those that can side-step – and perhaps one day even replace – the
laborious manual methods that traditionally have been used to identify relevant
Searching for Extremist Content Online 181
information online (Brynielsson et al., 2013; Cohen et al., 2014). As a result of
this changing landscape, governments around the globe have engaged researchers
to develop advanced information technologies, machine learning algorithms, and
risk assessment tools to identify and counter extremism through the collection
and analysis of large-scale data made available online (see Chen, Mao, Zhang, &
Leung, 2014). Whether this work involves nding radical users of interest (e.g.,
Klausen, Marks, & Zaman, 2018), measuring digital pathways of radicalization
(e.g., Hung, Jayasumana, & Bandara, 2016), or detecting virtual indicators that
may prevent future terrorist attacks (e.g., Johansson, Kaati, & Sahlgren, 2016),
the urgent need to pinpoint extremist content online is one of the most signicant
challenges faced by law enforcement agencies and security ofcials worldwide
(Sageman, 2014).
We have been part of this growing eld of research at the International
CyberCrime Research Centre (ICCRC), situated in Simon Fraser University’s
School of Criminology.1 Our work has ranged from identifying radical users in
online discussion forums (e.g., Scrivens, Davies, & Frank, 2017) to understand-
ing terrorist organizations’ online recruitment efforts on various online platforms
(e.g., Davies et al., 2015), to evaluating linguistic patterns presented in the online
magazines of terrorist groups (e.g., Macnair & Frank, 2018a, 2018b). These expe-
riences have provided us with insights regarding the applicability of a specic
set of machine learning techniques and research methods to conduct large-scale
data analyses of extremist content online.2 In what follows, we will rst describe
a customized web-crawler that was developed at the ICCRC for the purpose of
collecting, classifying, and interpreting extremist content online and on a large
scale. Second, we will provide an overview of a relatively novel machine learning
tool, sentiment analysis, which has sparked the interest of some researchers in
the eld of terrorism and extremism studies who are faced with new challenges in
detecting extremist content online. Third, we conclude with a discussion of what
we believe is the future applicability of sentiment analysis within the online politi-
cal violence research domain.
Before proceeding, however, it is necessary to outline how we conceptualize
extremist content online. We dene it as text-, audio-, and/or video-based online
material containing radical views – counter to mainstream opinion – that may or
may not promote violence in the name of a radical belief. At the ICCRC, we focus
primarily on text-based extremist content that has radical right-wing or jihadi
leanings. For the former, radical right-wing material is characterized by racially,
ethnically and sexually dened nationalism, which is typically framed in terms
of white power and grounded in xenophobic and exclusionary understandings
of the perceived threats posed by such groups as non-whites, Jews, immigrants,
homosexuals, and feminists (see Perry & Scrivens, 2016). For the latter, we dene
jihadi material as supportive of the creation of an expansionist Islamic state or
khalifa, the imposition of sharia law with violent jihad as a central component,
and the use of local, national, and international grievances affecting Muslims (see
Moghadam, 2008).
182 RYAN SCRIVENS ET AL.
EXTRACTING EXTREMIST CONTENT ONLINE:
THE DARK CRAWLER
In recent years, researchers have shown a vested interest in developing web-
crawler tools to collect large volumes of content on the Internet. This interest has
made its way into terrorism and extremism studies in the past 10 plus years (e.g.,
Abbasi & Chen, 2005; Bouchard et al., 2014; Chen, 2012; Fu, Abbasi, & Chen,
2010; Zhang et al., 2010; Zhou, Qin, Lai, Reid, & Chen, 2005). Some research-
ers have used standard, off-the-shelf web-crawler tools that are readily available
online for a fee, while others have developed custom-written computer programs
to collect high volumes of information online. The Dark Crawler (TDC) is one
example of this latter approach.
Web-crawlers, also known as “crawlers,” “data scrapers,” and “data parsers,
are the tools used by all search engines to automatically map and navigate the
Internet as well as collect information about each website and webpage that a
web-crawler visits. There are many off-the-shelf web-crawler solutions available
on the Internet for purchase, such as Win Web Crawler,3 WebSPHINX,4 Black
Widow,5 or BeautifulSoup.6 Once an end-user decides on a website from which
parsing will begin, the crawler recursively follows the links from that webpage
until some user-specied condition is met (discussed below), capturing all content
along the way. During this process, the software tracks all the links between it and
other websites and, if an end-user so chooses, the software will follow and retrieve
those links as well. The content, as it is retrieved, is then saved to the hard-drive of
the user for later analysis. In short, the purpose of most web-crawlers is to simply
save the retrieved content onto a hard-drive, essentially “ripping” a webpage or
website because it contains content desired by the end-user. More advanced ana-
lytic capabilities usually are not part of the package.
Similar in spirit, TDC browses the World Wide Web, but since it is a custom-
written computer program, it is much more exible than the abovementioned
process. In particular, TDC is capable of seeking out extremist content online,
among other types of content, based on user-dened keywords and other
parameters (discussed below). As TDC visits each page, it captures all of the
content on that page for later analysis, while simultaneously collecting informa-
tion about the content and making decisions as to whether the page includes
extremist content. The idea of this approach is based on a combination of the
work associated with the Dark Web project at the University of Arizona (see
Chen, 2012) and a previous project at the ICCRC that identied and explored
online child exploitation websites (e.g., Allsup, Thomas, Monk, Frank, &
Bouchard, 2015; Joffres, Bouchard, Frank, & Westlake, 2011; Frank, Westlake,
& Bouchard, 2010; Monk, Allsup, & Frank, 2015; Westlake & Bouchard, 2015;
Westlake, Bouchard, & Frank, 2011). TDC has since demonstrated its ben-
et in investigating online networks and communities in general (e.g., Frank,
Macdonald, & Monk, 2016; Macdonald & Frank, 2016, 2017; Macdonald,
Frank, Mei, & Monk, 2015; Mikhaylov & Frank, 2016, 2018; Zulkarnine, Frank,
Monk, Mitchell, & Davies, 2016) and extremist content online in particular
Searching for Extremist Content Online 183
(e.g., Bouchard et al., 2014; Davies et al., 2015; Frank et al., 2015; Levey,
Bouchard, Hashimi, Monk, & Frank, 2016; Mei & Frank, 2015; Scrivens et al.,
2017; Scrivens, Davies, & Frank, 2018; Scrivens & Frank, 2016; Wong, Frank,
& Allsup, 2015).
TDC is a system that can be distributed across multiple virtual machines,
depending on the number of machines that are available. The rst of four steps,
as expressed in Fig. 1, is to dene a task along with its parameters. TDC can
handle multiple tasks simultaneously, each of which is given a priority, speci-
ed by the end-user. The priority of each task is determined by the number of
machines allocated to it. For example, if tasks I, II, and III are given priority 50,
80, and 70, respectively, then task I will receive 25% of the available resources
(50/(50+80+70) = 0.25 = 25%). If more machines are added to TDC, then the
absolute number of resources available for each task will increase but the relative
amount of resources available for each task will remain the same.
Each task consists of four parameters to prevent it from perpetually crawling
the Internet and wandering into websites and webpages unrelated to extremism.
The parameters are as follows:
Fig. 1. Overview of The Dark Crawler.
184 RYAN SCRIVENS ET AL.
Number of Webpages
For practical purposes, since the number of webpages on the Internet is innite,
restrictions must be placed on the number of webpages that are retrieved by the
web-crawler. Theoretically, any web-crawler could crawl for a very long time and
store the entire collection of webpages on the Internet. For our purposes at the
ICCRC, however, this is infeasible for several reasons. First, the amount of stor-
age that would be required to warehouse the extracted data is beyond the scope of
any sensible research project. Second, webpages are created at a much higher rate
than what can be extracted with TDC. Lastly, a copy of the “full Internet” is not
required to draw meaningful conclusions about a particular topic under investi-
gation; extracting large scale, representative samples, is more than adequate.
Number of Domains
The number of Internet domains that TDC will collect data on must be speci-
ed. When limiting a crawl to n pages, the crawler will attempt to distribute the
sampling equally across all websites that it has encountered, meaning that TDC
will sample similar numbers of pages from each of the sites it visits. As a result,
at the end of the task, if w websites are sampled, all sites will have approximately
the same number of pages retrieved (=n/w) and analyzed.
Trusted Domains
A set of trusted domains are then specied by the end-user, which tells the crawler
that all contents on those domains should not be extracted. As an example, it can
be assumed, with a high level of certainty, that a website such as www.microsoft.
com does not include extremist material. Having said that, TDC is trained to
assume that the website does not contain extremist content, and as such, it would
not retrieve any pages from that site. Without having this mechanism in place,
TDC could wander into a search engine, directing it completely off topic and
making the resulting extracted network irrelevant to the specied topic and task.
Keywords
The purpose of TDC is to nd, analyze, and map out the websites and web-
pages that include extremist content. To achieve this, TDC recursively retrieves
all webpages that are linked from the webpage it is currently reviewing. However,
since extremist webpages consist of a very small subset of all the webpages on
the Internet (Frank et al., 2015), it would be expected that, unconstrained, TDC
would very quickly start to retrieve webpages that are completely unrelated to
extremism. As a result, some mechanism must be built into TDC that controls
which webpages it uses in its exploration process. This is done through the use
of keywords, which are user-specied words that have been found to be indica-
tive of extremist content (e.g., Bouchard et al., 2014; Davies et al., 2015; Wong et
al., 2015) and thus indicate to TDC that the pages being retrieved are on-topic.
Within the extremist domain, such keywords could include gun, weapon, or terror.
To make this mechanism more robust, though, a word counter can be included
Searching for Extremist Content Online 185
to TDC, which indicates a minimum threshold on the number of keywords that
must exist on a page before the TDC considers it on topic.
Once a task has been decided, each webpage is downloaded by TDC (Fig. 1–
Step 2). If the downloaded page meets the parameters laid out above, then the
page is considered “on topic,” all page content is saved and all page links are fol-
lowed out of it recursively (Step 3). The webpage contents are then stored in the
database, and various reports and analyses can be performed (Step 4).
DATA MINING EXTREMIST CONTENT ONLINE:
SENTIMENT ANALYSIS
The use of keywords presents a useful rst step in identifying large-scale patterns
in extremist content online (e.g., Chalothorn & Ellman, 2012; Bouchard et al.,
2014; Davies et al., 2015; Wong et al., 2015). However, the use of single keywords
may lead to misleading interpretations of content (Mei & Frank, 2015; Scrivens
& Frank, 2016). If, for example, on a particular webpage, the words gun and con-
trol are found within close proximity of each other, it might be concluded that the
page is discussing gun control. This would most likely not indicate an extremist
page but more likely that the page was written by a proponent or opponent of
gun ownership. This page, of course, is not relevant to TDC’s data collection. On
the other hand, a page containing the words gun and control could be discussing
“controlling someone with a gun” within the context of kidnapping for example
which, in this case, TDC should continue with its analysis. In other words, key-
words can give an indication of the content within a webpage but cannot be used
to determine exactly what that content is about. For a more complete understand-
ing of the content of a specic piece of text, more powerful computational tools,
such as sentiment analysis, are required.
Sentiment analysis, also known as “opinion mining,” is a category of comput-
ing science that specializes in evaluating the opinions found in a piece of text by
organizing data into distinct classes and sections, and then assigning a piece of
text with a positive, negative, or neutral polarity value (Abbasi & Chen, 2005).
Sentiment analysis also provides a more targeted view of textual data by allowing
for the demarcation between cases that are sought after and those without any
notable relevance. Sentiment analysis has been used in a wide variety of settings,
including customer review analysis for products (e.g., Feldman, 2013), assess-
ments of attitudes toward events or products on social media platforms (e.g.,
Ghiassi, Skinner, & Zimbra, 2013), and for various analyses of extremist content
online (e.g., Bermingham, Conway, McInerney, O’Hare, & Smeaton, 2009, Chen,
2008; Williams & Burnap, 2015). Sentiment analysis has become increasingly
popular in terrorism and extremism studies because, as the amount of “opinion-
ated data” online grows exponentially, sentiment analysis software offers a wide
range of applications that can help address previously untapped and challenging
research problems (see Liu, 2012). Based on the notion that an author’s opinion
toward a particular topic is reected in the choice and intensity of words he or
she chooses to communicate, sentiment analysis software allows for identication
186 RYAN SCRIVENS ET AL.
and classication of opinions found in a piece of text (Abbasi, Chen, & Salem,
2008). Typically, this process occurs through a two-step process that produces a
“polarity value”:
(1) a body of text is split into sections (sentences) to determine subjective and
objective content and
(2) subjective content is classied by the software as being either positive, neu-
tral, or negative, where positive scores reect positive attitudes and negative
scores reect negative attitudes (see Feldman, 2013).
Worth highlighting, though, is the fact that sentiment analysis is not without
its limitations. It is estimated, for example, that 21% of the time humans cannot
agree among themselves about the sentiment within a given piece of text (Ogneva,
2010), with some individuals unable to understand subtle context or irony.
Understandably, sentiment analysis systems cannot be expected to have 100%
accuracy when compared to the opinions of humans. Sentiment analysis does,
however, provide insight into authors’ expressions and reactions toward certain
events or actions, for example, and one sentiment analysis program that has been
widely used by criminologists in terrorism and extremism studies is SentiStrength
(e.g., Frank et al., 2015; Levey et al., 2016; Macnair & Frank, 2018a, 2018b; Mei &
Frank, 2015; Scrivens et al., 2017, 2018; Scrivens & Frank, 2016).
SentiStrength uses a “lexical approach” – which maintains that an essential
part of understanding a language rests on the ability to understand the patterns
of and meanings associated with language (see Lewis, 1993) – and is based on
human coded data as well as various lexicons (i.e., dictionaries of phases and
words). Although the program is designed for short informal online texts, there
are features in place that allow longer texts to be analyzed (Thelwall & Buckley,
2013). SentiStrength analyzes a text by attributing positive, neural, or negative
values to words within the text, and these values are augmented by “booster
words” that can inuence the values assigned to the text as well as negating words,
punctuation, and other features that are uniquely suited for studying an online
context (Thelwall & Buckley, 2013). One of the features of SentiStrength is its
ability to analyze the sentiment around any given keyword. For example, the
phrase “I love apples but hate oranges” can be analyzed for the sentiment around
apples (resulting in a positive outcome) as well as oranges (resulting in a negative
outcome). The words around a given set of keywords are compared to the senti-
ment dictionary and their resulting values make up the total sentiment score for
any given text. Logically, a negative value implies negative sentiment for the ideas
expressed in the text while a positive value implies overall support.
To analyze a given piece of text for multiple keywords, multiple iterations must
be done, with each iteration consisting of an analysis of the same piece of text but
with a different keyword (resulting in the sentiment toward that keyword). Due
to the very specic nature of this procedure, it is necessary to input each form
of a particular word being analyzed. For example, to analyze sentiment around
the word kill, it is necessary to also analyze the words kills, killing, killed, and
all other derivatives of the word. Multiple iterations of SentiStrength are then
Searching for Extremist Content Online 187
applied, each one returning the detected sentiment around the specic keyword
and its derivatives. At the end of this process, each text is linked to multiple senti-
ment values, one for each keyword. Those multiple values are then averaged to
produce a single sentiment score for any given piece of text.
THE FUTURE OF DETECTING EXTREMIST
SENTIMENT ONLINE
There’s been a shift in recent years in how researchers investigate online com-
munities, ranging from the study of how extremists communicate through social
media (e.g., Bermingham et al., 2009) to the analysis of users connecting through
online health forums (e.g., Wang, Kraut, & Levine, 2012), for example. In short,
researchers in this area are shifting from manual identication of specic online
content to algorithmic techniques to do similar yet larger-scale tasks. The use of
such analytical approaches is becoming increasingly apparent in criminology and
criminal justice research (see Hannah-Moffat, 2018). This is a symptom of what
some have described as the “big data” phenomenon – that is, a massive increase
in the amount of data that is readily available, particularly online (see Chen et
al., 2014).
Logically, a number of researchers who study how terrorists and extremists use
the Internet have turned to sentiment analysis and other machine learning tech-
niques to identify and, by extension, analyze content of interest on a large scale.
In what follows, we will discuss what we believe is the future of the applicability of
sentiment analysis to explore extremist content online, drawing from the recom-
mendations of the below-listed studies in combination with our own experience
with sentiment analysis. In short, we suggest that the future of sentiment analysis
in exploring extremist content online should: (1) encourage social scientists and
computer scientists to collaborate with one another; (2) consider a combination
of analyses or more features to increase classier’s effectiveness; (3) continue to
validate sentiment analysis programs; and (4) apply and adapt sentiment analysis
to new and evolving radical online spaces.
Collaborations
In order to gain a broader understanding of online extremism, or to improve the
means by which researchers and practitioners “search for a needle in a haystack,”
social scientists and computer scientists must collaborate with one another.
Historically, large-scale data analyses have been conducted by computer scien-
tists and technical experts, which can be problematic in complex elds such as
terrorism and extremism research. Computer and technical experts tend to take a
high-level methodological perspective, measuring levels of – or propensity toward –
radicalization, or ways of identifying violent extremists, or predicting the next
terrorist attack. But searching for radical material online without a fundamen-
tal understanding of the radicalization process or how terrorists and extremists
use the Internet can be counterproductive. Social scientists, on the other hand,
may be well-versed in terrorism and extremism research, but most tend to be
188 RYAN SCRIVENS ET AL.
ill-equipped to manage large-scale data – from collecting to formatting to archiv-
ing large volumes of information. Bridging the computer science and social
science approaches to build on the strengths of each discipline offers perhaps
the best chance to construct a useful framework for detecting extremist content
online, as well as assisting authorities in addressing the threat of violent extrem-
ism as it evolves in the online milieu.
Combinations
A myriad of research shows that combining sentiment analysis with other meth-
ods and/or semantic-oriented approaches improves the detection of extremist
content online, on three fronts. This, we argue, is the future in detecting extremist
content online. First, research suggests that sentiment analysis software, in com-
bination with classication software, is an effective method to detect extremist
content online and on a large scale. In particular, combing sentiment analysis
with classication software – which identies similarities and differences in data-
sets and makes determinations about the data in a decision tree format – can
pinpoint extremist websites (see Mei & Frank, 2015; see also Scrivens & Frank,
2016). In addition, combining sentiment analysis with affect analysis –a machine
learning technique that measures the emotional content of communications – can
detect and measure the intensity levels associated with a broad range of emotions
in text found in online forums (see Chen, 2008; see also Figea, Kaati, & Scrivens,
2016). Research similarly suggests that the effectiveness of the sentiment analysis
in detecting extremist content can be signicantly boosted with additional clas-
sication feature sets, such as syntactic, stylistic, content-specic, and lexicon fea-
tures (Abbasi & Chen, 2005; Abbasi et al., 2008; Yang et al., 2011).
Second, research suggests that combining sentiment analysis with commonly
used research methods and frameworks in criminology and criminal justice
research can aid in the detection of extremist content online. Social network anal-
ysis, for example, in combination with sentiment analysis can be used to identify
users on YouTube who may have had a radicalizing agenda (see Bermingham et
al., 2009), detect extremist content on Twitter (Wei, Sing, & Martin, 2016), and
model online propaganda (see Burnap et al., 2014) and cyberhate (see Williams &
Burnap, 2015) on Twitter following the Woolwich terrorism incident. In addition,
combining sentiment analysis with geolocation software can be used to organ-
ize the opinions found on Twitter accounts using hashtags associated with IS
(Mirani & Sasi, 2016). Lastly, sentiment analysis, in combination with an algo-
rithm that incorporates criminal career measures (i.e., volume, severity, and dura-
tion) developed by Blumstein, Cohen, Roth, and Visher (1986), has been used to
account for unique components of a users’ online posting behavior and detect
the most radical authors in a large scale sample of online postings (see Scrivens
et al., 2017).
Third, in online political violence research, a growing emphasis has been placed
on the integration of a temporal component with sentiment analysis – a combi-
nation that we believe is key to providing insight into the patterns and trends
that characterize the development of extremist communities over time online.
Searching for Extremist Content Online 189
For example, combining sentiment analysis with survival analysis has been a use-
ful way to measure propaganda surges (see Burnap et al., 2014) and levels of
cyberhate (Williams & Burnap, 2015) on Twitter over time after a terrorist attack.
Other approaches that have proven to be effective in understanding temporal pat-
terns in radical online communities include: (1) mapping users’ sentiment and
affect changes in extremist forums (see Figea et al., 2016); (2) identifying signi-
cant temporal spikes in extremist forums that coincide with real-world events (see
Park, Beck, Fletche, Lam, & Tsang, 2016); and measuring the evolution of senti-
ment found in IS-produced online propaganda magazines (see Macnair & Frank,
2018b; see also Vergani & Bluic, 2015). Most recently, combining sentiment
analysis with semi-parametric group-based modeling to measure the evolution of
radical posting trajectories has shown to be an effective way to detect large-scale
temporal patterns in radical online communities (see Scrivens et al., 2018).
Validation
Another common thread that binds together the aforementioned studies is the
need to assess and potentially improve the classication accuracy and content
identication offered by sentiment analysis software. Researchers, for example,
have proposed that future work includes a “comparative human evaluation”
component to validate a sentiment program’s classications (e.g., Chalothorn &
Ellman, 2012; Figea et al., 2016). Macnair and Frank (2018a) further added that:
computer-assisted techniques such as sentiment analysis, in conjunction with human oversight,
can aid in the overall process of locating, identifying, and eventually, countering the narratives
that exist within extremist media. (p. 452)
This technique would have humans rate opinions in sentences and compare the
results to a sentiment analysis program. By comparing how a human might clas-
sify a piece of text to a sentiment analysis program, researchers can gain insight
into the accuracy of sentiment analysis’ classications. Future studies should also
integrate a qualitative understanding of how machine learning tools in general
and sentiment analysis software in particular make decisions about the content
that the tools analyze. Doing so may increase the reliability of the results and
increase the likelihood of identifying radical content online (Scrivens & Frank,
2016).
Also, it is not yet clear which sentiment analysis program is the most accu-
rate or effective overall in detecting extremist content online. Some research
does, however, draw comparisons between the performance of several sentiment
methods (i.e., SentiWordNet, SASA, PANAS-t, Emoticons, SentiStrength, LIWC,
SenticNet, and Happiness Index) (see Gonçalves, Benevenuto, Araújo, & Cha,
2013), but comparisons of this sort have yet to be explored within the online
political violence domain. One notable exception is the exploration of the linguis-
tic patterns on Twitter following the Manchester attacks and the Las Vegas shoot-
ing terrorist attacks (see Kostakos, Nykänen, Martinviita, Pandya, & Oussalah,
2018). Comparing the results of NLTK+SentiWordNet and SentiStrength soft-
ware, the authors concluded that SentiStrength performed at a higher level than
190 RYAN SCRIVENS ET AL.
the other applications. Building from that study, future work should continue to
explore and test the wide variety of programs currently available to determine if
there is indeed one ‘superior’ method, or if the appropriate methodology is con-
text-specic. A combination of sentiment analysis tools could also be integrated
into an analysis, in an attempt to cross-validate each other (Scrivens et al., 2017).
Adaptation
The ways in which violent extremists communicate online will continue to
evolve, shifting from uses on traditional discussion forums and social media
platforms to lesser known spaces on the Internet. For example, in addition to
the use of dedicated extreme right forums and all major social media platforms,
a diversity of more general online forums or forum-like online spaces are also
hosting increasing amounts of extreme right content. These include the popu-
lar social news aggregation, web content rating, and discussion site Reddit and
image-based bulletin board and comment site 4chan (Scrivens & Conway, in
press). Sites such as these, which contrary to most mainstream social media
platforms, do not have clear-cut anti-hate speech policies (see Gaudette, Davies,
& Scrivens, 2018), may provide unique insight into the expressions and man-
ifestations of virtual hate, especially on a large scale using sentiment analy-
sis tool outlined in this chapter. In addition, a new generation of right-wing
extremists are moving to more overtly hateful, yet to some extent more hidden
platforms, including the likes of 8chan, Voat, Gab, and Discord (see Davey &
Ebner, 2017). Similarly, between 2013 and 2016, IS’s media production out-
lets developed content that was largely distributed via major (and some minor)
social media and other online platforms. These included prominent IS presences
not only on Facebook, Twitter, and YouTube, but also on Ask.fm, JustPaste.it,
and the Internet Archive (Scrivens & Conway, in press). Having said that, as the
ways in which extremists communicate online will undoubtedly evolve, so too
must the ways in which researchers detect extremist content online. Sentiment
analysis software, in combination with other means of analyzing data, should
be applied to these increasingly popular platforms.
CONCLUSION
Since the advent of the Internet, violent extremists and those who subscribe
to radical views from across the globe have exploited online resources to build
transnational “virtual communities.” The Internet is a fundamental medium that
facilitates these radical communities, not only in “traditional” hate sites, web
forums, and commonly used social media sites, but in lesser known, oftentimes
more hidden spaces online as well (Scrivens & Conway, in press). Researchers and
practitioners have attempted to identify and monitor extremist content online but
increasingly have been overwhelmed by the sheer volume of data in these growing
spaces. Simply put, the manual analysis of online content has become increas-
ingly less feasible.
Searching for Extremist Content Online 191
As a result, researchers and practitioners have sought to develop different
methods of extracting data, especially through the use of web-crawlers, as well
develop different methods for managing this large-scale phenomenon to sift
through and detect extremist content. A relatively novel machine learning tool,
sentiment analysis, has sparked the interest of some researchers in the eld of
terrorism and extremism studies who face new challenges in detecting the spread
of extremist content online. Though this area of research is in its infancy, senti-
ment analysis is showing signs of success and may represent the future of how
researchers and practitioners study extremism online – particularly on a large
scale. This, however, will require that the social scientists continue to collabo-
rate with the computer scientists, combining sentiment analysis software with
other classication tools and research methods, validating sentiment analysis
programs, and adapting sentiment analysis software to new and evolving radical
online spaces.
NOTES
1. For more information, see https://www.sfu.ca/iccrc.html.
2. An array of machine learning techniques is used in the online political violence
research terrain that are not discussed in detail in this chapter. For a list of some of these
techniques, see Chen (2012).
3. For more information, see http://www.winwebcrawler.com.
4. For more information, see http://www.cs.cmu.edu/rcm/websphinx.
5. For more information, see http://sbl.net.
6. For more information, see https://pypi.org/project/beautifulsoup4/.
REFERENCES
Abbasi, A., & Chen, H. (2005). Applying authorship analysis to extremist-group web forum messages.
Intelligent Systems, 20(5), 67–75.
Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature selec-
tion for opinion classication in web forums. ACM Transactions on Information Systems, 26(3),
1–34.
Allsup, R., Thomas, E., Monk, B., Frank, R., & Bouchard, M. (2015). Networking in child exploi-
tation – Assessing disruption strategies using registrant information. In Proceedings of the
2015 IEEE/ACM international conference on advances in social networks analysis and mining
(ASONAM), Paris, France (pp. 400–407).
Amarasingam, A. (2016). What Aaron told me: An expert on extremism shares his conversations with
the terror suspect. National Post, August 11. Retrieved from https://nationalpost.com/news/
canada/what-aaron-told-me-an-expert-on-extremism-shares-his-conversations-with-the-terror-
suspect. Accessed on January 23, 2019.
Bartlett, J., & Littler, M. (2011). Insight the EDL: Populist politics in the digital era. London: Demos.
Bermingham, A., Conway, M., McInerney, L., O’Hare, N., & Smeaton, A. F. (2009). Combining social
network analysis and sentiment analysis to explore the potential for online radicalisation. In
Proceedings of the 2009 international conference on advances in social network analysis mining
(ASONAM), Athens, Greece (pp. 231–236).
Blumstein, A., Cohen, J., Roth, J. A., & Visher, C. A. (1986). Criminal careers and ‘career criminals.’
Washington, DC: National Academy Press.
Bouchard, M., Joffres, K., & Frank, R. (2014). Preliminary analytical considerations in designing
a terrorism and extremism online network extractor. In V. Mago & V. Dabbaghian (Eds.),
Computational models of complex systems (pp. 171–184). New York, NY: Springer.
192 RYAN SCRIVENS ET AL.
Brynielsson, J., Horndahl, A., Johansson, F., Kaati, L., Martenson, C., & Svenson, P. (2013). Analysis
of weak signals for detecting lone wolf terrorists. Security Informatics, 2(11), 1–15.
Burnap, P., Williams, M. L., Sloan, L., Rana, O., Housley, W., Edwards, A., …, Voss, A. (2014).
Tweeting the terror: Modelling the social media reaction to the Woolwich terrorist attack.
Social Network Analysis and Mining, 4, 1–14.
Chalothorn, T., & Ellman, J. (2012). Using SentiWordNet and sentiment analysis for detecting radical
content on web forums. In Proceedings of the 6th conference on software, knowledge, information
management and application (SKIMA), Chengdu, China (pp. 9–11).
Chen, H. (2008). Sentiment and affect analysis of dark web forums: Measuring radicalization on the
Internet. In Proceedings of the 2008 IEEE international conference on intelligence and security
informatics (ISI), Taipei, Taiwan (pp. 104–109).
Chen, H. (2012). Dark web: Exploring and data mining the dark side of the web. New York, NY: Springer.
Chen, M., Mao, S., Zhang, Y., & Leung, V. C. (2014). Big data: Related technologies, challenges and
future prospects. New York, NY: Springer.
Cohen, K., Johansson, F., Kaati, L., & Mork, J. (2014). Detecting linguistic markers for radical vio-
lence in social media. Terrorism and Political Violence, 26(1), 246–256.
Davey, J., & Ebner, J. (2017). The fringe insurgency: Connectivity, convergence and mainstreaming of the
extreme right. London: Institute for Strategic Dialogue.
Davies, G., Bouchard, M., Wu, E., Joffres, K., & Frank, R. (2015). Terrorist and extremist organiza-
tions’ use of the Internet for recruitment. In M. Bouchard (Ed.), Social networks, terrorism and
counter-terrorism: Radical and connected (pp. 105–127). New York, NY: Routledge.
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM,
56(4), 82–89.
Figea, L., Kaati, L., & Scrivens, R. (2016). Measuring online affects in a white supremacy forum. In
Proceedings of the 2016 IEEE international conference on intelligence and security informatics
(ISI), Tucson, Arizona, USA (pp. 85–90).
Frank, R., Bouchard, M., Davies, G., & Mei, J. (2015). Spreading the message digitally: A look into
extremist content on the Internet. In R. G. Smith, R. C.-C. Cheung, & L. Y.-C. Lau (Eds.),
Cybercrime risks and responses: Eastern and western perspectives (pp. 130–145). London:
Palgrave Macmillan.
Frank, R., Macdonald, M., & Monk, B. (2016). Location, location, location: Mapping potential
Canadian targets in online hacker discussion forums. In Proceedings of the 2016 European intel-
ligence and security informatics conference (EISIC), Uppsala, Sweden (pp. 16–23).
Frank, R., Westlake, B. G., & Bouchard, M. (2010). The structure and content of online child exploita-
tion networks. In Proceedings of the 10th ACM SIGKDD workshop on intelligence and security
informatics (ISI-KDD), Washington, DC, USA, Article 3.
Fu, T., Abbasi, A., & Chen, H. (2010). A focused crawler for dark web forums. Journal of American
Society for Information Science and Technology, 61(6), 1213–1231.
Gaudette, T., Davies, G., & Scrivens, R. (2018). Upvoting extremism, part I: An assessment of extreme
right discourse on Reddit. VOX-Pol Network of Excellence Blog. Retrieved from https://www.
voxpol.eu/upvoting-extremism-part-i-an-assessment-of-extreme-right-discourse-on-reddit.
Accessed on January 23, 2019.
Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using
n-gram analysis and dynamic articial neutral network. Expert Systems with Applications,
40(16), 6266–6282.
Gonçalves, P., Benevenuto, M., Araújo, F., & Cha, M. (2013). Comparing and combining sentiment
analysis methods. In Proceedings of the 1st ACM conference on online social networks, Boston,
MA, USA (pp. 27–38).
Hankes, K. (2015). Dylann Roof may have been a regular commenter at neo-Nazi website The
Daily Stormer. Southern Poverty Law Center. Retrieved from http://www.splcenter.org/
blog/2015/06/22/dylann-roof-may-have-been-a-regular-commenter-at-neo-nazi-website-the-
daily-stormer. Accessed on January 23, 2019.
Hannah-Moffat, K. (2018). Algorithmic risk governance: big data analytics, race and information
activism in criminal justice debates. Theoretical Criminology.
Searching for Extremist Content Online 193
Hung, B. W. K., Jayasumana, A. P., & Bandara, V. W. (2016). Detecting radicalization trajectories using
graph pattern matching algorithms. In Proceedings of the 2016 IEEE international conference on
intelligence and security informatics (ISI), Tucson, Arizona, USA (pp. 313–315).
Internet Live Stats. (2019). Total number of websites. Retrieved from http://www.internetlivestats.com/
total-number-of-websites. Accessed on January 23, 2019.
Internet World Stats. (2019). Internet growth statistics. Retrieved from http://www.internetworldstats.
com/emarketing.htm. Accessed on January 23, 2019.
Joffres, K., Bouchard, M., Frank, R., & Westlake, B. G. (2011). Strategies to disrupt online child por-
nography networks. In Proceedings of the European intelligence and security informatics confer-
ence (EISIC), Athens, Greece (pp. 163–170).
Johansson, J., Kaati, L., & Sahlgren, M. (2016). Detecting linguistic markers of violent extremism
in online environments. In M. Khader, L. S. Neo, G. Ong, E. T. Mingyi, & J. Chin (Eds.),
Combating violent extremism and radicalization in the digital era (pp. 374–390). Hershey, PA:
Information Science Reference.
Klausen, J., Marks, C. E., & Zaman, T. (2018). Finding extremists in online social networks. Operations
Research, 66(4), 957–976.
Kostakos, P., Nykänen, M., Martinviita, M., Pandya, A., & Oussalah, M. (2018). Meta-terrorism: iden-
tifying linguistic patterns in public discourse after an attack. In Proceedings of the 2018 IEEE/
ACM international conference on advances in social networks analysis and mining (ASONAM),
Barcelona, Spain (pp. 1079–1083).
Levey, P., Bouchard, M., Hashimi, S., Monk, B., & Frank, R. (2016). The emergence of violent narra-
tives in the life-course trajectories of online forum participants. Canadian Network for Research
on Terrorism, Security and Society Report, Waterloo, ON, Canada.
Lewis, M. (1993). The lexical approach: The state of ELT and the way forward. Hove: Language
Teaching Publications.
Liu, B. (2012). Sentiment analysis and opinion mining. San Rafael, CA: Morgan and Claypool.
Macdonald, M., & Frank, R. (2016). The network structure of malware development, deployment and
distribution. Global Crime, 18(1), 49–69.
Macdonald, M., & Frank, R. (2017). Shufe up and deal: Use of a capture–recapture method to esti-
mate the size of stolen data markets. American Behavioral Scientist, 61(11), 1313–1340.
Macdonald, M., Frank, R., Mei, J., & Monk, B. (2015). Identifying digital threats in a hacker web
forum. In Proceedings of the 2015 international symposium on foundations of open source intel-
ligence and security informatics (FOSINT), Paris, France (pp. 926–933).
Macnair, L., & Frank, R. (2018a). The mediums and the messages: Exploring the language of Islamic
State media through sentiment analysis. Critical Studies on Terrorism, 11(3), 438–457.
Macnair, L., & Frank, R. (2018b). Changes and stabilities in the language of Islamic State magazines:
A sentiment analysis. Dynamics of Asymmetric Conict, 11(2), 109–120.
Mei, J., & Frank, R. (2015). Sentiment crawling: Extremist content collection through a sentiment
analysis guided web-crawler. In Proceedings of the international symposium on foundations of
open source intelligence and security informatics (FOSINT), Paris, France (pp. 1024–1027).
Mikhaylov, A., & Frank, R. (2016). Cards, money and two hacking forums: An analysis of online
money laundering schemes. In Proceedings of the 2016 European intelligence and security infor-
matics conference (EISIC), Uppsala, Sweden (pp. 80–83).
Mikhaylov, A., & Frank, R. (2018). Illicit payments for illicit goods: Noncontact drug distribution on
Russian online drug marketplaces. Global Crime, 19(2), 146–170.
Mirani, T. B., & Sasi, S. (2016). Sentiment analysis of ISIS related tweets using absolute location. In
Proceedings of the 2016 international conference on computational science and computational
intelligence (CSCI), Las Vegas, NV, USA (pp. 1140–1145).
Monk, B., Allsup, R., & Frank, R. (2015). LECENing places to hide: Geo-mapping child exploitation
material. In Proceedings of the 2015 IEEE intelligence and security informatics (ISI), Baltimore,
MD, USA (pp. 73–78).
Moghadam, A. (2008). The Sala-jihad as a religious ideology. CTC Sentinel, 1(3), 14–16.
Ogneva, M. (2010). How companies can use sentiment analysis to improve their business. Retrieved
from http://mashable.com/2010/04/19/sentiment-analysis. Accessed on January 23, 2019.
194 RYAN SCRIVENS ET AL.
Park, A. J., Beck, B., Fletche, D., Lam, P., & Tsang, H. H. (2016). Temporal analysis of radical dark
web forum users. In Proceedings of the 2016 IEEE/ACM international conference on advances
in social networks analysis and mining (ASONAM), San Francisco, CA, USA (pp. 880–883).
Perry, B., & Scrivens, R. (2016). Uneasy alliances: A look at the right-wing extremist movement in
Canada. Studies in Conict and Terrorism, 39(9), 819–841.
Sageman, M. (2014). The stagnation in terrorism research. Terrorism and Political Violence, 26(4),
565–580.
Scrivens, R., & Conway, M. (in press). The roles of ‘old’ and ‘new’ media tools and technologies in
the facilitation of violent extremism and terrorism. In R. Leukfeldt & T. J. Holt (Eds.),
Cybercrime: The Human Factor. New York, NY: Routledge.
Scrivens, R., Davies, G., & Frank, R. (2017). Searching for signs of extremism on the web: An introduc-
tion to sentiment-based identication of radical authors. Behavioral Sciences of Terrorism and
Political Aggression, 10(1), 39–59.
Scrivens, R., Davies, G., & Frank, R. (2018). Measuring the evolution of radical right-wing posting
behaviors online. Deviant Behavior.
Scrivens, R., & Frank, R. (2016). Sentiment-based classication of radical text on the web. In
Proceedings of the 2016 European intelligence and security informatics conference (EISIC),
Uppsala, Sweden (pp. 104–107).
Southern Poverty Law Center. (2014). White homicide worldwide. Retrieved from https://www.spl-
center.org/20140331/white-homicide-worldwide. Accessed on January 23, 2019.
Thelwall, M., & Buckley, K. (2013). Topic-based sentiment analysis for the social web: The role of
mood and issue-related words. Journal of the American Society for Information Science and
Technology, 64(8), 1608–1617.
Vergani, M., & Bluic, A. M. (2015). The evolution of the ISIS’ language: A quantitative analysis of the
language of the rst year of Dabiq magazine. Security, Terrorism, and Society, 2, 7–20.
Wang, Y-C., Kraut, R., & Levine, J. M. (2012). To stay or leave? The relationship of emotional and
informational support to commitment in online health support groups. In Proceedings of the
ACM 2012 conference on computer supported cooperative work, Seattle, WA, USA (pp. 833–842).
Wei, Y., Singh, L., & Martin, S. (2016). Identication of extremism on Twitter. In Proceedings of the
2016 IEEE/ACM international conference on advances in social networks analysis and mining
(ASONAM), San Francisco, CA, USA (pp. 1251–1255).
Westlake, B. G., & Bouchard, M. (2015). Criminal careers in cyberspace: Examining website failure
within child exploitation networks. Justice Quarterly, 33(7), 1154–1181.
Westlake, B. G., Bouchard, M., & Frank, R. (2011). Finding the key players in online child exploitation
networks. Policy and Internet, 3(2), 1–25.
Williams, M. L., & Burnap, P. (2015). Cyberhate on social media in the aftermath of Woolwich: A
case study in computational criminology and big data. British Journal of Criminology, 56(2),
211–238.
Wong, M., Frank, R., & Allsup, R. (2015). The supremacy of online white supremacists – An analysis
of online discussions of white supremacists. Information and Communications Technology Law,
24(1), 41–73.
Yang, M., Kiang, M., Ku, Y., Chiu, C., & Li, Y. (2011). Social media analytics for radical opinion
mining in hate group web forums. Journal of Homeland Security and Emergency Management,
8(1), 1547–7355.
Zhang, Y., Zeng, S., Huang, C.-N., Fan, L., Yu, X., Dang, Y., …, Chen, H. (2010). Developing a dark
web collection and infrastructure for computational and social sciences. In Proceedings of the
2010 IEEE international conference on intelligence and security informatics (ISI), Atlanta, GA,
USA (pp. 59–64).
Zhou, Y., Qin, J., Lai, G., Reid, E., & Chen, H. (2005). Building knowledge management system for
researching terrorist groups on the web. In Proceedings of the 11th Americas conference on infor-
mation systems (AMCIS), Omaha, NE, USA (pp. 2524–2536).
Zulkarnine, A., Frank, R., Monk, B., Mitchell, J., & Davies, G. (2016). Surfacing collaborated networks
in dark web to nd illicit and criminal content. In Proceedings of the 2016 IEEE international
conference on intelligence and security informatics (ISI), Tucson, AZ, USA (pp. 109–114).
... Doing so enabled us to develop theoretical propositions that explain how individuals behave online once they have moved "up" the radicalization "staircase" (Moghaddam, 2005), and passed the psychological tipping point for engaging in extremist action. Thus, our research aimed to contribute toward developing theoretical propositions to help identify the "needle in the haystack" and extend the possibilities of using digital trace data to prevent terrorist attacks (Macdonald et al., 2019;Quiggin, 2017;Sageman, 2011;Scrivens, 2021;Scrivens et al., 2019). ...
... The perpetrator of the Christchurch terrorist attack in New Zealand shared his plans on Twitter and 4Chan in the days and hours preceding the attack, before livestreaming his actions on Facebook (Macklin, 2019). While these incidents are indicative of the challenges associated with efforts to monitor online material, they also highlight the potential of using data from the online communications of terror actors to examine online signals of mobilization and further understand this process (Scrivens et al., 2019). This is especially pertinent in the case of right-wing terrorist attacks, in which individuals tend to act alone, using the internet in their preparation and planning, but rarely display the offline behaviors that have historically been indicative of future violence (e.g., coordinated planning across group members, reconnaissance activities; Gill, 2015). ...
Article
Full-text available
Psychological theories of mobilization tend to focus on explaining people's motivations for action, rather than mobilization ("activation") processes. To investigate the online behaviors associated with mobilization, we compared the online communications data of 26 people who subsequently mobilized to right-wing extremist action and 48 people who held similar extremist views but did not mobilize (N = 119,473 social media posts). In a three-part analysis, involving content analysis (Part 1), topic modeling (Part 2), and machine learning (Part 3), we showed that communicating ideological or hateful content was not related to mobilization, but rather mobilization was positively related to talking about violent action, operational planning, and logistics. Our findings imply that to explain mobilization to extremist action, rather than the motivations for action, theories of collective action should extend beyond how individuals express grievances and anger, to how they equip themselves with the "know-how" and capability to act.
... The organizational structure of extremist groups has also been examined to identify variations in their practices, particularly the ways their relationships change with the growth of the Internet and social media platforms (Asal & Rethemeyer, 2008;Chermak et al., 2013;Hsu et al., 2018;Pantucci, 2011). Several studies have also explored the ways that the process of radicalization to ideological extremism occurs in both off-and online spaces Gill et al., 2014;Hegghammer, 2013;Pantucci, 2011;Scrivens et al., 2019Scrivens et al., , 2023Weimann, 2011). ...
Article
There has been a dramatic increase in research on terrorism and extremist activities over the last two decades. Despite this growth, the majority of studies focus on either the harm caused by ideologically-motivated violence in physical spaces, or the ways in which individuals radicalize and organize in online spaces. There is growing evidence that traditional extremist groups and terrorists engage in cyberattacks, such as computer hacking, in support of their ideological beliefs. Little is known about the degree to which ideologically-motivated cyberattacks cause harm to victims, and the correlates of harm depending on the nature of the attack. This study attempts to address this gap in the literature through a quantitative analysis of 425 victims of 246 cyberattacks captured in the open-source Extremist CyberCrime Database (ECCD). Using situational crime prevention, this analysis attempts to identify the significant factors associated with the loss of time, data, and financial harm experienced by victims of cyberattacks performed by ideological actors with and without state sponsorship. The findings demonstrate that the forms of attack reported, as well as the unique attack methods, such as zero-day vulnerabilities, are more likely to lead victims to report the loss of time to the victim, as well as sensitive data and financial losses. The target type is also associated with the loss of both time and sensitive data, however there is no relationship between targets and the financial losses reported from cyberattacks. Additionally, financial harm was more likely to result from non-state sponsored ideological actors, such as racial and ethnically motivated individuals and jihadists. This analysis demonstrates support for the application of situational crime prevention frameworks traditionally used for physical terrorism to virtual ideological attacks. Further, this study demonstrates the importance of assessing cyberattacks as a form of ideologically-motivated crime. Finally, the findings demonstrate the need for increased resources to improve the state of cybersecurity for individuals, businesses, and government agencies to reduce the risk of harm associated with cyberattacks performed by both nation-state sponsored and non-state ideological actors alike.
... Their proposed model is one of the first models based on deep neural networks to process the textual data on the dark Web. There are also some studies that proposed tools to support the collection of specific information, such as a focused crawler by Iliou et al. [30], new crawling frameworks for Tor by Zhang et al. [14], crawling extremist content using dark crawling and sentiment analysis by Scrivens et al. [59], and advanced crawling and indexing systems like LIGHTS by Ghosh et al. [60]. ...
Article
Full-text available
Content on the World Wide Web that is not indexable by standard search engines defines a category called the deep Web. Dark networks are a subset of the deep Web. They provide services of great interest to users who seek online anonymity during their search on the Internet. Tor is the most widely used dark network around the world. It requires unique application layer protocols and authorization schemes to access. The present evidence reveals that in spite of great efforts to investigate Tor, our understanding is limited to the work on either the information or structure of this network. Also, interplay between information and structure that plays an important role in evaluating socio-technical systems including Tor has not been given the attention it deserves. In this article, we review and classify the present work on Tor to improve our understanding of this network and shed light on the new directions to evaluate Tor. The related work can be categorized into proposals that (1) study the security and privacy on Tor, (2) characterize Tor’s structure, (3) evaluate the information hosted on Tor, and (4) review the related work on Tor from 2014 to the present.
... 72 Data collection and sampling efforts proceeded in four stages. First, a custom-written computer program that was designed to collect vast amounts of information online captured all open-source content on Stormfront Canada, 73 which resulted in approximately 125,000 sub-forum posts made by approximately 7,000 authors between September 12, 2001 and October 12, 2016. 74 Second, to pinpoint users in the sub-forum who were violent or non-violent RWEs offline, a former violent extremist 75 voluntarily reviewed a list of 7,000 users who posted in the sub-forum and selected those who matched one of the two user types. ...
Article
Full-text available
Despite the ongoing need for practitioners to identify violent extremists online before their engagement in violence offline, little is empirically known about their digital footprints generally or differences in their posting behaviors compared to their non-violent counterparts particularly – especially on high-frequency posting days. Content analysis was used to examine postings from a unique sample of violent and non-violent right-wing extremists as well as from a sample of postings within a sub-forum of the largest white supremacy forum during peak and non-peak posting days for comparison purposes. Several noteworthy posting behaviors were identified that may assist in identifying credible threats online.
... Research concerning the creation of an automatic web browser to collect data from extremist websites (Elovici et al., 2010;Scrivens et al., 2019) is considered in the works. ...
Article
Full-text available
In recent years, there has been a noticeable increase in both individuals and organizations utilizing social networks for illicit purposes. This trend can be viewed as a potential threat to the national security of the country. In this article, the authors pay attention to how various extremist organizations use social networks in their activities, and offer LSTM‐based models for classifying extremist texts in Kazakh on web resources. The main purpose of the article is to classify Kazakh texts in social networks into extremist and non‐extremist classes. The authors employed techniques such as Tf‐Idf, Word2Vec, Bag of Words (BoW), and n‐grams in experiments. A list of extremist keywords in the Kazakh language and, accordingly, a corpus of extremist texts in the Kazakh language were created for training and testing machine learning methods. As a result, the authors introduced a model that demonstrated superior performance across all evaluation metrics in machine learning for detecting extremist texts in the Kazakh language. The theoretical significance of this study lies in its comprehensive exploration of methods and algorithms for detecting extremist activities and organizations. The foundational findings derived from this research can contribute valuable insights to the global scientific community. The practical implications, including the developed methodology can be utilized by authorized entities to enhance information security, safeguard critical infrastructure, and combat online extremism.
... Data collection and sampling efforts proceeded in two stages. First, all open-source content on Stormfront Canada was captured using a custom-written computer program that was designed to collect vast amounts of information online (for more information on the web-crawler, see Scrivens et al., 2019). In total, the web-crawler extracted approximately 125,000 sub-forum posts made by approximately 7,000 authors between 12 September 2001 and 29 October 2017. ...
Article
Full-text available
Little is known about online behaviors of violent extremists generally or differences compared to non-violent extremists who share ideological beliefs. Even less is known about desistance from posting behavior. A sample of 99 violent and non-violent right-wing extremists to compare their online patterns of desistance within a sub-forum of the largest white supremacy web-forum was analyzed. A probabilistic model of desistance was tested to determine the validity of criteria set for users reaching posting desistance. Findings indicated that criteria predicted “true” desistance, with 5% misidentification. Each consecutive month without posting in the sub-forum resulted in a 7.6% increase in odds of posting desistance. There were no significant differences in effects for violent versus non-violent users, though statistical power was low.
... Articles were included if they used machine learning to develop a predictive model that could be applied to addressing or preventing violent extremism. Therefore, articles were excluded if they focused on other aspects of artificial intelligence that did not result in the development of a predictive model, such as the development of Web crawlers [49]. Articles were also excluded if they were only peripherally related to violent extremism. ...
Article
Full-text available
The purpose of this scoping review is to highlight the machine learning tools used in research to address and prevent violent extremism. To achieve this goal, the following objectives guide this study: (1) describe outcomes that have been studied; (2) summarize the data sources used; and, (3) determine whether the reporting of machine learning predictive models aligns with the established reporting guidelines for reporting of prediction models. ProQuest, Compendex, IEEE, JStor and PubMed were searched from June to July 2022. Based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines, databases were searched for articles related to machine learning models applied to the address and prevention of violent extremism. Following standards established by reporting guidelines, findings were extracted from published articles, including general study characteristics, aspects of model development, and reporting of results. Of 53 unique articles identified by the search, 18 were included in the review. Most articles were published between 2016 and 2022 (n=16, 88.8%). Studies focused on violent extremism worldwide, with the majority of studies not specifically focused on a distinct region (n=11, 61.1%). The most frequently used machine learning algorithms were support vector machines (n=9, 50%) followed by random forests (n=5, 27.7%), natural language processing (n=4, 22.2%), and deep learning (n=4, 22.2%). The number of features used varied greatly, ranging from 17 to 7,556. Many studies did not report an epistemological or theoretical framework which guided their machine learning approaches or interpretation of findings (n=8, 44.4%). Many studies did not incorporate the TRIPOD or any other recommended guidelines for the reporting of predictive 2 models. Future research in this field should prioritize evaluating the impact of prediction models on decisions for addressing and preventing violent extremism.
... 1686) and many sites within the manosphere are rife with image-based messages in the forms of memes and gifs. As online content related to the topic of misogynistic extremism proliferates, adopting Natural Language Processing techniques, along with the use of hate speech detection tools (e.g., HateXplain) and customized web crawlers may be beneficial (Chen et al., 2022;Mathew et al., 2021;Scrivens et al., 2019). ...
Article
Full-text available
In recent years, the concept of "misogynistic extremism" has emerged as a subject of interest among scholars, governments, law enforcement personnel, and the media. Yet a consistent understanding of how misogynistic extremism is defined and conceptualized has not yet emerged. Varying epistemological orientations may contribute to the current conceptual muddle of this topic, reflecting long-standing and on-going challenges with the conceptualization of its individual components. To address the potential impact of misogynistic extremism (i.e., violent attacks), a more precise understanding of what this phenomenon entails is needed. To summarize the existing knowledge base on the nature of misogynistic extremism, this scoping review analyzed publications within English-language peer-reviewed and gray literature sources. Seven electronic databases and citation indexes were systematically searched using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for scoping reviews (PRISMA-ScR) checklist and charted using the 2020 PRISMA flow diagram. Inclusion criteria included English peer-reviewed articles and relevant gray literature publications, which contained the term "misogynistic extremism" and other closely related terms. No date restrictions were imposed. The search strategy initially yielded 475 publications. After exclusion of ineligible articles, 40 publications remained for synthesis. We found that misogynistic extremism is most frequently conceptualized in the context of misogynistic incels, male supremacism, far-right extremism, terrorism, and the black pill ideology. Policy recommendations include increased education among law enforcement and Countering and Preventing Violent Extremism experts on male supremacist violence and encouraging legal and educational mechanisms to bolster gender equality. Violence stemming from misogynistic worldviews must be addressed by directly acknowledging and challenging socially embedded systems of oppression such as white supremacy and cisheteropatriarchy.
Chapter
Full-text available
This chapter describes and discusses the roles of media tools and technologies in the facilitation of violent extremism and terrorism. Rather than focusing on how media report on terrorism, we investigate how extremist and terrorist groups and movements themselves have exploited various ‘traditional’ and ‘new’ media tools, from print to digital, outlining the significance that they have had on extremists’ ability to mark territory, intimidate some audiences, connect with other (sympathetic) audiences, radicalize, and even recruit. Underlined is that violent extremists and terrorists of all stripes have, over time, used every means at their disposal to forward their communicative goals. Also worth noting is that ‘old’ media tools are not extinct, and although ‘new’ media play a prominent role in contemporary violent extremism and terrorism, ‘old’ tools – everything from murals to magazines – continue to be utilized in tandem with the former.
Article
Full-text available
Online discussion forums have been identified as an online social milieu that may facilitate the radicalization process, or the development of violent narratives for a minority of participants, notably youth. Yet, very little is known on the nature of the conversations youth have online, the emotions they convey, and whether or how the sentiments expressed in online narratives may change over time. Using Life Course Theory (LCT) and General Strain Theory (GST) as theoretical guidance, this article seeks to address the development of negative emotions in an online context, specifically whether certain turning points (such as entry into adulthood) are associated with a change in the nature of sentiments expressed online. A mixed methods approach is used, where the content of posts from a sample of 96 individuals participating in three online discussion forums focused on Islamic issues is analyzed quantitatively and qualitatively to assess the nature and evolution of negative emotions. The results show that 1) minors have a wider range of sentiments than adults, 2) adults are more negative overall when compared to minors, and 3) both groups tended to become more negative over time. However, the most negative users of the sample did not show as much change as the others, remaining consistent in their narratives from the beginning to the end of the study period.
Article
Full-text available
Researchers have previously explored how right-wing extremists build a collective identity online by targeting their perceived “threat,” but little is known about how this “us” versus “them” dynamic evolves over time. This study uses a sentiment analysis-based algorithm that adapts criminal career measures, as well as semi-parametric group-based modeling, to evaluate how users’ anti-Semitic, anti-Black, and anti-LGBTQ posting behaviors develop on a sub-forum of the most conspicuous white supremacy forum. The results highlight the extent to which authors target their key adversaries over time, as well as the applicability of a criminal career approach in measuring radical posting trajectories online.
Conference Paper
Full-text available
When a terror-related event occurs, there is a surge of traffic on social media comprising of informative messages, emotional outbursts, helpful safety tips, and rumors. It is important to understand the behavior manifested on social media sites to gain a better understanding of how to govern and manage in a time of crisis. We undertook a detailed study of Twitter during two recent terror-related events: the Manchester attacks and the Las Vegas shooting. We analyze the tweets during these periods using (a) sentiment analysis, (b) topic analysis, and (c) fake news detection. Our analysis demonstrates the spectrum of emotions evinced in reaction and the way those reactions spread over the event timeline. Also, with respect to topic analysis, we find “echo chambers”, groups of people interested in similar aspects of the event. Encouraged by our results on these two event datasets, the paper seeks to enable a holistic analysis of social media messages in a time of crisis.
Article
Full-text available
The distribution or consumption of traditional drugs has become the subject of stringent penalties throughout most of the world and synthetic designer drugs have become the alternative. Novel psychoactive substances, also called ‘legal highs’, are highly varied in terms of chemical composition. These substances are advertised and distributed as an alternative to traditional drugs on the Internet, making identification of new substances and enforcement difficult. For this article, we downloaded and analysed 28 Russian-language online drug marketplaces which distribute traditional and novel psychoactive substances. All marketplaces used a noncontact drug dealing method where the seller and the buyer communicate through the Internet to arrange for payment and delivery of drugs without meeting face-to-face. Geographic information, price, amount, substance type and payment method data were extracted. Findings indicate such marketplaces are able to operate due to the ability of their clients to pay anonymously with the virtual currencies – Qiwi and Bitcoin.
Article
This study applies the semi-automated method of sentiment analysis in order to examine any quantifiable changes in the linguistic, topical, or narrative patterns that are present in the English-language Islamic State-produced propaganda magazines Dabiq (15 issues) and Rumiyah (10 issues). Based on a sentiment analysis of the textual content of these magazines, it was found that the overall use of language has remained largely consistent between the two magazines and across a timespan of roughly three years. However, while the majority of the language within these magazines is consistent, a small number of significant changes with regard to certain words and phrases were found. Specifically, the language of Islamic State magazines has become increasingly hostile towards certain enemy groups of the organization, while the language used to describe the Islamic State itself has become significantly more positive over time. In addition to identifying the changes and stabilities of the language used in Islamic State magazines, this study endeavours to test the effectiveness of the sentiment analysis method as a means of examining and potentially countering extremist media moving forward.
Article
This study applies the method of sentiment analysis to the online media released by the Islamic State (IS) in order to distinguish the ways in which IS uses language within their media, and potential ways in which this language differs across various online platforms. The data used for this sentiment analysis consist of transcripts of IS-produced videos, the text of IS-produced online periodical magazines, and social media posts from IS-affiliated Twitter accounts. It was found that the language and discourse utilised by IS in their online media is of a predominantly negative nature, with the language of videos containing the highest concentration of negative sentiment. The words and phrases with the most extreme sentiment values are used as a starting point for the identification of specific narratives that exist within online IS media. The dominant narratives discovered with the aid of sentiment analysis were: 1) the demonstrated strength of the IS, 2) the humiliation of IS enemies, 3) continuous victory, and 4) religious righteousness. Beyond the identification of IS narratives, this study serves to further explore the utility of the sentiment analysis method by applying it to mediums and data that it has not traditionally been applied to, specifically, videos and magazines.
Article
Often overlooked in the measurement of crime is the underlying size of offender populations. This holds true for online property crimes involving the sale, purchase, and use of stolen financial data. Though available data suggests that online frauds are steadily increasing, there are currently no estimates of the scope of this offender population. The current study addresses this issue by using capture–recapture methods to estimate the size of the population participating in stolen data markets over a calendar year. Data analysis involved samples collected from three websites that facilitate financial crimes and frauds. Findings suggest that markets are much larger in size than what can otherwise be observed, are heterogeneous, and that buyers outnumber vendors.
Conference Paper
Twitter is a free broadcast service for the registered members to the public limited to 140 characters that may include text, photos, videos and hyperlinks. People share news, opinions and information to support or against media. The most petrified topic is the ISIS terrorist attacks taking place around the world. ISIS takes advantage of the social media to continuously communicate using coded words or to establish their indirect presence. Hashtags associated with ISIS can be analyzed and capture the sentiment of the tweets. This paper presents a novel process for sentiment analysis on the ISIS related tweets and to organize the opinions with their geolocations. The Jeffrey Breen algorithm is used for sentiment analysis. The data mining algorithms such as Support Vector Machine, Random Forest, Bagging, Decision Trees and Maximum Entropy are applied for polarity based classification of ISIS related Tweets. The results are compared and presented.