ChapterPDF Available

Searching for Extremist Content Online Using The Dark Crawler and Sentiment Analysis



Purpose – This chapter examines how sentiment analysis and web-crawling technology can be used to conduct large-scale data analyses of extremist content online. Methods/approach – The authors describe a customized web-crawler that was developed for the purpose of collecting, classifying, and interpreting extremist content online and on a large scale, followed by an overview of a relatively novel machine learning tool, sentiment analysis, which has sparked the interest of some researchers in the field of terrorism and extremism studies. The authors conclude with a discussion of what they believe is the future applicability of sentiment analysis within the online political violence research domain. Findings – In order to gain a broader understanding of online extremism, or to improve the means by which researchers and practitioners “search for a needle in a haystack,” the authors recommend that social scientists continue to collaborate with computer scientists, combining sentiment analysis software with other classification tools and research methods, as well as validate sentiment analysis programs and adapt sentiment analysis software to new and evolving radical online spaces.
Ryan Scrivens, Tiana Gaudette, Garth Davies and
Richard Frank
Purpose – This chapter examines how sentiment analysis and web-crawling
technology can be used to conduct large-scale data analyses of extremist
content online.
Methods/approach – The authors describe a customized web-crawler that was
developed for the purpose of collecting, classifying, and interpreting extremist
content online and on a large scale, followed by an overview of a relatively novel
machine learning tool, sentiment analysis, which has sparked the interest of
some researchers in the eld of terrorism and extremism studies. The authors
conclude with a discussion of what they believe is the future applicability of
sentiment analysis within the online political violence research domain.
Findings – In order to gain a broader understanding of online extremism,
or to improve the means by which researchers and practitioners “search for a
needle in a haystack,” the authors recommend that social scientists continue
to collaborate with computer scientists, combining sentiment analysis soft-
ware with other classication tools and research methods, as well as validate
sentiment analysis programs and adapt sentiment analysis software to new
and evolving radical online spaces.
Methods of Criminology and Criminal Justice Research
Sociology of Crime, Law and Deviance, Volume 24, 179–194
Copyright © 2019 by Emerald Publishing Limited
All rights of reproduction in any form reserved
ISSN: 1521-6136/doi:10.1108/S1521-613620190000024016
Originality/value – This chapter provides researchers and practitioners who are
faced with new challenges in detecting extremist content online with insights
regarding the applicability of a specic set of machine learning techniques and
research methods to conduct large-scale data analyses in the eld of terrorism
and extremism studies.
Keywords: Sentiment analysis; web-crawler; machine learning;
terrorism; extremism; internet
Violent extremists and those who subscribe to radical beliefs have left their digital
footprints online since the inception of the World Wide Web. Notable examples
include Anders Breivik, the Norwegian far-right terrorist convicted of killing 77
people in 2011, who was a registered member of a white supremacy web forum
(Southern Poverty Law Center, 2014) and had ties to a far-right wing social media
site (Bartlett & Littler, 2011); Dylann Roof, the 21 year old who murdered nine
Black parishioners in Charleston, South Carolina, in 2015, and who allegedly
posted messages on a white power website (Hankes, 2015); and Aaron Driver, the
Canadian suspected of planning a terrorist attack in 2016, who showed explicit
support for the so-called “Islamic State” (IS) on several social media platforms
(Amarasingam, 2016).
It should come as little surprise that, in an increasingly digital world, iden-
tifying signs of extremism online sits at the top of the priority list for counter-
extremist agencies (Cohen, Johansson, Kaati, & Mork, 2014), with the current
focus of government-funded research on the development of advanced informa-
tion technologies and risk assessment tools to identify and counter the threat of
violent extremism on the Internet (Sageman, 2014). Within this context, criminol-
ogists have argued that successfully identifying radical content online (i.e., behav-
iors, patterns, or processes), on a large scale, is the rst step in reacting to it (e.g.,
Bouchard, Joffres, & Frank, 2014; Davies, Bouchard, Wu, Joffres, & Frank, 2015;
Frank, Bouchard, Davies, & Mei, 2015; Mei & Frank, 2015; Williams & Burnap,
2015). Yet in the last 10 years alone, it is estimated that the number of individuals
with access to the Internet has increased threefold (Internet World Stats, 2019),
from over 1 billion users in 2005 to more than 3.8 billion as of 2019 (Internet Live
Stats, 2019). With all of these new users, more information has been generated,
leading to a ood of data.
It is becoming increasingly difcult, nearly impossible really, to manually
search for violent extremists, potentially violent extremists, or even users who
post radical content online because the Internet contains an overwhelming
amount of information. These new conditions have necessitated guided data l-
tering methods, those that can side-step – and perhaps one day even replace – the
laborious manual methods that traditionally have been used to identify relevant
Searching for Extremist Content Online 181
information online (Brynielsson et al., 2013; Cohen et al., 2014). As a result of
this changing landscape, governments around the globe have engaged researchers
to develop advanced information technologies, machine learning algorithms, and
risk assessment tools to identify and counter extremism through the collection
and analysis of large-scale data made available online (see Chen, Mao, Zhang, &
Leung, 2014). Whether this work involves nding radical users of interest (e.g.,
Klausen, Marks, & Zaman, 2018), measuring digital pathways of radicalization
(e.g., Hung, Jayasumana, & Bandara, 2016), or detecting virtual indicators that
may prevent future terrorist attacks (e.g., Johansson, Kaati, & Sahlgren, 2016),
the urgent need to pinpoint extremist content online is one of the most signicant
challenges faced by law enforcement agencies and security ofcials worldwide
(Sageman, 2014).
We have been part of this growing eld of research at the International
CyberCrime Research Centre (ICCRC), situated in Simon Fraser University’s
School of Criminology.1 Our work has ranged from identifying radical users in
online discussion forums (e.g., Scrivens, Davies, & Frank, 2017) to understand-
ing terrorist organizations’ online recruitment efforts on various online platforms
(e.g., Davies et al., 2015), to evaluating linguistic patterns presented in the online
magazines of terrorist groups (e.g., Macnair & Frank, 2018a, 2018b). These expe-
riences have provided us with insights regarding the applicability of a specic
set of machine learning techniques and research methods to conduct large-scale
data analyses of extremist content online.2 In what follows, we will rst describe
a customized web-crawler that was developed at the ICCRC for the purpose of
collecting, classifying, and interpreting extremist content online and on a large
scale. Second, we will provide an overview of a relatively novel machine learning
tool, sentiment analysis, which has sparked the interest of some researchers in
the eld of terrorism and extremism studies who are faced with new challenges in
detecting extremist content online. Third, we conclude with a discussion of what
we believe is the future applicability of sentiment analysis within the online politi-
cal violence research domain.
Before proceeding, however, it is necessary to outline how we conceptualize
extremist content online. We dene it as text-, audio-, and/or video-based online
material containing radical views – counter to mainstream opinion – that may or
may not promote violence in the name of a radical belief. At the ICCRC, we focus
primarily on text-based extremist content that has radical right-wing or jihadi
leanings. For the former, radical right-wing material is characterized by racially,
ethnically and sexually dened nationalism, which is typically framed in terms
of white power and grounded in xenophobic and exclusionary understandings
of the perceived threats posed by such groups as non-whites, Jews, immigrants,
homosexuals, and feminists (see Perry & Scrivens, 2016). For the latter, we dene
jihadi material as supportive of the creation of an expansionist Islamic state or
khalifa, the imposition of sharia law with violent jihad as a central component,
and the use of local, national, and international grievances affecting Muslims (see
Moghadam, 2008).
In recent years, researchers have shown a vested interest in developing web-
crawler tools to collect large volumes of content on the Internet. This interest has
made its way into terrorism and extremism studies in the past 10 plus years (e.g.,
Abbasi & Chen, 2005; Bouchard et al., 2014; Chen, 2012; Fu, Abbasi, & Chen,
2010; Zhang et al., 2010; Zhou, Qin, Lai, Reid, & Chen, 2005). Some research-
ers have used standard, off-the-shelf web-crawler tools that are readily available
online for a fee, while others have developed custom-written computer programs
to collect high volumes of information online. The Dark Crawler (TDC) is one
example of this latter approach.
Web-crawlers, also known as “crawlers,” “data scrapers,” and “data parsers,
are the tools used by all search engines to automatically map and navigate the
Internet as well as collect information about each website and webpage that a
web-crawler visits. There are many off-the-shelf web-crawler solutions available
on the Internet for purchase, such as Win Web Crawler,3 WebSPHINX,4 Black
Widow,5 or BeautifulSoup.6 Once an end-user decides on a website from which
parsing will begin, the crawler recursively follows the links from that webpage
until some user-specied condition is met (discussed below), capturing all content
along the way. During this process, the software tracks all the links between it and
other websites and, if an end-user so chooses, the software will follow and retrieve
those links as well. The content, as it is retrieved, is then saved to the hard-drive of
the user for later analysis. In short, the purpose of most web-crawlers is to simply
save the retrieved content onto a hard-drive, essentially “ripping” a webpage or
website because it contains content desired by the end-user. More advanced ana-
lytic capabilities usually are not part of the package.
Similar in spirit, TDC browses the World Wide Web, but since it is a custom-
written computer program, it is much more exible than the abovementioned
process. In particular, TDC is capable of seeking out extremist content online,
among other types of content, based on user-dened keywords and other
parameters (discussed below). As TDC visits each page, it captures all of the
content on that page for later analysis, while simultaneously collecting informa-
tion about the content and making decisions as to whether the page includes
extremist content. The idea of this approach is based on a combination of the
work associated with the Dark Web project at the University of Arizona (see
Chen, 2012) and a previous project at the ICCRC that identied and explored
online child exploitation websites (e.g., Allsup, Thomas, Monk, Frank, &
Bouchard, 2015; Joffres, Bouchard, Frank, & Westlake, 2011; Frank, Westlake,
& Bouchard, 2010; Monk, Allsup, & Frank, 2015; Westlake & Bouchard, 2015;
Westlake, Bouchard, & Frank, 2011). TDC has since demonstrated its ben-
et in investigating online networks and communities in general (e.g., Frank,
Macdonald, & Monk, 2016; Macdonald & Frank, 2016, 2017; Macdonald,
Frank, Mei, & Monk, 2015; Mikhaylov & Frank, 2016, 2018; Zulkarnine, Frank,
Monk, Mitchell, & Davies, 2016) and extremist content online in particular
Searching for Extremist Content Online 183
(e.g., Bouchard et al., 2014; Davies et al., 2015; Frank et al., 2015; Levey,
Bouchard, Hashimi, Monk, & Frank, 2016; Mei & Frank, 2015; Scrivens et al.,
2017; Scrivens, Davies, & Frank, 2018; Scrivens & Frank, 2016; Wong, Frank,
& Allsup, 2015).
TDC is a system that can be distributed across multiple virtual machines,
depending on the number of machines that are available. The rst of four steps,
as expressed in Fig. 1, is to dene a task along with its parameters. TDC can
handle multiple tasks simultaneously, each of which is given a priority, speci-
ed by the end-user. The priority of each task is determined by the number of
machines allocated to it. For example, if tasks I, II, and III are given priority 50,
80, and 70, respectively, then task I will receive 25% of the available resources
(50/(50+80+70) = 0.25 = 25%). If more machines are added to TDC, then the
absolute number of resources available for each task will increase but the relative
amount of resources available for each task will remain the same.
Each task consists of four parameters to prevent it from perpetually crawling
the Internet and wandering into websites and webpages unrelated to extremism.
The parameters are as follows:
Fig. 1. Overview of The Dark Crawler.
Number of Webpages
For practical purposes, since the number of webpages on the Internet is innite,
restrictions must be placed on the number of webpages that are retrieved by the
web-crawler. Theoretically, any web-crawler could crawl for a very long time and
store the entire collection of webpages on the Internet. For our purposes at the
ICCRC, however, this is infeasible for several reasons. First, the amount of stor-
age that would be required to warehouse the extracted data is beyond the scope of
any sensible research project. Second, webpages are created at a much higher rate
than what can be extracted with TDC. Lastly, a copy of the “full Internet” is not
required to draw meaningful conclusions about a particular topic under investi-
gation; extracting large scale, representative samples, is more than adequate.
Number of Domains
The number of Internet domains that TDC will collect data on must be speci-
ed. When limiting a crawl to n pages, the crawler will attempt to distribute the
sampling equally across all websites that it has encountered, meaning that TDC
will sample similar numbers of pages from each of the sites it visits. As a result,
at the end of the task, if w websites are sampled, all sites will have approximately
the same number of pages retrieved (=n/w) and analyzed.
Trusted Domains
A set of trusted domains are then specied by the end-user, which tells the crawler
that all contents on those domains should not be extracted. As an example, it can
be assumed, with a high level of certainty, that a website such as
com does not include extremist material. Having said that, TDC is trained to
assume that the website does not contain extremist content, and as such, it would
not retrieve any pages from that site. Without having this mechanism in place,
TDC could wander into a search engine, directing it completely off topic and
making the resulting extracted network irrelevant to the specied topic and task.
The purpose of TDC is to nd, analyze, and map out the websites and web-
pages that include extremist content. To achieve this, TDC recursively retrieves
all webpages that are linked from the webpage it is currently reviewing. However,
since extremist webpages consist of a very small subset of all the webpages on
the Internet (Frank et al., 2015), it would be expected that, unconstrained, TDC
would very quickly start to retrieve webpages that are completely unrelated to
extremism. As a result, some mechanism must be built into TDC that controls
which webpages it uses in its exploration process. This is done through the use
of keywords, which are user-specied words that have been found to be indica-
tive of extremist content (e.g., Bouchard et al., 2014; Davies et al., 2015; Wong et
al., 2015) and thus indicate to TDC that the pages being retrieved are on-topic.
Within the extremist domain, such keywords could include gun, weapon, or terror.
To make this mechanism more robust, though, a word counter can be included
Searching for Extremist Content Online 185
to TDC, which indicates a minimum threshold on the number of keywords that
must exist on a page before the TDC considers it on topic.
Once a task has been decided, each webpage is downloaded by TDC (Fig. 1–
Step 2). If the downloaded page meets the parameters laid out above, then the
page is considered “on topic,” all page content is saved and all page links are fol-
lowed out of it recursively (Step 3). The webpage contents are then stored in the
database, and various reports and analyses can be performed (Step 4).
The use of keywords presents a useful rst step in identifying large-scale patterns
in extremist content online (e.g., Chalothorn & Ellman, 2012; Bouchard et al.,
2014; Davies et al., 2015; Wong et al., 2015). However, the use of single keywords
may lead to misleading interpretations of content (Mei & Frank, 2015; Scrivens
& Frank, 2016). If, for example, on a particular webpage, the words gun and con-
trol are found within close proximity of each other, it might be concluded that the
page is discussing gun control. This would most likely not indicate an extremist
page but more likely that the page was written by a proponent or opponent of
gun ownership. This page, of course, is not relevant to TDC’s data collection. On
the other hand, a page containing the words gun and control could be discussing
“controlling someone with a gun” within the context of kidnapping for example
which, in this case, TDC should continue with its analysis. In other words, key-
words can give an indication of the content within a webpage but cannot be used
to determine exactly what that content is about. For a more complete understand-
ing of the content of a specic piece of text, more powerful computational tools,
such as sentiment analysis, are required.
Sentiment analysis, also known as “opinion mining,” is a category of comput-
ing science that specializes in evaluating the opinions found in a piece of text by
organizing data into distinct classes and sections, and then assigning a piece of
text with a positive, negative, or neutral polarity value (Abbasi & Chen, 2005).
Sentiment analysis also provides a more targeted view of textual data by allowing
for the demarcation between cases that are sought after and those without any
notable relevance. Sentiment analysis has been used in a wide variety of settings,
including customer review analysis for products (e.g., Feldman, 2013), assess-
ments of attitudes toward events or products on social media platforms (e.g.,
Ghiassi, Skinner, & Zimbra, 2013), and for various analyses of extremist content
online (e.g., Bermingham, Conway, McInerney, O’Hare, & Smeaton, 2009, Chen,
2008; Williams & Burnap, 2015). Sentiment analysis has become increasingly
popular in terrorism and extremism studies because, as the amount of “opinion-
ated data” online grows exponentially, sentiment analysis software offers a wide
range of applications that can help address previously untapped and challenging
research problems (see Liu, 2012). Based on the notion that an author’s opinion
toward a particular topic is reected in the choice and intensity of words he or
she chooses to communicate, sentiment analysis software allows for identication
and classication of opinions found in a piece of text (Abbasi, Chen, & Salem,
2008). Typically, this process occurs through a two-step process that produces a
“polarity value”:
(1) a body of text is split into sections (sentences) to determine subjective and
objective content and
(2) subjective content is classied by the software as being either positive, neu-
tral, or negative, where positive scores reect positive attitudes and negative
scores reect negative attitudes (see Feldman, 2013).
Worth highlighting, though, is the fact that sentiment analysis is not without
its limitations. It is estimated, for example, that 21% of the time humans cannot
agree among themselves about the sentiment within a given piece of text (Ogneva,
2010), with some individuals unable to understand subtle context or irony.
Understandably, sentiment analysis systems cannot be expected to have 100%
accuracy when compared to the opinions of humans. Sentiment analysis does,
however, provide insight into authors’ expressions and reactions toward certain
events or actions, for example, and one sentiment analysis program that has been
widely used by criminologists in terrorism and extremism studies is SentiStrength
(e.g., Frank et al., 2015; Levey et al., 2016; Macnair & Frank, 2018a, 2018b; Mei &
Frank, 2015; Scrivens et al., 2017, 2018; Scrivens & Frank, 2016).
SentiStrength uses a “lexical approach” – which maintains that an essential
part of understanding a language rests on the ability to understand the patterns
of and meanings associated with language (see Lewis, 1993) – and is based on
human coded data as well as various lexicons (i.e., dictionaries of phases and
words). Although the program is designed for short informal online texts, there
are features in place that allow longer texts to be analyzed (Thelwall & Buckley,
2013). SentiStrength analyzes a text by attributing positive, neural, or negative
values to words within the text, and these values are augmented by “booster
words” that can inuence the values assigned to the text as well as negating words,
punctuation, and other features that are uniquely suited for studying an online
context (Thelwall & Buckley, 2013). One of the features of SentiStrength is its
ability to analyze the sentiment around any given keyword. For example, the
phrase “I love apples but hate oranges” can be analyzed for the sentiment around
apples (resulting in a positive outcome) as well as oranges (resulting in a negative
outcome). The words around a given set of keywords are compared to the senti-
ment dictionary and their resulting values make up the total sentiment score for
any given text. Logically, a negative value implies negative sentiment for the ideas
expressed in the text while a positive value implies overall support.
To analyze a given piece of text for multiple keywords, multiple iterations must
be done, with each iteration consisting of an analysis of the same piece of text but
with a different keyword (resulting in the sentiment toward that keyword). Due
to the very specic nature of this procedure, it is necessary to input each form
of a particular word being analyzed. For example, to analyze sentiment around
the word kill, it is necessary to also analyze the words kills, killing, killed, and
all other derivatives of the word. Multiple iterations of SentiStrength are then
Searching for Extremist Content Online 187
applied, each one returning the detected sentiment around the specic keyword
and its derivatives. At the end of this process, each text is linked to multiple senti-
ment values, one for each keyword. Those multiple values are then averaged to
produce a single sentiment score for any given piece of text.
There’s been a shift in recent years in how researchers investigate online com-
munities, ranging from the study of how extremists communicate through social
media (e.g., Bermingham et al., 2009) to the analysis of users connecting through
online health forums (e.g., Wang, Kraut, & Levine, 2012), for example. In short,
researchers in this area are shifting from manual identication of specic online
content to algorithmic techniques to do similar yet larger-scale tasks. The use of
such analytical approaches is becoming increasingly apparent in criminology and
criminal justice research (see Hannah-Moffat, 2018). This is a symptom of what
some have described as the “big data” phenomenon – that is, a massive increase
in the amount of data that is readily available, particularly online (see Chen et
al., 2014).
Logically, a number of researchers who study how terrorists and extremists use
the Internet have turned to sentiment analysis and other machine learning tech-
niques to identify and, by extension, analyze content of interest on a large scale.
In what follows, we will discuss what we believe is the future of the applicability of
sentiment analysis to explore extremist content online, drawing from the recom-
mendations of the below-listed studies in combination with our own experience
with sentiment analysis. In short, we suggest that the future of sentiment analysis
in exploring extremist content online should: (1) encourage social scientists and
computer scientists to collaborate with one another; (2) consider a combination
of analyses or more features to increase classier’s effectiveness; (3) continue to
validate sentiment analysis programs; and (4) apply and adapt sentiment analysis
to new and evolving radical online spaces.
In order to gain a broader understanding of online extremism, or to improve the
means by which researchers and practitioners “search for a needle in a haystack,”
social scientists and computer scientists must collaborate with one another.
Historically, large-scale data analyses have been conducted by computer scien-
tists and technical experts, which can be problematic in complex elds such as
terrorism and extremism research. Computer and technical experts tend to take a
high-level methodological perspective, measuring levels of – or propensity toward –
radicalization, or ways of identifying violent extremists, or predicting the next
terrorist attack. But searching for radical material online without a fundamen-
tal understanding of the radicalization process or how terrorists and extremists
use the Internet can be counterproductive. Social scientists, on the other hand,
may be well-versed in terrorism and extremism research, but most tend to be
ill-equipped to manage large-scale data – from collecting to formatting to archiv-
ing large volumes of information. Bridging the computer science and social
science approaches to build on the strengths of each discipline offers perhaps
the best chance to construct a useful framework for detecting extremist content
online, as well as assisting authorities in addressing the threat of violent extrem-
ism as it evolves in the online milieu.
A myriad of research shows that combining sentiment analysis with other meth-
ods and/or semantic-oriented approaches improves the detection of extremist
content online, on three fronts. This, we argue, is the future in detecting extremist
content online. First, research suggests that sentiment analysis software, in com-
bination with classication software, is an effective method to detect extremist
content online and on a large scale. In particular, combing sentiment analysis
with classication software – which identies similarities and differences in data-
sets and makes determinations about the data in a decision tree format – can
pinpoint extremist websites (see Mei & Frank, 2015; see also Scrivens & Frank,
2016). In addition, combining sentiment analysis with affect analysis –a machine
learning technique that measures the emotional content of communications – can
detect and measure the intensity levels associated with a broad range of emotions
in text found in online forums (see Chen, 2008; see also Figea, Kaati, & Scrivens,
2016). Research similarly suggests that the effectiveness of the sentiment analysis
in detecting extremist content can be signicantly boosted with additional clas-
sication feature sets, such as syntactic, stylistic, content-specic, and lexicon fea-
tures (Abbasi & Chen, 2005; Abbasi et al., 2008; Yang et al., 2011).
Second, research suggests that combining sentiment analysis with commonly
used research methods and frameworks in criminology and criminal justice
research can aid in the detection of extremist content online. Social network anal-
ysis, for example, in combination with sentiment analysis can be used to identify
users on YouTube who may have had a radicalizing agenda (see Bermingham et
al., 2009), detect extremist content on Twitter (Wei, Sing, & Martin, 2016), and
model online propaganda (see Burnap et al., 2014) and cyberhate (see Williams &
Burnap, 2015) on Twitter following the Woolwich terrorism incident. In addition,
combining sentiment analysis with geolocation software can be used to organ-
ize the opinions found on Twitter accounts using hashtags associated with IS
(Mirani & Sasi, 2016). Lastly, sentiment analysis, in combination with an algo-
rithm that incorporates criminal career measures (i.e., volume, severity, and dura-
tion) developed by Blumstein, Cohen, Roth, and Visher (1986), has been used to
account for unique components of a users’ online posting behavior and detect
the most radical authors in a large scale sample of online postings (see Scrivens
et al., 2017).
Third, in online political violence research, a growing emphasis has been placed
on the integration of a temporal component with sentiment analysis – a combi-
nation that we believe is key to providing insight into the patterns and trends
that characterize the development of extremist communities over time online.
Searching for Extremist Content Online 189
For example, combining sentiment analysis with survival analysis has been a use-
ful way to measure propaganda surges (see Burnap et al., 2014) and levels of
cyberhate (Williams & Burnap, 2015) on Twitter over time after a terrorist attack.
Other approaches that have proven to be effective in understanding temporal pat-
terns in radical online communities include: (1) mapping users’ sentiment and
affect changes in extremist forums (see Figea et al., 2016); (2) identifying signi-
cant temporal spikes in extremist forums that coincide with real-world events (see
Park, Beck, Fletche, Lam, & Tsang, 2016); and measuring the evolution of senti-
ment found in IS-produced online propaganda magazines (see Macnair & Frank,
2018b; see also Vergani & Bluic, 2015). Most recently, combining sentiment
analysis with semi-parametric group-based modeling to measure the evolution of
radical posting trajectories has shown to be an effective way to detect large-scale
temporal patterns in radical online communities (see Scrivens et al., 2018).
Another common thread that binds together the aforementioned studies is the
need to assess and potentially improve the classication accuracy and content
identication offered by sentiment analysis software. Researchers, for example,
have proposed that future work includes a “comparative human evaluation”
component to validate a sentiment program’s classications (e.g., Chalothorn &
Ellman, 2012; Figea et al., 2016). Macnair and Frank (2018a) further added that:
computer-assisted techniques such as sentiment analysis, in conjunction with human oversight,
can aid in the overall process of locating, identifying, and eventually, countering the narratives
that exist within extremist media. (p. 452)
This technique would have humans rate opinions in sentences and compare the
results to a sentiment analysis program. By comparing how a human might clas-
sify a piece of text to a sentiment analysis program, researchers can gain insight
into the accuracy of sentiment analysis’ classications. Future studies should also
integrate a qualitative understanding of how machine learning tools in general
and sentiment analysis software in particular make decisions about the content
that the tools analyze. Doing so may increase the reliability of the results and
increase the likelihood of identifying radical content online (Scrivens & Frank,
Also, it is not yet clear which sentiment analysis program is the most accu-
rate or effective overall in detecting extremist content online. Some research
does, however, draw comparisons between the performance of several sentiment
methods (i.e., SentiWordNet, SASA, PANAS-t, Emoticons, SentiStrength, LIWC,
SenticNet, and Happiness Index) (see Gonçalves, Benevenuto, Araújo, & Cha,
2013), but comparisons of this sort have yet to be explored within the online
political violence domain. One notable exception is the exploration of the linguis-
tic patterns on Twitter following the Manchester attacks and the Las Vegas shoot-
ing terrorist attacks (see Kostakos, Nykänen, Martinviita, Pandya, & Oussalah,
2018). Comparing the results of NLTK+SentiWordNet and SentiStrength soft-
ware, the authors concluded that SentiStrength performed at a higher level than
the other applications. Building from that study, future work should continue to
explore and test the wide variety of programs currently available to determine if
there is indeed one ‘superior’ method, or if the appropriate methodology is con-
text-specic. A combination of sentiment analysis tools could also be integrated
into an analysis, in an attempt to cross-validate each other (Scrivens et al., 2017).
The ways in which violent extremists communicate online will continue to
evolve, shifting from uses on traditional discussion forums and social media
platforms to lesser known spaces on the Internet. For example, in addition to
the use of dedicated extreme right forums and all major social media platforms,
a diversity of more general online forums or forum-like online spaces are also
hosting increasing amounts of extreme right content. These include the popu-
lar social news aggregation, web content rating, and discussion site Reddit and
image-based bulletin board and comment site 4chan (Scrivens & Conway, in
press). Sites such as these, which contrary to most mainstream social media
platforms, do not have clear-cut anti-hate speech policies (see Gaudette, Davies,
& Scrivens, 2018), may provide unique insight into the expressions and man-
ifestations of virtual hate, especially on a large scale using sentiment analy-
sis tool outlined in this chapter. In addition, a new generation of right-wing
extremists are moving to more overtly hateful, yet to some extent more hidden
platforms, including the likes of 8chan, Voat, Gab, and Discord (see Davey &
Ebner, 2017). Similarly, between 2013 and 2016, IS’s media production out-
lets developed content that was largely distributed via major (and some minor)
social media and other online platforms. These included prominent IS presences
not only on Facebook, Twitter, and YouTube, but also on,,
and the Internet Archive (Scrivens & Conway, in press). Having said that, as the
ways in which extremists communicate online will undoubtedly evolve, so too
must the ways in which researchers detect extremist content online. Sentiment
analysis software, in combination with other means of analyzing data, should
be applied to these increasingly popular platforms.
Since the advent of the Internet, violent extremists and those who subscribe
to radical views from across the globe have exploited online resources to build
transnational “virtual communities.” The Internet is a fundamental medium that
facilitates these radical communities, not only in “traditional” hate sites, web
forums, and commonly used social media sites, but in lesser known, oftentimes
more hidden spaces online as well (Scrivens & Conway, in press). Researchers and
practitioners have attempted to identify and monitor extremist content online but
increasingly have been overwhelmed by the sheer volume of data in these growing
spaces. Simply put, the manual analysis of online content has become increas-
ingly less feasible.
Searching for Extremist Content Online 191
As a result, researchers and practitioners have sought to develop different
methods of extracting data, especially through the use of web-crawlers, as well
develop different methods for managing this large-scale phenomenon to sift
through and detect extremist content. A relatively novel machine learning tool,
sentiment analysis, has sparked the interest of some researchers in the eld of
terrorism and extremism studies who face new challenges in detecting the spread
of extremist content online. Though this area of research is in its infancy, senti-
ment analysis is showing signs of success and may represent the future of how
researchers and practitioners study extremism online – particularly on a large
scale. This, however, will require that the social scientists continue to collabo-
rate with the computer scientists, combining sentiment analysis software with
other classication tools and research methods, validating sentiment analysis
programs, and adapting sentiment analysis software to new and evolving radical
online spaces.
1. For more information, see
2. An array of machine learning techniques is used in the online political violence
research terrain that are not discussed in detail in this chapter. For a list of some of these
techniques, see Chen (2012).
3. For more information, see
4. For more information, see
5. For more information, see
6. For more information, see
Abbasi, A., & Chen, H. (2005). Applying authorship analysis to extremist-group web forum messages.
Intelligent Systems, 20(5), 67–75.
Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature selec-
tion for opinion classication in web forums. ACM Transactions on Information Systems, 26(3),
Allsup, R., Thomas, E., Monk, B., Frank, R., & Bouchard, M. (2015). Networking in child exploi-
tation – Assessing disruption strategies using registrant information. In Proceedings of the
2015 IEEE/ACM international conference on advances in social networks analysis and mining
(ASONAM), Paris, France (pp. 400–407).
Amarasingam, A. (2016). What Aaron told me: An expert on extremism shares his conversations with
the terror suspect. National Post, August 11. Retrieved from
suspect. Accessed on January 23, 2019.
Bartlett, J., & Littler, M. (2011). Insight the EDL: Populist politics in the digital era. London: Demos.
Bermingham, A., Conway, M., McInerney, L., O’Hare, N., & Smeaton, A. F. (2009). Combining social
network analysis and sentiment analysis to explore the potential for online radicalisation. In
Proceedings of the 2009 international conference on advances in social network analysis mining
(ASONAM), Athens, Greece (pp. 231–236).
Blumstein, A., Cohen, J., Roth, J. A., & Visher, C. A. (1986). Criminal careers and ‘career criminals.’
Washington, DC: National Academy Press.
Bouchard, M., Joffres, K., & Frank, R. (2014). Preliminary analytical considerations in designing
a terrorism and extremism online network extractor. In V. Mago & V. Dabbaghian (Eds.),
Computational models of complex systems (pp. 171–184). New York, NY: Springer.
Brynielsson, J., Horndahl, A., Johansson, F., Kaati, L., Martenson, C., & Svenson, P. (2013). Analysis
of weak signals for detecting lone wolf terrorists. Security Informatics, 2(11), 1–15.
Burnap, P., Williams, M. L., Sloan, L., Rana, O., Housley, W., Edwards, A., …, Voss, A. (2014).
Tweeting the terror: Modelling the social media reaction to the Woolwich terrorist attack.
Social Network Analysis and Mining, 4, 1–14.
Chalothorn, T., & Ellman, J. (2012). Using SentiWordNet and sentiment analysis for detecting radical
content on web forums. In Proceedings of the 6th conference on software, knowledge, information
management and application (SKIMA), Chengdu, China (pp. 9–11).
Chen, H. (2008). Sentiment and affect analysis of dark web forums: Measuring radicalization on the
Internet. In Proceedings of the 2008 IEEE international conference on intelligence and security
informatics (ISI), Taipei, Taiwan (pp. 104–109).
Chen, H. (2012). Dark web: Exploring and data mining the dark side of the web. New York, NY: Springer.
Chen, M., Mao, S., Zhang, Y., & Leung, V. C. (2014). Big data: Related technologies, challenges and
future prospects. New York, NY: Springer.
Cohen, K., Johansson, F., Kaati, L., & Mork, J. (2014). Detecting linguistic markers for radical vio-
lence in social media. Terrorism and Political Violence, 26(1), 246–256.
Davey, J., & Ebner, J. (2017). The fringe insurgency: Connectivity, convergence and mainstreaming of the
extreme right. London: Institute for Strategic Dialogue.
Davies, G., Bouchard, M., Wu, E., Joffres, K., & Frank, R. (2015). Terrorist and extremist organiza-
tions’ use of the Internet for recruitment. In M. Bouchard (Ed.), Social networks, terrorism and
counter-terrorism: Radical and connected (pp. 105–127). New York, NY: Routledge.
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM,
56(4), 82–89.
Figea, L., Kaati, L., & Scrivens, R. (2016). Measuring online affects in a white supremacy forum. In
Proceedings of the 2016 IEEE international conference on intelligence and security informatics
(ISI), Tucson, Arizona, USA (pp. 85–90).
Frank, R., Bouchard, M., Davies, G., & Mei, J. (2015). Spreading the message digitally: A look into
extremist content on the Internet. In R. G. Smith, R. C.-C. Cheung, & L. Y.-C. Lau (Eds.),
Cybercrime risks and responses: Eastern and western perspectives (pp. 130–145). London:
Palgrave Macmillan.
Frank, R., Macdonald, M., & Monk, B. (2016). Location, location, location: Mapping potential
Canadian targets in online hacker discussion forums. In Proceedings of the 2016 European intel-
ligence and security informatics conference (EISIC), Uppsala, Sweden (pp. 16–23).
Frank, R., Westlake, B. G., & Bouchard, M. (2010). The structure and content of online child exploita-
tion networks. In Proceedings of the 10th ACM SIGKDD workshop on intelligence and security
informatics (ISI-KDD), Washington, DC, USA, Article 3.
Fu, T., Abbasi, A., & Chen, H. (2010). A focused crawler for dark web forums. Journal of American
Society for Information Science and Technology, 61(6), 1213–1231.
Gaudette, T., Davies, G., & Scrivens, R. (2018). Upvoting extremism, part I: An assessment of extreme
right discourse on Reddit. VOX-Pol Network of Excellence Blog. Retrieved from https://www.
Accessed on January 23, 2019.
Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using
n-gram analysis and dynamic articial neutral network. Expert Systems with Applications,
40(16), 6266–6282.
Gonçalves, P., Benevenuto, M., Araújo, F., & Cha, M. (2013). Comparing and combining sentiment
analysis methods. In Proceedings of the 1st ACM conference on online social networks, Boston,
MA, USA (pp. 27–38).
Hankes, K. (2015). Dylann Roof may have been a regular commenter at neo-Nazi website The
Daily Stormer. Southern Poverty Law Center. Retrieved from
daily-stormer. Accessed on January 23, 2019.
Hannah-Moffat, K. (2018). Algorithmic risk governance: big data analytics, race and information
activism in criminal justice debates. Theoretical Criminology.
Searching for Extremist Content Online 193
Hung, B. W. K., Jayasumana, A. P., & Bandara, V. W. (2016). Detecting radicalization trajectories using
graph pattern matching algorithms. In Proceedings of the 2016 IEEE international conference on
intelligence and security informatics (ISI), Tucson, Arizona, USA (pp. 313–315).
Internet Live Stats. (2019). Total number of websites. Retrieved from
total-number-of-websites. Accessed on January 23, 2019.
Internet World Stats. (2019). Internet growth statistics. Retrieved from http://www.internetworldstats.
com/emarketing.htm. Accessed on January 23, 2019.
Joffres, K., Bouchard, M., Frank, R., & Westlake, B. G. (2011). Strategies to disrupt online child por-
nography networks. In Proceedings of the European intelligence and security informatics confer-
ence (EISIC), Athens, Greece (pp. 163–170).
Johansson, J., Kaati, L., & Sahlgren, M. (2016). Detecting linguistic markers of violent extremism
in online environments. In M. Khader, L. S. Neo, G. Ong, E. T. Mingyi, & J. Chin (Eds.),
Combating violent extremism and radicalization in the digital era (pp. 374–390). Hershey, PA:
Information Science Reference.
Klausen, J., Marks, C. E., & Zaman, T. (2018). Finding extremists in online social networks. Operations
Research, 66(4), 957–976.
Kostakos, P., Nykänen, M., Martinviita, M., Pandya, A., & Oussalah, M. (2018). Meta-terrorism: iden-
tifying linguistic patterns in public discourse after an attack. In Proceedings of the 2018 IEEE/
ACM international conference on advances in social networks analysis and mining (ASONAM),
Barcelona, Spain (pp. 1079–1083).
Levey, P., Bouchard, M., Hashimi, S., Monk, B., & Frank, R. (2016). The emergence of violent narra-
tives in the life-course trajectories of online forum participants. Canadian Network for Research
on Terrorism, Security and Society Report, Waterloo, ON, Canada.
Lewis, M. (1993). The lexical approach: The state of ELT and the way forward. Hove: Language
Teaching Publications.
Liu, B. (2012). Sentiment analysis and opinion mining. San Rafael, CA: Morgan and Claypool.
Macdonald, M., & Frank, R. (2016). The network structure of malware development, deployment and
distribution. Global Crime, 18(1), 49–69.
Macdonald, M., & Frank, R. (2017). Shufe up and deal: Use of a capture–recapture method to esti-
mate the size of stolen data markets. American Behavioral Scientist, 61(11), 1313–1340.
Macdonald, M., Frank, R., Mei, J., & Monk, B. (2015). Identifying digital threats in a hacker web
forum. In Proceedings of the 2015 international symposium on foundations of open source intel-
ligence and security informatics (FOSINT), Paris, France (pp. 926–933).
Macnair, L., & Frank, R. (2018a). The mediums and the messages: Exploring the language of Islamic
State media through sentiment analysis. Critical Studies on Terrorism, 11(3), 438–457.
Macnair, L., & Frank, R. (2018b). Changes and stabilities in the language of Islamic State magazines:
A sentiment analysis. Dynamics of Asymmetric Conict, 11(2), 109–120.
Mei, J., & Frank, R. (2015). Sentiment crawling: Extremist content collection through a sentiment
analysis guided web-crawler. In Proceedings of the international symposium on foundations of
open source intelligence and security informatics (FOSINT), Paris, France (pp. 1024–1027).
Mikhaylov, A., & Frank, R. (2016). Cards, money and two hacking forums: An analysis of online
money laundering schemes. In Proceedings of the 2016 European intelligence and security infor-
matics conference (EISIC), Uppsala, Sweden (pp. 80–83).
Mikhaylov, A., & Frank, R. (2018). Illicit payments for illicit goods: Noncontact drug distribution on
Russian online drug marketplaces. Global Crime, 19(2), 146–170.
Mirani, T. B., & Sasi, S. (2016). Sentiment analysis of ISIS related tweets using absolute location. In
Proceedings of the 2016 international conference on computational science and computational
intelligence (CSCI), Las Vegas, NV, USA (pp. 1140–1145).
Monk, B., Allsup, R., & Frank, R. (2015). LECENing places to hide: Geo-mapping child exploitation
material. In Proceedings of the 2015 IEEE intelligence and security informatics (ISI), Baltimore,
MD, USA (pp. 73–78).
Moghadam, A. (2008). The Sala-jihad as a religious ideology. CTC Sentinel, 1(3), 14–16.
Ogneva, M. (2010). How companies can use sentiment analysis to improve their business. Retrieved
from Accessed on January 23, 2019.
Park, A. J., Beck, B., Fletche, D., Lam, P., & Tsang, H. H. (2016). Temporal analysis of radical dark
web forum users. In Proceedings of the 2016 IEEE/ACM international conference on advances
in social networks analysis and mining (ASONAM), San Francisco, CA, USA (pp. 880–883).
Perry, B., & Scrivens, R. (2016). Uneasy alliances: A look at the right-wing extremist movement in
Canada. Studies in Conict and Terrorism, 39(9), 819–841.
Sageman, M. (2014). The stagnation in terrorism research. Terrorism and Political Violence, 26(4),
Scrivens, R., & Conway, M. (in press). The roles of ‘old’ and ‘new’ media tools and technologies in
the facilitation of violent extremism and terrorism. In R. Leukfeldt & T. J. Holt (Eds.),
Cybercrime: The Human Factor. New York, NY: Routledge.
Scrivens, R., Davies, G., & Frank, R. (2017). Searching for signs of extremism on the web: An introduc-
tion to sentiment-based identication of radical authors. Behavioral Sciences of Terrorism and
Political Aggression, 10(1), 39–59.
Scrivens, R., Davies, G., & Frank, R. (2018). Measuring the evolution of radical right-wing posting
behaviors online. Deviant Behavior.
Scrivens, R., & Frank, R. (2016). Sentiment-based classication of radical text on the web. In
Proceedings of the 2016 European intelligence and security informatics conference (EISIC),
Uppsala, Sweden (pp. 104–107).
Southern Poverty Law Center. (2014). White homicide worldwide. Retrieved from https://www.spl- Accessed on January 23, 2019.
Thelwall, M., & Buckley, K. (2013). Topic-based sentiment analysis for the social web: The role of
mood and issue-related words. Journal of the American Society for Information Science and
Technology, 64(8), 1608–1617.
Vergani, M., & Bluic, A. M. (2015). The evolution of the ISIS’ language: A quantitative analysis of the
language of the rst year of Dabiq magazine. Security, Terrorism, and Society, 2, 7–20.
Wang, Y-C., Kraut, R., & Levine, J. M. (2012). To stay or leave? The relationship of emotional and
informational support to commitment in online health support groups. In Proceedings of the
ACM 2012 conference on computer supported cooperative work, Seattle, WA, USA (pp. 833–842).
Wei, Y., Singh, L., & Martin, S. (2016). Identication of extremism on Twitter. In Proceedings of the
2016 IEEE/ACM international conference on advances in social networks analysis and mining
(ASONAM), San Francisco, CA, USA (pp. 1251–1255).
Westlake, B. G., & Bouchard, M. (2015). Criminal careers in cyberspace: Examining website failure
within child exploitation networks. Justice Quarterly, 33(7), 1154–1181.
Westlake, B. G., Bouchard, M., & Frank, R. (2011). Finding the key players in online child exploitation
networks. Policy and Internet, 3(2), 1–25.
Williams, M. L., & Burnap, P. (2015). Cyberhate on social media in the aftermath of Woolwich: A
case study in computational criminology and big data. British Journal of Criminology, 56(2),
Wong, M., Frank, R., & Allsup, R. (2015). The supremacy of online white supremacists – An analysis
of online discussions of white supremacists. Information and Communications Technology Law,
24(1), 41–73.
Yang, M., Kiang, M., Ku, Y., Chiu, C., & Li, Y. (2011). Social media analytics for radical opinion
mining in hate group web forums. Journal of Homeland Security and Emergency Management,
8(1), 1547–7355.
Zhang, Y., Zeng, S., Huang, C.-N., Fan, L., Yu, X., Dang, Y., …, Chen, H. (2010). Developing a dark
web collection and infrastructure for computational and social sciences. In Proceedings of the
2010 IEEE international conference on intelligence and security informatics (ISI), Atlanta, GA,
USA (pp. 59–64).
Zhou, Y., Qin, J., Lai, G., Reid, E., & Chen, H. (2005). Building knowledge management system for
researching terrorist groups on the web. In Proceedings of the 11th Americas conference on infor-
mation systems (AMCIS), Omaha, NE, USA (pp. 2524–2536).
Zulkarnine, A., Frank, R., Monk, B., Mitchell, J., & Davies, G. (2016). Surfacing collaborated networks
in dark web to nd illicit and criminal content. In Proceedings of the 2016 IEEE international
conference on intelligence and security informatics (ISI), Tucson, AZ, USA (pp. 109–114).
... The first, including all years aside from 2011 to 2013, was collected by the Southern Poverty Law Center (SPLC) and provided to the authors. The second, covering 2011-2013 was collected by other scholars (Scrivens et al., 2019) and provided to the authors. The nature of the Stormfront forum enables complete data collection of each forum, both sources utilized web-crawlers in the forum collecting all the open-access forums and sub-forums, thus ensuring the data utilized in the study is all of the forums in the specified time period (Scrivens, 2021;Scrivens et al., 2019Scrivens et al., , 2020Scrivens et al., , 2021. ...
... The second, covering 2011-2013 was collected by other scholars (Scrivens et al., 2019) and provided to the authors. The nature of the Stormfront forum enables complete data collection of each forum, both sources utilized web-crawlers in the forum collecting all the open-access forums and sub-forums, thus ensuring the data utilized in the study is all of the forums in the specified time period (Scrivens, 2021;Scrivens et al., 2019Scrivens et al., , 2020Scrivens et al., , 2021. After screening for posts including the term "vaccin*," we retained a corpus of 8892 posts for analysis. ...
Introduction Research has indicated a growing resistance to vaccines among U.S. conservatives and Republicans. Following past successes of the far-right in mainstreaming health misinformation, this study tracks almost two decades of vaccine discourse on the extremist, white nationalist (WN) online message-board Stormfront. We examine the argumentative repertoire around vaccines on the forum, and whether it assimilated to or challenged common arguments for and against vaccines, or extended it in ways unique to the racist WN agenda. Methods We use a mixed-methods approach, combining unsupervised machine learning of 8892 posts including the term “vaccin*“, published on Stormfront between 2001 and 2017. We supplemented the computational analysis with a manual coding of randomly sampled 500 posts, evaluating the prevalence of pro- and anti-vaccine sentiment, previously identified pro- and anti-vaccine arguments, and WN-specific arguments. Results Discourse was dynamic, increasing around specific events, such as outbreaks and following legal debates about vaccine mandates. We identified four themes: conspiracies, science, race and white innovation. The prominence of themes over time was relatively stable. Our manual coding identified levels of anti-vaccine sentiment that were much higher than found in the past on mainstream social media. Most anti-vaccine posts relied on common anti-vaccine tropes and not on WN conspiracy theories. Pro-vaccination posts, however, were supported by unique race-based arguments. Conclusion We find a high volume of anti-vaccine sentiment among WN on Stormfront, but also identify unique pro-vaccine arguments that echo the group's racist ideology. Public health implication As with past health-related conspiracy theories, high levels of anti-vaccine sentiment in online far-right sociotechnical information systems could threaten public health, especially if it ‘spills-over’ to mainstream media. Many pro-vaccine arguments on the forum relied on racist, WN reasoning, thus preventing the authors from recommending the use of these unethical arguments in future public health communications.
... 1686) and many sites within the manosphere are rife with image-based messages in the forms of memes and gifs. As online content related to the topic of misogynistic extremism proliferates, adopting Natural Language Processing techniques, along with the use of hate speech detection tools (e.g., HateXplain) and customized web crawlers may be beneficial (Chen et al., 2022;Mathew et al., 2021;Scrivens et al., 2019). ...
Full-text available
In recent years, the concept of "misogynistic extremism" has emerged as a subject of interest among scholars, governments, law enforcement personnel, and the media. Yet a consistent understanding of how misogynistic extremism is defined and conceptualized has not yet emerged. Varying epistemological orientations may contribute to the current conceptual muddle of this topic, reflecting long-standing and on-going challenges with the conceptualization of its individual components. To address the potential impact of misogynistic extremism (i.e., violent attacks), a more precise understanding of what this phenomenon entails is needed. To summarize the existing knowledge base on the nature of misogynistic extremism, this scoping review analyzed publications within English-language peer-reviewed and gray literature sources. Seven electronic databases and citation indexes were systematically searched using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for scoping reviews (PRISMA-ScR) checklist and charted using the 2020 PRISMA flow diagram. Inclusion criteria included English peer-reviewed articles and relevant gray literature publications, which contained the term "misogynistic extremism" and other closely related terms. No date restrictions were imposed. The search strategy initially yielded 475 publications. After exclusion of ineligible articles, 40 publications remained for synthesis. We found that misogynistic extremism is most frequently conceptualized in the context of misogynistic incels, male supremacism, far-right extremism, terrorism, and the black pill ideology. Policy recommendations include increased education among law enforcement and Countering and Preventing Violent Extremism experts on male supremacist violence and encouraging legal and educational mechanisms to bolster gender equality. Violence stemming from misogynistic worldviews must be addressed by directly acknowledging and challenging socially embedded systems of oppression such as white supremacy and cisheteropatriarchy.
... Data were captured using a custom-written computer program that was designed to collect vast amounts of information online. 58 It is worth highlighting here that Fascist Forge's "multi-stage application process" to vet its users before they are given access to post or visit and participate in private channels there (discussed below) did not affect the data collection process for the current study, as only the postings from the open-access section of the forum were extracted. In other words, those who were not vetted could view the open-access sections of the forum and, by extension, extract the open source content from the platform. ...
... For this study, publicly accessible Reddit data consisting of comments made within the r/Incels subreddit were used. All open-source content during this time frame on r/Incels was captured by one of the authors using a custom-written computer program that was designed to collect vast amounts of information online (for more information on the web-crawler, see Scrivens et al. 2019). In total, the web-crawler extracted approximately 1.4 million comments made by approximately 45,000 authors between January 2014 and November 2017. ...
Full-text available
The online presence of incels, or involuntary celibates, has been an increasing security concern for researchers, practitioners, and policymakers in recent years, given that self-identified incels – including Alek Minassian and Elliot Rodger – used the Internet to disseminate incel ideology and manifestos prior to committing acts of violence. However, little is empirically known about the incel movement in general or their online communities in particular. The present study draws from a set of comments from r/Incels, a now defunct but once popular subreddit dedicated to the incel community, and compares the most highly-upvoted comments (n = 500) to a random set of other comments (n = 500) in the subreddit. This qualitative analysis focuses on identifying subcultural discourse that is widely supported and engaged with by members of the online community and the extent to which incels utilize this online space to reaffirm deviant behavior. Our study underscores the importance, as well as the difficulties, of drawing from online sources like web-forums to generate new knowledge on deviant communities and behaviors. We conclude with a discussion of the implications of this analysis, its limitations, and avenues for future research.
Full-text available
Strong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically harvesting web content from web servers enables LEAs to collect and preserve data prone to serve as potential leads, clues, or evidence in an investigation. Although scientific studies have explored the field of web crawling soon after the inception of the web, few research studies have thoroughly scrutinised web crawling on the "dark web" or via ACNs such as I2P, IPFS, Freenet, and Tor. The current paper presents a systematic literature review (SLR) that examines the prevalence and characteristics of dark web crawlers. From a selection of 58 peer-reviewed articles mentioning crawling and the dark web, 34 remained after excluding irrelevant articles. The literature review showed that most dark web crawlers were programmed in Python, using either Selenium or Scrapy as the web scraping library. The knowledge gathered from the systematic literature review was used to develop a Tor-based web crawling model into an already existing software toolset customised for ACN-based investigations. Finally, the performance of the model was examined through a set of experiments. The results indicate that the developed crawler was successful in scraping web content from both clear and dark web pages, and scraping dark marketplaces on the Tor network. The scientific contribution of this paper entails novel knowledge concerning ACN-based web crawlers. Furthermore, it presents a model for crawling and scraping clear and dark websites for the purpose of digital investigations. The conclusions include practical implications of dark web content retrieval and archival, such as investigation clues and evidence, and the related future research topics.
Criminal law might sometimes be perceived to be at the margins of the automation process that involves increasing sectors of our society. While the expansion of automated-driven cars is by now an established fact, the technological upgrade of criminal justice mechanisms still tends to evocate sci-fi images and Minority-report-style dystopian scenarios to the non-specialists.However, the high variety of applications that can already be counted in this domain, and the acceleration of this process in the last few years, clearly speaks for a tangible expansion of AI technology also in the field. In particular, against some more-established scenarios, especially related to phenomena of so-called predictive policing, Multi-Agent Systems (MAS) open today the perspective of a much deeper involvement of automated technologies in the very shaping of investigative proceedings.The chapter offers an analysis of such potential, both with respect to their possible contribution to the efficiency of investigations (in their preventive and repressive dimension), and to avoid or reduce certain negative biases typical of “purely human” investigation processes, first of all, the tunnel vision effect.
Full-text available
Although many researchers, practitioners, and policymakers are concerned about identifying and characterizing online posting patterns of violent extremists prior to their engagement in violence offline, little is empirically known about their online patterns generally or differences in their patterns compared to their non-violent counterpart particularly. In this study, we drew from a unique sample of violent and non-violent right-wing extremists to develop and compare their online posting typologies (i.e., super-posters, committed, engaged, dabblers, and non-posters) in the largest white supremacy web-forum. We identified several noteworthy posting patterns that may assist law enforcement and intelligence agencies in identifying credible threats online.
Right-wing extremists, among other extremists, continue to exploit the power of the Internet and associated technologies by connecting with the like-minded from around the globe and developing a sense of identity there. A growing body of literature has been dedicated to exploring this phenomenon, with an interest in how online identities of these adherents develop over time. Overlooked in these discussions, however, has been an assessment of how the development of identities of violent right-wing adherents compare to their non-violent counterpart. This study explores how 49 violent and 50 non-violent right-wing extremists frame their identities over time on a popular online space of the extreme right, Stormfront. The results highlight the extent to which the collective identities of both groups take shape over time. We conclude with a discussion of implications of this analysis and avenues for future research.
Full-text available
There is an ongoing need for researchers, practitioners, and policymakers to detect and assess online posting behaviors of violent extremists prior to their engagement in violence offline, but little is empirically known about their online behaviors generally or the differences in their behaviors compared with nonviolent extremists who share similar ideological beliefs particularly. In this study, we drew from a unique sample of violent and nonviolent right-wing extremists to compare their posting behaviors in the largest White supremacy web-forum. We used logistic regression and sensitivity analysis to explore how users’ time of entry into the lifespan of an extremist sub-forum and their cumulative posting activity predicted their violence status. We found a number of significant differences in the posting behaviors of violent and nonviolent extremists which may inform future risk factor frameworks used by law enforcement and intelligence agencies to identify credible threats online.
Full-text available
Im Zuge des Medienwandels und der stetigen Ausdifferenzierung verfügbarer Online-Angebote verlagert sich nicht nur das alltägliche Leben zunehmend ins Digitale, sondern auch die Aktivitäten extremistischer Akteure. In Folge technologischer und gesellschaftlicher Entwicklungen (z. B. hinsichtlich zunehmender Gewaltbereitschaft im Rahmen von Covid-19-Demonstrationen) rücken Befürchtungen, das Internet könne eine Radikalisierung fördern, in den Fokus wissenschaftlicher und öffentlicher Debatten. Die Durchdringung des Alltags durch das Internet ist daher auch zentral bei der Analyse, Diskussion und Prävention von Radikalisierungsdynamiken. Die genaue Rolle des Internets in Radikalisierungsprozes-sen hängt dabei von verschiedenen Faktoren ab. Anhand einer systematischen Literaturanalyse von 216 Publikationen zu Radikalisierung im Internet wird ein Überblick über das Forschungsfeld generiert. Die Systematisie-rung der Literatur erfolgt auf drei Betrachtungsebenen, nämlich (1) der Unterscheidung von Wirkmechanismen auf Mikro-, Meso- und Makroebene, (2) der Modellierung von Radikalisierungsdynamiken entlang des Kommunikationsprozesses (Kommunikator:innen, Inhalt, Medium, Rezipient:innen) sowie (3) der differenzierten Betrachtung unterschiedlicher digitaler Räume im Kontext ihrer Nutzungspotenziale (Affordanzen) für extremistische Akteure. Darauf aufbauend werden Forschungslücken und Potenziale für künftige Studien sowie Handlungsempfehlungen für Akteure aus Praxis und Politik abgeleitet.
Full-text available
This chapter describes and discusses the roles of media tools and technologies in the facilitation of violent extremism and terrorism. Rather than focusing on how media report on terrorism, we investigate how extremist and terrorist groups and movements themselves have exploited various ‘traditional’ and ‘new’ media tools, from print to digital, outlining the significance that they have had on extremists’ ability to mark territory, intimidate some audiences, connect with other (sympathetic) audiences, radicalize, and even recruit. Underlined is that violent extremists and terrorists of all stripes have, over time, used every means at their disposal to forward their communicative goals. Also worth noting is that ‘old’ media tools are not extinct, and although ‘new’ media play a prominent role in contemporary violent extremism and terrorism, ‘old’ tools – everything from murals to magazines – continue to be utilized in tandem with the former.
Full-text available
Online discussion forums have been identified as an online social milieu that may facilitate the radicalization process, or the development of violent narratives for a minority of participants, notably youth. Yet, very little is known on the nature of the conversations youth have online, the emotions they convey, and whether or how the sentiments expressed in online narratives may change over time. Using Life Course Theory (LCT) and General Strain Theory (GST) as theoretical guidance, this article seeks to address the development of negative emotions in an online context, specifically whether certain turning points (such as entry into adulthood) are associated with a change in the nature of sentiments expressed online. A mixed methods approach is used, where the content of posts from a sample of 96 individuals participating in three online discussion forums focused on Islamic issues is analyzed quantitatively and qualitatively to assess the nature and evolution of negative emotions. The results show that 1) minors have a wider range of sentiments than adults, 2) adults are more negative overall when compared to minors, and 3) both groups tended to become more negative over time. However, the most negative users of the sample did not show as much change as the others, remaining consistent in their narratives from the beginning to the end of the study period.
Full-text available
Researchers have previously explored how right-wing extremists build a collective identity online by targeting their perceived “threat,” but little is known about how this “us” versus “them” dynamic evolves over time. This study uses a sentiment analysis-based algorithm that adapts criminal career measures, as well as semi-parametric group-based modeling, to evaluate how users’ anti-Semitic, anti-Black, and anti-LGBTQ posting behaviors develop on a sub-forum of the most conspicuous white supremacy forum. The results highlight the extent to which authors target their key adversaries over time, as well as the applicability of a criminal career approach in measuring radical posting trajectories online.
Conference Paper
Full-text available
When a terror-related event occurs, there is a surge of traffic on social media comprising of informative messages, emotional outbursts, helpful safety tips, and rumors. It is important to understand the behavior manifested on social media sites to gain a better understanding of how to govern and manage in a time of crisis. We undertook a detailed study of Twitter during two recent terror-related events: the Manchester attacks and the Las Vegas shooting. We analyze the tweets during these periods using (a) sentiment analysis, (b) topic analysis, and (c) fake news detection. Our analysis demonstrates the spectrum of emotions evinced in reaction and the way those reactions spread over the event timeline. Also, with respect to topic analysis, we find “echo chambers”, groups of people interested in similar aspects of the event. Encouraged by our results on these two event datasets, the paper seeks to enable a holistic analysis of social media messages in a time of crisis.
Full-text available
The distribution or consumption of traditional drugs has become the subject of stringent penalties throughout most of the world and synthetic designer drugs have become the alternative. Novel psychoactive substances, also called ‘legal highs’, are highly varied in terms of chemical composition. These substances are advertised and distributed as an alternative to traditional drugs on the Internet, making identification of new substances and enforcement difficult. For this article, we downloaded and analysed 28 Russian-language online drug marketplaces which distribute traditional and novel psychoactive substances. All marketplaces used a noncontact drug dealing method where the seller and the buyer communicate through the Internet to arrange for payment and delivery of drugs without meeting face-to-face. Geographic information, price, amount, substance type and payment method data were extracted. Findings indicate such marketplaces are able to operate due to the ability of their clients to pay anonymously with the virtual currencies – Qiwi and Bitcoin.
This study applies the semi-automated method of sentiment analysis in order to examine any quantifiable changes in the linguistic, topical, or narrative patterns that are present in the English-language Islamic State-produced propaganda magazines Dabiq (15 issues) and Rumiyah (10 issues). Based on a sentiment analysis of the textual content of these magazines, it was found that the overall use of language has remained largely consistent between the two magazines and across a timespan of roughly three years. However, while the majority of the language within these magazines is consistent, a small number of significant changes with regard to certain words and phrases were found. Specifically, the language of Islamic State magazines has become increasingly hostile towards certain enemy groups of the organization, while the language used to describe the Islamic State itself has become significantly more positive over time. In addition to identifying the changes and stabilities of the language used in Islamic State magazines, this study endeavours to test the effectiveness of the sentiment analysis method as a means of examining and potentially countering extremist media moving forward.
This study applies the method of sentiment analysis to the online media released by the Islamic State (IS) in order to distinguish the ways in which IS uses language within their media, and potential ways in which this language differs across various online platforms. The data used for this sentiment analysis consist of transcripts of IS-produced videos, the text of IS-produced online periodical magazines, and social media posts from IS-affiliated Twitter accounts. It was found that the language and discourse utilised by IS in their online media is of a predominantly negative nature, with the language of videos containing the highest concentration of negative sentiment. The words and phrases with the most extreme sentiment values are used as a starting point for the identification of specific narratives that exist within online IS media. The dominant narratives discovered with the aid of sentiment analysis were: 1) the demonstrated strength of the IS, 2) the humiliation of IS enemies, 3) continuous victory, and 4) religious righteousness. Beyond the identification of IS narratives, this study serves to further explore the utility of the sentiment analysis method by applying it to mediums and data that it has not traditionally been applied to, specifically, videos and magazines.
Often overlooked in the measurement of crime is the underlying size of offender populations. This holds true for online property crimes involving the sale, purchase, and use of stolen financial data. Though available data suggests that online frauds are steadily increasing, there are currently no estimates of the scope of this offender population. The current study addresses this issue by using capture–recapture methods to estimate the size of the population participating in stolen data markets over a calendar year. Data analysis involved samples collected from three websites that facilitate financial crimes and frauds. Findings suggest that markets are much larger in size than what can otherwise be observed, are heterogeneous, and that buyers outnumber vendors.
Conference Paper
Twitter is a free broadcast service for the registered members to the public limited to 140 characters that may include text, photos, videos and hyperlinks. People share news, opinions and information to support or against media. The most petrified topic is the ISIS terrorist attacks taking place around the world. ISIS takes advantage of the social media to continuously communicate using coded words or to establish their indirect presence. Hashtags associated with ISIS can be analyzed and capture the sentiment of the tweets. This paper presents a novel process for sentiment analysis on the ISIS related tweets and to organize the opinions with their geolocations. The Jeffrey Breen algorithm is used for sentiment analysis. The data mining algorithms such as Support Vector Machine, Random Forest, Bagging, Decision Trees and Maximum Entropy are applied for polarity based classification of ISIS related Tweets. The results are compared and presented.