ChapterPDF Available

Characterizing Social Bots Spreading Financial Disinformation


Abstract and Figures

Despite the existence of several studies on the characteristics and role of social bots in spreading disinformation related to politics, health, science and education, financial social bots remain a largely unexplored topic. We aim to shed light on this issue by investigating the activities of large social botnets in Twitter, involved in discussions about stocks traded in the main US financial markets. We show that the largest discussion spikes are in fact caused by mass-retweeting bots. Then, we focus on characterizing the activity of these financial bots, finding that they are involved in speculative campaigns aimed at promoting low-value stocks by exploiting the popularity of high-value ones. We conclude by highlighting the peculiar features of these accounts, comprising similar account creation dates, similar screen names, biographies, and profile pictures. These accounts appear as untrustworthy and quite simplistic bots, likely aiming to fool automatic trading algorithms rather than human investors. Our findings pave the way for the development of accurate detection and filtering techniques for financial spam. In order to foster research and experimentation on this novel topic, we make our dataset publicly available for research purposes.
Content may be subject to copyright.
Characterizing Social Bots Spreading
Financial Disinformation
Serena Tardelli1,MarcoAvvenuti
and Stefano Cresci3(B
1IIT-CNR and Department of Information Engineering,
University of Pisa, Pisa, Italy
2Department of Information Engineering, University of Pisa, Pisa, Italy
3IIT-CNR, Pisa, Italy
Abstract. Despite the existence of several studies on the characteris-
tics and role of social bots in spreading disinformation related to poli-
tics, health, science and education, financial social bots remain a largely
unexplored topic. We aim to shed light on this issue by investigating
the activities of large social botnets in Twitter, involved in discussions
about stocks traded in the main US financial markets. We show that
the largest discussion spikes are in fact caused by mass-retweeting bots.
Then, we focus on characterizing the activity of these financial bots, find-
ing that they are involved in speculative campaigns aimed at promoting
low-value stocks by exploiting the popularity of high-value ones. We con-
clude by highlighting the peculiar features of these accounts, comprising
similar account creation dates, similar screen names, biographies, and
profile pictures. These accounts appear as untrustworthy and quite sim-
plistic bots, likely aiming to fool automatic trading algorithms rather
than human investors. Our findings pave the way for the development of
accurate detection and filtering techniques for financial spam. In order
to foster research and experimentation on this novel topic, we make our
dataset publicly available for research purposes.
Keywords: Social bots ·Disinformation ·Deception ·Financial
spam ·Stock markets ·Twitter
Nowadays, social bots play a pivotal role in shaping the content of online social
media [25]. Their involvement in the spread of disinformation ranges from the
promotion of low-credibility content, astroturfing, and fake endorsements, to
the propagation of hate speech propaganda, in attempts to manipulate public
opinion and to increase societal polarization [26]. Indeed, recent studies have
observed the presence of artificial tampering in a wide variety of online topic
Springer Nature Switzerland AG 2020
G. Meiselwitz (Ed.): HCII 2 02 0, LNCS 1 2 1 94, pp. 376–392 , 2 02 0. 007/978-3-030-49570-1 _26
Characterizing Social Bots Spreading Financial Disinformation 377
debates, including political discussions, terrorist propaganda, and health con-
troversies [8].
social bots now pervade [14,24]. Indeed, such an ecosystem has proven to be of
great interest as a valuable ground to entice investors. Although the leverage
of social media content for predicting trends in the stock market has promising
potential [6], the presence of social bots in such scenarios poses serious con-
cerns over the reliability of financial information. Examples of repercussions of
financial spam on unaware investors and automated trading systems include the
real-world event known as the Flash Crash – the one-day collapse of the Dow
Jones Industrial Average in 2010 induced by an error in the estimation of online
information by automated trading systems [19]. Another most notable example
is the hacking of the US International Press Ocer’s ocial Twitter account in
2013, when a bot reported the injury of President Obama following a terrorist
attack, causing a major stock market drop in a short time1.Finally,wewitnessed
to the abrupt rise of Cynk Technology in 2014 from an unknown unprosperous
small company to a billions-worth company, due to a social bot orchestration
that lured automatic trading algorithms into investing in the company’s shares
based on a fake social discussion, which ultimately resulted in severe losses2.
Therefore, investigating such manipulations and characterizing them is of the
utmost importance in order to protect our markets from manipulation and to
safeguard our investments.
Contributions. In an eort to shed light on the little-studied activity of social
bots tampering with online financial discussions, we analyze a rich dataset of
9M tweets discussing stocks of the five main US financial markets. Our dataset
is complemented with financial information collected from Google Finance, for
each of the stocks mentioned in our tweets. By comparing social and financial
information, we report on the activity of large botnets perpetrating speculative
campaigns aimed at promoting low-value stocks by exploiting the popularity of
high-value ones. We highlight the main characteristics of these financial bots,
which appear as untrustworthy, simplistic accounts. Based on these findings, we
conclude that their activity is likely aimed at fooling automatic trading algo-
rithms rather than human investors.
Our main contributions are analytically summarized as follows:
disinformation on Twitter.
We uncover the existence of several large botnets, actively involved in artifi-
cially promoting specific stocks.
We characterize social bots tampering with financial discussions from various
perspectives, including their content, temporal, and social facets.
378 S. Tardelli et al.
Our findings provide an important contribution towards understanding the
role and impact of social bots in the financial domain, and pave the way for the
development of accurate detection and filtering techniques for financial spam.
2 Related Work
As anticipated, in the present study we are interested in analyzing the activity,
the behavior, and the characteristics of financial social bots. For this reason, in
this section we do not survey previous works related to the detection of social
bots – which is a dierent topic covered by many through studies [8] – but we
rather focus on those works related to the characterization of malicious accounts.
Given the many issues caused by malicious accounts to our online social
ecosystems, a large body of work analyzed the behavior of bots and trolls in dis-
information campaigns aimed at influencing a variety of debates. To understand
how trolls tampered with the 2016 US Presidential elections, previous work char-
acterized the content they disseminated, and their influence on the information
ecosystem [30]. Among other findings, authors discovered that trolls were cre-
ated a few weeks before important world events, and that they are more likely
to retweet political content from normal Twitter users, rather than other news
sources. In [31], authors evaluated the behavior and strategy changes over time
of Russian and Iranian state-sponsored trolls. By exposing the way Iranian trolls
changed behavior over time and started retweeting each other, they highlighted
how strategies employed by trolls adapt and evolve to new campaigns. Authors
in [27] detected and characterized Twitter bots and user interactions by analyz-
ing their retweet and mention strategies, and observed a high correlation between
the number of friends and followers of accounts and their bot-likelihood. In [1],
authors characterized Arabic social bots spreading religious hatred on Twitter,
and discovered they have a longer life, a higher number of followers, and an
activity more geared towards creating original content than retweets, compared
to English bots [5]. The previous works remarked the importance of understand-
ing the inherent characteristics of bots and trolls. In fact, despite showing signs
of bot-likelihood, bots do not often get caught in time, thus potentially aecting
the polarization and outcome of essential debates. Moreover, previous works also
highlighted how bots and trolls evolve and adapt to new contexts. Despite such
consistency in previous results, the characteristics of social bots disseminating
financial information are yet to be explored. The few previous works that tackled
automation and disinformation in online financial discussions, went as far as pro-
viding evidence of the presence of financial spam in stock microblogs and raised
concerns over the reliability of such information [13,14]. However, the detection
and impact estimation of such bots in social media financial discussions still rep-
resent largely unexplored fields of study. Conversely, the leverage of social bots
in other sectors has been extensively examined, with previous works focusing on
the interference of bots in health issues [2], terrorist propaganda [3], and politi-
cal election campaigns in the US [5], France [16], Italy [10], and Germany [7], to
name but a few.
Characterizing Social Bots Spreading Financial Disinformation 379
In this work, we aim at filling in the missing piece of the puzzle – that is, the
characterization of social bots in online financial conversations, with a focus on
how they are organized and how they operate.
Tabl e 1. Statistics about the financial and social composition of our dataset.
Markets Financial data Twitter data
Companies Median cap. ($) Tota l c a p . ($B) Users Tweets Retweets (%)
NASDAQ 3,013 365,780,000 10,521 252,587 4,017,158 1,017,138 (25%)
NYSE 2,997 1,810,000,000 28,692 265,618 4,410,201 923,123 (21%)
NYSEARCA 726 245,375,000 2,227 56,101 298,445 157,101 (53%)
NYSEMKT 340 78,705,000 256 22,614 196,545 63,944 (33%)
OTCMKTS 22,956 31,480,000 45,457 64,628 584,169 446,293 (76%)
To t a l 30,032 87,152 467,241 7,855,518 1,802,705 (23%)
By leveraging Twitter’s Streaming API [15], we collected all tweets mentioning at
least one of the 6,689 stocks listed on the ocial NASDAQ Web site3. Companies
quoted in the stock market are easily identified on Twitter by means of cashtags
– strings composed of a dollar sign followed by the ticker symbol of the company
(e.g., $AAPL is the cashtag of Apple, Inc.). Just like the hashtags, cashtags
serve as beacons to find, filter, and collect relevant content [18].
Our data collection covered a period of five months, from May to September
2017, and resulted in the retrieval of more than 9M tweets. We also extended
the dataset by gathering additional financial information (e.g., capitalization
and industrial classification) about the companies mentioned in our tweets, by
leveraging the Google Finance Web site4.Table1shows summary statistics about
our dataset, which is publicly available online for research purposes5.
In this section we describe the various analyses that allowed us to uncover
widespread speculative campaigns perpetrated by several botnets. For additional
details on the subsequent analyses, we point interested readers to [14].
380 S. Tardelli et al.
4.1 Dataset Overview
Each tweet in our dataset mentions at least one of the 6,689 stocks of the NAS-
DAQ list. Companies from this list typically feature a large market capitaliza-
tion and are traded in the 4 main US financial markets – namely, NASDAQ,NYSE,
NYSEARCA, and NYSEMKT. However, among tweets mentioning our 6,689 stocks, we
also found many mentions of other, less known, stocks. In particular, as shown
in Table 1, overall tweets of our dataset also mention 22,956 stocks traded in
the OTCMKTS market. Contrarily to the main stock exchanges, OTCMKTS has less
stringent constraints and mainly hosts stocks with a small capitalization. Unsur-
prisingly, if we analyze our whole dataset, no company from OTCMKTS appears
among those that are discussed the most. In fact, the most tweeted companies
in our dataset are in line with those found in previous works [18], and include
well-known and popular stocks such as $AAPL,$TSLA, and $FB. Nonetheless, a
few concerns rise if we consider the rate of retweets for OTCMKTS stocks, which
happens to be as high as 76% and in sharp contrast with the much lower rates
measured for all other markets. Since automated mass-retweets have been fre-
quently exploited by bots and trolls to artificially boost content popularity [23],
this result might hint at the possibility of a manipulation related to OTCMKTS
Fig. 1. Examples of tweets in which a few high-capitalization companies (green-colored)
co-occur with many low-capitalization ones (red-colored). (Color figure online)
4.2 Investigating Financial Discussion Spikes
In order to deepen our analysis of financial conversations, we now focus our
attention on discussion spikes about the 6,689 stocks of our starting list. For
identifying discussion spikes, we compute for each stock the hourly time series of
the volume of tweets mentioning that stock, to which we apply a simple anomaly
detection technique. In detail, we label as anomalies those peaks of discussion
that exceed the mean hourly volume of tweets by more than 10 standard devia-
tions, finding in total 1,926 financial discussion spikes.
Within the discussion spikes, we found more retweets than in the rest of
the dataset – namely, 60% retweets for spikes vs 23% for the whole dataset,
Characterizing Social Bots Spreading Financial Disinformation 381
on average. This finding alone does not necessarily imply a coordinated inau-
thentic activity, since also organic surges of interest in social media typically
result in many retweets. What is unusual however, is that tweets posted during
the identified discussion spikes contain, on average, many more cashtags (i.e.,
mentioned stocks) than the ones in the rest of the dataset. Moreover, such co-
occurring stocks seem largely unrelated, and the authors of those tweets do not
provide any information to explain the co-occurrences, as shown in the examples
of Fig. 1.
Fig. 2. Entropy of the industrial
classes of co-occurring stocks in
discussion spikes. As shown, the
high measured entropy implies
that co-occurring companies are
largely unrelated.
Fig. 3. Standard deviation of the capitaliza-
tion of co-occurring companies in discussion
spikes, and comparison with a bootstrap. The
large measured standard deviation implies
that high-cap companies co-occur with low-
cap ones.
4.3 Co-occurring Stocks
To investigate the reasons behind this large number of co-occurring stocks, we
follow two dierent hypotheses: (i) stocks might co-occur because of a similar
industrial sector (i.e., companies involved in the same business are more likely to
be mentioned together) or (ii) they might co-occur because of a similar market
value (i.e., high capitalization companies are more likely to be compared to
others with similar capitalization).
To assess whether our co-occurring stocks have a similar industrial sector,
we leverage Thomson Reuters Business Classification (TRBC)6. In particular,
we compute the normalized Shannon entropy between the TRBC classes of co-
occurring stocks for each tweet that contributes to a discussion spike. This anal-
ysis is repeated for all 5 TRBC levels. Each entropy value measured for a given
6TRBC is a 5-level hierarchical sector and industry classification, widely used in
the financial domain for computing sector-specific indices:
wiki/Thomson Reuters Business Classification.
382 S. Tardelli et al.
TRBC level for discussion spikes is then compared with the corresponding one
computed out of the whole dataset. Results of this analysis are shown in Fig. 2
and depict a situation characterized by large entropy values (i.e., 1, which is
the maximum possible value of normalized entropy). In turn, this implies that
co-occurring companies in discussion spikes are almost completely unrelated with
regards to their industrial classification. Moreover, entropy values measured for
discussion spikes are always higher than those measured for the whole dataset.
Regarding financial value, we assess the extent to which co-occurring com-
panies have a similar value by measuring the standard deviation of their market
capitalizations. To understand whether the measured standard deviation is due
to the intrinsic characteristics of our dataset (i.e., the underlying statistical dis-
tribution of capitalization) or to other external factors, we compared mean values
of our empiric measurements with a bootstrap. Results are shown in Fig. 3and
highlight a large empiric standard deviation between the capitalization of co-
occurring companies, such that a random bootstrap baseline – accounting for the
intrinsic characteristics of our dataset – can not explain it. These results mean
that not only high-capitalized companies indeed mostly co-occur with small-
capitalized ones, as shown in Fig. 1, but also that this phenomenon is rather the
consequence of some external action.
In summary, we demonstrated that tweets responsible for generating financial
discussion spikes mention a large number of unrelated stocks, some of which are
high-cap stocks while the others are low-cap ones.
Fig. 4. Kernel density estimation investigating the relation between social and financial
importance, for stocks of the 5 considered markets. OTCMKTS stocks have a suspiciously
high social importance despite their low financial importance.
4.4 Financial vs Social Importance
Several existing systems for forecasting stock prices leverage the positive cor-
relation between discussion volumes on social media around a given stock, and
its market value [22]. In other words, it is generally believed that stocks with a
high capitalization (i.e., high financial importance) are discussed more in social
media (i.e., high social importance) than those with a low capitalization. In this
section we verify whether this expected positive relation exists also for the stocks
in our dataset.
Characterizing Social Bots Spreading Financial Disinformation 383
In fact over our whole dataset, we measure a moderate positive Spearman’s
rank correlation coecient of ρ=0.4871 between social and financial impor-
tance, thus confirming previous findings. However, when focusing on discussion
spikes only, we measure a suspicious behavior related to OTCMKTS stocks, which
feature a negative ρ=0.2658, meaning that low-value OTCMKTS stocks are more
likely to appear in discussion spikes than high-value ones. To thoroughly under-
stand the relation between social and financial importance, in Fig. 4we report
the results of a bi-dimensional kernel density estimation of social and financial
importance for stocks of the five considered markets. Confirming previous con-
cerns, OTCMKTS stocks feature a suspiciously high social importance, despite their
low financial importance, in contrast with stocks of all other markets.
Fig. 5. Number of accounts created per
week in 2017. Bot accounts display coor-
dinated creation activities, while humans
are more evenly spread across the year.
Fig. 6. Distribution of the number of fol-
lowers and friends. Bot accounts show a
lower number of followers and friends with
respect to human accounts.
In the previous section, we described several suspicious phenomena related to
stock microblogs. In detail, discussion spikes about high-value stocks are filled
with mentions of low-value (mainly OTCMKTS) ones. Such mentions are not
explained by real-world stock relatedness. Moreover, the discussion spikes are
largely caused by mass retweets.
5.1 Bot Detection
In order to understand whether the previously described disorders in financial
microblogs are caused by organic (i.e., human-driven) or rather by synthetic
activity, here we discuss results of the application of a bot detection technique
to all users that contributed to at least one of the top-100 largest discussion
spikes. In this way, we analyzed roughly 50% of all our dataset, both in terms
of tweets and users, in search for social bots.
384 S. Tardelli et al.
To perform bot detection, we employ the state-of-the-art technique described
in [10], which is based on the analysis of the sequences of actions performed by
the investigated accounts. Strikingly, the technique classified as much as 71% of
all analyzed users as bots. Moreover, 48% of the users classified as bots were
also later suspended by Twitter, corroborating our results. Given these impor-
tant findings, we conclude that social bots were responsible for perpetrating the
financial disinformation campaigns that promoted OTCMKTS low-value stocks by
exploiting the popularity of high-value ones. In the remainder of this section we
report on the general characteristics of the 18,509 users classified as financial
social bots and we compare them to other bots and trolls previously studied in
literature as well as to the 7,448 accounts classified as humans.
Fig. 7. Examples of a subset of users classified as bots. The accounts show similarities in
their names, screen names, numbers of followers and followings, and in their description.
Such similarities support the hypothesis that these accounts are part of large, organized
5.2 Profile Characteristics of Financial Bots
The creation date is an unforgeable characteristic of a social media account that
has been frequently used to spot groups of coordinated malicious accounts (e.g.,
bots and trolls) [29]. Its usefulness lies in the impossibility to counterfeit or
to masquerade it, combined with the fact that “masters” typically create their
bot and troll armies in short time spans7. As a consequence, large numbers of
accounts featuring almost identical creation dates might represent botnets or
troll armies. Given this picture, the first characteristic of financial social bots
that we analyze is the distribution of their account creation dates. The creation
dates of the accounts in our dataset are distributed between 2007 and 2017.
However, the majority of bots (53%) were created in 2017, as opposed to humans
(12%). Figure 5shows the distribution of creation dates of bots and humans in
Characterizing Social Bots Spreading Financial Disinformation 385
2017, at a weekly granularity. Interestingly, bots display coordinated creation
activities, while the creation of human accounts is more evenly distributed across
the year. In detail, 45% of bots were created between February and April, with
a particularly significant spike of 1,346 bots created on March 2. These findings
further confirm the manufactured nature of the accounts classified as bots, and
their pervasive presence in stock microblogs.
Tabl e 2. Top-5 words and 3-grams used in account descriptions by bots and humans.
Descriptions for humans are more heterogeneous and repetitions are less frequent.
Social bots Humans
Word Freq. 3-gram Freq. Word Freq. 3-gram Freq.
Trad i n g 8,138 Day trading making 848 Love 314 Fol l ow foll o w back 7
Day 4,195 Trading making money 848 Life 209 Never say never 7
Money 4,173 Investing day trading 838 Foll ow 168 Always strive prosper 6
Stocks 4,056 Trading stocks investing 821 Music 164 Live life fullest 5
Trad i n g 4,047 Investing trading stocks 814 Like 112 Stock market investor 4
Colluding groups of bots and trolls have also been associated to peculiar
patterns in their screen names [21]. This is because they represent fictitious
identities whose names and usernames are typically generated algorithmically.
Looking for artificial patterns in the screen name, we first analyze the distribu-
tion of the screen name length. Interestingly, 50% of bots have a screen name
length between 14 and 15 characters, while only 26% of humans share such char-
acteristic. By examining the structure of suspiciously long bot screen names, we
observe two main patterns. The first denotes the presence of screen names com-
posed of a given name, followed by a family name. Such users also use the given
name, which in almost all the cases is a female English name, as their display
name. The second pattern exposes bots with a screen name composed of exactly
15 random alpha-numeric characters, accompanied by a given name as a display
name. Such phenomenon has been observed before for numerous bot accounts
involved in two dierent political-related events [4], and it’s a strong confirma-
tion of the malicious nature of our accounts labelled as bots. Figure 7provides
some examples of such bots. Moreover, by cross-checking information related to
the creation dates, we observe that 11% of such bots are created on the same
Next, we inspect account descriptions (also known as biographies). We find
other users. In other words, there are 174 small groups of at least 3 users having
in common the same description. Such repeated descriptions follow a specific
pattern – in particular, they are composed of a famous quote or law, and of a set
of financial keywords that are totally unrelated with the rest of the description.
Interestingly, the use of famous quotes by bots to attract genuine users has
already been documented before, for bots acting in the political domain [11].
386 S. Tardelli et al.
We find 373 occurrences of such pattern, and none amongst the users with this
pattern is classified as human. Some bot accounts exhibiting this characteristic
are shown in Fig. 7.Table2summarizes the words and 3-grams mostly used in
account descriptions by bots and humans. As shown, striking dierences emerge.
In summary, all previous findings support the hypothesis that users classified as
bots did not act individually but that are rather part of large, organized and
coordinated botnets.
Finally, we measure dierences between bots and humans with respect to
their social relationships. In particular, Fig. 6shows dierences in the distribu-
tions of followers and followings. Bots are characterized by a significantly lower
number of both followers and followings, indicating accounts with few social
relationships. It has been demonstrated that accounts with many social rela-
tionships in online social platforms are perceived as more trustworthy and cred-
ible [9]. Thus, to this regard, our financial bots appear as rather untrustworthy
and simplistic accounts. Having few social connections also implies a diculty
in amplifying and propagating messages. In other words, only few users can read
– and possibly re-share – what these bots post.
Fig. 8. Distribution of the number of
tweets per user. Although bots and
humans feature similar volumes of shared
tweets, financial bots tend to retweet
rather than to create original content.
Fig. 9. Edge weights distribution in
the user similarity network. The distri-
bution approximates a power law with
circles. (Color figure online)
5.3 Tweeting Characteristics of Financial Bots
Studying general profile characteristics, as we have done in the previous sub-
section, allows to assess the credibility and trustworthiness of financial bots (or
lack thereof). Instead, in the remainder of this section we focus on their tweeting
Characterizing Social Bots Spreading Financial Disinformation 387
Fig. 10. A portion of the user similarity network. Nodes are colored according to their
classification as bots or humans. Several dierent botnets are clearly visible as dense
clusters. Bots are typically connected to a single human-labeled user, with which they
share the majority of their mentioned companies.
activity. Our aim for the following analyses is to understand the likely target of
financial bots as well as their inner organization.
We first analyze the distribution of the number of tweets posted by bots and
humans, for each possible type of tweet (that is, original tweets, retweets and
replies). As displayed in Fig. 8, financial bots and humans share a comparable
total number of tweets. In other words, financial bots do not seem to post exces-
sively (i.e., to spam), as other simplistic types of bot do [8], but instead they
have an overall content production that is similar to that of humans. However,
bots exhibit a strong preference for retweeting rather than for creating original
content or for replying. Therefore, retweets are the primary mechanism used by
financial bots to propagate content. It is worth noting however that the focus of
388 S. Tardelli et al.
these financial bots is likely posed on the retweet itself, rather than on retweets
as an ecient mean to rapidly reach broader audiences [17,23]. This is because
financial bots are characterized by few social relationships, as discussed in the
previous section. As such, few users would be exposed to their retweets. This
strategy, applied to the financial context, may nonetheless deceive trading algo-
rithms listening to social conversations in search for hot stocks to invest in. As
a consequence, synchronized mass-retweets of stock microblogs may contribute
to artificially overstate the interest associated with specific stocks.
We conclude our analyses by studying the use of cashtags by bots and
humans. Here, we are particularly interested in identifying groups of users that
systematically tweet about the same stocks, because this might reveal the inner
structure of financial disinformation botnets. Interesting questions are related
as to whether we are witnessing to a single huge botnet or whether there are
multiple botnets individually promoting dierent sets of stocks.
To answer these questions, we first build the bipartite network of users (com-
prising both bots and humans) and companies. In detail: Twitter users are one
set of nodes, companies represent the other set of nodes, and a link connects
a user to a company if that user mentioned that company in one of its tweets.
This bipartite network is directed and weighted based on the number of times a
user mentions given companies. In order to study similarities between groups of
users, we then project our bipartite network onto the set of users. This process
results in two users being linked to one another if they both mentioned at least
one common company. The projected network, henceforth called user similarity
network, is undirected and weighted. The weight of a link connecting two users
measures the number of companies mentioned by both users.
For the sake of clarity, in the following we report results of the analysis of
a subset of the user similarity network. In particular, Fig. 9shows the distribu-
tion of edge weights in the considered portion of the network. As shown, the
edge weights distribution approximates a power law, with 2 notable exceptions
marked in figure with red circles. Peculiar patterns that deviate from the gen-
eral law for specific portions of a network distribution have been previously
associated with malicious activities [20]. For this reason, we focus subsequent
analyses on the network nodes and edges that are responsible for the deviations
highlighted in Fig. 9. In particular, Fig. 10 shows the resulting user similarity
network, visualized via a force-directed layout, where nodes are colored accord-
ing to their classification as bots or humans. Interestingly, the vast majority of
nodes in this network were previously labeled as bots, during our bot detection
step. This explains the deviations observed in the edge weights distribution plot.
In addition, the vast majority of bots is organized in a few large distinct clusters.
Each cluster of bots is typically connected to a single human-labeled user, with
which bots share the majority of their mentioned companies. In other words, the
visualization of Fig. 10 clearly allows to identify several distinct botnets, as well
as the accounts that they are promoting. The few human-labeled users of the
network show more diverse patterns of network connections. They are not orga-
nized in dense clusters and, in general, feature more heterogeneous connectivity
Characterizing Social Bots Spreading Financial Disinformation 389
patterns with respect to the bots, confirming previous literature results [12]. A
few interesting portions of the network are magnified in the A,Band Cinsets
of Fig. 10, and allow to identify the users to which the botnets are connected
(including the @YahooFinance account visible in inset B), as well as the similar
names (e.g., all English and female, as shown in insets Aand B)oftheaccounts
that constitute the botnets.
Results of our investigations highlighted the widespread existence of financial
disinformation in Twitter. In particular, we documented a speculative campaign
where many financially unimportant (low-cap) stocks are massively mentioned
in tweets together with a few financially important (high-cap) ones. In previous
work, this fraud was dubbed as cashtag piggybacking, since the low-value stocks
are piggybacked “on top of the shoulders” of the high-value ones [14]. Considering
the already demonstrated relation between social and financial importance [22],
a possible outcome expected by perpetrators of this advertising practice is the
increase in financial importance of the low-value stocks, by exploiting the popu-
larity of high-value ones. To this regard, promising directions of future research
involve assessing whether these kinds of malicious activity are correlated to, or
can influence, stock prices fluctuations, the stock market’s performance, or even
the macroeconomic stability [14].
Analyses of suspicious users involved in financial discussion spikes, revealed
that the speculative campaigns are perpetrated by large groups of coordinated
social bots, organized in several distinct botnets. We showed that the financial
bots involved in these manipulative activities present very simple accounts, with
few details and social connections. Among the available details, many signs of
fictitious information emerge, such as the suspicious profile descriptions where
some financial keywords are mixed with other unrelated content. The simplistic
characteristics of these bots, their relatively recent and bursty creation dates,
and their limited number of social connections give the overall impression of
untrustworthy accounts. The financial social bots discovered in our study have
dierent characteristics with respect to the much more sophisticated social bots
recently emerged in worldwide political discussions [8,11]. Financial social bots
thus appear as a rather easy target for automatic detection and removal, as also
confirmed by the large number of such bots that has already been banned by
Based on these findings, we conclude that these bots should not pose a seri-
ous threat to human investors (e.g., noise traders) looking for fresh information
on Twitter. However, the aim of financial bots could be that of fooling automatic
trading algorithms. In fact, to the best of our knowledge, the majority of exist-
ing systems that feed on social information for predicting stock prices, do not
perform filtering with regards to possibly fictitious content. As such, these sys-
tems could potentially be vulnerable to coordinated malicious practices such as
that of cashtag piggybacking. The fact that no study nor existing system actually
390 S. Tardelli et al.
hunted financial bots before our present works, could also possibly explain the
simplistic characteristics of these bots. In fact, it is largely demonstrated that
recent social bots became so evolved and sophisticated as an evasion mechanism
for the plethora of existing bot detection techniques [8]. In other words, financial
bots could be this simple, just because nobody ever hunted them. If this proves
to be the case however, we should expect financial bots to become much more
sophisticated in the near future. A scenario that would pose a heavier burden
on our side with regards to their detection and removal.
The user-centric classification approach that we adopted in this study
demands the availability and the analysis of large amounts of data, and requires
intensive and time-consuming computations. This is because, in order to assess
the veracity of a discussion spike, all users involved in that discussion are to be
analyzed. This could easily imply the analysis of tens of thousands of accounts for
evaluating a single spike of discussion. On the contrary, another – more favorable
– scenario could involve the classification of the discussion spikes themselves. In
other words, future financial spam detection systems could analyze high-level
characteristics of discussion spikes (e.g., their burstiness, the number of distinct
accounts that participate, market information of the discussed stocks, etc.), with
the goal of promptly detecting promoted, fictitious, or made up discussions. This
approach, previously applied to other scenarios [28], is however still unexplored
in the online financial domain. As such, it represents another promising avenue
of future research and experimentation.
Our work investigated the presence and the characteristics of financial disinfor-
mation in Twitter. We documented a speculative practice aimed at promoting
low-value stocks, mainly from the OTCMKTS financial market, by exploiting the
popularity of high-value (e.g., NASDAQ)ones.Anin-depthanalysisoftheaccounts
involved in this practice revealed that 71% of them are bots. Moreover, 48%
of the accounts classified as bots have been subsequently banned by Twitter.
Finally, bots involved in financial disinformation turned out to be rather sim-
plistic and untrustworthy, in contrast with recent political bots that are much
more sophisticated.
Our findings about the characteristics of fake financial discussion spikes as
well as those related to the characteristics of financial bots, could be leveraged
in the future as features for designing novel financial spam filtering systems.
Hence, this work lays the foundations for the development of specific – yet still
unavailable – methods to detect online financial disinformation, before it harms
the pockets of unaware investors.
Acknowledgements. This work is partially supported by the European Com-
munity’s H2020 Program under the scheme INFRAIA-1-2014-2015: Research
Infrastructures, grant agreement #654024 SoBigData: Social Mining and Big
Data Ecosystem and the scheme INFRAIA-01-2018-2019: Research and Innovation
action, grant agreement #871042 SoBigData++: European Integrated Infrastructure
Characterizing Social Bots Spreading Financial Disinformation 391
for Social Mining and Big Data Analytics, and by the Italian Ministry of Education
and Research in the framework of the CrossLab Project (Cloud Computing, Big Data
& Cybersecurity), Departments of Excellence.
1. Albadi, N., Kurdi, M., Mishra, S.: Hateful people or hateful bots? Detection and
characterization of bots spreading religious hatred in Arabic social media. In: Pro-
ceedings of the ACM on Human-Computer Interaction (HCI), vol. 3, no. CSCW,
pp. 1–25 (2019)
2. Allem, J.P., Ferrara, E.: Could social bots pose a threat to public health? Am. J.
Public Health (AJPH) 108(8), 1005 (2018)
3. Berger, J.M., Morgan, J.: The ISIS Twitter census: defining and describing the
population of ISIS supporters on Twitter. The Brookings Project on US Relations
with the Islamic World, vol. 3, no. 20, pp. 1–4 (2015)
4. Beskow, D.M., Carley, K.M.: Its all in a name: detecting and labeling bots by their
name. Comput. Math. Organ. Theory (CMOT) 25(1), 24–35 (2019). https://doi.
5. Bessi, A., Ferrara, E.: Social bots distort the 2016 U.S. Presidential election online
discussion. First Monday 21 (2016).
6. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput.
Sci. (JCS) 2(1), 1–8 (2011)
7. Brachten, F., Stieglitz, S., Hofeditz, L., Kloppenborg, K., Reimann, A.: Strategies
and influence of social bots in a 2017 German state election - a case study on
Twitter. In: The 28th Australasian Conference on Information Systems (ACIS
2017) (2017)
8. Cresci, S.: Detecting malicious social bots: story of a never-ending clash. In:
Grimme, C., Preuss, M., Takes, F.W., Waldherr, A. (eds.) MISDOOM 2019. LNCS,
vol. 12021, pp. 77–88. Springer, Cham (2020).
39627-5 7
9. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale:
ecient detection of fake Twitter followers. Decis. Support Syst. (DSS) 80, 56–71
10. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Social finger-
printing: detection of spambot groups through DNA-inspired behavioral modeling.
IEEE Trans. Dependable Secure Comput. (TDSC) 15(4), 561–576 (2017)
11. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-
shift of social spambots: evidence, theories, and tools for the arms race. In: The
26th International Conference on World Wide Web Companion (WWW 2017 Com-
panion), pp. 963–972 (2017). IW3C2
12. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Emergent
properties, models and laws of behavioral similarities within groups of Twitter
users. Comput. Commun. 150, 47–61 (2020)
13. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: $FAKE: evidence of spam
and bot activity in stock microblogs on Twitter. In: The 12th International AAAI
Conference on Web and Social Media (ICWSM 2018). AAAI (2018)
14. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking:
uncovering spam and bot activity in stock microblogs on Twitter. ACM Trans.
Web (TWEB) 13(2), 11:1–11:27 (2019)
392 S. Tardelli et al.
15. Cresci, S., Minutoli, S., Nizzoli, L., Tardelli, S., Tesconi, M.: Enriching digital
libraries with crowdsensed data. In: Manghi, P., Candela, L., Silvello, G. (eds.)
IRCDL 2019. CCIS, vol. 988, pp. 144–158. Springer, Cham (2019). https://doi.
org/10.1007/978-3-030-11226-4 12
16. Ferrara, E.: Disinformation and social bot operations in the run up to the 2017
French Presidential election. First Monday 22(8) (2017).
17. Giatsoglou, M., Chatzakou, D., Shah, N., Faloutsos, C., Vakali, A.: Retweeting
activity on Twitter: signs of deception. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho,
T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp.
122–134. Springer, Cham (2015). 10
18. Hentschel, M., Alonso, O.: Follow the money: a study of cashtags on Twitter. First
Monday 19(8) (2014).
19. Hwang, T., Pearce, I., Nanis, M.: Socialbots: voices from the fronts. Interactions
19(2), 38–45 (2012)
20. Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: CatchSync: catching syn-
chronized behavior in large directed graphs. In: The 20th SIGKDD International
Conference on Knowledge Discovery and Data Mining (SIGKDD 2014), pp. 941–
950. ACM (2014)
21. Lee, S., Kim, J.: Early filtering of ephemeral malicious accounts on Twitter. Com-
put. Commun. 54, 48–57 (2014)
22. Mao, Y., Wei, W., Wang, B., Liu, B.: Correlating S&P 500 stocks with Twitter
data. In: The 1st International Workshop on Hot Topics on Interdisciplinary Social
Networks Research (SIGKDD 2012 Workshops), pp. 69–72. ACM (2012)
23. Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: RTbust:
exploiting temporal patterns for botnet detection on Twitter. In: The 11th Inter-
national ACM Web Science Conference (WebSci 2019), pp. 183–192. ACM (2019)
24. Nizzoli, L., Tardelli, S., Avvenuti, M., Cresci, S., Tesconi, M., Ferrara, E.:
Charting the landscape of online cryptocurrency manipulation. arXiv preprint
arXiv:2001.10289 (2020)
25. Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.: The
spread of low-credibility content by social bots. Nat. Commun. 9(1), 4787 (2018)
26. Stella, M., Ferrara, E., De Domenico, M.: Bots increase exposure to negative and
inflammatory content in online social systems. Proc. Natl. Acad. Sci. 115(49),
12435–12440 (2018)
27. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot
interactions: detection, estimation, and characterization. In: The 11th International
AAAI Conference on Web and Social Media (ICWSM 2017). AAAI (2017)
28. Varol, O., Ferrara, E., Menczer, F., Flammini, A.: Early detection of promoted
campaigns on social media. EPJ Data Sci. 6(1), 1–19 (2017).
29. Viswanath, B., et al.: Strength in numbers: robust tamper detection in crowd
computations. In: The 2015 ACM Conference on Online Social Networks (COSN
2015), pp. 113–124. ACM (2015)
30. Zannettou, S., Caulfield, T., De Cristofaro, E., Sirivianos, M., Stringhini, G., Black-
burn, J.: Disinformation warfare: understanding state-sponsored trolls on Twitter
and their influence on the Web. In: The 2019 World Wide Web Conference Com-
panion (WWW 2019 Companion), pp. 218–226 (2019). IW3C2
31. Zannettou, S., Caulfield, T., Setzer, W., Sirivianos, M., Stringhini, G., Blackburn,
J.: Who let the trolls out? Towards understanding state-sponsored trolls. In: The
11th International ACM Web Science Conference (WebSci 2019), pp. 353–362.
ACM (2019)
... The strength of this approach is that it focused on the following links of the graph to detect bots accounts. However, a recent study [53] found that bot accounts show a lower number of followers and friends with respect to humans. This contrasted with the fact that early generations of bots had a higher ratio of following rate [35]. ...
Full-text available
Twitter, as a popular social network, has been targeted by different bot attacks. Detecting social bots is a challenging task, due to their evolving capacity to avoid detection. Extensive research efforts have proposed different techniques and approaches to solving this problem. Due to the scarcity of recently updated labeled data, the performance of detection systems degrades when exposed to a new dataset. Therefore, semi-supervised learning (SSL) techniques can improve performance, using both labeled and unlabeled examples. In this paper, we propose a framework based on the multi-view graph attention mechanism using a transfer learning (TL) approach, to predict social bots. We called the framework ‘Bot-MGAT’, which stands for bot multi-view graph attention network. The framework used both labeled and unlabeled data. We used profile features to reduce the overheads of the feature engineering. We executed our experiments on a recent benchmark dataset that included representative samples of social bots with graph structural information and profile features only. We applied cross-validation to avoid uncertainty in the model’s performance. Bot-MGAT was evaluated using graph SSL techniques: single graph attention networks (GAT), graph convolutional networks (GCN), and relational graph convolutional networks (RGCN). We compared Bot-MGAT to related work in the field of bot detection. The results of Bot-MGAT with TL outperformed, with an accuracy score of 97.8%, an F1 score of 0.9842, and an MCC score of 0.9481.
... The characteristics of these identified accounts may then be systematically evaluated across relevant case studies of interest to inform counter-strategies for enhanced societal resilience to information operations writ large [12,[26][27][28]. This supervised paradigm of scholarship has uncovered valuable insights about the impacts of information operations, spanning a range of domains like politics [5,29], finance [30,31], and public health [14,32]; as well as in diverse national and international contexts around the world [12,[33][34][35][36]. Across this vast literature, important findings include the quantification of links between the activity of social bots and the spread of low-credibility information or fake news [37]; how automated accounts increase human exposure to more inflammatory content [4]; how bots can spread hate most effectively in groups that are denser and more isolated from mainstream dialogue [7]; and how they may even attenuate the influence of more traditional opinion leaders in online conversations [15]. ...
Full-text available
This paper presents a new computational framework for mapping state-sponsored information operations into distinct strategic units. Utilizing a novel method called multi-view modularity clustering (MVMC), we identify groups of accounts engaged in distinct narrative and network information maneuvers. We then present an analytical pipeline to holistically determine their coordinated and complementary roles within the broader digital campaign. Applying our proposed methodology to disclosed Chinese state-sponsored accounts on Twitter, we discover an overarching operation to protect and manage Chinese international reputation by attacking individual adversaries (Guo Wengui) and collective threats (Hong Kong protestors), while also projecting national strength during global crisis (the COVID-19 pandemic). Psycholinguistic tools quantify variation in narrative maneuvers employing hateful and negative language against critics in contrast to communitarian and positive language to bolster national solidarity. Network analytics further distinguish how groups of accounts used network maneuvers to act as balanced operators, organized masqueraders, and egalitarian echo-chambers. Collectively, this work breaks methodological ground on the interdisciplinary application of unsupervised and multi-view methods for characterizing not just digital campaigns in particular, but also coordinated activity more generally. Moreover, our findings contribute substantive empirical insights around how state-sponsored information operations combine narrative and network maneuvers to achieve interlocking strategic objectives. This bears both theoretical and policy implications for platform regulation and understanding the evolving geopolitical significance of cyberspace.
... The focus of these studies was to evaluate content diffusion on social networks as a way to measure the influence of bots on public stance towards a topic. Most of these studies evaluated the spread of the misleading and false information by social bots to measure the effect of these accounts on the discussion of an event (Tardelli et al.;Santia et al. 2019;Shao et al. 2018); for instance, study by (Santia et al. 2019) and (Tardelli et al.) evaluated the spread of misleading content on Facebook by bots, and spreading misinformation related to financial topics, respectively. Similarly, the work of Shao et al. (2018) found that bots amplified the spread of fake news within a 10-month period between 2016 and 2017. ...
Full-text available
There is a rising concern with social bots that imitate humans and manipulate opinions on social media. Current studies on assessing the overall effect of bots on social media users mainly focus on evaluating the diffusion of discussions on social networks by bots. Yet, these studies do not confirm the relationship between bots and users’ stances. This study fills in the gap by analyzing if these bots are part of the signals that formulated social media users’ stances towards controversial topics. We analyze users’ online interactions that are predictive to their stances and identify the bots within these interactions. We applied our analysis on a dataset of more than 4000 Twitter users who expressed a stance on seven different topics. We analyzed those users’ direct interactions and indirect exposures with more than 19 million accounts. We identify the bot accounts for supporting/against stances, and compare them to other types of accounts, such as the accounts of influential and famous users. Our analysis showed that bot interactions with users who had specific stances were minimal when compared to the influential accounts. Nevertheless, we found that the presence of bots was still connected to users’ stances, especially in an indirect manner, as users are exposed to the content of the bots they follow, rather than by directly interacting with them by retweeting, mentioning, or replying.
... Articles 3 and 4 of the Treaty on the Functioning of the European Union.424 Tardelli, S., Avvenuti, M., Tesconi, M. and Cresci, S., 'Characterizing Social Bots Spreading Financial Disinformation', Meiselwitz, G. (eds), Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis, HCII 2020, Lecture Notes in Computer Science, vol 12194, Springer, 2020.425 ...
Technical Report
Full-text available
This study, commissioned by the European Parliament's Policy Department for Citizens' Rights and Constitutional Affairs at the request of the LIBE Committee, aims at finding the balance between regulatory measures to tackle disinformation and the protection of freedom of expression. It explores the European legal framework and analyses the roles of all stakeholders in the information landscape. The study offers recommendations to reform the attention-based, data-driven information landscape and regulate platforms' rights and duties relating to content moderation.
Today, implications of automation in social media, specifically whether social bots can be used to manipulate people’s thoughts and behaviors are discussed. Some believe that social bots are simple tools that amplify human-created content, while others claim that social bots do not exist at all and that the research surrounding them is a conspiracy theory. This paper discusses the potential of automation in online media and the challenges that may arise as technological advances continue. The authors believe that automation in social media exists, but acknowledge that there is room for improvement in current scientific methodology for investigating this phenomenon. They focus on the evolution of social bots, the state-of-the-art content generation technologies, and the perspective of content generation in games. They provide a background discussion on the human perception of content in computer-mediated communication and describe a new automation level, from which they derive interdisciplinary challenges.KeywordsSocial mediaAutomationBotsArtificial intelligenceContent generation
Full-text available
To study the various factors influencing the process of information sharing on Twitter is a very active research area. This paper aims to explore the impact of numerical features extracted from user profiles in retweet prediction from the real-time raw feed of tweets. The originality of this work comes from the fact that the proposed model is based on simple numerical features with the least computational complexity, which is a scalable solution for big data analysis. This research work proposes three new features from the tweet author profile to capture the unique behavioral pattern of the user, namely “Author total activity”, “Author total activity per year”, and “Author tweets per year”. The features set is tested on a dataset of 100 million random tweets collected through Twitter API. The binary labels regression gave an accuracy of 0.98 for user-profile features and gave an accuracy of 0.99 when combined with tweet content features. The regression analysis to predict the retweet count gave an R-squared value of 0.98 with combined features. The multi-label classification gave an accuracy of 0.9 for combined features and 0.89 for user-profile features. The user profile features performed better than tweet content features and performed even better when combined. This model is suitable for near real-time analysis of live streaming data coming through Twitter API and provides a baseline pattern of user behavior based on numerical features available from user profiles only.
Online social networks (OSNs) are a major component of societal digitalization. OSNs alter how people communicate, make decisions, and form or change their beliefs, attitudes, and behaviors. Thus, they can now impact social groups, financial systems, and political communication at scale. As one type of OSN, social media platforms, such as Facebook, Twitter, and YouTube, serve as outlets for users to convey information to an audience as broad or targeted as the user desires. Over the years, these social media platforms have been infected with automated accounts, or bots, that are capable of hijacking conversations, influencing other users, and manipulating content dissemination. Although benign bots exist to facilitate legitimate activities, we focus on bots designed to perform malicious acts through social media platforms. Bots that mimic the social behaviors of humans are referred to as social bots. Social bots help automate sociotechnical behaviors, such as "liking" tweets, tweeting/retweeting a message, following users, and coordinating with or even competing against other bots. Some advanced social bots exhibit highly sophisticated traits of coordination and communication with complex organizational structures. This article presents a detailed survey of social bots, their types and behaviors, and how they impact social media, identification algorithms, and their coordination strategies in OSNs. The survey also discusses coordination in areas such as biological systems, interorganizational networks, and coordination games. Existing research extensively studied bot detection, but bot coordination is still emerging and requires more in-depth analysis. The survey covers existing techniques and open research issues on the analysis of social bots, their behaviors, and how social network theories can be leveraged to assess coordination during online campaigns.
Misleading information is an emerging cyber risk. It includes misinformation, disinformation, and fake news. Digital transformation and COVID-19 have exacerbated it. While there has been much discussion about the effects of misinformation, disinformation, and fake news on the political process, the consequences of misleading information on businesses have been far less, and it can be argued insufficiently, examined. The article offers a primer on misleading information and cyber risks aimed at business executives and leaders across an array of industries, organizations, and nations. Misleading information can have a profound effect on business. I analyze different misleading information types and identify associated cyber risks to help businesses think about these emerging threats. I examine in general the cyber risk posed by misleading information on business, and I explore in more detail the impact on healthcare, media, financial markets, and elections and geopolitical risks. Finally, I offer a set of practical recommendations for organizations to respond to these new challenges and to manage risks.
Online financial content is widespread on social media, especially on Twitter. The possibility to access open, real-time data about stock market information and firms’ public reputation can bring competitive advantages to industry insiders. However, as many studies extensively demonstrated before, manipulative campaigns by social bots do not spare the financial sector either. In this work, we show that the more viral a stock is on Twitter, the more that virality is artificially caused by social bots. This result is also confirmed when considering accounts suspended by Twitter instead of bots. Starting from this finding, we then propose two methods for detecting the presence and the extent of financial disinformation on Twitter, via classification and regression. Our systems exploit hundreds of features to encode the characteristics of viral discussions, including features about: participating users, textual content of shared posts, temporal patterns of diffusion, and financial information about stocks. We experiment with different combinations of algorithms and features, achieving excellent results for the detection of financial disinformation (F1=0.97) and promising results for the challenging task of estimating the extent of inorganic activity within financial discussions (R2=0.81, MAE=4.9%). Our compelling results pave the way for the deployment of novel systems for protecting against financial disinformation.
Full-text available
Cryptocurrencies represent one of the most attractive markets for financial speculation. As a consequence, they have attracted unprecedented attention on social media. Besides genuine discussions and legitimate investment initiatives, several deceptive activities have flourished. In this work, we chart the online cryptocurrency landscape across multiple platforms. To reach our goal, we collected a large dataset, composed of more than 50M messages published by almost 7M users on Twitter, Telegram and Discord, over three months. We performed bot detection on Twitter accounts sharing invite links to Telegram and Discord channels, and we discovered that more than 56% of them were bots or suspended accounts. Then, we applied topic modeling techniques to Telegram and Discord messages, unveiling two different deception schemes – “pump-and-dump” and “Ponzi” – and identifying the channels involved in these frauds. Whereas on Discord we found a negligible level of deception, on Telegram we retrieved 296 channels involved in pump-anddump and 432 involved in Ponzi schemes, accounting for a striking 20% of the total. Moreover, we observed that 93% of the invite links shared by Twitter bots point to Telegram pump-and-dump channels, shedding light on a little-known social bot activity. Charting the landscape of online cryptocurrency manipulation can inform actionable policies to fight such abuse.
Conference Paper
Full-text available
Within OSNs, many of our supposedly online friends may instead be fake accounts called social bots, part of large groups that purposely re-share targeted content. Here, we study retweeting behaviors on Twitter, with the ultimate goal of detecting retweeting social bots.We collect a dataset of 10M retweets. We design a novel visualization that we leverage to highlight benign and malicious patterns of retweeting activity. In this way, we uncover a ?normal" retweeting pattern that is peculiar of human-operated accounts, and suspicious patterns related to bot activities. Then, we propose a bot detection technique that stems from the previous exploration of retweeting behaviors. Our technique, called Retweet-Buster (RTbust), leverages unsupervised feature extraction and clustering. An LSTM autoencoder converts the retweet time series into compact and informative latent feature vectors, which are then clustered with a hierarchical density-based algorithm. Accounts belonging to large clusters characterized by malicious retweeting patterns are labeled as bots. RTbust obtains excellent detection results, with F1=0.87, whereas competitors achieve F1?0.76.Finally, we apply RTbust to a large dataset of retweets, uncovering 2 previously unknown active botnets with hundreds of accounts.
Conference Paper
Full-text available
Recent evidence has emerged linking coordinated campaigns by state-sponsored actors to manipulate public opinion on the Web. Campaigns revolving around major political events are enacted via mission-focused ?trolls." While trolls are involved in spreading disinformation on social media, there is little understanding of how they operate, what type of content they disseminate, how their strategies evolve over time, and how they influence the Web's in- formation ecosystem. In this paper, we begin to address this gap by analyzing 10M posts by 5.5K Twitter and Reddit users identified as Russian and Iranian state-sponsored trolls. We compare the behavior of each group of state-sponsored trolls with a focus on how their strategies change over time, the different campaigns they embark on, and differences between the trolls operated by Russia and Iran. Among other things, we find: 1) that Russian trolls were pro-Trump while Iranian trolls were anti-Trump; 2) evidence that campaigns undertaken by such actors are influenced by real-world events; and 3) that the behavior of such actors is not consistent over time, hence detection is not straightforward. Using Hawkes Processes, we quantify the influence these accounts have on pushing URLs on four platforms: Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab. In general, Russian trolls were more influential and efficient in pushing URLs to all the other platforms with the exception of /pol/ where Iranians were more influential. Finally, we release our source code to ensure the reproducibility of our results and to encourage other researchers to work on understanding other emerging kinds of state-sponsored troll accounts on Twitter.
Conference Paper
Full-text available
Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” Although they are often involved in spreading disinformation on social media, there is little understanding of how these trolls operate, what type of content they disseminate, and most importantly their influence on the information ecosystem. In this paper, we shed light on these questions by analyzing 27K tweets posted by 1K Twitter users identified as having ties with Russia’s Internet Research Agency and thus likely state-sponsored trolls. We compare their behavior to a random set of Twitter users, finding interesting differences in terms of the content they disseminate, the evolution of their account, as well as their general behavior and use of Twitter. Then, using Hawkes Processes, we quantify the influence that trolls had on the dissemination of news on social platforms like Twitter, Reddit, and 4chan. Overall, our findings indicate that Russian trolls managed to stay active for long periods of time and to reach a substantial number of Twitter users with their tweets. When looking at their ability of spreading news content and making it viral, however, we find that their effect on social platforms was minor, with the significant exception of news published by the Russian state-sponsored news outlet RT (Russia Today).
Full-text available
SoBigData is a Research Infrastructure (RI) aiming to provide an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining. A key milestone of the project focuses on data, methods and results sharing, in order to ensure the reproducibility, review and re-use of scientific works. For this reason, the Digital Library paradigm is implemented within the RI, providing users with virtual environments where datasets, methods and results can be collected, maintained, managed and preserved, granting full documentation, access and the possibility to re-use.
Full-text available
Automated social media bots have existed almost as long as the social media environments they inhabit. Their emergence has triggered numerous research efforts to develop increasingly sophisticated means to detect these accounts. These efforts have resulted in a cat and mouse cycle in which detection algorithms evolve trying to keep up with ever evolving bots. As part of this continued evolution, our research proposes a multi-model 'tool-box' approach in order to conduct detection at various tiers of data granularity. To support this toolbox approach this research also uses random string detection applied to user names to filter twitter streams for bot accounts and use this as labeled training data for follow on research.
Microblogs are increasingly exploited for predicting prices and traded volumes of stocks in financial markets. However, it has been demonstrated that much of the content shared in microblogging platforms is created and publicized by bots and spammers. Yet, the presence (or lack thereof) and the impact of fake stock microblogs has never systematically been investigated before. Here, we study 9M tweets related to stocks of the 5 main financial markets in the US. By comparing tweets with financial data from Google Finance, we highlight important characteristics of Twitter stock microblogs. More importantly, we uncover a malicious practice perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones. Our results call for the adoption of spam and bot detection techniques in all studies and applications that exploit user-generated content for predicting the stock market.
Recently, studies on the characterization and detection of social bots were published at an impressive rate. By looking back at over ten years of research and experimentation on social bots detection, in this paper we aim at understanding past, present, and future research trends in this crucial field. In doing so, we discuss about one of the nastiest features of social bots – that is, their evolutionary nature. Then, we highlight the switch from supervised bot detection techniques – focusing on feature engineering and on the analysis of one account at a time – to unsupervised ones, where the focus is on proposing new detection algorithms and on the analysis of groups of accounts that behave in a coordinated and synchronized fashion. These unsupervised, group-analyses techniques currently represent the state-of-the-art in social bot detection. Going forward, we analyze the latest research trend in social bot detection in order to highlight a promising new development of this crucial field.
Arabic Twitter space is crawling with bots that fuel political feuds, spread misinformation, and proliferate sectarian rhetoric. While efforts have long existed to analyze and detect English bots, Arabic bot detection and characterization remains largely understudied. In this work, we contribute new insights into the role of bots in spreading religious hatred on Arabic Twitter and introduce a novel regression model that can accurately identify Arabic language bots. Our assessment shows that existing tools that are highly accurate in detecting English bots don't perform as well on Arabic bots. We identify the possible reasons for this poor performance, perform a thorough analysis of linguistic, content, behavioral and network features, and report on the most informative features that distinguish Arabic bots from humans as well as the differences between Arabic and English bots. Our results mark an important step toward understanding the behavior of malicious bots on Arabic Twitter and pave the way for a more effective Arabic bot detection tools.
DNA-inspired online behavioral modeling techniques have been proposed and successfully applied to a broad range of tasks. In this paper, we investigate the fundamental laws that drive the occurrence of behavioral similarities among Twitter users, employing a DNA-inspired technique. Our findings are multifold. First, we demonstrate that, despite apparently featuring little to no similarities, the online behaviors of Twitter users are far from being uniformly random. Secondly, we benchmark different behavioral models through a number of simulations. We characterize the main properties of such models and we identify those models that better resemble human behaviors in Twitter. Then, we demonstrate that the number and the extent of behavioral similarities within a group of Twitter users obey a log-normal law, and we leverage this characterization to propose a novel bot detection system. In a nutshell, the results shed light on the fundamental properties that drive the online behaviors of groups of Twitter users, through the lenses of DNA-inspired behavioral modeling techniques. This study is based on a wealth of data gathered over several months that, for the sake of reproducibility, are publicly available for research purposes.