ArticlePDF Available

Abstract

The massive spread of fake news has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of digital misinformation and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. However, to date, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand claims on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots play a key role in the spread of fake news. Accounts that actively spread misinformation are significantly more likely to be bots. Automated accounts are particularly active in the early spreading phases of viral claims, and tend to target influential users. Humans are vulnerable to this manipulation, retweeting bots who post false news. Successful sources of false and biased claims are heavily supported by social bots. These results suggests that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
The spread of fake news by social bots
Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol,
Alessandro Flammini, and Filippo Menczer
Indiana University, Bloomington
Abstract
The massive spread of fake news has been identified as a major global
risk and has been alleged to influence elections and threaten democracies.
Communication, cognitive, social, and computer scientists are engaged
in efforts to study the complex causes for the viral diffusion of digital
misinformation and to develop solutions, while search and social media
platforms are beginning to deploy countermeasures. However, to date,
these efforts have been mainly informed by anecdotal evidence rather than
systematic data. Here we analyze 14 million messages spreading 400 thou-
sand claims on Twitter during and following the 2016 U.S. presidential
campaign and election. We find evidence that social bots play a key role in
the spread of fake news. Accounts that actively spread misinformation are
significantly more likely to be bots. Automated accounts are particularly
active in the early spreading phases of viral claims, and tend to target
influential users. Humans are vulnerable to this manipulation, retweeting
bots who post false news. Successful sources of false and biased claims
are heavily supported by social bots. These results suggests that curbing
social bots may be an effective strategy for mitigating the spread of online
misinformation.
1 Introduction
If you get your news from social media, as most Americans do [7], you are ex-
posed to a daily dose of false or misleading content — hoaxes, rumors, conspiracy
theories, fabricated reports, click-bait headlines, and even satire. We refer to
this misinformation collectively as false or fake news. The incentives are well
understood: traffic to fake news sites is easily monetized through ads [16], but
political motives can be equally or more powerful [18, 23]. The massive spread
of false news has been identified as a major global risk [11]. Claims that fake
news can influence elections and threaten democracies [8] are hard to prove.
Yet we have witnessed abundant demonstrations of real harm caused by mis-
information spreading on social media, from dangerous health decisions [10] to
manipulations of the stock market [6].
1
arXiv:1707.07592v1 [cs.SI] 24 Jul 2017
A complex mix of cognitive, social, and algorithmic biases contribute to our
vulnerability to manipulation by online misinformation. Even in an ideal world
where individuals tend to recognize and avoid sharing low-quality information,
information overload and finite attention limit the capacity of social media to
discriminate information on the basis of quality. As a result, online misinforma-
tion is just as likely to go viral as reliable information [22]. Of course, we do not
live in such an ideal world. Our online social networks are strongly polarized and
segregated along political lines [3, 2]. The resulting “echo chambers” [28, 21]
provide selective exposure to news sources, biasing our view of the world [20].
Furthermore, social media platforms are designed to prioritize engaging rather
than trustworthy posts. Such algorithmic popularity bias may well hinder the
selection of quality content [24, 9, 19]. All of these factors play into confirmation
bias and motivated reasoning [26, 14], making the truth hard to discern.
While fake news are not a new phenomenon [15], the online information
ecosystem is particularly fertile ground for sowing misinformation. Social me-
dia can be easily exploited to manipulate public opinion thanks to the low cost
of producing fraudulent websites and high volumes of software-controlled pro-
files or pages, known as social bots [23, 6, 27, 29]. These fake accounts can post
content and interact with each other and with legitimate users via social connec-
tions, just like real people. People tend to trust social contacts [12] and can be
manipulated into believing and spreading content produced in this way [1]. To
make matters worse, echo chambers make it easy to tailor misinformation and
target those who are most likely to be believe it. Moreover, the amplification of
fake news through social bots overloads our fact-checking capacity due to our
finite attention, as well as our tendencies to attend to what appears popular
and to trust information in a social setting [13].
The fight against fake news requires a grounded assessment of the mecha-
nism by which misinformation spreads online. If the problem is mainly driven
by cognitive limitations, we need to invest in news literacy education; if social
media platforms are fostering the creation of echo chambers, algorithms can be
tweaked to broaden exposure to diverse views; and if malicious bots are respon-
sible for many of the falsehoods, we can focus attention on detecting this kind of
abuse. Here we focus on gauging the latter effect. There is plenty of anecdotal
evidence that social bots play a role in the spread of fake news. The earliest man-
ifestations were uncovered in 2010 [18, 23]. Since then, we have seen influential
bots affect online debates about vaccination policies [6] and participate actively
in political campaigns, both in the U.S. [1] and other countries [32]. However,
a quantitative analysis of the effectiveness of misinformation-spreading attacks
based on social bots is still missing.
A large-scale, systematic analysis of the spread of fake news and its manip-
ulation by social bots is now feasible thanks to two tools developed in our lab:
the Hoaxy platform to track the online spread of claims [25] and the Botome-
ter machine learning algorithm to detect social bots [4, 29]. Let us examine
how social bots promoted hundreds of thousands of fake news articles spreading
through millions of Twitter posts during and following the 2016 U.S. presidential
campaign.
2
Figure 1: Weekly tweeted claim articles, tweets/article ratio and articles/site
ratio. The collection was briefly interrupted in October 2016. In December 2016
we expanded the set of claim sources, from 71 to 122 websites.
2 Results
We crawled the articles published by seven independent fact-checking organiza-
tions and 122 websites that, according to established media, routinely publish
false and/or misleading news. The present analysis focuses on the period from
mid-May 2016 to the end of March 2017. During this time, we collected 15,053
fact-checking articles and 389,569 unsubstantiated or debunked claims. Using
the Twitter API, Hoaxy collected 1,133,674 public posts that included links
to fact checks and 13,617,425 public posts linking to claims. See Methods for
details.
As shown in Fig. 1, fake news websites each produced approximately 100
articles per week, on average. The virality of these claims increased to approxi-
mately 30 tweets per article per week, on average. However, success is extremely
heterogeneous across articles. Fig. 2(a) illustrates an example of viral claim.
Whether we measure success by number of people sharing an article or number
of posts containing a link, we find a very broad distribution of popularity span-
ning several orders of magnitude: while the majority of articles go unnoticed, a
significant fraction go viral (Fig. 2(b,c)). Unfortunately, and consistently with
prior analysis using Facebook data [22], we find that the popularity profiles
profiles of false news are indistinguishable from those of fact-checking articles.
Most claims are spread through original tweets and especially retweets, while
few are shared in replies (Fig. 3).
The claim-posting patterns shown in Fig. 4(a) highlight inorganic support.
The points aligned along the diagonal lines (on the left of the plot) indicate
that for many articles, one or two accounts are responsible for the entirety
of the activity. Furthermore, some accounts share the same claim up to 100
3
a
bc
Figure 2: Virality of fake news. (a) Diffusion network for the article titled
“Spirit cooking”: Clinton campaign chairman practices bizarre occult ritual,
published by the conspiracy site Infowars.com four days before the 2016 U.S.
election. Over 30 thousand tweets shared this claim; only the largest connected
component of the network is shown. Nodes and links represent Twitter accounts
and retweets of the claim, respectively. Node size indicates account influence,
measured by the number of times an account is retweeted. Node color represents
bot score, from blue (likely human) to red (likely bot); yellow nodes cannot be
evaluated because they have either been suspended or deleted all their tweets.
An interactive version of this network is available online (iunetsci.github.
io/HoaxyBots/). The two charts plot the probability distributions (density
functions) of (b) number of tweets per article and (c) number of users per
article, for claims and fact-checking articles.
4
Figure 3: Distribution of types of tweet spreading claims. Each article is mapped
along three axes representing the percentages of different types of messages that
share it: original tweets, retweets, and replies. Color represents the number of
articles in each bin, on a log-scale.
times or more. The ratio or tweets per user decreases for more viral claims,
indicating more organic spreading. But Fig. 4(b) demonstrates that for the most
viral claims, much of the spreading activity originates from a small portion of
accounts.
We suspect that these super-spreaders of fake news are social bots that
automatically post links to articles, retweet other accounts, or perform more
sophisticated autonomous tasks, like following and replying to other users. To
test this hypothesis, we used the Botometer service to evaluate the Twitter
accounts that posted links to claims. For each user we computed a bot score,
which can be interpreted as the likelihood that the account is controlled by
software. Details of our detection systems can be found in Methods.
Fig. 5 confirms that the super-spreaders are significantly more likely to be
bots compared to the population of users who share claims. We hypothesize that
these bots play a critical role in driving the viral spread of fake news. To test this
conjecture, we examined the accounts that post viral claims at different phases
of their spreading cascades. As shown in Fig. 6, bots actively share links in
the first few seconds after they are first posted. This early intervention exposes
many users to the fake news article, effectively boosting its viral diffusion.
Another strategy used by bots is illustrated in Fig. 7(a): influential users are
often mentioned in tweets that link to debunked claims. Bots seem to employ
this targeting strategy repetitively; for example, a single account mentioned
@realDonaldTrump in 18 tweets linking the claim shown in the figure. For a
systematic investigation, let us use the number of followers of a Twitter user as a
5
a
b
Figure 4: Concentration of claim-sharing activity. (a) Scatter plot of
tweets/account ratio versus number of tweets sharing a claim. The darkness
of a point represents the number of claims. (b) Source concentration for claims
with different popularity. We consider a collection of articles shared by a mini-
mum number of tweets as a popularity group. For claims in each of these groups,
we show the distribution of Gini coefficients. A high coefficient indicates that a
small subset of accounts was responsible for a large portion of the posts. In this
and the following violin plots, the width of a contour represents the probability
of the corresponding value, and the median is marked by a colored line.
6
Figure 5: Bot score distributions for a random sample of 915 users who posted
at least one link to a claim, and for the 961 accounts that most actively share
fake news (super-spreaders). The two groups have significantly different scores
(p < 104according to a Welch’s unequal-variances t-test).
Figure 6: Temporal evolution of bot score distributions for a sample of 60,000
accounts that participate in the spread of the 1,000 most viral claims. We
focus on the first hour since a fake news article appears, and divide this early
spreading phase into logarithmic lag intervals.
7
TheEllenShow
YouTube
BarackObama
BBCBreaking
nytimes
CNN
ladygaga
realDonaldTrump
cnnbrk
a
b
Figure 7: (a) Example of targeting for the claim Report: three million votes
in presidential election cast by illegal aliens, published by Infowars.com on
November 14, 2016 and shared over 18 thousand times on Twitter. Only a por-
tion of the diffusion network is shown. Nodes stand for Twitter accounts, with
size representing number of followers. Links illustrate how the claim spreads:
by retweets and quoted tweets (blue), or by replies and mentions (red). (b) Dis-
tributions of the number of followers for Twitter users who are mentioned or
replied to in posts that link to the most viral 1000 claims. The distributions are
grouped by bot score of the account that creates the mention or reply.
8
Figure 8: Scatter plot of bot activity vs. difference between actual and predicted
vote margin by U.S. states. For each state, we compared the vote margin
with forecasts based on the final polls on election day. A positive percentage
indicates a larger Republican margin or smaller Democratic margin. To gauge
fake news sharing activity by bots, we considered tweets posting links to claims
by accounts with bot score above 0.6 that reported a U.S. state location in their
profile. We compared the tweet frequencies by states with those expected from
a large sample of tweets about the elections in the same period. Ratios above
one indicate states with higher than expected bot activity. We also plot a linear
regression (red line). Pearson’s correlation is ρ= 0.15.
proxy for their influence. We consider tweets that mention or reply to a user and
include a link to a viral fake news story. Tweets tend to mention popular people,
of course. However, Fig. 7(b) shows that when accounts with the highest bot
scores share these links, they tend to target users with a higher median number
of followers and lower variance. In this way bots expose influential people, such
as journalists and politicians, to a claim, creating the appearance that it is
widely shared and the chance that the targets will spread it.
We examined whether bots tended to target voters in certain states by cre-
ating the appearance of users posting claims from those locations. To this end,
we considered accounts with high bot scores that shared claims in the three
months before the election, and focused on those with a state location in their
profile. The location is self-reported and thus trivial to fake. As a baseline, we
extracted state locations from a large sample of tweets about the elections in
the same period (see details in Methods). A χ2test indicates that the location
patterns produced by bots are inconsistent with the geographic distribution of
political conversations on Twitter (p < 104). Given the widespread but un-
proven allegations that fake news may have influenced the 2016 U.S. elections,
9
Figure 9: Joint distribution of the bot scores of accounts that retweeted links
to claims and accounts that had originally posted the links. Color represents
the number of retweeted messages in each bin, on a log scale. Projections show
the distributions of bot scores for retweeters (top) and for accounts retweeted
by humans (left).
we explored the relationship between bot activity and voting data. The ratio of
bot frequencies with respect to state baselines provides an indication of claim-
sharing activity by state. Fig. 8 shows a weak correlation between this ratio and
the change in actual vote margin with respect to state forecasts (see Methods).
Naturally this correlation does not imply that voters were affected by bots shar-
ing fake news; many other factors can explain the election outcome. However
it is remarkable that states most actively targeted by misinformation-spreading
bots tended to have more surprising election results.
Having found that bots are employed to drive the viral spread of fake news,
let us explore how humans interact with the content shared by bots, which may
provide insight into whether and how bots are able to affect public opinion.
Fig. 9 shows that human do most of the retweeting, and they retweet claims
posted by bots as much as by other humans. This suggests that humans can be
successfully manipulated through social bots.
Finally, we compared the extent to which social bots successfully manipulate
the information ecosystem in support of different sources of online misinforma-
10
Figure 10: Popularity and bot support for the top 20 fake news websites. Popu-
larity is measured by total tweet volume (horizontal axis) and median number of
tweets per claim (circle area). Bot support is gauged by the median bot score of
the 100 most active accounts posting links to articles from each source (vertical
axis).
tion. We considered the most popular sources in terms of median and aggregate
article posts, and measured the bot scores of the accounts that most actively
spread their claims. As shown in Fig. 10, one site (beforeitsnews.com) stands
out in terms of manipulation, but other well-known sources also have many bots
among their promoters. At the bottom we find satire sites like The Onion.
3 Discussion
Our analysis provides quantitative empirical evidence of the key role played by
social bots in the viral spread of fake news online. Relatively few accounts are
responsible for a large share of the traffic that carries misinformation. These
accounts are likely bots, and we uncovered several manipulation strategies they
use. First, bots are particularly active in amplifying fake news in the very early
spreading moments, before a claim goes viral. Second, bots target influential
users through replies and mentions. Finally, bots may disguise their geographic
locations. People are vulnerable to these kinds of manipulation, retweeting bots
who post false news just as much as other humans. Successful sources of fake
news in the U.S., including those on both ends of the political spectrum, are
heavily supported by social bots. As a result, the virality profiles of false news
are indistinguishable from those of fact-checking articles. Social media platforms
are beginning to acknowledge these problems and deploy countermeasures, al-
though their effectiveness is hard to evaluate [31, 17].
Our findings demonstrate that social bots are an effective tool to manipulate
social media and deceive their users. Although our spreading data is collected
from Twitter, there is no reason to believe that the same kind of abuse is not
11
taking place on other digital platforms as well. In fact, viral conspiracy theories
spread on Facebook [5] among the followers of pages that, like social bots, can
easily be managed automatically and anonymously. Furthermore, just like on
Twitter, false claims on Facebook are as likely to go viral as reliable news [22].
While the difficulty to access spreading data on platforms like Facebook is a
concern, the growing popularity of ephemeral social media like Snapchat may
make future studies of this abuse all but impossible.
The results presented here suggest that curbing social bots may be an effec-
tive strategy for mitigating the spread of online misinformation. Progress in this
direction may be accelerated through partnerships between social media plat-
forms and academic research. For example, our lab and others are developing
machine learning algorithms to detect social bots [6, 27, 29]. The deployment
of such tools is fraught with peril, however. While platforms have the right
to enforce their terms of service, which forbid impersonation and deception,
algorithms do make mistakes. Even a single false-positive error leading to the
suspension of a legitimate account may foster valid concerns about censorship.
This justifies current human-in-the-loop solutions, which unfortunately do not
scale with the volume of abuse that is enabled by software. It is therefore
imperative to support research on improved abuse detection technology.
An alternative strategy would be to employ CAPTCHAs [30], challenge-
response tests to determine whether a user is human. CAPTCHAs have been
deployed widely and successfully to combat email spam and other types of online
abuse. Their use to limit automatic posting or resharing of news links could
stem bot abuse, but also add undesirable friction to benign applications of
automation by legitimate entities, such as news media and emergency response
coordinators. These are hard trade-offs that must be studied carefully as we
contemplate ways to address the fake news epidemics.
4 Methods
The online article-sharing data was collected through Hoaxy, an open plat-
form developed at Indiana University to track the spread of fake news and
fact checking on Twitter [25]. A search engine, interactive visualizations, and
open-source software are freely available (hoaxy.iuni.iu.edu). The data is
accessible through a public API.
The links to the stories considered here were crawled from websites that rou-
tinely publish unsubstantiated or debunked claims, according to lists compiled
by reputable third-party news and fact-checking organizations. We started the
collection in mid-May 2016 with 71 sites and added 51 more in mid-December
2016. The full list of sources is available on the Hoaxy website. The collection
period for the present analysis extends until the end of March 2017. During this
time, we collected 389,569 claims. We also tracked 15,053 stories published by
independent fact-checking organizations, such as snopes.com,politifact.com,
and factcheck.org.
Using Twitter’s public streaming API, we collected 13,617,425 public posts
12
that included links to claims and 1,133,674 public posts linking to fact checks.
We extracted metadata about the source of each link, the account that shared it,
the original poster in case of retweet or quoted tweet, and any users mentioned
or replied to in the tweet.
We transformed URLs to their canonical forms to merge different links re-
ferring to the same article. This happens mainly due to shortening services
(44% links are redirected) and extra parameters (34% of URLs contain analyt-
ics tracking parameters), but we also found websites that use duplicate domains
and snapshot services. Canonical URLs were obtained by resolving redirection
and removing analytics parameters.
We apply no editorial judgment about the truthfulness of individual claims;
some may be accurate (false positives) and some fake news may be missed (false
negatives). The great ma jority of claims are misleading, including fabricated
news, hoaxes, rumors, conspiracy theories, click bait, and politically biased
content. We did not exclude satire because many fake-news sources label their
content as satirical, making the distinction problematic. Furthermore, viral
satire is often mistaken for real news. The Onion is the satirical source with the
highest total volume of shares. We repeated our analyses of most viral claims
(e.g., Fig. 6) with articles from theonion.com excluded and the results were not
affected.
The bot score of Twitter accounts was computed using the Botometer ser-
vice, developed at Indiana University and available through a public API
(botometer.iuni.iu.edu). Botometer evaluates the extent to which an ac-
count exhibits similarity to the characteristics of social bots [4]. We use the
Twitter Search API to collect up to 200 of an account’s most recent tweets and
up to 100 of the most recent tweets mentioning the account. From this data we
extract features capturing various dimensions of information diffusion as well
as user metadata, friend statistics, temporal patterns, part-of-speech and senti-
ment analysis. These features are fed to a machine learning algorithm trained
on thousands of examples of human and bot accounts. The system has high
accuracy [29] and is widely adopted, serving over 100 thousand requests daily.
The location analysis in Fig. 8 is based on 3,971 tweets that meet four
conditions: they were shared in the period between August and October 2016,
included a link to a claim, originated from an account with high bot score (above
0.6), and included one of the 51 U.S. state names or abbreviations (including
District of Columbia) in the location metadata. The baseline frequencies were
obtained from a 10% sample of public posts from the Twitter streaming API.
We considered 164,276 tweets in the same period that included hashtags with
the prefix #election and a U.S. state location. 2016 election forecast data was
obtained from from FiveThirtyEight (projects.fivethirtyeight.com/2016-
election-forecast/) and vote margins data from the Cook Political Report
(cookpolitical.com/story/10174).
Acknowledgments. We are grateful to Ben Serrette and Valentin Pentchev
of the Indiana University Network Science Institute (iuni.iu.edu), as well as
13
Lei Wang for supporting the development of the Hoaxy platform. Clayton A.
Davis developed the Botometer API. C.S. thanks the Center for Complex Net-
works and Systems Research (cnets.indiana.edu) for the hospitality during
his visit at the Indiana University School of Informatics and Computing. He
was supported by the China Scholarship Council. G.L.C. was supported by
IUNI. The development of the Botometer platform was supported in part by
DARPA (grant W911NF-12-1-0037). A.F. and F.M. were supported in part
by the James S. McDonnell Foundation (grant 220020274) and the National
Science Foundation (award CCF-1101743). The funders had no role in study
design, data collection and analysis, decision to publish or preparation of the
manuscript.
References
[1] A. Bessi and E. Ferrara. Social bots distort the 2016 us presidential election
online discussion. First Monday, 21(11), 2016.
[2] M. Conover, J. Ratkiewicz, M. Francisco, B. Gon¸calves, A. Flammini, and
F. Menczer. Political polarization on twitter. In Proc. 5th International
AAAI Conference on Weblogs and Social Media (ICWSM), 2011.
[3] M. D. Conover, B. Gon¸calves, A. Flammini, and F. Menczer. Partisan
asymmetries in online political activity. EPJ Data Science, 1:6, 2012.
[4] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer. Botornot:
A system to evaluate social bots. In Proceedings of the 25th International
Conference Companion on World Wide Web, pages 273–274, 2016.
[5] M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H. E.
Stanley, and W. Quattrociocchi. The spreading of misinformation online.
Proc. National Academy of Sciences, 113(3):554–559, 2016.
[6] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini. The rise of
social bots. Comm. ACM, 59(7):96–104, 2016.
[7] J. Gottfried and E. Shearer. News use across social media platforms 2016.
White paper, Pew Research Center, May 2016.
[8] L. Gu, V. Kropotov, and F. Yarochkin. The fake news machine: How
propagandists abuse the internet and manipulate the public. Trendlabs
research paper, Trend Macro, 2017.
[9] N. O. Hodas and K. Lerman. How limited visibility and divided attention
constrain social contagion. In Proc. ASE/IEEE International Conference
on Social Computing, 2012.
[10] P. J. Hotez. Texas and its measles epidemics. PLOS Medicine, 13(10):1–5,
10 2016.
14
[11] L. Howell et al. Digital wildfires in a hyperconnected world. In Global
Risks. World Economic Forum, 2013.
[12] T. Jagatic, N. Johnson, M. Jakobsson, and F. Menczer. Social phishing.
Communications of the ACM, 50(10):94–100, October 2007.
[13] Y. Jun, R. Meng, and G. V. Johar. Perceived social presence reduces fact-
checking. Proceedings of the National Academy of Sciences, 114(23):5976–
5981, 2017.
[14] M. S. Levendusky. Why do partisan media polarize viewers? American
Journal of Political Science, 57(3):611—-623, 2013.
[15] W. Lippmann. Public Opinion. Harcourt, Brace and Company, 1922.
[16] B. Markines, C. Cattuto, and F. Menczer. Social spam detection. In Proc.
5th International Workshop on Adversarial Information Retrieval on the
Web (AIRWeb), 2009.
[17] A. Mosseri. News feed fyi: Showing more informative links in news feed.
press release, Facebook, June 2017.
[18] E. Mustafaraj and P. T. Metaxas. From obscurity to prominence in minutes:
Political speech and real-time search. In Proc. Web Science Conference:
Extending the Frontiers of Society On-Line, 2010.
[19] A. Nematzadeh, G. L. Ciampaglia, F. Menczer, and A. Flammini. How al-
gorithmic popularity bias hinders or promotes quality. Preprint 1707.00574,
arXiv, 2017.
[20] D. Nikolov, D. F. M. Oliveira, A. Flammini, and F. Menczer. Measuring
online social bubbles. PeerJ Computer Science, 1(e38), 2015.
[21] E. Pariser. The filter bubble: How the new personalized Web is changing
what we read and how we think. Penguin, 2011.
[22] X. Qiu, D. F. M. Oliveira, A. S. Shirazi, A. Flammini, and F. Menczer.
Limited individual attention and online virality of low-quality information.
Nature Human Behavior, 1:0132, 2017.
[23] J. Ratkiewicz, M. Conover, M. Meiss, B. Gon¸calves, A. Flammini, and
F. Menczer. Detecting and tracking political abuse in social media. In
Proc. 5th International AAAI Conference on Weblogs and Social Media
(ICWSM), 2011.
[24] M. J. Salganik, P. S. Dodds, and D. J. Watts. Experimental study of
inequality and unpredictability in an artificial cultural market. Science,
311(5762):854–856, 2006.
15
[25] C. Shao, G. L. Ciampaglia, A. Flammini, and F. Menczer. Hoaxy: A
platform for tracking online misinformation. In Proceedings of the 25th
International Conference Companion on World Wide Web, pages 745–750,
2016.
[26] N. Stroud. Niche News: The Politics of News Choice. Oxford University
Press, 2011.
[27] V. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan, K. Lerman,
L. Zhu, E. Ferrara, A. Flammini, F. Menczer, A. Stevens, A. Dekhtyar,
S. Gao, T. Hogg, F. Kooti, Y. Liu, O. Varol, P. Shiralkar, V. Vydiswaran,
Q. Mei, and T. Hwang. The darpa twitter bot challenge. IEEE Computer,
49(6):38–46, 2016.
[28] C. R. Sunstein. Going to Extremes: How Like Minds Unite and Divide.
Oxford University Press, 2009.
[29] O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A. Flammini. Online
human-bot interactions: Detection, estimation, and characterization. In
Proc. Intl. AAAI Conf. on Web and Social Media (ICWSM), 2017.
[30] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford. Captcha: Using
hard ai problems for security. In E. Biham, editor, Advances in Cryptol-
ogy — Proceedings of EUROCRYPT 2003: International Conference on
the Theory and Applications of Cryptographic Techniques, pages 294–311.
Springer, 2003.
[31] J. Weedon, W. Nuland, and A. Stamos. Information operations and face-
book. white paper, Facebook, April 2017.
[32] S. C. Woolley and P. N. Howard. Computational propaganda worldwide:
Executive summary. Working Paper 2017.11, Oxford Internet Institute,
2017.
16
... Undoubtedly, social bots have become a part of reality for almost every social media platform without exception, but there is something really different in the way how they are coordinated on Western social media and VKontakte. Based on massive evidence, social bots used on English-speaking platforms are primarily represented by programs and scripts, whose mechanism was described in the chapter earlier (Shao, 2017). On the other hand, based on reports and research from independent Russian journalists, the overwhelming majority of social bots on VKontakte are directly operated and controlled by human beings (Novaya Gazeta, 2022). ...
Thesis
The author studies the most popular social platform among Russian speakers using the mixed-methods approach in order to describe the impact that computational propaganda has on the platform itself as well as on the society built around the platform. In addition to this, the author also answers the question of whether a growing presence of computational propaganda can lead to the ultimate abandonment of the network on the part of its audience. Consequently, the author comes up to the conclusion that, on average, VK has a striking presence of social bots with 24.6% out of the total comments in six major sampled news communities – RIA, LIFE.RU, RT in Russian, REN TV, RBK and Lentach. The author also categorizes the tendency of generating comments with computational propaganda. As it turns out, social bots are much more active on working days rather than on weekends. The author draws the conclusion that a huge proportion of social bots on VK are run by human beings presumably working for a particular specialized agency and not by artificial intelligence programs. In addition to this, after conducting a series of interpersonal interviews and analysing the results, the author concludes that users are not likely to fully abandon the network due to the personal attachment that they have to the network.
... As stated by Giansiracusa (2021), "whether we want it or not, automation is coming to journalism, and none are more poised to take advantage of this than the peddlers of fake news". A complex mix of cognitive, social and algorithmic biases makes us more vulnerable to believing online disinformation and being manipulated (Shao et al., 2017). For that reason, it is important to combat disinformation in the same environment in which it is generated: the digital world. ...
Article
Full-text available
The development of the internet and digital technologies has inadvertently facilitated the huge disinformation problem that faces society nowadays. This phenomenon impacts ideologies, politics and public health. The 2016 US presidential elections, the Brexit referendum, the COVID-19 pandemic and the Russia-Ukraine war have been ideal scenarios for the spreading of fake news and hoaxes, due to the massive dissemination of information. Assuming that fake news mixes reliable and unreliable information, we propose RUN-AS (Reliable and Unreliable Annotation Scheme), a fine-grained annotation scheme that enables the labelling of the structural parts and essential content elements of a news item and their classification into Reliable and Unreliable. This annotation proposal aims to detect disinformation patterns in text and to classify the global reliability of news. To this end, a dataset in Spanish was built and manually annotated with RUN-AS and several experiments using this dataset were conducted to validate the annotation scheme by using Machine Learning (ML) and Deep Learning (DL) algorithms. The experiments evidence the validity of the annotation scheme proposed, obtaining the best F1m\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf F_\textbf 1\textbf m$$\end{document}, 0.948, with the Decision Tree algorithm.
... This type of fake account is maintained by malicious users [28], [29]. Social media bots are implicated in the spreading of fake news, according to a study [30]. Despite its critical importance for our individual and social contexts, finding an appropriate mechanism for detecting fake news remains challenging. ...
Article
Full-text available
Nowadays, with the proliferation of different news sources, fake news detection is becoming a crucial topic to research. Millions of articles are published daily in the press, on social media, and in electronic media, and many of them may be fake. It is common for scammers to spread fake news to mislead people for malicious purposes. For researchers to be able to evaluate fake news, it is necessary to understand its diversity, how to study it, how to detect it, and its limitations. A descriptive literature review has been conducted in this paper to identify more appropriate methodologies for analysing fake news. The review found two broad classifications in the fake news research methodologies: fake news study perspectives and fake news detection techniques. Based on our literature review, we suggest four perspectives to study fake news and two major approaches to detecting it. Fake news can be studied in terms of knowledge, style, propagation and source. In order to detect fake news, there are two major approaches: manually and automatically. There are two types of manual fact-checks: expert-based and crowd-sourced. Automatic techniques are based mainly on data science methods, specifically deep learning and machine learning. A machine learning-based method was found to be more appealing when we evaluated all the automatic methods. Further research will focus on investigating the efficacy of using Bayesian methods for detecting fake news statistically because it is a flexible approach that allows for rapid updating of models in response to new data and has been successfully applied to a wide range of problems across different domains.
Chapter
Public trust represents a cornerstone of today’s democracies, their media, and institutions and in the search for consensus among different actors. However, the deliberate and non-deliberate spreading of misinformation and fake news severely damages the cohesion of our societies. This effect is intensified by the ease and speed of information creation and distribution that today’s social media offers. In addition, the current state-of-the-art for artificial intelligence available to everybody at their fingertips to create ultra-realistic fake multimedia news is unprecedented. This situation challenges professionals within the communication sphere, i.e., media professionals and public servants, to counter this flood of misinformation. While these professionals can also use artificial intelligence to combat fake news, introducing this technology into the working environment and work processes often meets a wide variety of resistance. Hence, this paper investigates what barriers but also chances these communication experts identify from their professional point of view. For this purpose, we have conducted a quantitative study with more than 100 participants, including journalists, press officers, experts from different ministries, and scientists. We analyzed the results with a particular focus on the types of fake news and in which capacity they were encountered, the experts’ general attitude towards artificial intelligence, as well as the perceived most pressing barriers concerning its use. The results are then discussed, and propositions are made concerning actions for the most pressing issues with a broad societal impact.KeywordsFake NewsArtificial IntelligenceMedia ForensicJournalismSocial MediaPublic Sector
Article
Full-text available
Evidence concerning the proliferation of propaganda on social media has renewed scientific interest in persuasive communication practices, resulting in a thriving yet quite disconnected scholarship. This fragmentation poses a significant challenge, as the absence of a structured and comprehensive organization of this extensive literature hampers the interpretation of findings, thus jeopardizing the understanding of online propaganda functioning. To address this fragmentation, I propose a systematization approach that involves utilizing Druckman's Generalizing Persuasion Framework as a unified interpretative tool to organize this scholarly work. By means of this approach, it is possible to systematically identify the various strands within the field, detect their respective shortcomings, and formulate new strategies to bridge these research strands and advance our knowledge of how online propaganda operates. I conclude by arguing that these strategies should involve the sociocultural perspectives offered by cognitive and cultural sociology, as these provide important insights and research tools to disentangle and evaluate the role played by supra-individual factors in the production, distribution, consumption, and evaluation of online propaganda.
Chapter
Over time, there has been a significant increase in the amount of text data, leading to an alarming rise in the spread of fake news globally. This has had a detrimental impact on various aspects of society, including the economy, politics, organizations, and individuals. To combat this issue, it is crucial to identify fake news early on. Fake news propagators often target innocent people, making it necessary to develop effective techniques for detecting and preventing the spread of fake news. One approach is to use supervised machine learning algorithms, which can classify news articles as true or false by analyzing their language and features. In this article, natural language processing techniques and feature extraction techniques are implemented to analyze and predict the spread of fake news. The XGBoost algorithm yields the best results during testing, with an accuracy of 99.62%.KeywordsNatural Language ProcessingSupervised Machine LearningArtificial IntelligenceNews ClassificationNeural Networks
Article
Full-text available
Social media are massive marketplaces where ideas and news compete for our attention1. Previous studies have shown that quality is not a necessary condition for online virality2 and that knowledge about peer choices can distort the relationship between quality and popularity3. However, these results do not explain the viral spread of low-quality information, such as the digital misinformation that threatens our democracy4. We investigate quality discrimination in a stylized model of an online social network, where individual agents prefer quality information, but have behavioural limitations in managing a heavy flow of information. We measure the relationship between the quality of an idea and its likelihood of becoming prevalent at the system level. We find that both information overload and limited attention contribute to a degradation of the market’s discriminative power. A good tradeoff between discriminative power and diversity of information is possible according to the model. However, calibration with empirical data characterizing information load and finite attention in real social media reveals a weak correlation between quality and popularity of information. In these realistic conditions, the model predicts that low-quality information is just as likely to go viral, providing an interpretation for the high volume of misinformation we observe online.
Article
Full-text available
Algorithms that favor popular items are used to help us select among many choices, from engaging articles on a social media news feed to songs and books that others have purchased, and from top-raked search engine results to highly-cited scientific papers. The goal of these algorithms is to identify high-quality items such as reliable news, beautiful movies, prestigious information sources, and important discoveries --- in short, high-quality content should rank at the top. Prior work has shown that choosing what is popular may amplify random fluctuations and ultimately lead to sub-optimal rankings. Nonetheless, it is often assumed that recommending what is popular will help high-quality content "bubble up" in practice. Here we identify the conditions in which popularity may be a viable proxy for quality content by studying a simple model of cultural market endowed with an intrinsic notion of quality. A parameter representing the cognitive cost of exploration controls the critical trade-off between quality and popularity. We find a regime of intermediate exploration cost where an optimal balance exists, such that choosing what is popular actually promotes high-quality items to the top. Outside of these limits, however, popularity bias is more likely to hinder quality. These findings clarify the effects of algorithmic popularity bias on quality outcomes, and may inform the design of more principled mechanisms for techno-social cultural markets.
Article
Full-text available
Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.
Conference Paper
Full-text available
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
Article
Full-text available
Peter Hotez reflects on declining vaccination rates in Texas and the potential for future measles epidemics.
Article
Full-text available
Significance The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. However, the World Wide Web is a fruitful environment for the massive diffusion of unverified rumors. In this work, using a massive quantitative analysis of Facebook, we show that information related to distinct narratives––conspiracy theories and scientific news––generates homogeneous and polarized communities (i.e., echo chambers) having similar information consumption patterns. Then, we derive a data-driven percolation model of rumor spreading that demonstrates that homogeneity and polarization are the main determinants for predicting cascades’ size.
Article
Full-text available
Social media have quickly become a prevalent channel to access information, spread ideas, and influence opinions. However, it has been suggested that social and algorithmic filtering may cause exposure to less diverse points of view, and even foster polarization and misinformation. Here we explore and validate this hypothesis quantitatively for the first time, at the collective and individual levels, by mining three massive datasets of web traffic, search logs, and Twitter posts. Our analysis shows that collectively, people access information from a significantly narrower spectrum of sources through social media and email, compared to search. The significance of this finding for individual exposure is revealed by investigating the relationship between the diversity of information sources experienced by users at the collective and individual level. There is a strong correlation between collective and individual diversity, supporting the notion that when we use social media we find ourselves inside "social bubbles". Our results could lead to a deeper understanding of how technology biases our exposure to new information.
Article
Social media have been extensively praised for increasing democratic discussion on social issues related to policy and politics. However, what happens when this powerful communication tools are exploited to manipulate online discussion, to change the public perception of political entities, or even to try affecting the outcome of political elections? In this study we investigated how the presence of social media bots, algorithmically driven entities that on the surface appear as legitimate users, affect political discussion around the 2016 U.S. Presidential election. By leveraging state-of-the-art social bot detection algorithms, we uncovered a large fraction of user population that may not be human, accounting for a significant portion of generated content (about one-fifth of the entire conversation). We inferred political partisanships from hashtag adoption, for both humans and bots, and studied spatio-temporal communication, political support dynamics, and influence mechanisms by discovering the level of network embeddedness of the bots. Our findings suggest that the presence of social media bots can indeed negatively affect democratic political discussion rather than improving it, which in turn can potentially alter public opinion and endanger the integrity of the Presidential election. © 2016, Alessandro Bessi and Emilio Ferrara. All Rights Reserved.
Article
Fox News, MSNBC, The New York Times, The Wall Street Journal, The Rush Limbaugh Show, National Public Radio-a list of available political media sources could continue without any apparent end. This book investigates how people navigate these choices. It asks whether people are using media sources that express political views matching their own, a behavior known as partisan selective exposure. By looking at newspaper, cable news, news magazine, talk radio, and political website use, this book offers a look to-date at the extent to which partisanship influences our media selections. Using data from numerous surveys and experiments, the results provide broad evidence about the connection between partisanship and news choices. This book also examines who seeks out likeminded media and why they do it. Perceptions of partisan biases in the media vary-sources that seem quite biased to some don't seem so biased to others. These perceptual differences provide insight into why some people select politically likeminded media-a phenomenon that is democratically consequential. On one hand, citizens may become increasingly divided from using media that coheres with their political beliefs. In this way, partisan selective exposure may result in a more fragmented and polarized public. On the other hand, partisan selective exposure may encourage participation and understanding. Likeminded partisan information may inspire citizens to participate in politics and help them to organize their political thinking. But, ultimately, the partisan use of niche news has some troubling effects. It is vital that we think carefully about the implications both for the conduct of media research and, more broadly, for the progress of democracy.
Article
The recent increase in partisan media has generated interest in whether such outlets polarize viewers. I draw on theories of motivated reasoning to explain why partisan media polarize viewers, why these programs affect some viewers much more strongly than others, and how long these effects endure. Using a series of original experiments, I find strong support for my theoretical expectations, including the argument that these effects can still be detected several days postexposure. My results demonstrate that partisan media polarize the electorate by taking relatively extreme citizens and making them even more extreme. Though only a narrow segment of the public watches partisan media programs, partisan media's effects extend much more broadly throughout the political arena.