PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Individuals can be misled by fake news and spread it unintentionally without knowing it is false. This phenomenon has been frequently observed but has not been investigated. Our aim in this work is to assess the intent of fake news spreaders. To distinguish between intentional versus unintentional spreading, we study the psychological explanations of unintentional spreading. With this foundation, we then propose an influence graph, using which we assess the intent of fake news spreaders. Our extensive experiments show that the assessed intent can help significantly differentiate between intentional and unintentional fake news spreaders. Furthermore, the estimated intent can significantly improve the current techniques that detect fake news. To our best knowledge, this is the first work to model individuals' intent in fake news spreading.
Content may be subject to copyright.
“This is Fake! Shared it by Mistake”:
Assessing the Intent of Fake News Spreaders
Xinyi Zhou
Syracuse University
Syracuse, NY, USA
zhouxinyi@data.syr.edu
Kai Shu
Illinois Institute of Technology
Chicago, IL, USA
kshu@iit.edu
Vir V. Phoha
Syracuse University
Syracuse, NY, USA
vvphoha@syr.edu
Huan Liu
Arizona State University
Tempe, AZ, USA
huan.liu@asu.edu
Reza Zafarani
Syracuse University
Syracuse, NY, USA
reza@data.syr.edu
ABSTRACT
Individuals can be misled by fake news and spread it unintentionally
without knowing it is false. This phenomenon has been frequently
observed but has not been investigated. Our aim in this work is to
assess the intent of fake news spreaders. To distinguish between
intentional versus unintentional spreading, we study the psycholog-
ical explanations of unintentional spreading. With this foundation,
we then propose an inuence graph, using which we assess the
intent of fake news spreaders. Our extensive experiments show
that the assessed intent can help signicantly dierentiate between
intentional and unintentional fake news spreaders. Furthermore,
the estimated intent can signicantly improve the current tech-
niques that detect fake news. To our best knowledge, this is the
rst work to model individuals’ intent in fake news spreading.
CCS CONCEPTS
Human-centered computing Social media
;
Applied com-
puting Law, social and behavioral sciences
;
Computing
methodologies Articial intelligence.
KEYWORDS
Fake news, intent, social media
ACM Reference Format:
Xinyi Zhou, Kai Shu, Vir V. Phoha, Huan Liu, and Reza Zafarani. 2022.
“This is Fake! Shared it by Mistake”: Assessing the Intent of Fake News
Spreaders. In Proceedings of the ACM Web Conference 2022 (WWW ’22), April
25–29, 2022, Virtual Event, Lyon, France. ACM, New York, NY, USA, 10 pages.
https://doi.org/10.1145/3485447.3512264
1 INTRODUCTION
A frequently observed and discussed phenomenon is that individ-
uals can be misled by fake news and can unintentionally spread
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9096-5/22/04. . . $15.00
https://doi.org/10.1145/3485447.3512264
it [
24
,
30
,
47
]. Thankfully, research has pointed out that (1) cor-
rection and (2) nudging can eectively prevent such users from
spreading fake news. That is, by informing them of the news false-
hood, or simply requesting from them to pay attention to news
accuracy before spreading the news [
24
,
36
]. Such ndings encour-
age social media platforms to develop more gentle strategies for
these unintentional fake news spreaders to reasonably and eec-
tively combat fake news. Clearly, such strategies should vary from
the aggressive deactivation and suspension strategies that plat-
forms adopt for inauthentic or toxic accounts (e.g., Twitter
1
and
Facebook
2
). For example, platforms can present such unintentional
fake news spreaders with useful facts, motivating the need for new
recommendation algorithms. Such algorithms not only recommend
topics to these users that they enjoy reading the most (or users they
are similar to), but also facts or users active in fact-checking (see
Figure 1 for an example) [18, 36, 47].
To determine (1) if correction or nudging is needed for a fake
news spreader, (2) whether the spreader should be suspended or de-
activated, or (3) which users should be targeted by fact-presenting
recommendation algorithms, one needs to assess the intent of fake
news spreaders. Furthermore, knowing that some users had ma-
licious intent in the past provides a strong signal indicating that
their future posts are also potentially fake. This information can be
immensely useful for fake news detection [
47
]. While determining
the intent is extremely important, it is yet to be investigated.
This work: Assessing Spreading Intent
. We aim to assess the
intent of individuals spreading fake news. Our approach to as-
sessing the intent of fake news spreaders relies on fundamental
social science theories and exploits advanced machine learning
techniques. In particular, we rst look into psychological factors
that can contribute to the unintentional spreading of fake news (see
Section 2.1). These factors can be categorized as internal inuence
and external inuence [
47
]. To capture these factors, and in turn,
quantify intent, we propose an inuence graph; a directed, weighted,
and attributed graph. The degree to which fake news spreaders
are intentional/unintentional can be assessed with this graph. To
evaluate our assessment, we rst extend two fake news datasets by
introducing annotated intent data of fake news spreaders (inten-
tional or unintentional) due to the unavailability of ground truth.
1https://help.twitter.com/en/rules-and-policies/twitter- rules
2https://transparency.fb.com/policies/
arXiv:2202.04752v1 [cs.SI] 9 Feb 2022
WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France Zhou et al.
Figure 1: An Example of a
Fact-checking Post
19
Post content
c
j
User
u
j
Shared news article
a
j
Time
t
j
Figure 2: An Illustration of a
Post 𝑝𝑗=(𝑎𝑗, 𝑐 𝑗, 𝑡 𝑗, 𝑢𝑗)
With this data, we validate the assessed intent and show that it can
strongly dierentiate between intentional and unintentional fake
news spreaders. We further show through experiments that the
assessed intent can signicantly enhance fake news detection.
The innovation and contribution of this work are:
(1)
Modeling Fake News Spreading Intent: To our best knowledge,
this is the rst work to assess the degree to which fake news
spreaders are intentional/unintentional. To this end, we con-
duct an interdisciplinary study that endows our work with
a theoretical foundation and explainability. A new inuence
graph is proposed that captures factors that contribute to
spreading intent as well as multimodal news information.
(2)
New Datasets on Intent: We leverage manual and automatic
annotation mechanisms to introduce the ground truth on
the intent of fake news spreaders in two large-scale real-
world datasets. These are the rst two datasets that provide
intent information. We conduct extensive experiments using
these datasets to validate the assessed intent of fake news
spreaders.
(3)
Combating Fake News: Our work helps combat fake news
from two perspectives. First, we demonstrate that by assess-
ing intent, we can successfully distinguish between mali-
cious fake news spreaders (should be blocked) and benign
ones (should be presented with facts or nudged). Second, we
present the eectiveness of the assessed spreader intent (and
the proposed inuence graph) in fake news detection.
The rest of the paper is organized as follows. A literature review
is rst conducted in Section 2. In Section 3, we specify the method
to assess the intent of fake news spreaders, followed by the method
evaluation in Section 4. We demonstrate the value of assessing
intent in combating fake news in Section 5. Finally, we conclude in
Section 6 with a discussion on our future work.
2 RELATED WORK
We rst review fundamental social science theories that have been
connected to fake news spreading (see Section 2.1). Next, we review
the methods developed to combat fake news (see Section 2.2) as we
will later utilize the assessed spreader intent to detect fake news.
2.1 Social Science Foundation of
Unintentional Fake News Spreading
Extensive social science research has been conducted on fake news.
We particularly review studies that focus on the psychological
factors that contribute to the unintentional spreading of fake news.
Lazer et al. [
15
] attribute this phenomenon to “individuals prefer
information that conrms their preexisting attitudes (selective ex-
posure), view information consistent with their preexisting beliefs
as more persuasive than dissonant information (conrmation bias),
and are inclined to accept information that pleases them (desirability
bias).” Scheufele and Krause [
30
] summarize these factors as conr-
mation bias,selective exposure, and motivated reasoning (i.e., people
tend to use emotionally biased reasoning to make most desired
decisions rather than those that accurately reect the evidence).
Grouping aforementioned psychological factors as an internal
inuence, Zhou and Zafarani [
47
] further discuss how the exter-
nal inuence on individuals can contribute to their unintentional
spreading of fake news. Such social inuence can be reected via,
e.g., availability cascade (i.e., individuals tend to adopt insights
expressed by others when such insights are gaining more popular-
ity) [
14
], social identity theory [
3
,
13
] (i.e., individuals conform to
the behavior of others for being liked and accepted by the commu-
nity and society), and validity eect (e.g., individuals tend to believe
information is correct after repeated exposures) [6, 23].
This work shares the social science foundation presented in
[
15
,
30
,
47
]. Besides understanding why individuals can be misled
by fake news and unintentionally spread it, we further conduct
quantitative research to assess user intent.
2.2 Methodologies to Combat Fake News
The unprecedented growth of fake news and its detrimental im-
pacts on democracies, economies, and public health has increased
the demand for automatic methodologies to combat fake news [
47
].
With extensive recent contributions by the research community,
automatic fake news detection has signicantly improved in e-
ciency and explainability. In general, fake news detection methods
can be content-based or propagation-based depending on whether
the method focuses on investigating news content or how the news
spreads on social media.
As news articles are mostly text, content-based methods start
with manually extracting linguistic features for news representa-
tion; LIWC (Linguistic Inquiry and Word Count) [
22
] has been often
employed as a comprehensive feature extractor [
7
,
25
,
27
]. Common
classiers, such as SVMs (Support Vector Machines), are then used
to predict fake news. With advances in deep learning, recent atten-
tion has been paid to employing multimodal (textual and visual)
information of news content to detect fake news (see related work
such as [
1
,
26
,
39
,
43
,
46
]). On the other hand, propagation-based
methods utilize auxiliary social-media information to predict fake
news. Some examples of such information include post stances [
33
],
post-repost relationships [
37
], user comments [
32
], and proles [
9
].
There have been other strategies proposed to combat fake news.
For example, education and nudging have been emphasized to im-
prove individuals’ ability to recognize misinformation [
15
,
17
,
30
].
Pennycook et al. further provide empirical evidence that uninten-
tional fake news spreading can become less by asking individuals
“This is Fake! Shared it by Mistake”: Assessing the Intent of Fake News Spreaders WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France
to assess the accuracy of news before attempting to spread it [
24
].
Lazer et al. suggest incorporating information quality into algorith-
mic rankings or recommendations of online platforms [
15
]. Studies
have also demonstrated that connecting users active in fact check-
ing with fake news spreaders on social networks is an eective way
to combat fake news [18, 36].
3 MODELING THE INTENT OF FAKE NEWS
SPREADERS ON SOCIAL MEDIA
As presented in Section 2.1, psychological factors that contribute
to unintentional fake news spreading of individuals can be summa-
rized as two: (1) internal inuence and (2) external inuence [
15
,
30
,
47
]. Hence, an individual is more unintentional in spreading a
news article if his or her spreading behavior receives more internal
and external inuence. Specically, both conrmation bias [
20
,
21
]
and selective exposure [
12
,
19
] point out that the more consistent an
individual’s preexisting attitudes and beliefs are with the fake news,
the higher the probability that the individual would believe the fake
news and unintentionally spread it (internal inuence) [
15
,
30
,
47
].
As availability cascade [
14
] and social identity theory [
3
,
13
] suggest,
individuals can be aected by others as well. An individual would be
more unintentional in spreading a fake news article if the spreading
follows a herd behavior; i.e., the individual’s participation matches
extensive participation of others and his or her attitude conforms
to the attitudes of most participants (external inuence) [47].
Problems then arise on social media: where can one nd out
the preexisting attitudes and beliefs of a user, the participation of
users, and their attitudes towards a news article? We note that a
user’s preexisting attitudes and beliefs can be reected in his or
her historical activities on social media. For most social media sites,
such historical activities include past posts, likes, and comments.
Similarly, the participation of users often takes the form of posting,
liking, and commenting. Hence, mining the content of posts and
comments allows understanding users’ attitudes. For simplicity, we
start with posts in this work to determine users’ preexisting beliefs
and participation. In sum, a user spreads a fake news article in his
or her post more unintentionally if the post is more similar to or
inuenced by (1) the user’s past posts (internal inuence), and (2)
the posts of other users (external inuence).
A natural approach to capture the inuence among posts is to
construct an inuence graph of posts. In this graph, a (directed) edge
between two posts indicates the (external or internal) inuence ow
from one post to the other. The edge weight implies the amount
of the inuence ow. With this graph, the overall inuence that a
post receives from other posts can be assessed by looking at its cor-
responding incoming edges and their weights. The more inuence
a post that contains fake news receives, the more unintentional is
the user who is posting it in spreading this fake news.
To concretely present our proposed inuence graph formed by
a group of posts, we start by a pair of
p
osts
𝑝𝑖
and
𝑝𝑗
, which are
represented as tuples
(𝑎𝑖, 𝑐𝑖, 𝑡𝑖, 𝑢𝑖)
and
(𝑎𝑗, 𝑐 𝑗, 𝑡 𝑗, 𝑢𝑗)
, respectively.
An example is presented in Figure 2. In the tuple representing post
𝑝𝑖
,
𝑎𝑖
denotes the news
a
rticle shared by post
𝑝𝑖
. For simplicity, we
rst assume that each post can only share one news article (we
will consider a more general case later in this section);
U
ser
𝑢𝑖
and
t
ime
𝑡𝑖
refer to the user and posting time of
𝑝𝑖
; and
C
ontent
𝑐𝑖
is
pipj
ti ≥ tj:
ti < tj:
pipj
ui = uj, ai ≠ aj:
pipj
ui ≠ uj:
pipj
17
ai = aj, ci = cj, ti + ϵ = tj:
pipj
ai ≠ aj, ci ≠ cj, ti ≪ tj:
pipj
(a) Without (upper g.)
v.s. With (down)Inuence
pipj
ti ≥ tj:
ti < tj:
pipj
ui = uj, ai ≠ aj:
pipj
ui ≠ uj:
pipj
17
ai = aj, ci = cj, ti + ϵ = tj:
pipj
ai ≠ aj, ci ≠ cj, ti ≪ tj:
pipj
(b) External (up) v.s. Inter-
nal (down) Inuence
pipj
ti ≥ tj:
ti < tj:
pipj
ui = uj, ai ≠ aj:
pipj
ui ≠ uj:
pipj
17
ai = aj, ci = cj, ti + ϵ = tj:
pipj
ai ≠ aj, ci ≠ cj, ti ≪ tj:
pipj
(c) Large (up) v.s. Small
(down) Volume of Inuence
Figure 3: Pairwise Inuence of Posts 𝑝𝑖and 𝑝𝑗: (a) decides if
there is an edge from 𝑝𝑖to 𝑝𝑗in an inuence graph; (b) iden-
ties the edge attribute; and (c) determines the edge weight.
the post content, often containing the attitude and opinion of
𝑢𝑖
regarding
𝑎𝑖
. Next, we discuss how (A) internal and (B) external
inuence between 𝑝𝑖and 𝑝𝑗can be modeled, respectively.
A. Modeling Internal inuence between
𝑝𝑖
and
𝑝𝑗
.If
𝑝𝑖
internally
inuences
𝑝𝑗
,
𝑝𝑖
should be posted earlier than
𝑝𝑗
and by the same
user of
𝑝𝑗
(to capture preexisting beliefs of the user), i.e.,
𝑡𝑖<𝑡𝑗
and
𝑢𝑖=𝑢𝑗
. The amount of inuence owing from
𝑝𝑖
to
𝑝𝑗
can
be determined by how similar the news articles and attitudes in
𝑝𝑗
are to those of
𝑝𝑖
. In other words, how similar
𝑎𝑗
and
𝑐𝑗
are
to
𝑎𝑖
and
𝑐𝑖
[
47
]. However, evidence has indicated that the same
user spreading the same news, especially fake news, is often a sign
of intentional spreading rather than unintentional spreading [
31
].
Therefore, we exclude internal inuence from 𝑝𝑖and 𝑝𝑗if 𝑎𝑖=𝑎𝑗.
B. Modeling External inuence between
𝑝𝑖
and
𝑝𝑗
.If
𝑝𝑖
externally
inuences
𝑝𝑗
,
𝑝𝑖
should, at least, be posted by a dierent user from
that of
𝑝𝑗
(to capture “external”) and earlier than
𝑝𝑗
(otherwise,
𝑝𝑖
is not observable to
𝑝𝑗
); i.e.,
𝑡𝑖<𝑡𝑗
and
𝑢𝑖𝑢𝑗
. We further
consider two questions in assessing external inuence. First, can a
user’s post spreading one news article externally inuence a post
of another user spreading a dierent news article; in other words,
if
𝑎𝑖𝑎𝑗
, can
𝑝𝑖
possibly inuences
𝑝𝑗
externally with
𝑡𝑖<𝑡𝑗
and
𝑢𝑖𝑢𝑗
? Two news articles that dier in text or image may discuss
the same event and express the same political stance; hence, this
scenario is possible but depends on the similarity between the two
news articles [
5
]. Second, can a user’s post possibly be inuenced
by the other’s post if the two users are not socially connected on
social media? Due to the platforms’ diverse recommendations and
services (e.g., the trending in Twitter and Weibo), this scenario is
also possible, but the amount of inuence depends on how similar
news articles and attitudes in 𝑝𝑗are to those of 𝑝𝑖[44, 47].
We summarize the above discussions by answering the following
three questions:
(1)
Edge existence: Can
𝑝𝑖
possibly inuence
𝑝𝑗
?As discussed,
it is is barely possible for
𝑝𝑖
to (internally or externally)
aect
𝑝𝑗
, if it is posted later than
𝑝𝑗
. Hence, in an inuence
graph, a directed edge can possibly exist from
𝑝𝑖
to
𝑝𝑗
, if
𝑝𝑖
is posted earlier than
𝑝𝑗
(i.e.,
𝑡𝑖<𝑡𝑗
); if
𝑡𝑖𝑡𝑗
, no edge
exists from
𝑝𝑖
to
𝑝𝑗
. Therefore, there can be either no edge
or only one directed edge between two posts. See Figure
3(a) for an illustration. Note that whether an edge ultimately
exists between two posts also depends on the edge weight
WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France Zhou et al.
(we specify below in 3); a zero weight can make an edge
“disappear.
(2)
Edge attribute: What type of inuence (internal vs external)
is owing between
𝑝𝑖
and
𝑝𝑗
?We dene the inuence as
external, if 𝑝𝑖and 𝑝𝑗are posted by dierent users, i.e., 𝑢𝑖
𝑢𝑗
[
44
]. The inuence is internal, if
𝑝𝑖
and
𝑝𝑗
are posted by
the same user and do not share the same news, i.e.,
𝑢𝑖=𝑢𝑗
and 𝑎𝑖𝑎𝑗[31]. See Figure 3(b) for an illustration.
(3)
Edge weight: How much inuence ows from
𝑝𝑖
to
𝑝𝑗
?We
assume that the amount of inuence ow is aected by three
factors. The rst, as discussed, is the news articles shared
by
𝑝𝑖
and
𝑝𝑗
(
𝑎𝑖
versus
𝑎𝑗
); basically, if
𝑝𝑖
and
𝑝𝑗
spread
the same news, the inuence ow between them should
be greater compared to if they spread completely dierent
news articles [
5
,
47
]. The second, as discussed, is the atti-
tudes held by
𝑝𝑖
and
𝑝𝑗
on the news (
𝑐𝑖
versus
𝑐𝑗
); basically, if
two posts agree with each other, the inuence ow between
them should be greater compared to if they disagree with
each other [
47
]. Furthermore, we consider the time interval
between
𝑝𝑖
and
𝑝𝑗
(
𝑡𝑖
versus
𝑡𝑗
); instead of “remembering all”,
users forget past news articles and their corresponding posts
over time (with some decay) [
40
]. Thus, a greater amount
of inuence ow is assigned to two posts when one is pub-
lished close in time to the other, compared to those that are
published farther apart.
Next, we formalize the proposed inuence graph (see Denition
3.1), and introduce how the intent of (fake) news spreaders can be
quantied based on this graph. Clearly, in a real-world scenario, it
is possible for a post to contain more than one news article (e.g.,
multiple URLs). Hence, in this formalization, we no longer assume
that each post can only share one news article and generalize to a
set of articles, i.e., (𝑎𝑖, 𝑐𝑖, 𝑡𝑖, 𝑢𝑖)becomes (𝐴𝑖, 𝑐 𝑖, 𝑡𝑖,𝑢𝑖).
Denition 3.1 (Inuence Graph). Given a set of news articles,
denoted as
𝐴={𝑎1, 𝑎2,· · · , 𝑎𝑚}
, we denote user posts that share
these news articles on social media as
𝑃={𝑝1, 𝑝2,· · · , 𝑝𝑛}
. Each
post
𝑝𝑖(𝑖=1,2,· · · , 𝑛 )
is represented as a tuple
(𝐴𝑖, 𝑐𝑖, 𝑡𝑖, 𝑢𝑖)
,
where
𝐴𝑖
,
𝑐𝑖
,
𝑡𝑖
, and
𝑢𝑖
respectively refer to a set of news arti-
cles (can be one article) shared by the post (i.e.,
𝐴𝑖𝐴
), the post
content, the posting time, and the user.
Inuence graph, denoted as
𝐺=(𝑉 , 𝐸, W)
, is formed by user
posts, i.e.,
𝑉=𝑃
. Edges exist from
𝑝𝑖
to
𝑝𝑗
if (i)
𝑝𝑖
is posted earlier
than
𝑝𝑗
, and (ii)
𝑝𝑖
and
𝑝𝑗
do not share the same news when posted
by the same user. In other words,
(𝑝𝑖, 𝑝 𝑗) ∈ 𝐸
if (i)
𝑡𝑖<𝑡𝑗
and (ii)
𝐴𝑖𝐴𝑗for 𝑢𝑖=𝑢𝑗. The edge weight for (𝑝𝑖, 𝑝 𝑗)is
W𝑖𝑗 =¯
S(𝐴𝑖, 𝐴𝑗) · S (𝑐𝑖,𝑐 𝑗) · T (Δ𝑡𝑖𝑗 ),(1)
where
S(∗𝑖,𝑗)
assesses the similarity between
𝑖
and
𝑗
,
T (Δ𝑡𝑖 𝑗 )
for
Δ𝑡𝑖 𝑗 =𝑡𝑗𝑡𝑖
is a self-dened monotonically decreasing decay
function to capture users’ forgetting, and
¯
S(𝐴𝑖, 𝐴𝑗)
computes the
average pairwise similarity among news pairs
(𝑎𝑖, 𝑎 𝑗) ∈ 𝐴𝑖×𝐴𝑗
.
Formally,
¯
S(𝐴𝑖, 𝐴𝑗)=Í(𝑎𝑘,𝑎𝑙) ∈𝐴𝑖×𝐴𝑗S(𝑎𝑘, 𝑎𝑙)
|𝐴𝑖|×|𝐴𝑗|;(2)
hence, ¯
S(𝐴𝑖, 𝐴𝑗)=S (𝑎𝑘, 𝑎𝑙)if 𝐴𝑖={𝑎𝑘}and 𝐴𝑗={𝑎𝑙}.
Based on the above graph, the overall inuence on each post,
which we denote as the aected degree, is computed as
f𝑗=
(𝑝𝑖,𝑝𝑗) ∈ 𝐸
W𝑖𝑗 ,(3)
where the external and internal inuence, respectively, refer to
fExternal
𝑗=Í(𝑝𝑖,𝑝𝑗) ∈ 𝐸W𝑖𝑗 if 𝑢𝑖𝑢𝑗;
fInternal
𝑗=Í(𝑝𝑖,𝑝𝑗) ∈ 𝐸W𝑖𝑗 if 𝑢𝑖=𝑢𝑗.(4)
For posts sharing fake news articles, greater values of
fExternal
𝑗
,
fInternal
𝑗
, and
f𝑗
indicate that user
𝑗
receives more external, inter-
nal, and combined (external+internal) inuence when spreading
the fake news article, i.e., the user engages more unintentionally.
Conversely, smaller values of
fExternal
𝑗
,
fInternal
𝑗
, and
f𝑗
indicate
that the user is aected less and engages more intentionally in fake
news spreading.3
Customized Implementation Details.
The implementation of
inuence graph has several customizable parts; it can be modied
by dening dierent
T
, developing dierent techniques to represent
news articles and user posts, and designing ways to compute their
similarities. Below are our implementations and justications.
To represent news articles and posts, we investigate both textual
and visual information within the content. Textual features are
extracted using transformers, which have excellently performed
in understanding semantics of text and various NLP (Natural Lan-
guage Processing) tasks such as machine translation and sentiment
analysis [
35
,
41
]. As user posts are often short and within 512 words
(e.g., on Twitter, the number of words are not allowed to exceed
280),
4
we use a pre-trained Sentence-RoBERTa model, which modi-
es RoBERTa by the Siamese network, to obtain the post embed-
ding [
28
]; the model performs best in the task of semantic textual
similarity.
5
Dierently, as news articles are often long and over
512 words,
6
we employ Longformer [
4
] to derive the semantically
meaningful text embedding of news articles. Longformer addresses
the limitation of 512 tokens in BERT by reducing the quadratic scal-
ing (with the input sequence) to linear [
4
]. For visual features, we
extract them using a pre-trained DeepRanking model particularly
designed for the task of ne-grained image similarity computa-
tion [
38
]. With textual features of news (or post) denoted as
t
, and
its visual features denoted as
v
, we dene the similarity between a
news (or post) pair as
S(∗𝑖,𝑗)=𝜇^cos(t𝑖,t𝑗)+(1𝜇)^cos (v𝑖,v𝑗),(5)
where
=𝑎(for news)
or
𝑝(for posts)
;
^cos(., .)=[1cos (., .)]/2
;
and
𝜇
,
^cos(., .)
,
S(., .) ∈ [0,1]
. In our experiments, we determine
the value of 𝜇by varying it from 0.1 to 0.9 with a step size 0.1; we
set 𝜇=0.8that leads to the best evaluation and prediction results.
As for decay function T, we dene it as
T (Δ𝑡𝑖 𝑗 )=𝑒1Δ𝑡𝑖 𝑗 ,(6)
which is inspired by [
40
].
Δ𝑡𝑖 𝑗 =𝑡𝑗𝑡𝑖
and
𝑡𝑖
indicates the chrono-
logical ranking of post
𝑝𝑖
(i.e.,
𝑡𝑖Z+
); hence,
T (.)∈(0,1]
due
3The statement also holds for posts sharing true news articles.
4https://developer.twitter.com/en/docs/counting-characters
5https://github.com/UKPLab/sentence-transformers
6
As [
45
] suggests, the number of words of news articles published by mainstream and
fake news medium has a mean value around 800 and median value around 600.
“This is Fake! Shared it by Mistake”: Assessing the Intent of Fake News Spreaders WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France
to
𝑡𝑗>𝑡𝑖
. The benet of such
T
is two fold. First, it helps normal-
ize the aected degree for any inuence graph. Specically, let
f
𝑗
denote either of
f𝑗
,
fInternal
𝑗
, or
fExternal
𝑗
. Let
^
f
𝑗
denote the nor-
malized version of
f
𝑗
, i.e.,
^
f
𝑗∈ [0,1]
(accurately, here
^
f
𝑗∈ [0,1)
).
Then, for f
𝑗we have
f
𝑗=Í(𝑝𝑖,𝑝𝑗) ∈ 𝐸S(𝐴𝑖, 𝐴 𝑗) · S (𝑐𝑖, 𝑐 𝑗) · T (Δ𝑡𝑖 𝑗 )
Í(𝑝𝑖,𝑝𝑗) ∈ 𝐸T (Δ𝑡𝑖 𝑗 )
<Í
𝑘=1𝑒1𝑘
=𝑒(𝑒1)1.
(7)
In other words, the upper bound of the aected degree, denoted
by
𝑓max
, is
𝑒(𝑒1)1
. Strictly speaking,
𝐾
posts (
𝐾>1
) can be
posted at the same time in a real-world scenario, i.e., their ranking,
denoted by
𝑡𝑋
, is the same. We point out that the upper bound
𝑓max
still holds in this case, if the ranking value after
𝑡𝑋
is
𝑡𝑋+𝐾
rather
than
𝑡𝑋+1
. Finally, the normalized aected degree
^
f
𝑗
for post
𝑝𝑗
is
^
f
𝑗=1
𝑓max
f
𝑗=𝑒1
𝑒f
𝑗.(8)
Secondly, in the worst case, inuence graph can be a tournament,
taking up much space. Such
T
facilitates graph sparsication, while
maintaining the performance on tasks (see details in Appendix A).
Lastly, we note that we have tested
Δ𝑡𝑖 𝑗
(the time interval) with
various units (seconds/minutes/hours/days) in addition to chrono-
logical rankings; still, the ranking performs best in all experiments.
4 METHOD EVALUATION
In this section, we evaluate the proposed method in assessing the in-
tent of fake news spreaders. To this end, evaluation data is required
that contains the ground-truth label on
News credibility, i.e., whether a news article is fake news or
true news; and
Spreader intent, i.e., whether a user spreads a fake news
article intentionally or unintentionally on social media.
We point out that this work is the rst to model individuals’ intent
in fake news propagation. Therefore, no data exists that contains
the ground-truth label on spreader intent, let alone both news cred-
ibility and spreader intent. Next, we rst detail how this problem is
addressed in Section 4.1, followed by the method evaluation results
in Section 4.2.
4.1 Datasets and Annotations
Our experiments to evaluate the proposed method are based on two
datasets developed for news credibility research: MM-COVID [
16
]
and ReCOVery [
45
]. Generally speaking, both datasets collect news
information veried by domain experts (labeled as true or fake)
and how the news spreads on Twitter. The corresponding data
statistics are in Table 1(a); we focus on the news with social context
information, and on the English news and tweets to which all pre-
trained models can be applied.
Although the ground-truth label on news credibility is avail-
able, both datasets do not provide annotations on intent of fake
news spreaders. We rst consider manual annotation to address
this problem. Specically, we invite one expert knowledgeable in
misinformation area and one graduate student generally aware
of the area. We randomly sample 300 posts (unique in tweet ID
and user ID) from MM-COVID and ReCOVery that contain fake
news (i.e., users of these posts are all fake news spreaders). Before
annotating, we rst inform the annotators with the denition and
general characteristics of unintentional fake news spreaders. That
is, as presented in Section 1: these spreaders are misled by fake
news, barely recognize it is fake, tend to believe in the fake news;
meanwhile, if informed on news falsehood or presented with facts,
such spreading behavior of them can be reduced, or even stopped.
In annotating, we present the two annotators with
The tweet’s link that spreads fake news, which allows anno-
tators to access the tweet details (as illustrated in Fig. 2).
The user’s link who posts the tweet, which allows annotators
to access the user’s prole and historical activities.
For each post, we ask the two annotators to
(1)
Annotate if the user spreads the fake news unintentionally
(with an answer of yes or no);
(2) Present the condence level (detailed below);
(3) Explain the annotation with evidence; and
(4) Provide an estimate on the time spent on annotation.
We provide three optional levels of condence. 0 indicates the an-
notation result is a random guess; no evidence is found to help
annotation, or half the evidence supports but the other half re-
jects the annotation result. 0.5 indicates a medium-level condence;
among all the evidence that the annotator nds, some of them re-
ject but most of them support the annotation result. 1 indicates
a high-level condence; all the evidence that the annotator nds
support the annotation result.
With the returned annotations, we compute the agreement of the
two annotators by Cohen’s
𝜅
coecient [
10
].
𝜅=0.61
, removing
annotations with no condence; in other words, two annotators
substantially agree with each other [
10
]. To further obtain the
ground truth, we only consider the annotations with a condence
score
0.5
and agreed by the two annotators. Finally, 119 posts
sharing fake news have the ground-truth label on their users’ intent,
among which 59 are unintentional and 60 are intentional.
We point out that annotating intent of fake news spreaders is
a time-consuming and challenging task. Around ve minutes is
required to annotate each instance on average. Understanding the
user intent behind a post demands evaluating the tweet content
and studying the user based on his or her historical behavior on
social media. Such manual annotation for large-scale data is hence
impractical, which drives us to consider algorithmic annotation that
accurately simulates manual annotation in an automatic manner.
Interestingly, we observe that annotators are more condent in
identifying intentional fake news spreaders than unintentional
ones. Specically, the expert annotator is at 0.93 condence level in
identifying intentional fake news spreaders and at 0.75 condence
level in identifying unintentional fake news spreaders. For the
graduate student annotator, the condence score is 0.84 and 0.57,
respectively. Both results have
𝑝0.001
with Mann–Whitney U
test. To conduct algorithmic annotation that can accurately simulate
manual annotation, we thus start to think “what kind of fake news
spreaders can be intentional?"
WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France Zhou et al.
Table 1: Data Statistics
(a) on News Credibility
MM-
COVID
Re-
COVery
# News Fake 355 535
True 448 1,231
# Tweets Sharing Fake News 16,500 26,657
Sharing True News 20,905 117,087
(b) on Intent of Fake News Spreaders
MM-
COVID
Re-
COVery
# Fake
News
Spreaders
Unintentional 9,237 7,911
Intentional 4,285 7,327
Bots 3,195 6,266
Trolls 1,024 2,687
Correctors 463 6
# Tweets
Sharing
Fake News
by Unintentional 10,519 10,733
by Intentional 5,953 12,502
by Bots 4,530 11,035
by Trolls 1,360 4,240
by Correctors 789 8
With the explanations given by annotators, we can reasonably
assume bots and trolls who have engaged in fake news propaga-
tion as intentional fake news spreaders. As inauthentic and toxic
accounts, bots and trolls have been often suspended or deactivated
by social media platforms (e.g., Twitter and Facebook) regardless of
spreading fake news or not. In fact, they have played a signicant
role in fake news dissemination [
11
,
31
,
34
,
47
]. As a comparison,
unintentional fake news spreaders deserve a “gentle” strategy de-
veloped by social media platforms: nudging and fact-presenting
recommendation are more reasonable than suspension and deacti-
vation, as we specied in Section 1. Therefore, we separate bots and
trolls from unintentional fake news spreaders. We further notice
that users active in fact-checking can spread fake news as well, in
acorrection manner; i.e., they clarify news is false (objectively, and
not aggressively) and inform other users of it in their spreading.
We call the corresponding posts that spread fake news correction
posts and these users correctors later in the paper. These correctors
enables recognizing news falsehood. We thus separate them from
unintentional fake news spreaders.
We identify bots and trolls by collecting data from two well-
established and widely accepted platforms, Botometer [
29
]
7
and
Bot Sentinel.
8
Ultimately, each Twitter user is assigned a bot score
(denoted as
𝑏
) and a troll score (denoted as
𝑟
), where
𝑏, 𝑐 ∈ [0,1]
.
To identify correctors, we rst annotate each tweet as a correction
or non-correction tweet. Then, we assign each fake news spreader
a corrector score (denoted as
𝑐
, where
𝑐∈ [0,1]
) by computing the
proportion of the user’s correction tweets to his or her total tweets
7https://botometer.osome.iu.edu/
8https://botsentinel.com/
Table 2: Performance of Algorithmic Annotations on Intent
of Fake News Spreaders
AUC Score Cohen’s 𝜅
MM-COVID + ReCOVery 0.8824 0.7482
MM-COVID 0.8857 0.7520
ReCOVery 0.8000 0.6484
that share fake news. With a threshold value,
𝜃∈ [0,1]
, each fake
news spreader can be classied as (i) bot (if
𝑏∈ [0, 𝜃 )
) or non-bot
(if
𝑏∈ [𝜃, 1]
), (ii) troll (if
𝑟∈ [0, 𝜃 )
) or non-troll (if
𝑟∈ [𝜃, 1]
), and
(iii) corrector (if 𝑐∈ [0, 𝜃)) or non-corrector (if 𝑐∈ [𝜃, 1]).
With identied bots, trolls, and correctors (here, we use 0.5 as
the threshold, i.e.,
𝜃=0.5
), the algorithmic annotation on intent
of fake news spreaders is conducted at two levels: (i) tweet-level
and (ii) user-level. At the tweet-level, the algorithm labels all cor-
rection tweets and tweets of bots and trolls that share fake news as
intentional spreading. The tweet-level annotation captures the user
intent for each spreading action of fake news. At the user-level, the
algorithm labels all bots, trolls, and correctors as intentional spread-
ers. The user-level annotation captures the general user intent when
spreading fake news. Table 1(b) summarizes the corresponding data
statistics.
Evaluating Algorithmic Annotations.
We compare the algorith-
mic annotation results with the manual annotations. Results are
shown in Table 2; results are the same at both the tweet- and user-
levels. We observe that the algorithmic annotation eectively sim-
ulates the manual annotation, whose AUC score is above 0.8 using
sampled MM-COVID and/or ReCOVery datasets. Automatic and
manual annotations have a substantial agreement with Cohen’s
𝜅
coecient above 0.64 [10].
4.2 Experimental Results
With annotated intent (intentional or unintentional) of fake news
spreaders, we verify if the assessed intent (i.e., aected degree)
diers between intentional and unintentional fake news spreaders
and if such dierence is statistically signicant. In particular, our
assessed intent can be validated if aected degrees of intentional
fake news spreaders are signicantly less than that of unintentional
fake news spreaders, i.e., if we estimate fake news spreaders who
are annotated as unintentional to be more unintentional than those
who are annotated as intentional.
As specied in last section, annotations are conducted at both
tweet and user levels. Correspondingly, aected degrees are com-
puted at two levels; we further obtain the user-level aected degree
by averaging the aected degree of the user’s posts sharing fake
news. Here we present tweet-level verication results; results at
the two levels reveal the same pattern, from which we can draw
the same conclusions.
First, we present the distribution of aected degrees for inten-
tional and unintentional fake news spreaders (see Figure 4). We
observe that, in general, the aected degree of intentional fake news
spreaders is less than that of unintentional fake news spreaders.
Specically, the average normalized aected degree of intentional
fake news spreaders are 0.55 with MM-COVID data and 0.61 with
“This is Fake! Shared it by Mistake”: Assessing the Intent of Fake News Spreaders WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France
(a) MM-COVID (𝑝0.001 with t-test) (b) ReCO Very (𝑝<0.01 with t-test)
Figure 4: Distribution of Aected Degree: Intentional Fake
News Spreaders v.s. Unintentional Fake News Spreaders
(a) MM-COVID (𝑝0.001 by ANOVA) (b) ReCO Very (𝑝<0.01 by ANOVA)
Figure 5: Aected Degree of Bots, Trolls, Correctors, and Oth-
ers (First Three: Intentional Fake News Spreaders; Others:
Unintentional Fake News Spreaders)
(a) MM-COVID (𝑝0.001 using t-test for the right)
(b) ReCO Very (𝑝0.001 using t-test for the right)
Figure 6: Relation between Aected Degree and (L) Bot Score,
(M) Troll Score, and (R) Corrector Score. 𝜌: Spearman’s Corre-
lation Coecient. ***: 𝑝<0.001; **: 𝑝<0.01; and *: 𝑝<0.05.
(a) MM-COVID
(b) ReCOVery
Figure 7: Method Performance with Various Thresholds (***:
𝑝<0.001; **: 𝑝<0.01; and *: 𝑝<0.05)
ReCOVery data. For unintentional fake news spreaders, the value is
0.58 and 0.62, respectively. Such dierence is statistically signicant
with a
𝑝
-value of
0.001
on MM-COVID and
<0.01
on ReCOV-
ery using
𝑡
-test. Therefore, the results validate our assessment. We
conduct the same experiment on the subset of data annotated by
humans, where we can draw the same conclusion.
Second, we compare the aected degree of bots, trolls, and cor-
rectors, which all are annotated as intentional fake news spreaders,
with that of others, which are annotated as unintentional fake news
spreaders. The results are shown in Figure 5. The results indicate
that bots, trolls, and correctors all have a lower aected degree
compared to unintentional fake news spreaders. The results are
statistically signicant with a
𝑝
-value of
0.001
on MM-COVID
and
<0.01
on ReCOVery using ANOVA test. Meanwhile, Figure 6
presents the relationship between aected degree and (i) bot score,
(ii) troll score, and (iii) corrector score. The results reveal the same
pattern: aected degree drops with an increasing bot, troll, or cor-
rector score. In particular, both bot and troll scores are negatively
correlated with aected degrees, with a Spearman’s correlation
coecient
𝜌∈ [−0.32,0.24]
for bots and
𝜌∈ [−0.58,0.36]
for
trolls. Results, again, validate our proposed method. Note that when
investigating the relationship between aected degree and, e.g., bot
score, we remove trolls and correctors to reduce noise.
Third, we assess the result robustness. As mentioned before, a
fake news spreader is labeled as an unintentional spreader with a
bot (troll, or corrector) score less than a threshold value (i.e.,
𝑋
[0, 𝜃), 𝑋 ={𝑏, 𝑟, 𝑐 }
); otherwise, he or she is an intentional spreader
(i.e.,
𝑋∈ [𝜃, 1], 𝑋 ={𝑏, 𝑟 , 𝑐}
). Varying
𝜃
among
0.4,0.5,0.6
, we
compare again the aected degree of intentional and unintentional
fake news spreaders. Results are presented in Figure 7 (the left
column). We observe that slightly adjusting the threshold value
does not change our observations and conclusions made in the rst
experiment (i.e., the result is robust).
We lastly evaluate the proposed method as follows: we label a
fake news spreader whose
𝑋∈ [0, 𝜃 )
as an unintentional spreader,
and whose
𝑋∈ [1𝜃 , 1]
as an intentional spreader. By decreasing
𝜃𝑋
, a fake news spreader is required to have a lower bot (troll, or
corrector) score to be unintentional and a higher bot (troll, or correc-
tor) score to be intentional. In other words, a smaller
𝜃
corresponds
to a more strict annotation (intentional or unintentional) of fake
news spreaders. We vary
𝜃
among
0.5,0.3,0.1
– correspondingly,
1𝜃
varies among
0.5,0.7,0.9
– and compare the aected degree of
WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France Zhou et al.
Table 3: Method Performance with Hand-crafted Features
in Fake News Detection. Here, 𝐾: the rst (earliest) 𝐾posts
spreading the news available for news representation; Rank-
ing: feature importance ranking of aected degree of posts
in the prediction model.
K AUC Score Ranking
10 0.918 (±0.009) 2
20 0.912 (±0.015) 2
MM-COVID 30 0.927 (±0.021) 2
40 0.923 (±0.012) 2
All 0.935 (±0.005) 3
10 0.891 (±0.007) 5
20 0.898 (±0.007) 3
ReCOVery 30 0.903 (±0.004) 3
40 0.909 (±0.014) 4
All 0.925 (±0.009) 5
intentional and unintentional fake news spreaders. Results are pre-
sented in Figure 7 (the right column). We observe that the aected
degree of intentional fake news spreaders is always less than that
of unintentional fake news spreaders with various thresholds. More
importantly, such pattern becomes more signicant with a smaller
𝜃
(i.e., a more strict annotation), which validates the eectiveness
of our assessment.
Finally, we point out that we experiment with (i) external af-
fected degree, (ii) internal aected degree, (iii) combined (exter-
nal+internal) aected degree, and (iv) combined aected degree
where the external one merely exists between post pairs sharing the
same news. The combined one (i.e., iii) is the one where signicant
and consistent patterns are discovered on both datasets.
5 UTILIZING INTENT OF NEWS SPREADERS
TO COMBAT FAKE NEWS
Using MM-COVID and ReCOVery data, we evaluate the eective-
ness of user intent in news propagation to detect fake news. We rst
employ the assessed aected degree of posts in news propagation
within a traditional machine learning framework. Then, we utilize
the proposed inuence graph within a deep learning framework.
I. Combating Fake News by Aected Degree.
For each news
article, we manually extract over 100 (propagation and content) fea-
tures as its representation. Propagation features include the average
(internal, external, and combined) aected degree of posts spread-
ing the news and a set of widely-accepted propagation features.
Content features are extracted using LIWC [
22
]. See Appendix B for
feature details. Five-fold cross-validation and XGBoost [
8
] are then
used with these features for training and classifying news articles.
Results indicate that this method correctly identies fake news
with an AUC score of around 0.93. As a comparison, dEFEND [
32
],
a state-of-the-art method that detects fake news by news content
and propagation information, performs around 0.90. Furthermore,
we observe that, as presented in Table 3, the proposed method per-
forms above 0.89 with limited propagation information of news
articles, i.e., at an early stage of news dissemination on social media.
Notably, internal aected degree of posts greatly contributes to de-
tecting fake news, whose feature importance assessed by XGBoost
ranks top ve all along.
II. Combating Fake News by Inuence Graph.
We construct
the news-post heterogeneous graph (shown in Figure 8); a post
is connected with a news article if the post shares the news, and
the relation among posts is modeled by the proposed inuence
graph
𝐺
. Then, we train the HetGNN (Heterogeneous Graph Neu-
ral Network) model [
42
] with this news-post graph to learn news
representation, with which XBGoost [
8
] is further utilized to pre-
dict fake news. Varying the percentage of labeled news from 20%
to 80%, this method performs with an AUC score ranging from 0.83
(with small-scale training data) to 0.91 (with relatively large-scale
training data) on two datasets. To further evaluate the proposed
inuence graph
𝐺
, we consider two variant groups of the con-
structed heterogeneous graph as baselines. One replaces
𝐺
by a
random version (
𝐺Random
): Based on our graph sparsication strat-
egy (see Appendix A), we construct the random graph by randomly
selecting a hundred posts for each post ensuring that no self-loops
are formed in this graph. The other replaces
𝐺
by its subgraph (i)
with internal inuence only (
𝐺Internal
); (ii) with external inuence
only (
𝐺External
); or (iii) with internal and external inuence but
the latter only exists between two posts sharing the same news
(
𝐺Same News
). Table 4 presents the full result;
𝐺Subgraph
in the table
refers to
𝐺Same News
, which performs best among all subgraphs. We
observe that in general, the proposed inuence graph outperforms
its variants in detecting fake news, especially with limited training
data. See Appendix B for other implementation details.
6 CONCLUSION AND FUTURE WORK
We look into the phenomenon that social media users can spread
fake news unintentionally. With social science foundations, we
propose inuence graph, with which we assess the degree to which
fake news spreaders are unintentional (denoted as aected degree).
Strategies to sparse the inuence graph and normalize the aected
degree by determining its upper bound are presented as well. We
develop manual and automatic annotation mechanisms to obtain
the ground-truth intent (intentional or unintentional) of fake news
spreaders for MM-COVID and ReCOVery data. We observe that the
aected degree of intentional fake news spreaders are signicantly
less than that of unintentional ones, which validates our assess-
ments. This work helps combat fake news from two perspectives.
First, our assessed intent helps determine the necessity of a fake
news spreader being nudged or recommended with (users active
in sharing) facts. Second, we present that the assessed spreader
intent and proposed inuence graph eectively help detect fake
news with an AUC score of around 0.9.
Limitations and Future Work
: We eectively assess the degree
to which fake news spreaders are unintentional, but remain the task
to classify a fake news spreader as an intentional or unintentional
spreader. We point out that merely relying on determining a thresh-
old for aected degree is barely enough. To address this problem, we
aim to propose a more complicated classication model in the near
future, which involves non-posting behavior (e.g., commenting,
liking, and following) of news spreaders.
“This is Fake! Shared it by Mistake”: Assessing the Intent of Fake News Spreaders WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France
News Post
a1
a2
a3
p1
p2
p3
p4
p5
26
G: Influence Graph
News Post
a1
a2
a3
p1
p2
p3
p4
p5
G: Influence Graph
Figure 8: News-post Graph
Table 4: Method Performance (Using AUC Scores) with Heterogeneous Graph
Neural Networks (HetGNN) in Fake News Detection
MM-COVID ReCOVery
% Labeled News 20% 40% 60% 80% 20% 40% 60% 80%
𝐺Random 0.829 0.856 0.876 0.902 0.647 0.654 0.660 0.674
𝐺Subgraph 0.817 0.861 0.890 0.915 0.820 0.845 0.869 0.908
𝐺0.869 0.864 0.902 0.905 0.825 0.863 0.883 0.881
ACKNOWLEDGMENTS
This research was supported in part by the National Science Founda-
tion under award CAREER IIS-1942929. We sincerely appreciate the
positive and constructive comments of the reviewers. We also thank
Chang Liu, Shengmin Jin, and Hao Tian for their useful suggestions
in data annotation.
REFERENCES
[1]
Anton Abilov, Yiqing Hua, Hana Matatov, Ofra Amir, and Mor Naaman. 2021.
VoterFraud2020: A Multi-modal Dataset of Election Fraud Claims on Twitter. In
Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15.
901–912.
[2]
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and
Roland Vollgraf. 2019. FLAIR: An Easy-to-Use Framework for State-of-the-Art
NLP. In Proceedings of the 2019 Conference of the North American Chapter of the
Association for Computational Linguistics (Demonstrations). 54–59.
[3]
Blake E Ashforth and Fred Mael. 1989. Social Identity Theory and the Organiza-
tion. Academy of Management Review 14, 1 (1989), 20–39.
[4]
Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The Long-
Document Transformer. arXiv preprint arXiv:2004.05150 (2020).
[5]
Alessandro Bessi, Fabio Petroni, Michela Del Vicario, Fabiana Zollo, Aris Anag-
nostopoulos, Antonio Scala, Guido Caldarelli, and Walter Quattrociocchi. 2015.
Viral Misinformation: The Role of Homophily and Polarization. In Proceedings of
the 24th International Conference on World Wide Web. 355–356.
[6]
Lawrence E Boehm. 1994. The Validity Eect: A Search for Mediating Variables.
Personality and Social Psychology Bulletin 20, 3 (1994), 285–293.
[7]
Sonia Castelo, Thais Almeida, Anas Elghafari, Aécio Santos, Kien Pham, Eduardo
Nakamura, and Juliana Freire. 2019. A Topic-Agnostic Approach for Identify-
ing Fake News Pages. In Companion Proceedings of the 2019 World Wide Web
Conference. ACM, 975–980.
[8]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting
System. In Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. 785–794.
[9]
Lu Cheng, Ruocheng Guo, Kai Shu, and Huan Liu. 2021. Causal Understanding
of Fake News Dissemination on Social Media. In Proceedings of the 27th ACM
SIGKDD Conference on Knowledge Discovery & Data Mining. 148–157.
[10]
Jacob Cohen. 1960. A Coecient of Agreement for Nominal Scales. Educational
and Psychological Measurement 20, 1 (1960), 37–46.
[11]
Emilio Ferrara. 2020. What types of COVID-19 conspiracies are populated by
Twitter bots? First Monday 6 (2020). https://doi.org/10.5210/fm.v25i6.10633
[12]
Jonathan L Freedman and David O Sears. 1965. Selective exposure. In Advances
in Experimental Social Psychology. Vol. 2. Elsevier, 57–97.
[13] Michael A Hogg. 2020. Social Identity Theory. Stanford University Press.
[14]
Timur Kuran and Cass R Sunstein. 1999. Availability Cascades and Risk Regula-
tion. Stanford Law Review 51, 4 (1999), 683–768.
[15]
David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M
Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny-
cook, David Rothschild, et al
.
2018. The science of fake news. Science 359, 6380
(2018), 1094–1096.
[16]
Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. MM-CO VID: A Multilin-
gual and Multimodal Data Repository for Combating COVID-19 Disinformation.
arXiv:2011.04088 [cs.SI]
[17]
Philipp Lorenz-Spreen, Stephan Lewandowsky, Cass R Sunstein, and Ralph Her-
twig. 2020. How behavioural sciences can promote truth, autonomy and demo-
cratic discourse online. Nature Human Behaviour 4, 11 (2020), 1102–1109.
[18]
Drew B Margolin, Aniko Hannak, and Ingmar Weber. 2018. Political fact-checking
on Twitter: When do corrections have an eect? Political Communication 35, 2
(2018), 196–219.
[19]
Miriam J Metzger, Ethan H Hartsell, and Andrew J Flanagin. 2020. Cognitive
dissonance or credibility? A comparison of two theoretical explanations for
selective exposure to partisan news. Communication Research 47, 1 (2020), 3–28.
[20]
Sendhil Mullainathan and Andrei Shleifer. 2005. The market for news. American
Economic Review 95, 4 (2005), 1031–1053.
[21]
Raymond S Nickerson. 1998. Conrmation bias: A ubiquitous phenomenon in
many guises. Review of General Psychology 2, 2 (1998), 175–220.
[22]
James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The
development and psychometric properties of LIWC2015. Technical Report.
[23]
Gordon Pennycook, Tyrone D Cannon, and David G Rand. 2018. Prior exposure
increases perceived accuracy of fake news. Journal of Experimental Psychology:
General 147, 12 (2018), 1865.
[24]
Gordon Pennycook, Ziv Epstein, Mohsen Mosleh, Antonio A Arechar, Dean
Eckles, and David G Rand. 2021. Shifting attention to accuracy can reduce
misinformation online. Nature 592, 7855 (2021), 590–595.
[25]
Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea.
2018. Automatic Detection of Fake News. In Proceedings of the 27th International
Conference on Computational Linguistics. 3391–3401.
[26]
Shengsheng Qian, Jinguang Wang, Jun Hu, Quan Fang, and Changsheng Xu.
2021. Hierarchical Multi-modal Contextual Attention Network for Fake News
Detection. In Proceedings of the 44th International ACM SIGIR Conference on
Research and Development in Information Retrieval. 153–162.
[27]
Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi.
2017. Truth of Varying Shades: Analyzing Language in Fake News and Political
Fact-Checking. In Proceedings of the 2017 Conference on Empirical Methods in
Natural Language Processing. 2931–2937.
[28]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings
using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint Conference
on Natural Language Processing (EMNLP-IJCNLP). 3973–3983.
[29]
Mohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang, Alessandro Flammini,
and Filippo Menczer. 2020. Detection of Novel Social Bots by Ensembles of
Specialized Classiers. In Proceedings of the 29th ACM International Conference
on Information & Knowledge Management. 2725–2732.
[30]
Dietram A Scheufele and Nicole M Krause. 2019. Science audiences, misinfor-
mation, and fake news. Proceedings of the National Academy of Sciences 116, 16
(2019), 7662–7669.
[31]
Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang,
Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility
content by social bots. Nature Communications 9, 1 (2018), 4787.
[32]
Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. 2019. dE-
FEND: Explainable Fake News Detection. In Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining. 395–405.
[33]
Panayiotis Smeros, Carlos Castillo, and Karl Aberer. 2019. SciLens: Evaluating the
Quality of Scientic News Articles Using Social Media and Scientic Literature
Indicators. In Proceedings of the International Conference on World Wide Web.
ACM, 1747–1758.
[34]
Kate Starbird. 2019. Disinformation’s spread: bots, trolls and all of us. Nature
571, 7766 (2019), 449–450.
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All
You Need. In Proceedings of the 31st International Conference on Neural Informa-
tion Processing Systems. 5998–6008.
[36]
Nguyen Vo and Kyumin Lee. 2018. The Rise of Guardians: Fact-checking URL
Recommendation to Combat Fake News. In Proceedings of the 41st International
ACM SIGIR Conference on Research & Development in Information Retrieval. 275–
284.
[37]
Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false
news online. Science 359, 6380 (2018), 1146–1151.
[38]
Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James
Philbin, Bo Chen, and Ying Wu. 2014. Learning ne-grained image similarity
WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France Zhou et al.
with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 1386–1393.
[39]
Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu
Su, and Jing Gao. 2018. EANN: Event Adversarial Neural Networks for Multi-
Modal Fake News Detection. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. 849–857.
[40]
Piotr A Woźniak, Edward J Gorzelańczyk, and Janusz A Murakowski. 1995. Two
components of long-term memory. Acta Neurobiologiae Experimentalis 55, 4
(1995), 301–305.
[41]
Da Yin, Tao Meng, and Kai-Wei Chang. 2020. SentiBERT: A Transferable
Transformer-Based Architecture for Compositional Sentiment Semantics. In
Proceedings of the 58th Annual Meeting of the Association for Computational Lin-
guistics. 3695–3706.
[42]
Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V
Chawla. 2019. Heterogeneous graph neural network. In Proceedings of the 25th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
793–803.
[43]
Jiaxuan Zhang, Sarah Ita Levitan, and Julia Hirschberg. 2020. Multimodal De-
ception Detection Using Automatically Extracted Acoustic, Visual, and Lexical
Features. In INTERSPEECH. 359–363.
[44]
Xin Zhang, Ding-Ding Han, Ruiqi Yang, and Ziqiao Zhang. 2017. Users’ partici-
pation and social inuence during information spreading on Twitter. PloS one 12,
9 (2017), e0183290.
[45]
Xinyi Zhou, Apurva Mulay, Emilio Ferrara, and Reza Zafarani. 2020. ReCOVery: A
Multimodal Repository for COVID-19 News Credibility Research. In Proceedings
of the 29th ACM International Conference on Information & Knowledge Manage-
ment. 3205–3212.
[46]
Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. SAFE: Similarity-Aware Multi-
Modal Fake News Detection. In Advances in Knowledge Discovery and Data Mining,
Vol. 12085. Nature Publishing Group, 354.
[47]
Xinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental
theories, detection methods, and opportunities. ACM Computing Surveys (CSUR)
53, 5 (2020), 1–40.
A SPARSIFICATION OF INFLUENCE GRAPH
Inuence graph can be a tournament in the worst case, taking much
space. To sparsify the graph, we add one more constraint in the
graph construction:
(𝑝𝑖, 𝑝 𝑗) ∈ 𝐸
if
Δ𝑡𝑖 𝑗 𝜃𝑡
. Thus, we assume that
each node (post) can be connected with (aected by) at most
𝜃𝑡
previous nodes (posts), which can be viewed as an extension of the
Markov property. We vary
𝜃𝑡
in {1, 10, 100, 1000} and ultimately set
𝜃𝑡=100 as all experimental results converge at this point.
B REPRODUCIBILITY DETAILS IN FAKE
NEWS DETECTION
We have 109 hand-crafted (linguistic and propagation) features.
Propagation features include the average external, internal, and
combined aected degree of posts sharing the news; the average
sentiment score (assessed by
flair
[
2
])
9
and the average number of
reposts, favorites, hashtags, mentions, symbols, quotes, and replies
of posts sharing the news; and the average number of followers,
friends, favorites, list memberships, and status updates of users
spreading the news. Content features include all that can be ex-
tracted by LIWC [
22
], each of which falls into one of the categories
including word count, summary language variables, linguistic di-
mensions, other grammars, and psychological processes.
With HetGNN, we use pre-trained transformers to extract con-
tent features of nodes (Longformer [
4
] for news stories and Sentence-
BERT [
28
] for tweets). The news node is associated with the news
embedding and the average embedding of its connected posts. The
post node is associated with the post embedding, the average em-
bedding of its connected news, and the average embedding of its
connected posts. Hence, the Bi-LSTM length of news content en-
coder is two, and that of post content encoder is three. For both
datasets, the embedding dimension of HetGNN is 1024, the size of
sampled neighbors set for each node is 23 (3 news nodes plus 20
post nodes), the learning rate is 0.0001, and the maximum number
of training iterations is 50. The other hyperparameters are set the
same as mentioned in [42].
9https://github.com/airNLP/air
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In recent years, there has been a great deal of concern about the proliferation of false and misleading news on social media1–4. Academics and practitioners alike have asked why people share such misinformation, and sought solutions to reduce the sharing of misinformation5–7. Here, we attempt to address both of these questions. First, we find that the veracity of headlines has little effect on sharing intentions, despite having a large effect on judgments of accuracy. This dissociation suggests that sharing does not necessarily indicate belief. Nonetheless, most participants say it is important to share only accurate news. To shed light on this apparent contradiction, we carried out four survey experiments and a field experiment on Twitter; the results show that subtly shifting attention to accuracy increases the quality of news that people subsequently share. Together with additional computational analyses, these findings indicate that people often share misinformation because their attention is focused on factors other than accuracy—and therefore they fail to implement a strongly held preference for accurate sharing. Our results challenge the popular claim that people value partisanship over accuracy8,9, and provide evidence for scalable attention-based interventions that social media platforms could easily implement to counter misinformation online.
Article
Full-text available
The explosive growth in fake news and its erosion to democracy, justice, and public trust has increased the demand for fake news detection and intervention. This survey reviews and evaluates methods that can detect fake news from four perspectives: (1) the false knowledge it carries, (2) its writing style, (3) its propagation patterns, and (4) the credibility of its source. The survey also highlights some potential research tasks based on the review. In particular, we identify and detail related fundamental theories across various disciplines to encourage interdisciplinary research on fake news. We hope this survey can facilitate collaborative efforts among experts in computer and information sciences, social sciences, political science, and journalism to research fake news, where such efforts can lead to fake news detection that is not only efficient but more importantly, explainable.
Article
Public opinion is shaped in significant part by online content, spread via social media and curated algorithmically. The current online ecosystem has been designed predominantly to capture user attention rather than to promote deliberate cognition and autonomous choice; information overload, finely tuned personalization and distorted social cues, in turn, pave the way for manipulation and the spread of false information. How can transparency and autonomy be promoted instead, thus fostering the positive potential of the web? Effective web governance informed by behavioural research is critically needed to empower individuals online. We identify technologically available yet largely untapped cues that can be harnessed to indicate the epistemic quality of online content, the factors underlying algorithmic decisions and the degree of consensus in online debates. We then map out two classes of behavioural interventions—nudging and boosting— that enlist these cues to redesign online environments for informed and autonomous choice.
Article
With people moving out of physical public spaces due to containment measures to tackle the novel coronavirus (COVID-19) pandemic, online platforms become even more prominent tools to understand social discussion. Studying social media can be informative to assess how we are collectively coping with this unprecedented global crisis. However, social media platforms are also populated by bots, automated accounts that can amplify certain topics of discussion at the expense of others. In this paper, we study 43.3M English tweets about COVID-19 and provide early evidence of the use of bots to promote political conspiracies in the United States, in stark contrast with humans who focus on public health concerns.
Conference Paper
The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the automatic identification of fake content in online news. Our contribution is twofold. First, we introduce two novel datasets for the task of fake news detection, covering seven different news domains. We describe the collection, annotation, and validation process in detail and present several exploratory analysis on the identification of linguistic differences in fake and legitimate news content. Second, we conduct a set of learning experiments to build accurate fake news detectors. In addition, we provide comparative analyses of the automatic and manual identification of fake news.