Conference PaperPDF Available

Reply-Aided Detection of Misinformation via Bayesian Deep Learning

Authors:

Abstract and Figures

Social media platforms are a plethora of misinformation and its potential negative influence on the public is a growing concern. This concern has drawn the attention of the research community on developing mechanisms to detect misinformation. The task of misinformation detection consists of classifying whether a claim is True or False. Most research concentrates on developing machine learning models, such as neural networks, that outputs a single value in order to predict the veracity of a claim. One of the major problem faced by these models is the inability of representing the uncertainty of the prediction, which is due incomplete or finite available information about the claim being examined. We address this problem by proposing a Bayesian deep learning model. The Bayesian model outputs a distribution used to represent both the prediction and its uncertainty. In addition to the claim content, we also encode auxiliary information given by people's replies to the claim. First, the model encodes a claim to be verified, and generate a prior belief distribution from which we sample a latent variable. Second, the model encodes all the people's replies to the claim in a temporal order through a Long Short Term Memory network in order to summarize their content. This summary is then used to update the prior belief generating the posterior belief. Moreover, in order to train this model, we develop a Stochastic Gradient Variational Bayes algorithm to approximate the analytically intractable posterior distribution. Experiments conducted on two public datasets demonstrate that our model outperforms the state-of-the-art detection models.
Content may be subject to copyright.
Reply-Aided Detection of Misinformation
via Bayesian Deep Learning
Qiang Zhang
University College London
London, United Kingdom
qiang.zhang.16@ucl.ac.uk
Aldo Lipani
University College London
London, United Kingdom
aldo.lipani@ucl.ac.uk
Shangsong Liang
Sun Yat-sen University
Guangzhou, China
liangshangsong@gmail.com
Emine Yilmaz
University College London
London, United Kingdom
emine.yilmaz@ucl.ac.uk
ABSTRACT
Social media platforms are a plethora of misinformation and its
potential negative inuence on the public is a growing concern.
This concern has drawn the attention of the research community
on developing mechanisms to detect misinformation. The task of
misinformation detection consists of classifying whether a claim is
True or False. Most research concentrates on developing machine
learning models, such as neural networks, that outputs a single
value in order to predict the veracity of a claim. One of the major
problem faced by these models is the inability of representing the
uncertainty of the prediction, which is due incomplete or nite
available information about the claim being examined. We address
this problem by proposing a Bayesian deep learning model. The
Bayesian model outputs a distribution used to represent both the
prediction and its uncertainty. In addition to the claim content,
we also encode auxiliary information given by people’s replies
to the claim. First, the model encodes a claim to be veried, and
generate a prior belief distribution from which we sample a latent
variable. Second, the model encodes all the people’s replies to the
claim in a temporal order through a Long Short Term Memory
network in order to summarize their content. This summary is
then used to update the prior belief generating the posterior belief.
Moreover, in order to train this model, we develop a Stochastic
Gradient Variational Bayes algorithm to approximate the analyti-
cally intractable posterior distribution. Experiments conducted on
two public datasets demonstrate that our model outperforms the
state-of-the-art detection models.
CCS CONCEPTS
Information systems Web mining
;
Computing method-
ologies Information extraction.
KEYWORDS
misinformation detection, bayesian analysis, deep learning
This paper is published under the Creative Commons Attribution 4.0 International
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their
personal and corporate Web sites with the appropriate attribution.
WWW ’19, May 13–17, 2019, San Francisco, CA, USA
©
2019 IW3C2 (International World Wide Web Conference Committee), published
under Creative Commons CC-BY 4.0 License.
ACM ISBN 978-1-4503-6674-8/19/05.
https://doi.org/10.1145/3308558.3313718
ACM Reference Format:
Qiang Zhang, Aldo Lipani, Shangsong Liang, and Emine Yilmaz. 2019.
Reply-Aided Detection of Misinformation via Bayesian Deep Learning.
In Proceedings of the 2019 World Wide Web Conference (WWW ’19), May
13–17, 2019, San Francisco, CA, USA. ACM, New York, NY, USA, 11 pages.
https://doi.org/10.1145/3308558.3313718
1 INTRODUCTION
Although the digital news consumption has increased in the last
decade, the increasing amount of misinformation and fake news has
not certainly proven its quality. Dierent from traditional media
where news are published by reputable organizations, online news
on social media platforms such as Facebook and Twitter are shared
by individuals and/or organizations without a careful checking or
with malicious intents. In Figure 1 we show a false claim posted
on Twitter about an alleged shooting in Ottawa. While some users
showed surprise and asked for further clarications in their replies,
other users believed the claim and re-tweeted it as if it was true.
This misinformation, when done on a large scale can inuence
the public by depicting a false picture of reality. Hence, detecting
misinformation eectively has become one of the biggest challenges
faced by social media platforms [17, 27].
A valuable attempt at rectifying this epidemic of false claims has
been tackled by some news websites, such as: Snopes
1
, Polifact
2
,
and Emergent
3
, which have employed professional journalists to
manually check and verify every potential false news. However,
such manual approach is very expensive and way too slow to be
able to check all the daily generated claims appearing on the web.
Thus, making automatic tools is in great need to speed up this
verication process.
In this paper, we tackle the automatic misinformation detection
task, which consists in classifying whether a claim is True or False.
Most existing models employ feature engineering or deep learning
to extract features from claims’ content and auxiliary information
such as people’s replies. However, these models generate determin-
istic mappings to capture the dierence between true or false claims.
A major limitation of these models is their inability to represent
uncertainty caused by incomplete or nite available data about the
claim being examined.
1https://www.snopes.com/
2https://www.politifact.com/
3http://www.emergent.info/
Search filters · Show
New to Twitter?
Sign up now to get your own
personalized timeline!
Sign up
Worldwide trends
ﻪﻌﻤﺠﻟا_مﻮﻳ#
104K Tweets
#Regular4thWin
13.2K Tweets
#FelizFinde
7,427 Tweets
#あなたのファンとアンチの
18.4K Tweets
#ModiSeCBIBachao
34.8K Tweets
Lance
89.9K Tweets
Sinead O'Connor
24.7K Tweets
KARMA In Cinemas
1,367 Tweets
なぎさちゃん
1,278 Tweets
Tory Lanez
24K Tweets
© 2018 Twitter About Help Center Terms
Privacy policy Cookies Ads info
View allPeople
@ottawacity
Updates from City of Ottawa. Not monitored 24/7. Protocol: ow.ly/bge830a9xXg. Services:
ow.ly/EPJB30a9y0F or call 311. En français @ottawaville
Ottawa, Ontario, Canada ottawa.ca
Tweets
30.2K
Following
32
Followers
183K
Follow
City of Ottawa
CTV News
@CTVNews
· 22 Oct 2014
Police have clarified that there were two shootings in Ottawa today, not
three: at the War Memorial and Parliament Hill.
6 231 72
Police have clarified that there were two shootings in Ottawa today, not three: at the …
Top Latest People Photos Videos News Broadcasts
Moments
Police have clarified that there we
Have an account? Log in
231 Retweets 72 Likes
CTV News
@CTVNews
Police have clarified that there were two
shootings in Ottawa today, not three: at the
War Memorial and Parliament Hill.
11:26 AM - 22 Oct 2014
Follow
6 231 72
dimelo_kan3lla
@KandiiKay
· 22 Oct 2014
Replying to @CTVNews
@CTVNews surprising to learn not all security is armed at the Hill..Y Not ??
Jen Gehl de Laforest
@janiejennifer
· 22 Oct 2014
Replying to @CTVNews
@CTVNews @ctvsaskatoon so what happened at Rideau? Nothing?
Global Awareness 101
@Mononoke__Hime
· 22 Oct 2014
Replying to @CTVNews
RT @CTVNews Police have clarified that there were two shootings in Ottawa
today, not three: at the War Memorial and Parliament Hill. #Canada
@Champagneveli
· 22 Oct 2014
Replying to @CTVNews
@CTVNews: Police have clarified that there were two shootings in Ottawa today,
not three: at the War Memorial and Parliament Hill.” wow
Figure 1: An example of a false claim and people’s replies to
it. From the replies, 231 users chose to trust the claim and
re-tweeted it as if it was true, while only 4 users asked for
further clarications.
We address this problem by proposing a Bayesian deep learning
model, which incorporates stochastic factors to capture complex
relationships between the latent distribution and the observed vari-
ables. The proposed model makes use of the claim content and
replies content. First, to represent the claim content we employ a
neural model to extract textual features from claims. To deal with
the ambiguity of the language used in claims and obtain salient
credibility information, the model generates a latent distribution
based on the extracted linguistic features. Since no auxiliary infor-
mation has been used so far, we interpret this latent distribution as
aprior belief of the claim being true. Second, to extract auxiliary
information from people’s replies content, we rank all the replies
of the claim in temporal order, and summarize them using a Long
Short Term Memory neural network (LSTM). Finally, after updat-
ing the prior belief with the aid of the LSTM output, the model
computes the veracity prediction and its uncertainty. This updated
prior belief distribution is interpreted as the posterior belief.
In order to train the proposed Bayesian deep learning model,
due to the analytical intractability of the posterior distribution, we
develop a Stochastic Gradient Variational Bayes (SGVB) algorithm.
A tractable Evidence Lower BOund (ELBO) objective function of
our model is derived to approximate the intractable distribution.
The model is optimized along the direction of maximizing the ELBO
objective function.
Our model inherit two advantages: rst of all, the model incorpo-
rates a latent distribution, which enables to represent uncertainty
and promote robustness; second, the Bayesian model formulates all
of its prior knowledge about a claim being examined in the form
of a prior, which can be updated by more added auxiliary infor-
mation generating more accurate detection results. To sum up, the
proposed model advances state-of-the-art methods in four aspects:
(1)
An eective representation of uncertainty due to incom-
plete/nite available data;
(2)
A temporal order-based approach to extract auxiliary infor-
mation from people’s replies;
(3) A SGVB algorithm to infer latent distributions;
(4)
A systematic experimentation of our model on two real-
world datasets.
The remainder of the paper is organized as follows: § 2 summarizes
the related work; § 3 denes the misinformation detection task; § 4
details the proposed Bayesian deep learning model; § 5 derives the
Stochastic Gradient Variational Bayes optimization algorithm; § 6
describes the used datasets and experimental setup; § 7 is devoted
to experimental results, and; § 8 concludes the paper.
2 RELATED WORK
Misinformation has been existing for centuries in dierent forms of
media, such as printed newspaper and television. Recently, online
social media platforms are also suering from the same issues.
Recent work on misinformation detection have tried to understand
the dierences between true and false claims in various aspects:
claim content, information source, multimedia such as aliated
images and videos, and other users’ engagement.
2.1 Textual Content
The text of a claim can provide linguistic features to help predict its
veracity. Since misinformation and false claims are created for nan-
cial or political purposes rather than to report an objective event,
they often contain opinionated or inammatory language [
6
]. In
order to reveal linguistic dierences between true and false claims,
lexical and syntactic features at character, word, sentence and doc-
ument level have been exploited [
1
,
11
,
33
,
36
]. Wawer et al
. [43]
compute psycholinguistic features using a bag-of-words paradigm.
Rashkin et al
. [34]
compare the language of true claims with that of
satire, hoaxes, and propaganda to nd linguistic characteristics of
untrustworthy text. Kakol et al
. [21]
construct a content credibility
corpus and examine a list of language factors that might aect web
content credibility based on which a predictive model is developed.
Bountouridis et al
. [3]
compare heterogeneous articles of the same
story and reveal that pieces of information cross-referenced are
more likely to be credible. Derczynski et al
. [9]
extract features
from claim tweets including bag-of-words, presence of URLs, and
presence of hashtags. A Support Vector Machine (SVM) is then
used to distinguish between true and false claims. Guacho et al
.
[14]
leverages a tensor decomposition to derive concise claim em-
beddings that capture contextual information from each claim; and
uses these embeddings to create a claim-by-claim graph on which
the labels propagate. Textual content has been empirically proven
to be a strong indicator of claim veracity, and thus can be used as a
prior probability.
2.2 Source Credibility Analysis
The credibility analysis of the sources of a claim is an important
auxiliary information. As misinformation is usually published by
unbelievable individuals or automatic bots, credibility plays a cru-
cial role in message communication [
18
,
32
]. Accurate and timely
discrimination of such accounts inhibits the proliferation of mis-
information at an early stage. Tseng and Fogg
[40]
identify two
components of source credibility, namely trustworthiness and ex-
pertise. Trustworthiness is generally taken to mean truthful, un-
biased and well intentioned. Expertise instead is understood as
knowledgeable, experienced and competent. Thus, features that
can reveal the trustworthiness and expertise of information sources
are strong indicators of source credibility. With the aid of informa-
tion source Thomson et al
. [39]
examine the credibility of tweets
related to the Fukushima Daiichi nuclear disaster in Japan. They
found that tweets from highly credible institutions and individuals
are mostly correct. Useful account features can be derived from the
account demographics, such as integrity of personal information,
the number of followers and followees [
5
]. Besides, aggregating a
group of account features are indicative, since spreaders of true and
false claims might come from dierent communities [
44
], such as
the percentage of veried user accounts [
28
] and the average num-
ber of followers [
26
]. However, account demographics can easily be
altered to decrease the similarity between credible and incredible
sources.
2.3 Multimedia Features
Multimedia features have been shown to be an important manip-
ulator for propaganda based on misinformation [
4
]. As we have
characterized, online misinformation exploits the individual vul-
nerabilities of people and thus often relies on sensational or even
fake images to provoke anger or other emotional response of con-
sumers. Visual-based features are extracted from images and videos
to capture the dierent characteristics of misinformation. Faking
images are identied based on various user-level and tweet-level
hand-crafted features [
15
]. Recently, various visual and statistical
features have been extracted for news verication [
20
]. Yang et al
.
[45]
develop a convolutional neural network to extract text and
visual features simultaneously. Visual features include clarity score,
coherence score, diversity score, and clustering score. Statistical
features include count, image ratio, multi-image ratio, hot image
ratio, long image ratio, etc. This approach suers from the prob-
lem that some misinformation on social media does not contain
multimedia content.
2.4 Social Engagement
The news spreading process over time on social media involves
user-driven engagement. Auxiliary information can also be derived
from such engagement to improve the claim veracity detection. Ma
et al
. [29]
propose to learn discriminative features by following
non-sequential propagation structure of tweets. A top-down and a
bottom-up recursive neural networks are proposed to predict claim
veracity. Glenski et al
. [12]
seek to better understand how users
react to trusted and deceptive news sources across two popular,
and very dierent, social media platforms. Signicant dierences
have been observed in the speed and the type of reactions between
trusted and deceptive news sources on Twitter, but far smaller dif-
ferences on Reddit. People react to a piece of claim by expressing
their stances or emotions in social media posts. Stances can be cat-
egorized as supportive, opposing, and neutral, which can be used
to infer claim veracity [
19
,
46
,
47
]. Kochkina et al
. [25]
propose a
neural multi-task model that leverages the relationship between
veracity detection and stance detection in a joint learning setup.
Another common post feature is the topic distribution that indi-
cates the central point of relevant aairs, which is derived by topic
models [
2
]. Post features are expanded in two ways: via aggregation
with relevant posts for a specic aair, and via temporal evolution
of post features. The rst way relies on the “wisdom of crowds” to
locate potential misinformation [
5
], while the second way captures
the periodic uctuations of shock cycles [
26
] or temporal pattern
of user activities, such as the number of engaged users and time
intervals between engagements [37]. Yet, semantic coherence and
temporal changes between users’ replies are not fully explored by
existing methods.
3 PROBLEM STATEMENT
The task of misinformation detection is to predict the news’ veracity
of claims, given their content and their people’s replies.
Let
C={c1,c2, . . . , cN}
be a set of
N
claims. The claim
ci
is
commented by a set of
M
user replies
Di={di,1,di,2, . . . , di,M}
.
We use
yi
to denote the binary veracity label of the claim
ci
, which
could be either
yi=
1for true or
yi=
0for false. The tuple of a claim
and people’s replies, i.e.,
{ci,Di}
, forms a data instance to predict
the claim veracity
yi
. For the sake of clarity, in the following, we will
omit the subscripts iwhen describing a single instance: {c,D,y}.
4 BAYESIAN DEEP LEARNING
In this section, we present our proposed Bayesian deep learning
model that eectively integrates claim and people’s replies. We will
rst introduce how to encode claim content with deep learning and
generate a latent distribution that is interpreted as a prior belief of
claim veracity. We then describe the temporal-ordered approach to
encode people’s replies, which captures semantic variation along
the time line. Finally, we correct the prior belief with the aid of
people’s replies, the result of which process is interpreted as the
posterior belief of claim veracity. Figure 2 describes the proposed
model.
4.1 Encoding a Claim
As content are strong indicators of claim veracity [
42
], we apply
deep learning to extract linguistic features from the claim
c
. To avoid
the ambiguity of claims and obtain salient credibility information,
we generate a latent distribution based on the extracted linguistic
features. The output of this claim encoder is the prior belief of the
veracity of the claim.
Let each claim
c
be a sequence of discrete words or tokens, i.e.,
c=[w1,w2, . . . , wl]
, where
wlRd
is a
d
-dimensional word
embedding vector. Based on the sequence of word embeddings,
textual features are extracted via a Bidirectional Long Short Term
Memory (BiLSTM) neural network [
13
]. The BiLSTM captures long
and short semantic dependencies both from previous time steps
and future time steps via forward and backward states.The BiLSTM
hc
h𝔇
z
y
BiLSTM
c
Claim
Encoder
Police
have
clarified
Parliament
Hill
MLP
l1
l2
𝒩(𝝁,diag 𝝈2)
MLP
Replies
Encoder
Word
Embedding
dm
BiLSTM
Surprising
?
<EoS>
wow
<EoS>
LSTM
d1
Word
Embedding
Figure 2: Framework of the Bayesian deep learning model. The framework consists of two parts, the claim encoder (§ 4.1) and
the replies encoder (§ 4.2), the concatenation of which determines the posterior belief of claim veracity. Blocks and nodes rep-
resent computation modules and variables. Grey nodes are observed variables while blank nodes are latent variables (similarly
with Figure 3). Note that blocks of the same color denote the same module.
takes as input
c
, converts the sequence of word embeddings into a
dense representation, and outputs the concatenation of two hidden
states capturing past and future information:
hc=BiLSTM(c),(1)
where hcdenote the concatenated hidden states.
To avoid the ambiguity of claims, instead of a deterministic
non-linear transformation, we generate a latent distribution, from
which we sample a latent stochastic variable
z
. To embed linguistic
information into the latent variable, we set the latent variable to be
conditional on hc:
zpθ(z|hc),(2)
where
p
is a latent distribution and
θ
denotes the non-linear trans-
formation of
hc
to generate the parameters of
p
. This non-linear
transformation is essential to capture higher level representations of
hc
; we implement this non-linear transformation via a Multi-Layer
Perceptron (MLP).
We assume that the latent variable
z
is continuous and follows a
multivariate Gaussian distribution. The variable
z
is parameterized
as follows:
pθ(z|hc)=N(z|µθ,diag(σ2
θ)),(3)
where
µθ
and
diag(σ2
θ)
are the mean and the covariance matrix
of the multivariate Gaussian distribution. Since the variable
z
is
conditional on the the claim hidden states
hc
, we derive these two
parameters of the Gaussian distribution from
hc
through a deep
neural network:
πθ=fθ(hc),(4)
µθ=l1(πθ)ln(σθ)=l2(πθ),(5)
where
fθ
denotes a MLP,
l1
and
l2
denote two Linear Transforma-
tions (LT). Since LT can generate negative values, to produce
σθ
we exponentiate the result of l2.
In order to make
µθ
and
σθ
dierentiable and backpropagate the
loss through the latent distribution (
p
), the following reparameteri-
zation trick is used:
z=ϵ·σθ+µθwhere ϵN(0,I),(6)
where
0
is a vector of zeros and
I
is the identity matrix. By making
use of the latent variable (
z
), our model is able to capture complex
noisy patterns in the data.
4.2 Encoding People’s Replies
We now present the people’s replies encoder to obtain auxiliary
information. This auxiliary information is claim-specic and is used
to generate the posterior belief by correcting the prior belief of the
claim veracity.
Replies on social media platforms are listed along the time line
as shown in Figure 1, where the earliest reply appears at the top of
the list. Truth about an event can be gradually manifest as more
evidence emerges, thus we assume that the latest replies tend to be
more reliable and more important than the earlier replies. Based on
this assumption, we design a two-layer recurrent neural network to
encode replies: the rst-layer applies a BiLSTM to summarize the
semantic information of each reply and the second-layer applies a
LSTM to capture the temporal semantic variation of the replies.
Given a claim
c
commented by a list of replies
D={d1, . . .
. . . , dm, . . . , dM}
, these replies are ranked based on their tempo-
ral order. The content of a
dm
consists of a sequence of words
d=[w1, . . . , wk]
. To project the claim and replies into the same
semantic space, we use the same pre-trained word embeddings for
both claims and replies. Hence,
wkRd
is a
d
-dimensional vector
such as the word embedding vector used to encoding the claim. For
the sake of semantic coherence, we also employ the same BiLSTM
to encode both the claim and its replies. Take the reply
dmD
for example, the concatenation of hidden states from forward and
backward directions is denoted as:
hdm=BiLSTM(dm),(7)
where hdmis the summary of the reply dm.
In order to capture the semantic information of all replies, we
sequentially input the concatenated hidden states of each reply into
a LSTM. We use the LSTM rather than a BiLSTMbecause the former
gives high weights to recent input, which matches our assumption
on the relative importance of the latest reply. Specically, the LSTM
takes the hidden states of each reply as input in a sequential way:
hD=LSTM(hM
d),(8)
where hM
d={hd1, . . . , hdm, . . . , hdM}.
The last hidden state
hD
contains both the linguistic information
of each reply and the semantic changes between replies along the
time line. This
hD
is then used to generate the posterior belief by
correcting the prior belief of the claim veracity.
4.3 Veracity Modeling
In § 4.1, we developed a prior belief of the claim veracity. In this
section we show how to correct this prior belief by including its
replies.
The posterior belief is generated by combining the claim and
reply information via a MLP. The strong non-linearity of MLPs
make them suitable to nd complex relationships between the
claim and its replies. Specically, the MLP input is the latent claim
variable zconcatenated to the hidden state of replies hD:
y=MLP(z,hD).(9)
This is the nal prediction of our Bayesian deep learning model for
misinformation detection.
5 OPTIMIZATION
The stochastic variables of our model are non-linear and non-
conjugate [
41
]. Hence, the posterior distribution cannot be de-
rived analytically. To approximate the posterior distribution, we
construct an inference model parameterized by
ϕ
to approximate
the intractable true posterior
pθ(z|hc)
; then we derive an objec-
tive function to measure how well
pθ(z|hc)
is approximated; -
nally we exploit the Stochastic Gradient Variational Bayes (SGVB)
method [
24
,
35
] to learn the inference model parameters
ϕ
together
N
y
hcz
h𝔇
𝜃
𝜙
𝜃
𝜃
𝜙
𝜙
Figure 3: The directed graphical model. Grey nodes rep-
resent observed variables while blank nodes represent la-
tent variables. Solid lines denote the generative model
pθ(z|hc)p(y|z,hD), dashed lines denote the variational ap-
proximation qϕ(z|y,hc,hD)to the intractable posterior
pθ(z|hc). The variational parameters are learned together
with the generative model parameters.
with the generative model parameters
θ
. Figure 3 shows the graphi-
cal representation of the generative model and the inference model.
5.1 Inference Model
Following the neural variational inference approach [
24
], we con-
struct an inference model (as in Fig. 3) parameterized by
ϕ
to com-
pute an approximated posterior distribution, called variational dis-
tribution. Given the observed variables, we dene a variational
distribution
qϕ(z|y,hc,hD)
to approximate the true posterior distri-
bution
pθ(z|hc)
. Like for the Variational Auto Encoder (VAE) [
24
],
similarly to Eq. 3 for
pθ(z|hc)
, the variational distribution is chosen
to be a multivariate Gaussian distribution:
qϕ(z|y,hc,hD)=N(z|µϕ,diag(σ2
ϕ)),(10)
where
µϕ
and
diag(σ2
ϕ)
are the mean and the covariance matrix of
the multivariate Gaussian distribution. We use a deep neural net-
work to derive these two parameters from the observed variables:
πϕ=fϕ(y,hc,hD),(11)
µϕ=l3(πϕ),ln(σϕ)=l4(πϕ),(12)
where
fϕ
denotes a MLP, and
l3
and
l4
denote two LTs. Note that in
the inference model to compute
µ
and
log σ
we use
y,hc,hD
and
not only hcas in the generative model.
5.2 Objective Function
In the following we derive the objective function of our Bayesian
deep learning model following the variational principle. To max-
imize the log-likelihood
ln (p(y|hc,hD))
, we derive an Evidence
Lower Bound (ELBO) objective function, which ensures a correct
approximation of the true posterior. To simplify the notation of
the derivation of the objective function we make the following
substitutions:
pθ(y)=pθ(y|z,hD)
,
pθ(z)=pθ(z|hc)
,
qϕ(z)=
Algorithm 1: Optimization of the proposed model.
input : Claim hidden state hc, replies hidden state hD,
veracity label y, the number claims Nand the
learning rate η.
1begin
2θ,ϕInitialize parameters;
3repeat
4Randomly draw a minibatch of Bclaims;
5for i=1,...,B do
6˜
zrandomly draw Ssamples from
N(z|µϕ,diag(σ2
ϕ));
7L(θ,ϕ|yi,hci,hDi) ← compute the loss for one
claim according to Eq. 14;
8L(θ,ϕ|y,hc,hD) ← compute the loss for the full
dataset according to Eq. 15;
9θθη· ∇θL(θ,ϕ|y,hc,hD);
10 ϕϕη· ∇ϕL(θ,ϕ|y,hc,hD);
11 until θ,ϕconverge;
12 return θ,ϕ;
qϕ(z|y,hc,hD). The objective function is derived as follows:
ln (p(y|hc,hD))=ln pθ(y)pθ(z)dz
qϕ(z)ln pθ(z)
qϕ(z)pθ(y)!dz
=Eqϕ(z)[ln (pθ(y))] − DKL [qϕ(z))∥pθ(z)]
=L(θ,ϕ|y,hc,hD),(13)
where
L(θ,ϕ|y,hc,hD)
is the ELBO objective function. The second
line of the derivation is possible by using the Jensen’s inequal-
ity [
16
]. Since the ELBO objective function is a lower bound of
the log-likelihood
ln(p(y|hc,hD))
, its maximization maximizes the
log-likelihood.
5.3 Gradient Estimation
Large-scale inference needs minibatch optimization. Thus, we de-
rive a minibatch-based SGVB estimator to dierentiate and optimize
the ELBO objective function (
L(θ,ϕ|y,hc,hD)
) with respect to both
the inference parameters (ϕ) and the generative parameters (θ).
Through Monte Carlo estimation we compute the expectation
part of the ELBO objective function. Let the minibatch size to be
B
and, for each claim
ci
with
i∈ [
1
,B]
,
S
a sample drawn from the
variational posterior distribution
˜
zqϕ
. Given a subset of claims,
we can construct an estimator of ELBO objective function for the
full dataset based on mini-batches as follows:
L(θ,ϕ|yi,hci,hDi)=1
S
S
Õ
s=1hlogpθ(yi|˜
z(s)
,hDi)i
DK L [qϕ(z))∥pθ(z)],(14)
==L(θ,ϕ|y,hc,hD) ≈ N
B
B
Õ
i=1
L(θ,ϕ|yi,hci,hDi),(15)
where
L(θ,ϕ|yi,hci,hDi)
denote the estimates based on the
i
claim
and
N
is the total number of claims. Algorithm 1 shows the mini-
batch gradient descent optimization process for both the generative
(
θ
) and inference (
ϕ
) parameters. Note that the gradient steps in
Algorithm 1 can easily be alternated with a more powerful optimizer
such as the Adam algorithm [23].
Although both
qϕ(z|y,hc,hD)
and
pθ(z|hc)
are modeled as pa-
rameterized Gaussian distributions. The former is an approximation
of the latter that only functions during learning. The latter, instead,
is the learned distribution from which samples are generated in
order to classify claim veracity.
5.4 Prediction
After training, we compute the posterior distribution
pθ(z|hc)
through the generative network. The actual prediction of a claim
veracity is given by taking the expectation of Ssamples:
y=1
S
S
Õ
s=1
MLP(zs
,hD),(16)
where zsdenote the samples drawn from the true posterior distri-
bution pθ(z|hc).
6 EXPERIMENT SETUP
In this section we start by introducing 4 research questions. We
then present the methodology used to answer them. The software
used to run the experiments in this paper is available on the website
of the rst author.
6.1 Research Questions
We seek to answer the following four research questions, which
will be guide the remainder of the paper:
RQ1
Does our model outperform the state-of-the-art misinforma-
tion detection baselines?
RQ2
Does the incorporation of the latent distribution outperforms
a deterministic counterpart?
RQ3
Does the auxiliary information from people’s replies produce
a more accurate posterior belief of claim veracity?
RQ4
Is the temporal order better than random when encoding
replies?
RQ5
Is it benecial to incorporate a latent variable to encode
replies?
RQ6
How does the dimension of the latent variable
z
aect the
model’s performance?
6.2 Datasets
In order to compare the performance of our proposed model against
the baselines, we experimented with two real-world benchmark
datasets, the RumourEval [
9
] and the Pheme [
48
] datasets. Both
datasets contain Twitter conversation threads about news (like
the example shown in Figure 1). A conversation thread consists
of a tweet making a true or false claim, and branches of people’s
replies expressing their opinion about it. A summary of the datasets
statistics is available in Table 1.
The RumourEval dataset has been developed for the SemEval-
2017 Task 8 competition. This dataset consists of 325 source tweets
Table 1: Statistics of the datasets.
Subset Veracity RumourEval Pheme
#Claims #Replies #Claims #Replies
Training True 83 1,949 861 24,438
False 70 1,504 625 17,676
Total 153 3,453 1,468 42,114
Validation True 10 101 95 1,154
False 12 141 115 1,611
Total 22 242 210 2,765
Testing True 9 412 198 3077
False 12 437 219 3,265
Total 21 849 417 6,342
and 5,568 user reply tweets. The veracity of each tweet can be
true (45%), false (23%) or unveried (32%). Since we aim to only
distinguish true and false claims, we lter out the unveried tweets.
We divide the ltered dataset into a training subset, a validation
subset and a testing subset. The training subset contains 153 claims
with 3,453 replies, the validation subset contains 22 claims with to
242 replies, and the test subset contains 21 claims with 849 replies.
The Pheme dataset is constructed to help understand how users
treat online rumour before and after the news is detected to be
true or false. Like the RumourEval dataset, we divide the Pheme
dataset into a training subset, a validation subset and a testing
subset. Specically, 70% of the claims are randomly selected as
training instances, 10% as validation instances and the rest as testing
instances. Users’ replies are divided according to the claims.
6.3 Evaluation Measures
The misinformation detection task is a binary classication task.
Such tasks are commonly evaluated by the following evaluation
measures: Accuracy, F1, Precision, and Recall.
Accuracy is a common evaluation measure for classication tasks.
However, it is less reliable when datasets suer from class imbal-
ance. The evaluation measures Precision, Recall and F
1
complement
Accuracy because not suering from this problem.
6.4 Hyperparameters Setting
The activation function of the three LSTMs is
tanh
. The activation
function of the MLPs is ReLu.
The hyperparameters tuned on the validation subset are:
the dimension of the hidden layer of all three LSTMs is 30;
the dimension of the latent variables is 10;
the minibatch size is 32;
the number of samples used in Monte Carlo estimates is 20.
State-of-the-art techniques have been employed to optimize the
objective function: Dropout [
38
] is applied to improve neural net-
works training, L2-norm regularization is imposed on the weights of
the neural networks, Adam optimizer [
23
] is exploited for fast con-
vergence, and stepwise exponential learning rate decay is adopted
to anneal the variations of convergence.
6.5 Baselines
We test our Bayesian deep learning model against six state-of-the-
art models. In order to have a fair comparison, only those models
using the claim content and users’ replies have been selected.
Support Vector Machine (SVM).
This model evaluates the per-
formance of manually extracted features. The extracted fea-
tures from claim content include: bag-of-words represen-
tation, presence of URLs, presence of hashtags, proportion
of supporting and denying response [9]. These features are
then input to a linear Support Vector Machine classier. This
classier achieves the highest misinformation detection per-
formance in the SemEval-2017 Task 84;
Convolutional Neural Networks (CNN).
This model evaluates
the performance of CNNs on the veracity detection task.
Apart from the sequential approach such as BiLSTM, the
convolutional model is another powerful neural architec-
ture for natural language understanding [
7
,
8
,
10
,
22
,
45
].
CNN takes as input pre-trained word embeddings generated
with Word2Vec [
30
] trained on the Google News dataset.
To capture features similar to
n
-grams, we apply dierent
convolutional window sizes. A max poling layer is applied
to compress the output information of the convolutional
layers [7];
Tensor Embeddings (TE).
This model leverages tensor decom-
position to derive concise claim embeddings, which are used
to create a claim-by-claim graph on which labels propa-
gate [14];
Evidence-Aware Deep Learning (DeClarE).
This model retriev-
es evidences from replies using claims as a queries [
31
]. Then
both claims and retrieved replies are input into a deep neural
network with attention mechanism. Claim veracity is then
computed by aggregating over the prediction generated by
every claim-retrieved reply pair;
Multitask Learning (Multitask).
This model leverages the rela-
tionship between two tasks of the veracity detection pipe-
line [
25
], stance detection and veracity prediction tasks. The
model is trained on both jointly. We apply the hard parame-
ter sharing mechanism, where dierent tasks share the same
hidden LSTM layers. Task-specic layers takes the shared
hidden information and generate per-task predictions;
Tree-structured RNN (TRNN).
This model learns discriminative
features from replies content by following their non-sequen-
tial propagation structure. Among the proposed two struc-
tures, we select the top-down structure for tweet representa-
tion learning because marginally better than the bottom-up
structure [29].
7 RESULTS AND DISCUSSION
This section answers the research questions proposed in § 6.
7.1 Performance Comparison (RQ1)
Table 2 summarizes the classication performance of the baselines
and our Bayesian deep learning framework on the RumourEval
and Pheme datasets. We can observe that: (1) In terms of
Accuracy
4http://alt.qcri.org/semeval2017/task8/
Table 2: Performance comparison of the proposed Bayesian deep learning framework against the baselines.
Dataset Measure SVM CNN TE DeClarE Multitask TRNN Ours
RumourEval Accuracy (%) 71.42 61.90 66.67 66.67 66.67 76.19 80.95
Precision (%) 66.67 54.54 60.00 58.33 57.14 70.00 77.78
Recall (%) 66.67 66.67 66.67 77.78 88.89 77.78 77.78
F1(%) 66.67 59.88 63.15 66.67 69.57 73.68 77.78
Pheme Accuracy (%) 72.18 59.23 65.22 67.87 74.94 78.65 80.33
Precision (%) 78.80 56.14 63.05 64.68 68.77 77.11 78.29
Recall (%) 75.75 64.64 64.64 71.21 87.87 78.28 79.29
F1(%) 72.10 60.09 63.83 67.89 77.15 77.69 78.78
and
F1
, most deep learning-based models, such as ours, TRNN
and Multitask, outperform the feature engineering-based models,
i.e., SVM. This demonstrates that deep neural networks indeed
help to learn better hidden representation of claims and replies. (2)
Methods exploring relationships between a claim and its replies,
such as ours, TRNN and Multitask, achieve better performance than
claim content-based methods like TE and CNN. This demonstrates
the signicance of utilizing people’s replies in the misinformation
detection task. (3) Our model achieves state-of-the-art performance
on both measures and both datasets demonstrating the eectiveness
of our model in the misinformation detection task. This is true
despite, our system is not the best for precision and recall alone. This
because precision and recall alone do not oer a clear picture about
the performance of a model since one measure can be increased at
the expense of the other and vice versa.
Specically, our model achieves the highest accuracy (80.95%),
precision (77.78%) and F
1
(77.78%) on the RumourEval test subset,
and the highest Accuracy (80.33%) and F
1
(78.78%) on the Pheme test
subset. The TRNN is the strongest baseline achieving the second
highest accuracy and F
1
on both RumourEval (76.19% and 73.68%)
and Pheme (78.65% and 77.69%) test subsets.
7.2 Ablation of the Latent Distribution (RQ2)
In this subsection, we evaluate the impact of using a latent distribu-
tion into the claim encoder on the misinformation detection task.
To evaluate the impact of the latent distribution
p
, we ablate
p
in
our model and compare its classication performance against the
full model. Specically, the ablation is done by taking the output
of the BiLSTM hidden states, i.e.,
hc
and give this as input to the
output MLP. The rest of the model remains unchanged. Since no
latent distribution is involved, the ablated model is optimized in
accordance with the conventional Softmax loss minimization.
In Figure 4(a) and 4(b) we show the classication performance
of the ablated model against the full model on the RumourEval and
Pheme test subsets. We observe that the full model outperforms the
ablated one by at least 7.77% on every evaluation measure. This
demonstrates the better representation quality achieved by the use
of the latent distribution.
7.3 Ablation of People’s Replies (RQ3)
We now evaluate the contribution people’s replies in the misin-
formation detection task. In order to examine its contribution we
compare our full model with and without replies. Specically, we
ablate the input coming from the replies to the nal MLP, which
now is used only to perform a non-linear transformation of the
latent variable z.
In Figure 5(a) and 5(b) we show the classication performance
of the ablated model against the full model on the RumourEval
and the Pheme test subsets. Here, we observe that the auxiliary
information extracted from people’s replies has a large impact to
the nal performance our model. In fact, every evaluation measure
is increased by at least 10.11%.
7.4 Random vs. Temporal Ordered Replies
(RQ4)
The proposed model rank people’s replies based on the temporal
order. In this subsection, we analyze the contribution of ranking the
replies according to their temporal order. We compare this against
a random order.Specically, we randomize the
hM
d
before it is input
to the LSTM.
In Figure 6(a) and 6(b) we show the performance comparison
of these two orders. We observe that the temporal ordered replies
achieve better performance than the random ordered. Besides, the
random ordered model is still worse TRNN yet better than Multitask.
This is probably because TRNN takes the temporal structure of
replies into the model while Multitask fail to involve temporal
information.
7.5 A Latent Distribution for Replies (RQ5)
Considering the improved performance brought by the latent dis-
tribution for claims, in this subsection we answer whether it would
be benecial to incorporate a latent distribution also for replies.
In order to answer this research question, we expand our model
by adding a new latent distribution in the reply encoder. Similarly
to what done for the claim encoder, the new latent distribution is
designed as a multidimensional Gaussian distribution with mean
and covariance matrix derived from the LSTM output
hD
(as in
Eq. 3, 4 and 5). A new latent variable is sampled similarly as in
Eq 6 and input to the MLP to predicting veracity of the claim being
examined.
In Figure 8(a) and 8(b) we show the model performance com-
parison. We observe that the new latent distribution does not have
an eect on the performance on the model for all the evaluation
measures and dataset test subsets. Based on this analysis, we con-
clude that the incorporation of the additional latent distribution
for replies does not provide any additional improvement in perfor-
mance.
Accuracy Precision Recall F1
0 20 40 60 80 100
71.42
80.95
63.63
77.78 70.01
77.78
66.67
77.78
Ablated
Full
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
68.11
80.33
65.41
78.29 69.69
79.29
67.47
78.78
Ablated
Full
(b) Pheme
Figure 4: The impact of the latent distribution pon the model performance. In both gures we show the performance change
on all the evaluation measures of the model with (Full) or without (Ablated) p. Figure (a) shows it for the RumourEval test
subset and Figure (b) shows it for the Pheme test subset.
Accuracy Precision Recall F1
0 20 40 60 80 100
67.67
80.95
61.01
77.78 67.67
77.78
64.23
77.78
Ablated
Full
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
64.26
80.33
61.84
78.29
64.64
79.29
63.21
78.78
Ablated
Full
(b) Pheme
Figure 5: The impact of people’s replies on the model performance. In both gures we show the performance change on all the
evaluation measures of the model with (Full) or without (Ablated) replies information. Figure (a) shows it for the RumourEval
test subset Figure (b) shows it for the Pheme test subset.
Accuracy Precision Recall F1
0 20 40 60 80 100
76.1980.95 75.0177.78
66.67
77.78 70.5977.78
Random
Temporal
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
78.4180.33 76.7378.29 78.2879.29 77.4978.78
Random
Temporal
(b) Pheme
Figure 6: The eect of the temporal order of the reply encoder on model performance. In both gures we show the performance
change on all the evaluation measures of the model with random and temporal ordered people’s replies. Figure (a) shows it
for the RumourEval test subset and Figure (b) shows it for the Pheme test subset.
Accuracy Precision Recall F1
0 20 40 60 80 100
76.1980.95 70.0177.78 77.7877.78 73.6977.78
Expanded
Full
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
77.9380.33 79.4978.29 75.4879.29 77.3478.78
Expanded
Full
(b) Pheme
Figure 7: The eect of an additional latent distribution for people’s replies on the model performance. In both gures we show
the performance change on all the evaluation measures of the model with (Expanded) and without (Full) an additional latent
variable for the people’s replies. Figure (a) shows it for the RumourEval test subset and Figure (b) shows it for the Pheme test
subset.
0 5 10 15 20 25
60 65 70 75 80 85
Accuracy
F1
(a) RumourEval
0 5 10 15 20 25
60 65 70 75 80 85
Accuracy
F1
(b) Pheme
Figure 8: The eect of the latent variable dimension on model performance. In both gures we show how the accuracy and F1
scores change when varying the dimension of z. Figure (a) shows it for the RumourEval test subset and Figure (b) shows it for
the Pheme test subset.
7.6 Sensitivity Analysis (RQ6)
In this subsection we evaluate the eect of the dimension of the
latent variable
z
. To do this after setting a dimension for
z
we
optimize the rest of the hyperparameters on the validation subset.
In Figure 8(a) and 8(b) we show the eect on performance of
the dimension of
z
on both datasets. We observe that the results
are similar for both evaluation measures, accuracy and F
1
. Varying
the dimension from 1 to 5 the model brings a larger performance
improvement than when varying it from 5 to 25. When the dimen-
sion is 15 the model obtains the highest accuracy, 81.22%, on the
RumourEval test subset while when the dimension is 10 the model
obtains the highest F
1
, 78.78%, on the RumourEval test subset and
highest accuracy, 80.33% and F1, 78.78%, on the Pheme test subset.
These results also show that the increase in model capacity may
not necessarily lead to an improvement in performance. The reason
could be found on the limited size of the datasets, which might
cause overtting when the model is too complex.
8 CONCLUSIONS
In this paper, we study the problem of misinformation detection on
social media platforms. One major problem faced by existing ma-
chine learning methods is the inability to represent uncertainty due
to incomplete or nite available information. We address the prob-
lem by proposing a Bayesian deep learning model. When encoding
claim content, we incorporate a latent distribution accounting for
uncertainty and randomness caused by noisy patterns in the nite
dataset. This latent distribution provides a prior belief of claim ve-
racity. We also encode auxiliary information from people’s replies
in a temporal order through an LSTM. Such auxiliary information
is then used to update the prior belief generating a posterior belief.
In order to optimize the Bayes model, we derive a minibatch-based
gradient estimation algorithm. Systematic experimentation has
demonstrated the superiority of our approach against the state-of-
the-art approaches in the misinformation detection task.
Despite encouraging experimental results, online misinforma-
tion detection is still a challenging problem with many open ques-
tions. In this paper, auxiliary information comes from people’s
replies alone, we argue that the proposed model can be enriched
by utilizing other auxiliary information, such as source credibility.
Also, the reply stances are a strong veracity indicator for a claim,
since false claims are usually controversial and accompanied by op-
posite stances. We let for future work, the combination of features
extract from credibility analysis and reply stances.
ACKNOWLEDGMENTS
This project was funded by the EPSRC Fellowship titled "Task Based
Information Retrieval", grant reference number EP/P024289/1. We
acknowledge the support of NVIDIA Corporation with the donation
of the Titan Xp GPU used for this research.
REFERENCES
[1]
Sadia Afroz, Michael Brennan, and Rachel Greenstadt. 2012. Detecting Hoaxes,
Frauds, and Deception in Writing Style Online. In Proceedings of the 2012 IEEE
Symposium on Security and Privacy (SP ’12). IEEE Computer Society, Washington,
DC, USA, 461–475. https://doi.org/10.1109/SP.2012.34
[2]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.
Journal of machine Learning research 3, Jan (2003), 993–1022.
[3]
Dimitrios Bountouridis, Mónica Marrero, Nava Tintarev, and Claudia Hau.
2018. Explaining Credibility in News Articles using Cross-Referencing. In SIGIR
workshop on ExplainAble Recommendation and Search (EARS).
[4]
Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeer, and Matt Stempeck. 2014.
Characterizing the life cycle of online news stories using social media reactions.
In Proceedings of the 17th ACM conference on Computer supported cooperative
work & social computing. ACM, 211–223.
[5]
Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credi-
bility on twitter. In Proceedings of the 20th international conference on World wide
web. ACM, 675–684.
[6]
Yimin Chen, Niall J Conroy, and Victoria L Rubin. 2015. Misleading online
content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on
Workshop on Multimodal Deception Detection. ACM, 15–19.
[7]
Yi-Chin Chen, Zhao-Yang Liu, and Hung-Yu Kao. 2017. IKM at SemEval-2017 Task
8: Convolutional neural networks for stance detection and rumor verication. In
Proceedings of the 11th International Workshop on Semantic Evaluation. 465–469.
[8]
Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Lan-
guage Modeling with Gated Convolutional Networks. In Proceedings of the 34th
International Conference on Machine Learning. 933–941.
[9]
Leon Derczynski, Kalina Bontcheva, Maria Liakata, Rob Procter, Geraldine Wong
Sak Hoi, and Arkaitz Zubiaga. 2017. SemEval-2017 Task 8: RumourEval: De-
termining rumour veracity and support for rumours. In Proceedings of the 11th
International Workshop on Semantic Evaluation (SemEval-2017). Association for
Computational Linguistics, 69–76. https://doi.org/10.18653/v1/S17-2006
[10]
Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks
for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th
International Conference on Computational Linguistics: Technical Papers. 69–78.
[11]
Johannes Fürnkranz. 1998. A study using n-gram features for text categorization.
Austrian Research Institute for Artical Intelligence 3, 1998 (1998), 1–10.
[12]
Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and
Understanding User Reactions to Deceptive and Trusted Social News Sources.
In Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics. Association for Computational Linguistics, 176–181.
[13]
Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2005. Bidirectional
LSTM networks for improved phoneme classication and recognition. In Inter-
national Conference on Articial Neural Networks. Springer, 799–804.
[14]
Gisel Bastidas Guacho, Sara Abdali, Neil Shah, and Evangelos E. Papalexakis.
2018. Semi-supervised Content-Based Detection of Misinformation via Tensor
Embeddings. In IEEE/ACM 2018 International Conference on Advances in Social
Networks Analysis and Mining. 322–325. https://doi.org/10.1109/ASONAM.2018.
8508241
[15]
Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi.
2013. Faking sandy: characterizing and identifying fake images on twitter during
hurricane sandy. In Proceedings of the 22nd international conference on World
Wide Web. ACM, 729–736.
[16]
Frank Hansen and Gert K Pedersen. 2003. Jensen’s operator inequality. Bulletin
of the London Mathematical Society 35, 4 (2003), 553–564.
[17]
Del Harvey and Yoel Roth. 2018. An Update On Our Elections In-
tegrity Work. https://blog.twitter.com/ocial/en{_}us/topics/company/2018/
an-update- on-our-elections- integrity-work.html
[18]
Carl I Hovland and Walter Weiss. 1951. The inuence of source credibility on
communication eectiveness. Public opinion quarterly 15, 4 (1951), 635–650.
[19]
Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News verication
by exploiting conicting social viewpoints in microblogs. In Thirtieth AAAI
Conference on Articial Intelligence.
[20]
Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. 2017. Novel
visual and statistical image features for microblogs news verication. IEEE
transactions on multimedia 19, 3 (2017), 598–608.
[21]
Michal Kakol, Radoslaw Nielek, and Adam Wierzbicki. 2017. Understanding
and predicting Web content credibility using the Content Credibility Corpus.
Information Processing & Management 53, 5 (2017), 1043–1061.
[22]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classication.
In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing. 1746–1751.
[23]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
mization. CoRR abs/1412.6980 (2014).
[24]
Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes.
CoRR abs/1312.6114 (2013). arXiv:1312.6114
[25]
Elena Kochkina, Maria Liakata, and Arkaitz Zubiaga. 2018. All-in-one: Multi-task
Learning for Rumour Verication. In Proceedings of the 27th International Con-
ference on Computational Linguistics. Association for Computational Linguistics,
3402–3413. http://aclweb.org/anthology/C18-1288
[26] Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, et al. 2013. Prominent
features of rumor propagation in online social media. In International Conference
on Data Mining. IEEE.
[27]
Tessa Lyons. 2018. Increasing Our Eorts to Fight False News |
Facebook Newsroom. https://newsroom.fb.com/news/2018/06/
increasing-our- eorts-to-ght- false-news/
[28]
Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. 2015. Detect
rumors using time series of social context information on microblogging websites.
In Proceedings of the 24th ACM International on Conference on Information and
Knowledge Management. ACM, 1751–1754.
[29]
Jing Ma, Wei Gao, and Kam-Fai Wong. 2018. Rumor Detection on Twitter with
Tree-structured Recursive Neural Networks. In Proceedings of the 56th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
Association for Computational Linguistics, 1980–1989.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je Dean. 2013.
Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems. 3111–3119.
[31]
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum.
2018. DeClarE: Debunking Fake News and False Claims using Evidence-Aware
Deep Learning. In Proceedings of the 2018 Conference on Empirical Methods in
Natural Language Processing. 22–32.
[32]
Chanthika Pornpitakpan. 2004. The persuasiveness of source credibility: A critical
review of ve decades’ evidence. Journal of applied social psychology 34, 2 (2004),
243–281.
[33]
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendor, and Benno
Stein. 2018. A Stylometric Inquiry into Hyperpartisan and Fake News. In Proceed-
ings of the 56th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers). Association for Computational Linguistics, 231–240.
[34]
Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi.
2017. Truth of varying shades: Analyzing language in fake news and political
fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in
Natural Language Processing. 2931–2937.
[35]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic
Backpropagation and Approximate Inference in Deep Generative Models. In
Proceedings of the 31st International Conference on Machine Learning (Proceedings
of Machine Learning Research), Eric P. Xing and Tony Jebara (Eds.), Vol. 32. PMLR,
Bejing, China, 1278–1286.
[36]
Victoria Rubin, Niall Conroy, Yimin Chen, and Sarah Cornwell. 2016. Fake news
or truth? using satirical cues to detect potentially misleading news. In Proceedings
of the Second Workshop on Computational Approaches to Deception Detection. 7–17.
[37]
Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. Csi: A hybrid deep model for
fake news detection. In Proceedings of the 2017 ACM on Conference on Information
and Knowledge Management. ACM, 797–806.
[38]
Nitish Srivastava. 2013. Improving neural networks with dropout. University of
Toronto 182 (2013), 566.
[39]
Robert Thomson, Naoya Ito, Hinako Suda, Fangyu Lin, Yafei Liu, Ryo Hayasaka,
Ryuzo Isochi, and Zian Wang. 2012. Trusting tweets: The Fukushima disaster and
information source credibility on Twitter. In Proceedings of the 9th International
ISCRAM Conference. Vancouver: Simon Fraser University, 1–10.
[40]
Shawn Tseng and BJ Fogg. 1999. Credibility and computing technology. Commun.
ACM 42, 5 (1999), 39–44.
[41]
Chong Wang and David M. Blei. 2013. Variational Inference in Nonconjugate
Models. J. Mach. Learn. Res. 14, 1 (April 2013), 1005–1031.
[42]
William Yang Wang. 2017. “Liar, Liar Pants on Fire”: A New Benchmark Dataset
for Fake News Detection. In Proceedings of the 55th Annual Meeting of the As-
sociation for Computational Linguistics (Volume 2: Short Papers). Association for
Computational Linguistics, 422–426. https://doi.org/10.18653/v1/P17-2067
[43]
Aleksander Wawer, Radoslaw Nielek, and Adam Wierzbicki. 2014. Predicting web-
page credibility using linguistic features. In Proceedings of the 23rd international
conference on world wide web. ACM, 1135–1140.
[44]
Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012. Automatic detection of
rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining
Data Semantics. ACM, 13.
[45]
Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, and Philip S Yu.
2018. TI-CNN: Convolutional Neural Networks for Fake News Detection. arXiv
preprint arXiv:1806.00749 (2018).
[46]
Qiang Zhang, Shangsong Liang, Aldo Lipani, Zhaochun Ren, and Emine Yil-
maz. 2019. From Stances’ Imbalance to Their Hierarchical Representation and
Detection. In Companion Proceedings of the The Web Conference 2019. ACM Press.
[47]
Qiang Zhang, Emine Yilmaz, and Shangsong Liang. 2018. Ranking-based Method
for News Stance Detection. In Companion Proceedings of the The Web Conference
2018. ACM Press.
[48]
Arkaitz Zubiaga, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, and Peter
Tolmie. 2016. Analysing how people orient to and spread rumours in social media
by looking at conversational threads. PloS one 11, 3 (2016), e0150989.
... Stances can be categorized as supportive, opposing, and neutral, which can be used to infer statement veracity.As the attention mechanism has led to improved performance in various areas [35,36,38], this observation motivated the development of an attention-based recurrent neural network (RNN) model. Zhang et al. [37] propose a probabilistic deep learning model to utilize user replies as auxiliary evidence. Yang et al. [31] consider statement veracity and user credibility as latent variables to predict their stances towards statement veracity. ...
... RQ4 Ablation study: what is the influence of the variable on the model's performance? For evaluation purposes, we use three well-established benchmark datasets, i.e., Twitter15, Twitter16 [16] and Pheme [37]. Statement veracity is manually annotated by fact-checkers. ...
Preprint
Full-text available
The quality of digital information on the web has been disquieting due to the lack of careful manual review. Consequently, a large volume of false textual information has been disseminating for a long time since the prevalence of social media. The potential negative influence of misinformation on the public is a growing concern. Therefore, it is strongly motivated to detect online misinformation as early as possible. Few-shot-few-clue learning applies in this misinformation detection task when the number of annotated statements is quite few (called few shots) and the corresponding evidence is also quite limited in each shot (called few clues). Within the few-shot-few-clue framework, we propose a Bayesian meta-learning algorithm to extract the shared patterns among different topics (i.e.different tasks) of misinformation. Moreover, we derive a scalable method, i.e., amortized variational inference, to optimize the Bayesian meta-learning algorithm. Empirical results on three benchmark datasets demonstrate the superiority of our algorithm. This work focuses more on optimizing parameters than designing detection models, and will generate fresh insights into data-efficient detection of online misinformation at early stages.
... Various methods have been used to tackle the misinformation problem. Content-based misinformation analysis models apply natural language processing tools to the text content of claims [23]. Alone, content-based models fail to trace the dynamics of spread for tasks such as early detection or spread forecasting. ...
Conference Paper
Misinformation takes the form of a false claim under the guise of fact. It is necessary to protect social media against misinformation by means of effective misinformation detection and analysis. To this end, we formulate misinformation propagation as a dynamic graph, then extract the temporal evolution patterns and geometric features of the propagation graph based on Temporal Point Processes (TPPs). TPPs provide the appropriate modelling framework for a list of stochastic, discrete events. In this context, that is a sequence of social user engagements. Furthermore, we forecast the cumulative number of engaged users based on a power law. Such forecasting capabilities can be useful in assessing the threat level of misinformation pieces. By jointly considering the geometric and temporal propagation patterns, our model has achieved comparable performance with state-of-the-art baselines on two well known datasets.
... A fraction of the PHEME dataset was used in the RumourEval task [19], having only 325 threads of conversations and 145 claims from tweets labeled as true, 74 as false and 106 as unverified. Misinformation detection methods operating both on PHEME and on the RumourEval data sets used either a sifted multi-task learning model with a shared structure for misinformation and stance detection [20], Bayesian Deep Learning models [21] or Deep Markov Random Fields [22]. However, none of these benchmark datasets contain any misinformation about COVID-19 or the vaccines used to protect against it. ...
Article
Enormous hope in the efficacy of vaccines became recently a successful reality in the fight against the COVID-19 pandemic. However, vaccine hesitancy, fueled by exposure to social media misinformation about COVID-19 vaccines became a major hurdle. Therefore, it is essential to automatically detect where misinformation about COVID-19 vaccines on social media is spread and what kind of misinformation is discussed, such that inoculation interventions can be delivered at the right time and in the right place, in addition to interventions designed to address vaccine hesitancy. This paper is addressing the first step in tackling hesitancy against COVID-19 vaccines, namely the automatic detection of known misinformation about the vaccines on Twitter, the social media platform that has the highest volume of conversations about COVID-19 and its vaccines. We present CoVaxLies, a new dataset of tweets judged relevant to several misinformation targets about COVID-19 vaccines on which a novel method of detecting misinformation was developed. Our method organizes CoVaxLies in a Misinformation Knowledge Graph as it casts misinformation detection as a graph link prediction problem. The misinformation detection method detailed in this paper takes advantage of the link scoring functions provided by several knowledge embedding methods. The experimental results demonstrate the superiority of this method when compared with classification-based methods, widely used currently.
... Some works include rich user information and describe the type of user-article interaction. Users are represented by user-generated texts such as their comment/reply to a news article or tweet [23,29,37] and the retweet in which they comment on the original tweet [32]. Although such user representations contain a user's stance and opinion towards an article, users are now merely represented by a single short text that is related to the article in question. ...
Preprint
Full-text available
User-generated content (e.g., tweets and profile descriptions) and shared content between users (e.g., news articles) reflect a user's online identity. This paper investigates whether correlations between user-generated and user-shared content can be leveraged for detecting disinformation in online news articles. We develop a multimodal learning algorithm for disinformation detection. The latent representations of news articles and user-generated content allow that during training the model is guided by the profile of users who prefer content similar to the news article that is evaluated, and this effect is reinforced if that content is shared among different users. By only leveraging user information during model optimization, the model does not rely on user profiling when predicting an article's veracity. The algorithm is successfully applied to three widely used neural classifiers, and results are obtained on different datasets. Visualization techniques show that the proposed model learns feature representations of unseen news articles that better discriminate between fake and real news texts.
... Besides bias detection, rumor and fake news detection are also hot topics employing embedding-based representation. They usually use RNN 132 Journal of Social Computing, June 2021, 2(2): 103−156 models as basic frameworks to encode the text representation [134,177] , with VAE [178] , GAN [179] , and Bayesian model [180] further improving the performance. Detection of other misinformation and misbehavior, such as toxicity triggers [181] , abuse language [139] , and hate speech detection [137] , apply the deep neural networks to obtain the text representation as well. ...
Article
Full-text available
Computational Social Science (CSS), aiming at utilizing computational methods to address social science problems, is a recent emerging and fast-developing field. The study of CSS is data-driven and significantly benefits from the availability of online user-generated contents and social networks, which contain rich text and network data for investigation. However, these large-scale and multi-modal data also present researchers with a great challenge: how to represent data effectively to mine the meanings we want in CSS? To explore the answer, we give a thorough review of data representations in CSS for both text and network. Specifically, we summarize existing representations into two schemes, namely symbol-based and embedding-based representations, and introduce a series of typical methods for each scheme. Afterwards, we present the applications of the above representations based on the investigation of more than 400 research articles from 6 top venues involved with CSS. From the statistics of these applications, we unearth the strength of each kind of representations and discover the tendency that embedding-based representations are emerging and obtaining increasing attention over the last decade. Finally, we discuss several key challenges and open issues for future directions. This survey aims to provide a deeper understanding and more advisable applications of data representations for CSS researchers.
... Metadata-based methods focus on extracting features surrounding sources [27], posts [28], [29], comments [9], users [30], [31], and propagation networks [8] for fake news detection. Concretely, the rapid development of social media has given every anonymous user the opportunity to become a publisher of information, which has greatly increased the measurement complexity of the credibility of information sources, making it difficult to obtain better performance for the credibility of sourcebased methods. ...
Article
Full-text available
The existing data-driven approaches typically capture credibility-indicative representations from relevant articles for fake news detection, such as skeptical and conflicting opinions. However, these methods still have several drawbacks: 1) Due to the difficulty of collecting fake news, the capacity of the existing datasets is relatively small; and 2) there is considerable unverified news that lacks conflicting voices in relevant articles, which makes it difficult for the existing methods to identify their credibility. Especially, the differences between true and fake news are not limited to whether there are conflict features in their relevant articles, but also include more extensive hidden differences at the linguistic level, such as the perspectives of emotional expression (like extreme emotion in fake news), writing style (like the shocking title in clickbait), etc., the existing methods are difficult to fully capture these differences. To capture more general and wide-ranging differences between true and fake news, in this paper, directly from the different categories of news itself, we propose a Category-controlled Encoder-Decoder model (CED) to generate examples with category-differentiated features and extend the dataset capacity to achieve data enhancement effect, thus enhancing fake news detection. The experimental results on three datasets demonstrate the superiority of CED.
... A fraction of the PHEME dataset was used in the RumourEval task [18], having only 325 threads of conversations and 145 claims from tweets labeled as true, 74 as false and 106 as unverified. Misinformation detection methods operating both on PHEME and on the RumourEval data sets used either a sifted multi-task learning models with a shared structure for misinformation and stance detection [19], Bayesian Deep Learning models [20] or Deep Markov Random Fields [21]. However, none of these benchmark datasets contain any misinformation about COVID-19 or the vaccines used to protect against it. ...
Preprint
Full-text available
Enormous hope in the efficacy of vaccines became recently a successful reality in the fight against the COVID-19 pandemic. However, vaccine hesitancy, fueled by exposure to social media misinformation about COVID-19 vaccines became a major hurdle. Therefore, it is essential to automatically detect where misinformation about COVID-19 vaccines on social media is spread and what kind of misinformation is discussed, such that inoculation interventions can be delivered at the right time and in the right place, in addition to interventions designed to address vaccine hesitancy. This paper is addressing the first step in tackling hesitancy against COVID-19 vaccines, namely the automatic detection of misinformation about the vaccines on Twitter, the social media platform that has the highest volume of conversations about COVID-19 and its vaccines. We present CoVaxLies, a new dataset of tweets judged relevant to several misinformation targets about COVID-19 vaccines on which a novel method of detecting misinformation was developed. Our method organizes CoVaxLies in a Misinformation Knowledge Graph as it casts misinformation detection as a graph link prediction problem. The misinformation detection method detailed in this paper takes advantage of the link scoring functions provided by several knowledge embedding methods. The experimental results demonstrate the superiority of this method when compared with classification-based methods, widely used currently.
... Ignoring the uncertainty caused by unreliable relations could lead to lacking robustness and make it risky for rumor detection. Inspired by valuable research (Zhang et al., 2019a) that modeled uncertainty caused by finite available textual contents, this paper makes the first attempt to consider the uncertainty caused by unreliable relations in the propagation structure for rumor detection. ...
Conference Paper
Full-text available
Stance detection has gained increasing interest from the research community due to its importance for fake news detection. The goal of stance detection is to categorize an overall position of a subject towards an object into one of the four classes: agree, disagree, dis-cuss, and unrelated. One of the major problems faced by current machine learning models used for stance detection is caused by a severe class imbalance among these classes. Hence, most models fail to correctly classify instances that fall into minority classes. In this paper, we address this problem by proposing a hierarchical representation of these classes, which combines the agree, disagree, and discuss classes under a new related class. Further, we propose a two-layer neural network that learns from this hierarchical representation and controls the error propagation between the two layers using the Maximum Mean Discrepancy regularizer. Compared with conventional four-way classifiers, this model has two advantages: (1) the hierarchical architecture mitigates the class imbalance problem; (2) the regularization makes the model to better discern between the related and unrelated stances. An extensive experimentation demonstrates state-of-the-art accuracy performance of the proposed model for stance detection.
Conference Paper
Full-text available
Automatic rumor detection is technically very challenging. In this work, we try to learn discriminative features from tweets content by following their non-sequential propagation structure and generate more powerful representations for identifying different type of rumors. We propose two recursive neural models based on a bottom-up and a top-down tree-structured neural networks for rumor representation learning and classification, which naturally conform to the propagation layout of tweets. Results on two public Twit-ter datasets demonstrate that our recursive neural models 1) achieve much better performance than state-of-the-art approaches; 2) demonstrate superior capacity on detecting rumors at very early stage.
Conference Paper
A valuable step towards news veracity assessment is to understand stance from different information sources, and the process is known as the stance detection. Specifically, the stance detection is to detect four kinds of stances ("agree'', "disagree'', "discuss'' and "unrelated'') of the news towards a claim. Existing methods tried to tackle the stance detection problem by classification-based algorithms. However, classification-based algorithms make a strong assumption that there is clear distinction between any two stances, which may not be held in the context of stance detection. Accordingly, we frame the detection problem as a ranking problem and propose a ranking-based method to improve detection performance. Compared with the classification-based methods, the ranking-based method compare the true stance and false stances and maximize the difference between them. Experimental results demonstrate the effectiveness of our proposed method.