Content uploaded by Qiang Zhang
Author content
All content in this area was uploaded by Qiang Zhang on Feb 23, 2019
Content may be subject to copyright.
Reply-Aided Detection of Misinformation
via Bayesian Deep Learning
Qiang Zhang
University College London
London, United Kingdom
qiang.zhang.16@ucl.ac.uk
Aldo Lipani
University College London
London, United Kingdom
aldo.lipani@ucl.ac.uk
Shangsong Liang
Sun Yat-sen University
Guangzhou, China
liangshangsong@gmail.com
Emine Yilmaz
University College London
London, United Kingdom
emine.yilmaz@ucl.ac.uk
ABSTRACT
Social media platforms are a plethora of misinformation and its
potential negative inuence on the public is a growing concern.
This concern has drawn the attention of the research community
on developing mechanisms to detect misinformation. The task of
misinformation detection consists of classifying whether a claim is
True or False. Most research concentrates on developing machine
learning models, such as neural networks, that outputs a single
value in order to predict the veracity of a claim. One of the major
problem faced by these models is the inability of representing the
uncertainty of the prediction, which is due incomplete or nite
available information about the claim being examined. We address
this problem by proposing a Bayesian deep learning model. The
Bayesian model outputs a distribution used to represent both the
prediction and its uncertainty. In addition to the claim content,
we also encode auxiliary information given by people’s replies
to the claim. First, the model encodes a claim to be veried, and
generate a prior belief distribution from which we sample a latent
variable. Second, the model encodes all the people’s replies to the
claim in a temporal order through a Long Short Term Memory
network in order to summarize their content. This summary is
then used to update the prior belief generating the posterior belief.
Moreover, in order to train this model, we develop a Stochastic
Gradient Variational Bayes algorithm to approximate the analyti-
cally intractable posterior distribution. Experiments conducted on
two public datasets demonstrate that our model outperforms the
state-of-the-art detection models.
CCS CONCEPTS
•Information systems →Web mining
;
•Computing method-
ologies →Information extraction.
KEYWORDS
misinformation detection, bayesian analysis, deep learning
This paper is published under the Creative Commons Attribution 4.0 International
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their
personal and corporate Web sites with the appropriate attribution.
WWW ’19, May 13–17, 2019, San Francisco, CA, USA
©
2019 IW3C2 (International World Wide Web Conference Committee), published
under Creative Commons CC-BY 4.0 License.
ACM ISBN 978-1-4503-6674-8/19/05.
https://doi.org/10.1145/3308558.3313718
ACM Reference Format:
Qiang Zhang, Aldo Lipani, Shangsong Liang, and Emine Yilmaz. 2019.
Reply-Aided Detection of Misinformation via Bayesian Deep Learning.
In Proceedings of the 2019 World Wide Web Conference (WWW ’19), May
13–17, 2019, San Francisco, CA, USA. ACM, New York, NY, USA, 11 pages.
https://doi.org/10.1145/3308558.3313718
1 INTRODUCTION
Although the digital news consumption has increased in the last
decade, the increasing amount of misinformation and fake news has
not certainly proven its quality. Dierent from traditional media
where news are published by reputable organizations, online news
on social media platforms such as Facebook and Twitter are shared
by individuals and/or organizations without a careful checking or
with malicious intents. In Figure 1 we show a false claim posted
on Twitter about an alleged shooting in Ottawa. While some users
showed surprise and asked for further clarications in their replies,
other users believed the claim and re-tweeted it as if it was true.
This misinformation, when done on a large scale can inuence
the public by depicting a false picture of reality. Hence, detecting
misinformation eectively has become one of the biggest challenges
faced by social media platforms [17, 27].
A valuable attempt at rectifying this epidemic of false claims has
been tackled by some news websites, such as: Snopes
1
, Polifact
2
,
and Emergent
3
, which have employed professional journalists to
manually check and verify every potential false news. However,
such manual approach is very expensive and way too slow to be
able to check all the daily generated claims appearing on the web.
Thus, making automatic tools is in great need to speed up this
verication process.
In this paper, we tackle the automatic misinformation detection
task, which consists in classifying whether a claim is True or False.
Most existing models employ feature engineering or deep learning
to extract features from claims’ content and auxiliary information
such as people’s replies. However, these models generate determin-
istic mappings to capture the dierence between true or false claims.
A major limitation of these models is their inability to represent
uncertainty caused by incomplete or nite available data about the
claim being examined.
1https://www.snopes.com/
2https://www.politifact.com/
3http://www.emergent.info/
Search filters · Show
New to Twitter?
Sign up now to get your own
personalized timeline!
Sign up
Worldwide trends
ﻪﻌﻤﺠﻟا_مﻮﻳ#
104K Tweets
#Regular4thWin
13.2K Tweets
#FelizFinde
7,427 Tweets
#あなたのファンとアンチの
18.4K Tweets
#ModiSeCBIBachao
34.8K Tweets
Lance
89.9K Tweets
Sinead O'Connor
24.7K Tweets
KARMA In Cinemas
1,367 Tweets
なぎさちゃん
1,278 Tweets
Tory Lanez
24K Tweets
© 2018 Twitter About Help Center Terms
Privacy policy Cookies Ads info
View allPeople
@ottawacity
Updates from City of Ottawa. Not monitored 24/7. Protocol: ow.ly/bge830a9xXg. Services:
ow.ly/EPJB30a9y0F or call 311. En français @ottawaville
Ottawa, Ontario, Canada • ottawa.ca
Tweets
30.2K
Following
32
Followers
183K
Follow
City of Ottawa
CTV News
@CTVNews
· 22 Oct 2014
Police have clarified that there were two shootings in Ottawa today, not
three: at the War Memorial and Parliament Hill.
6 231 72
Police have clarified that there were two shootings in Ottawa today, not three: at the …
Top Latest People Photos Videos News Broadcasts
Home
Moments
Police have clarified that there we
Have an account? Log in
231 Retweets 72 Likes
CTV News
@CTVNews
Police have clarified that there were two
shootings in Ottawa today, not three: at the
War Memorial and Parliament Hill.
11:26 AM - 22 Oct 2014
Follow
6 231 72
dimelo_kan3lla
@KandiiKay
· 22 Oct 2014
Replying to @CTVNews
@CTVNews surprising to learn not all security is armed at the Hill..Y Not ??
Jen Gehl de Laforest
@janiejennifer
· 22 Oct 2014
Replying to @CTVNews
@CTVNews @ctvsaskatoon so what happened at Rideau? Nothing?
Global Awareness 101
@Mononoke__Hime
· 22 Oct 2014
Replying to @CTVNews
RT @CTVNews Police have clarified that there were two shootings in Ottawa
today, not three: at the War Memorial and Parliament Hill. #Canada
@Champagneveli
· 22 Oct 2014
Replying to @CTVNews
“@CTVNews: Police have clarified that there were two shootings in Ottawa today,
not three: at the War Memorial and Parliament Hill.” wow
Figure 1: An example of a false claim and people’s replies to
it. From the replies, 231 users chose to trust the claim and
re-tweeted it as if it was true, while only 4 users asked for
further clarications.
We address this problem by proposing a Bayesian deep learning
model, which incorporates stochastic factors to capture complex
relationships between the latent distribution and the observed vari-
ables. The proposed model makes use of the claim content and
replies content. First, to represent the claim content we employ a
neural model to extract textual features from claims. To deal with
the ambiguity of the language used in claims and obtain salient
credibility information, the model generates a latent distribution
based on the extracted linguistic features. Since no auxiliary infor-
mation has been used so far, we interpret this latent distribution as
aprior belief of the claim being true. Second, to extract auxiliary
information from people’s replies content, we rank all the replies
of the claim in temporal order, and summarize them using a Long
Short Term Memory neural network (LSTM). Finally, after updat-
ing the prior belief with the aid of the LSTM output, the model
computes the veracity prediction and its uncertainty. This updated
prior belief distribution is interpreted as the posterior belief.
In order to train the proposed Bayesian deep learning model,
due to the analytical intractability of the posterior distribution, we
develop a Stochastic Gradient Variational Bayes (SGVB) algorithm.
A tractable Evidence Lower BOund (ELBO) objective function of
our model is derived to approximate the intractable distribution.
The model is optimized along the direction of maximizing the ELBO
objective function.
Our model inherit two advantages: rst of all, the model incorpo-
rates a latent distribution, which enables to represent uncertainty
and promote robustness; second, the Bayesian model formulates all
of its prior knowledge about a claim being examined in the form
of a prior, which can be updated by more added auxiliary infor-
mation generating more accurate detection results. To sum up, the
proposed model advances state-of-the-art methods in four aspects:
(1)
An eective representation of uncertainty due to incom-
plete/nite available data;
(2)
A temporal order-based approach to extract auxiliary infor-
mation from people’s replies;
(3) A SGVB algorithm to infer latent distributions;
(4)
A systematic experimentation of our model on two real-
world datasets.
The remainder of the paper is organized as follows: § 2 summarizes
the related work; § 3 denes the misinformation detection task; § 4
details the proposed Bayesian deep learning model; § 5 derives the
Stochastic Gradient Variational Bayes optimization algorithm; § 6
describes the used datasets and experimental setup; § 7 is devoted
to experimental results, and; § 8 concludes the paper.
2 RELATED WORK
Misinformation has been existing for centuries in dierent forms of
media, such as printed newspaper and television. Recently, online
social media platforms are also suering from the same issues.
Recent work on misinformation detection have tried to understand
the dierences between true and false claims in various aspects:
claim content, information source, multimedia such as aliated
images and videos, and other users’ engagement.
2.1 Textual Content
The text of a claim can provide linguistic features to help predict its
veracity. Since misinformation and false claims are created for nan-
cial or political purposes rather than to report an objective event,
they often contain opinionated or inammatory language [
6
]. In
order to reveal linguistic dierences between true and false claims,
lexical and syntactic features at character, word, sentence and doc-
ument level have been exploited [
1
,
11
,
33
,
36
]. Wawer et al
. [43]
compute psycholinguistic features using a bag-of-words paradigm.
Rashkin et al
. [34]
compare the language of true claims with that of
satire, hoaxes, and propaganda to nd linguistic characteristics of
untrustworthy text. Kakol et al
. [21]
construct a content credibility
corpus and examine a list of language factors that might aect web
content credibility based on which a predictive model is developed.
Bountouridis et al
. [3]
compare heterogeneous articles of the same
story and reveal that pieces of information cross-referenced are
more likely to be credible. Derczynski et al
. [9]
extract features
from claim tweets including bag-of-words, presence of URLs, and
presence of hashtags. A Support Vector Machine (SVM) is then
used to distinguish between true and false claims. Guacho et al
.
[14]
leverages a tensor decomposition to derive concise claim em-
beddings that capture contextual information from each claim; and
uses these embeddings to create a claim-by-claim graph on which
the labels propagate. Textual content has been empirically proven
to be a strong indicator of claim veracity, and thus can be used as a
prior probability.
2.2 Source Credibility Analysis
The credibility analysis of the sources of a claim is an important
auxiliary information. As misinformation is usually published by
unbelievable individuals or automatic bots, credibility plays a cru-
cial role in message communication [
18
,
32
]. Accurate and timely
discrimination of such accounts inhibits the proliferation of mis-
information at an early stage. Tseng and Fogg
[40]
identify two
components of source credibility, namely trustworthiness and ex-
pertise. Trustworthiness is generally taken to mean truthful, un-
biased and well intentioned. Expertise instead is understood as
knowledgeable, experienced and competent. Thus, features that
can reveal the trustworthiness and expertise of information sources
are strong indicators of source credibility. With the aid of informa-
tion source Thomson et al
. [39]
examine the credibility of tweets
related to the Fukushima Daiichi nuclear disaster in Japan. They
found that tweets from highly credible institutions and individuals
are mostly correct. Useful account features can be derived from the
account demographics, such as integrity of personal information,
the number of followers and followees [
5
]. Besides, aggregating a
group of account features are indicative, since spreaders of true and
false claims might come from dierent communities [
44
], such as
the percentage of veried user accounts [
28
] and the average num-
ber of followers [
26
]. However, account demographics can easily be
altered to decrease the similarity between credible and incredible
sources.
2.3 Multimedia Features
Multimedia features have been shown to be an important manip-
ulator for propaganda based on misinformation [
4
]. As we have
characterized, online misinformation exploits the individual vul-
nerabilities of people and thus often relies on sensational or even
fake images to provoke anger or other emotional response of con-
sumers. Visual-based features are extracted from images and videos
to capture the dierent characteristics of misinformation. Faking
images are identied based on various user-level and tweet-level
hand-crafted features [
15
]. Recently, various visual and statistical
features have been extracted for news verication [
20
]. Yang et al
.
[45]
develop a convolutional neural network to extract text and
visual features simultaneously. Visual features include clarity score,
coherence score, diversity score, and clustering score. Statistical
features include count, image ratio, multi-image ratio, hot image
ratio, long image ratio, etc. This approach suers from the prob-
lem that some misinformation on social media does not contain
multimedia content.
2.4 Social Engagement
The news spreading process over time on social media involves
user-driven engagement. Auxiliary information can also be derived
from such engagement to improve the claim veracity detection. Ma
et al
. [29]
propose to learn discriminative features by following
non-sequential propagation structure of tweets. A top-down and a
bottom-up recursive neural networks are proposed to predict claim
veracity. Glenski et al
. [12]
seek to better understand how users
react to trusted and deceptive news sources across two popular,
and very dierent, social media platforms. Signicant dierences
have been observed in the speed and the type of reactions between
trusted and deceptive news sources on Twitter, but far smaller dif-
ferences on Reddit. People react to a piece of claim by expressing
their stances or emotions in social media posts. Stances can be cat-
egorized as supportive, opposing, and neutral, which can be used
to infer claim veracity [
19
,
46
,
47
]. Kochkina et al
. [25]
propose a
neural multi-task model that leverages the relationship between
veracity detection and stance detection in a joint learning setup.
Another common post feature is the topic distribution that indi-
cates the central point of relevant aairs, which is derived by topic
models [
2
]. Post features are expanded in two ways: via aggregation
with relevant posts for a specic aair, and via temporal evolution
of post features. The rst way relies on the “wisdom of crowds” to
locate potential misinformation [
5
], while the second way captures
the periodic uctuations of shock cycles [
26
] or temporal pattern
of user activities, such as the number of engaged users and time
intervals between engagements [37]. Yet, semantic coherence and
temporal changes between users’ replies are not fully explored by
existing methods.
3 PROBLEM STATEMENT
The task of misinformation detection is to predict the news’ veracity
of claims, given their content and their people’s replies.
Let
C={c1,c2, . . . , cN}
be a set of
N
claims. The claim
ci
is
commented by a set of
M
user replies
Di={di,1,di,2, . . . , di,M}
.
We use
yi
to denote the binary veracity label of the claim
ci
, which
could be either
yi=
1for true or
yi=
0for false. The tuple of a claim
and people’s replies, i.e.,
{ci,Di}
, forms a data instance to predict
the claim veracity
yi
. For the sake of clarity, in the following, we will
omit the subscripts iwhen describing a single instance: {c,D,y}.
4 BAYESIAN DEEP LEARNING
In this section, we present our proposed Bayesian deep learning
model that eectively integrates claim and people’s replies. We will
rst introduce how to encode claim content with deep learning and
generate a latent distribution that is interpreted as a prior belief of
claim veracity. We then describe the temporal-ordered approach to
encode people’s replies, which captures semantic variation along
the time line. Finally, we correct the prior belief with the aid of
people’s replies, the result of which process is interpreted as the
posterior belief of claim veracity. Figure 2 describes the proposed
model.
4.1 Encoding a Claim
As content are strong indicators of claim veracity [
42
], we apply
deep learning to extract linguistic features from the claim
c
. To avoid
the ambiguity of claims and obtain salient credibility information,
we generate a latent distribution based on the extracted linguistic
features. The output of this claim encoder is the prior belief of the
veracity of the claim.
Let each claim
c
be a sequence of discrete words or tokens, i.e.,
c=[w1,w2, . . . , wl]
, where
wl∈Rd
is a
d
-dimensional word
embedding vector. Based on the sequence of word embeddings,
textual features are extracted via a Bidirectional Long Short Term
Memory (BiLSTM) neural network [
13
]. The BiLSTM captures long
and short semantic dependencies both from previous time steps
and future time steps via forward and backward states.The BiLSTM
hc
h𝔇
z
y
BiLSTM
c
Claim
Encoder
Police
have
clarified
…
Parliament
Hill
MLP
l1
l2
𝒩(𝝁,diag 𝝈2)
MLP
Replies
Encoder
Word
Embedding
…
dm
BiLSTM
Surprising
…
?
<EoS>
…
wow
<EoS>
LSTM
d1
Word
Embedding
Figure 2: Framework of the Bayesian deep learning model. The framework consists of two parts, the claim encoder (§ 4.1) and
the replies encoder (§ 4.2), the concatenation of which determines the posterior belief of claim veracity. Blocks and nodes rep-
resent computation modules and variables. Grey nodes are observed variables while blank nodes are latent variables (similarly
with Figure 3). Note that blocks of the same color denote the same module.
takes as input
c
, converts the sequence of word embeddings into a
dense representation, and outputs the concatenation of two hidden
states capturing past and future information:
hc=BiLSTM(c),(1)
where hcdenote the concatenated hidden states.
To avoid the ambiguity of claims, instead of a deterministic
non-linear transformation, we generate a latent distribution, from
which we sample a latent stochastic variable
z
. To embed linguistic
information into the latent variable, we set the latent variable to be
conditional on hc:
z∼pθ(z|hc),(2)
where
p
is a latent distribution and
θ
denotes the non-linear trans-
formation of
hc
to generate the parameters of
p
. This non-linear
transformation is essential to capture higher level representations of
hc
; we implement this non-linear transformation via a Multi-Layer
Perceptron (MLP).
We assume that the latent variable
z
is continuous and follows a
multivariate Gaussian distribution. The variable
z
is parameterized
as follows:
pθ(z|hc)=N(z|µθ,diag(σ2
θ)),(3)
where
µθ
and
diag(σ2
θ)
are the mean and the covariance matrix
of the multivariate Gaussian distribution. Since the variable
z
is
conditional on the the claim hidden states
hc
, we derive these two
parameters of the Gaussian distribution from
hc
through a deep
neural network:
πθ=fθ(hc),(4)
µθ=l1(πθ)ln(σθ)=l2(πθ),(5)
where
fθ
denotes a MLP,
l1
and
l2
denote two Linear Transforma-
tions (LT). Since LT can generate negative values, to produce
σθ
we exponentiate the result of l2.
In order to make
µθ
and
σθ
dierentiable and backpropagate the
loss through the latent distribution (
p
), the following reparameteri-
zation trick is used:
z=ϵ·σθ+µθwhere ϵ∼N(0,I),(6)
where
0
is a vector of zeros and
I
is the identity matrix. By making
use of the latent variable (
z
), our model is able to capture complex
noisy patterns in the data.
4.2 Encoding People’s Replies
We now present the people’s replies encoder to obtain auxiliary
information. This auxiliary information is claim-specic and is used
to generate the posterior belief by correcting the prior belief of the
claim veracity.
Replies on social media platforms are listed along the time line
as shown in Figure 1, where the earliest reply appears at the top of
the list. Truth about an event can be gradually manifest as more
evidence emerges, thus we assume that the latest replies tend to be
more reliable and more important than the earlier replies. Based on
this assumption, we design a two-layer recurrent neural network to
encode replies: the rst-layer applies a BiLSTM to summarize the
semantic information of each reply and the second-layer applies a
LSTM to capture the temporal semantic variation of the replies.
Given a claim
c
commented by a list of replies
D={d1, . . .
. . . , dm, . . . , dM}
, these replies are ranked based on their tempo-
ral order. The content of a
dm
consists of a sequence of words
d=[w1, . . . , wk]
. To project the claim and replies into the same
semantic space, we use the same pre-trained word embeddings for
both claims and replies. Hence,
wk∈Rd
is a
d
-dimensional vector
such as the word embedding vector used to encoding the claim. For
the sake of semantic coherence, we also employ the same BiLSTM
to encode both the claim and its replies. Take the reply
dm∈D
for example, the concatenation of hidden states from forward and
backward directions is denoted as:
hdm=BiLSTM(dm),(7)
where hdmis the summary of the reply dm.
In order to capture the semantic information of all replies, we
sequentially input the concatenated hidden states of each reply into
a LSTM. We use the LSTM rather than a BiLSTMbecause the former
gives high weights to recent input, which matches our assumption
on the relative importance of the latest reply. Specically, the LSTM
takes the hidden states of each reply as input in a sequential way:
hD=LSTM(hM
d),(8)
where hM
d={hd1, . . . , hdm, . . . , hdM}.
The last hidden state
hD
contains both the linguistic information
of each reply and the semantic changes between replies along the
time line. This
hD
is then used to generate the posterior belief by
correcting the prior belief of the claim veracity.
4.3 Veracity Modeling
In § 4.1, we developed a prior belief of the claim veracity. In this
section we show how to correct this prior belief by including its
replies.
The posterior belief is generated by combining the claim and
reply information via a MLP. The strong non-linearity of MLPs
make them suitable to nd complex relationships between the
claim and its replies. Specically, the MLP input is the latent claim
variable zconcatenated to the hidden state of replies hD:
y=MLP(z,hD).(9)
This is the nal prediction of our Bayesian deep learning model for
misinformation detection.
5 OPTIMIZATION
The stochastic variables of our model are non-linear and non-
conjugate [
41
]. Hence, the posterior distribution cannot be de-
rived analytically. To approximate the posterior distribution, we
construct an inference model parameterized by
ϕ
to approximate
the intractable true posterior
pθ(z|hc)
; then we derive an objec-
tive function to measure how well
pθ(z|hc)
is approximated; -
nally we exploit the Stochastic Gradient Variational Bayes (SGVB)
method [
24
,
35
] to learn the inference model parameters
ϕ
together
N
y
hcz
h𝔇
𝜃
𝜙
𝜃
𝜃
𝜙
𝜙
Figure 3: The directed graphical model. Grey nodes rep-
resent observed variables while blank nodes represent la-
tent variables. Solid lines denote the generative model
pθ(z|hc)p(y|z,hD), dashed lines denote the variational ap-
proximation qϕ(z|y,hc,hD)to the intractable posterior
pθ(z|hc). The variational parameters are learned together
with the generative model parameters.
with the generative model parameters
θ
. Figure 3 shows the graphi-
cal representation of the generative model and the inference model.
5.1 Inference Model
Following the neural variational inference approach [
24
], we con-
struct an inference model (as in Fig. 3) parameterized by
ϕ
to com-
pute an approximated posterior distribution, called variational dis-
tribution. Given the observed variables, we dene a variational
distribution
qϕ(z|y,hc,hD)
to approximate the true posterior distri-
bution
pθ(z|hc)
. Like for the Variational Auto Encoder (VAE) [
24
],
similarly to Eq. 3 for
pθ(z|hc)
, the variational distribution is chosen
to be a multivariate Gaussian distribution:
qϕ(z|y,hc,hD)=N(z|µϕ,diag(σ2
ϕ)),(10)
where
µϕ
and
diag(σ2
ϕ)
are the mean and the covariance matrix of
the multivariate Gaussian distribution. We use a deep neural net-
work to derive these two parameters from the observed variables:
πϕ=fϕ(y,hc,hD),(11)
µϕ=l3(πϕ),ln(σϕ)=l4(πϕ),(12)
where
fϕ
denotes a MLP, and
l3
and
l4
denote two LTs. Note that in
the inference model to compute
µ
and
log σ
we use
y,hc,hD
and
not only hcas in the generative model.
5.2 Objective Function
In the following we derive the objective function of our Bayesian
deep learning model following the variational principle. To max-
imize the log-likelihood
ln (p(y|hc,hD))
, we derive an Evidence
Lower Bound (ELBO) objective function, which ensures a correct
approximation of the true posterior. To simplify the notation of
the derivation of the objective function we make the following
substitutions:
pθ(y)=pθ(y|z,hD)
,
pθ(z)=pθ(z|hc)
,
qϕ(z)=
Algorithm 1: Optimization of the proposed model.
input : Claim hidden state hc, replies hidden state hD,
veracity label y, the number claims Nand the
learning rate η.
1begin
2θ,ϕ←Initialize parameters;
3repeat
4Randomly draw a minibatch of Bclaims;
5for i=1,...,B do
6˜
z←randomly draw Ssamples from
N(z|µϕ,diag(σ2
ϕ));
7L(θ,ϕ|yi,hci,hDi) ← compute the loss for one
claim according to Eq. 14;
8L(θ,ϕ|y,hc,hD) ← compute the loss for the full
dataset according to Eq. 15;
9θ←θ−η· ∇θL(θ,ϕ|y,hc,hD);
10 ϕ←ϕ−η· ∇ϕL(θ,ϕ|y,hc,hD);
11 until θ,ϕconverge;
12 return θ,ϕ;
qϕ(z|y,hc,hD). The objective function is derived as follows:
ln (p(y|hc,hD))=ln ∫pθ(y)pθ(z)dz
≥∫qϕ(z)ln pθ(z)
qϕ(z)pθ(y)!dz
=Eqϕ(z)[ln (pθ(y))] − DKL [qϕ(z))∥pθ(z)]
=L(θ,ϕ|y,hc,hD),(13)
where
L(θ,ϕ|y,hc,hD)
is the ELBO objective function. The second
line of the derivation is possible by using the Jensen’s inequal-
ity [
16
]. Since the ELBO objective function is a lower bound of
the log-likelihood
ln(p(y|hc,hD))
, its maximization maximizes the
log-likelihood.
5.3 Gradient Estimation
Large-scale inference needs minibatch optimization. Thus, we de-
rive a minibatch-based SGVB estimator to dierentiate and optimize
the ELBO objective function (
L(θ,ϕ|y,hc,hD)
) with respect to both
the inference parameters (ϕ) and the generative parameters (θ).
Through Monte Carlo estimation we compute the expectation
part of the ELBO objective function. Let the minibatch size to be
B
and, for each claim
ci
with
i∈ [
1
,B]
,
S
a sample drawn from the
variational posterior distribution
˜
z∼qϕ
. Given a subset of claims,
we can construct an estimator of ELBO objective function for the
full dataset based on mini-batches as follows:
L(θ,ϕ|yi,hci,hDi)=1
S
S
Õ
s=1hlogpθ(yi|˜
z(s)
,hDi)i
−DK L [qϕ(z))∥pθ(z)],(14)
==L(θ,ϕ|y,hc,hD) ≈ N
B
B
Õ
i=1
L(θ,ϕ|yi,hci,hDi),(15)
where
L(θ,ϕ|yi,hci,hDi)
denote the estimates based on the
i
claim
and
N
is the total number of claims. Algorithm 1 shows the mini-
batch gradient descent optimization process for both the generative
(
θ
) and inference (
ϕ
) parameters. Note that the gradient steps in
Algorithm 1 can easily be alternated with a more powerful optimizer
such as the Adam algorithm [23].
Although both
qϕ(z|y,hc,hD)
and
pθ(z|hc)
are modeled as pa-
rameterized Gaussian distributions. The former is an approximation
of the latter that only functions during learning. The latter, instead,
is the learned distribution from which samples are generated in
order to classify claim veracity.
5.4 Prediction
After training, we compute the posterior distribution
pθ(z|hc)
through the generative network. The actual prediction of a claim
veracity is given by taking the expectation of Ssamples:
y=1
S
S
Õ
s=1
MLP(zs
,hD),(16)
where zsdenote the samples drawn from the true posterior distri-
bution pθ(z|hc).
6 EXPERIMENT SETUP
In this section we start by introducing 4 research questions. We
then present the methodology used to answer them. The software
used to run the experiments in this paper is available on the website
of the rst author.
6.1 Research Questions
We seek to answer the following four research questions, which
will be guide the remainder of the paper:
RQ1
Does our model outperform the state-of-the-art misinforma-
tion detection baselines?
RQ2
Does the incorporation of the latent distribution outperforms
a deterministic counterpart?
RQ3
Does the auxiliary information from people’s replies produce
a more accurate posterior belief of claim veracity?
RQ4
Is the temporal order better than random when encoding
replies?
RQ5
Is it benecial to incorporate a latent variable to encode
replies?
RQ6
How does the dimension of the latent variable
z
aect the
model’s performance?
6.2 Datasets
In order to compare the performance of our proposed model against
the baselines, we experimented with two real-world benchmark
datasets, the RumourEval [
9
] and the Pheme [
48
] datasets. Both
datasets contain Twitter conversation threads about news (like
the example shown in Figure 1). A conversation thread consists
of a tweet making a true or false claim, and branches of people’s
replies expressing their opinion about it. A summary of the datasets
statistics is available in Table 1.
The RumourEval dataset has been developed for the SemEval-
2017 Task 8 competition. This dataset consists of 325 source tweets
Table 1: Statistics of the datasets.
Subset Veracity RumourEval Pheme
#Claims #Replies #Claims #Replies
Training True 83 1,949 861 24,438
False 70 1,504 625 17,676
Total 153 3,453 1,468 42,114
Validation True 10 101 95 1,154
False 12 141 115 1,611
Total 22 242 210 2,765
Testing True 9 412 198 3077
False 12 437 219 3,265
Total 21 849 417 6,342
and 5,568 user reply tweets. The veracity of each tweet can be
true (45%), false (23%) or unveried (32%). Since we aim to only
distinguish true and false claims, we lter out the unveried tweets.
We divide the ltered dataset into a training subset, a validation
subset and a testing subset. The training subset contains 153 claims
with 3,453 replies, the validation subset contains 22 claims with to
242 replies, and the test subset contains 21 claims with 849 replies.
The Pheme dataset is constructed to help understand how users
treat online rumour before and after the news is detected to be
true or false. Like the RumourEval dataset, we divide the Pheme
dataset into a training subset, a validation subset and a testing
subset. Specically, 70% of the claims are randomly selected as
training instances, 10% as validation instances and the rest as testing
instances. Users’ replies are divided according to the claims.
6.3 Evaluation Measures
The misinformation detection task is a binary classication task.
Such tasks are commonly evaluated by the following evaluation
measures: Accuracy, F1, Precision, and Recall.
Accuracy is a common evaluation measure for classication tasks.
However, it is less reliable when datasets suer from class imbal-
ance. The evaluation measures Precision, Recall and F
1
complement
Accuracy because not suering from this problem.
6.4 Hyperparameters Setting
The activation function of the three LSTMs is
tanh
. The activation
function of the MLPs is ReLu.
The hyperparameters tuned on the validation subset are:
•the dimension of the hidden layer of all three LSTMs is 30;
•the dimension of the latent variables is 10;
•the minibatch size is 32;
•
the number of samples used in Monte Carlo estimates is 20.
State-of-the-art techniques have been employed to optimize the
objective function: Dropout [
38
] is applied to improve neural net-
works training, L2-norm regularization is imposed on the weights of
the neural networks, Adam optimizer [
23
] is exploited for fast con-
vergence, and stepwise exponential learning rate decay is adopted
to anneal the variations of convergence.
6.5 Baselines
We test our Bayesian deep learning model against six state-of-the-
art models. In order to have a fair comparison, only those models
using the claim content and users’ replies have been selected.
Support Vector Machine (SVM).
This model evaluates the per-
formance of manually extracted features. The extracted fea-
tures from claim content include: bag-of-words represen-
tation, presence of URLs, presence of hashtags, proportion
of supporting and denying response [9]. These features are
then input to a linear Support Vector Machine classier. This
classier achieves the highest misinformation detection per-
formance in the SemEval-2017 Task 84;
Convolutional Neural Networks (CNN).
This model evaluates
the performance of CNNs on the veracity detection task.
Apart from the sequential approach such as BiLSTM, the
convolutional model is another powerful neural architec-
ture for natural language understanding [
7
,
8
,
10
,
22
,
45
].
CNN takes as input pre-trained word embeddings generated
with Word2Vec [
30
] trained on the Google News dataset.
To capture features similar to
n
-grams, we apply dierent
convolutional window sizes. A max poling layer is applied
to compress the output information of the convolutional
layers [7];
Tensor Embeddings (TE).
This model leverages tensor decom-
position to derive concise claim embeddings, which are used
to create a claim-by-claim graph on which labels propa-
gate [14];
Evidence-Aware Deep Learning (DeClarE).
This model retriev-
es evidences from replies using claims as a queries [
31
]. Then
both claims and retrieved replies are input into a deep neural
network with attention mechanism. Claim veracity is then
computed by aggregating over the prediction generated by
every claim-retrieved reply pair;
Multitask Learning (Multitask).
This model leverages the rela-
tionship between two tasks of the veracity detection pipe-
line [
25
], stance detection and veracity prediction tasks. The
model is trained on both jointly. We apply the hard parame-
ter sharing mechanism, where dierent tasks share the same
hidden LSTM layers. Task-specic layers takes the shared
hidden information and generate per-task predictions;
Tree-structured RNN (TRNN).
This model learns discriminative
features from replies content by following their non-sequen-
tial propagation structure. Among the proposed two struc-
tures, we select the top-down structure for tweet representa-
tion learning because marginally better than the bottom-up
structure [29].
7 RESULTS AND DISCUSSION
This section answers the research questions proposed in § 6.
7.1 Performance Comparison (RQ1)
Table 2 summarizes the classication performance of the baselines
and our Bayesian deep learning framework on the RumourEval
and Pheme datasets. We can observe that: (1) In terms of
Accuracy
4http://alt.qcri.org/semeval2017/task8/
Table 2: Performance comparison of the proposed Bayesian deep learning framework against the baselines.
Dataset Measure SVM CNN TE DeClarE Multitask TRNN Ours
RumourEval Accuracy (%) 71.42 61.90 66.67 66.67 66.67 76.19 80.95
Precision (%) 66.67 54.54 60.00 58.33 57.14 70.00 77.78
Recall (%) 66.67 66.67 66.67 77.78 88.89 77.78 77.78
F1(%) 66.67 59.88 63.15 66.67 69.57 73.68 77.78
Pheme Accuracy (%) 72.18 59.23 65.22 67.87 74.94 78.65 80.33
Precision (%) 78.80 56.14 63.05 64.68 68.77 77.11 78.29
Recall (%) 75.75 64.64 64.64 71.21 87.87 78.28 79.29
F1(%) 72.10 60.09 63.83 67.89 77.15 77.69 78.78
and
F1
, most deep learning-based models, such as ours, TRNN
and Multitask, outperform the feature engineering-based models,
i.e., SVM. This demonstrates that deep neural networks indeed
help to learn better hidden representation of claims and replies. (2)
Methods exploring relationships between a claim and its replies,
such as ours, TRNN and Multitask, achieve better performance than
claim content-based methods like TE and CNN. This demonstrates
the signicance of utilizing people’s replies in the misinformation
detection task. (3) Our model achieves state-of-the-art performance
on both measures and both datasets demonstrating the eectiveness
of our model in the misinformation detection task. This is true
despite, our system is not the best for precision and recall alone. This
because precision and recall alone do not oer a clear picture about
the performance of a model since one measure can be increased at
the expense of the other and vice versa.
Specically, our model achieves the highest accuracy (80.95%),
precision (77.78%) and F
1
(77.78%) on the RumourEval test subset,
and the highest Accuracy (80.33%) and F
1
(78.78%) on the Pheme test
subset. The TRNN is the strongest baseline achieving the second
highest accuracy and F
1
on both RumourEval (76.19% and 73.68%)
and Pheme (78.65% and 77.69%) test subsets.
7.2 Ablation of the Latent Distribution (RQ2)
In this subsection, we evaluate the impact of using a latent distribu-
tion into the claim encoder on the misinformation detection task.
To evaluate the impact of the latent distribution
p
, we ablate
p
in
our model and compare its classication performance against the
full model. Specically, the ablation is done by taking the output
of the BiLSTM hidden states, i.e.,
hc
and give this as input to the
output MLP. The rest of the model remains unchanged. Since no
latent distribution is involved, the ablated model is optimized in
accordance with the conventional Softmax loss minimization.
In Figure 4(a) and 4(b) we show the classication performance
of the ablated model against the full model on the RumourEval and
Pheme test subsets. We observe that the full model outperforms the
ablated one by at least 7.77% on every evaluation measure. This
demonstrates the better representation quality achieved by the use
of the latent distribution.
7.3 Ablation of People’s Replies (RQ3)
We now evaluate the contribution people’s replies in the misin-
formation detection task. In order to examine its contribution we
compare our full model with and without replies. Specically, we
ablate the input coming from the replies to the nal MLP, which
now is used only to perform a non-linear transformation of the
latent variable z.
In Figure 5(a) and 5(b) we show the classication performance
of the ablated model against the full model on the RumourEval
and the Pheme test subsets. Here, we observe that the auxiliary
information extracted from people’s replies has a large impact to
the nal performance our model. In fact, every evaluation measure
is increased by at least 10.11%.
7.4 Random vs. Temporal Ordered Replies
(RQ4)
The proposed model rank people’s replies based on the temporal
order. In this subsection, we analyze the contribution of ranking the
replies according to their temporal order. We compare this against
a random order.Specically, we randomize the
hM
d
before it is input
to the LSTM.
In Figure 6(a) and 6(b) we show the performance comparison
of these two orders. We observe that the temporal ordered replies
achieve better performance than the random ordered. Besides, the
random ordered model is still worse TRNN yet better than Multitask.
This is probably because TRNN takes the temporal structure of
replies into the model while Multitask fail to involve temporal
information.
7.5 A Latent Distribution for Replies (RQ5)
Considering the improved performance brought by the latent dis-
tribution for claims, in this subsection we answer whether it would
be benecial to incorporate a latent distribution also for replies.
In order to answer this research question, we expand our model
by adding a new latent distribution in the reply encoder. Similarly
to what done for the claim encoder, the new latent distribution is
designed as a multidimensional Gaussian distribution with mean
and covariance matrix derived from the LSTM output
hD
(as in
Eq. 3, 4 and 5). A new latent variable is sampled similarly as in
Eq 6 and input to the MLP to predicting veracity of the claim being
examined.
In Figure 8(a) and 8(b) we show the model performance com-
parison. We observe that the new latent distribution does not have
an eect on the performance on the model for all the evaluation
measures and dataset test subsets. Based on this analysis, we con-
clude that the incorporation of the additional latent distribution
for replies does not provide any additional improvement in perfor-
mance.
Accuracy Precision Recall F1
0 20 40 60 80 100
71.42
80.95
63.63
77.78 70.01
77.78
66.67
77.78
Ablated
Full
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
68.11
80.33
65.41
78.29 69.69
79.29
67.47
78.78
Ablated
Full
(b) Pheme
Figure 4: The impact of the latent distribution pon the model performance. In both gures we show the performance change
on all the evaluation measures of the model with (Full) or without (Ablated) p. Figure (a) shows it for the RumourEval test
subset and Figure (b) shows it for the Pheme test subset.
Accuracy Precision Recall F1
0 20 40 60 80 100
67.67
80.95
61.01
77.78 67.67
77.78
64.23
77.78
Ablated
Full
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
64.26
80.33
61.84
78.29
64.64
79.29
63.21
78.78
Ablated
Full
(b) Pheme
Figure 5: The impact of people’s replies on the model performance. In both gures we show the performance change on all the
evaluation measures of the model with (Full) or without (Ablated) replies information. Figure (a) shows it for the RumourEval
test subset Figure (b) shows it for the Pheme test subset.
Accuracy Precision Recall F1
0 20 40 60 80 100
76.1980.95 75.0177.78
66.67
77.78 70.5977.78
Random
Temporal
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
78.4180.33 76.7378.29 78.2879.29 77.4978.78
Random
Temporal
(b) Pheme
Figure 6: The eect of the temporal order of the reply encoder on model performance. In both gures we show the performance
change on all the evaluation measures of the model with random and temporal ordered people’s replies. Figure (a) shows it
for the RumourEval test subset and Figure (b) shows it for the Pheme test subset.
Accuracy Precision Recall F1
0 20 40 60 80 100
76.1980.95 70.0177.78 77.7877.78 73.6977.78
Expanded
Full
(a) RumourEval
Accuracy Precision Recall F1
0 20 40 60 80 100
77.9380.33 79.4978.29 75.4879.29 77.3478.78
Expanded
Full
(b) Pheme
Figure 7: The eect of an additional latent distribution for people’s replies on the model performance. In both gures we show
the performance change on all the evaluation measures of the model with (Expanded) and without (Full) an additional latent
variable for the people’s replies. Figure (a) shows it for the RumourEval test subset and Figure (b) shows it for the Pheme test
subset.
0 5 10 15 20 25
60 65 70 75 80 85
Accuracy
F1
(a) RumourEval
0 5 10 15 20 25
60 65 70 75 80 85
Accuracy
F1
(b) Pheme
Figure 8: The eect of the latent variable dimension on model performance. In both gures we show how the accuracy and F1
scores change when varying the dimension of z. Figure (a) shows it for the RumourEval test subset and Figure (b) shows it for
the Pheme test subset.
7.6 Sensitivity Analysis (RQ6)
In this subsection we evaluate the eect of the dimension of the
latent variable
z
. To do this after setting a dimension for
z
we
optimize the rest of the hyperparameters on the validation subset.
In Figure 8(a) and 8(b) we show the eect on performance of
the dimension of
z
on both datasets. We observe that the results
are similar for both evaluation measures, accuracy and F
1
. Varying
the dimension from 1 to 5 the model brings a larger performance
improvement than when varying it from 5 to 25. When the dimen-
sion is 15 the model obtains the highest accuracy, 81.22%, on the
RumourEval test subset while when the dimension is 10 the model
obtains the highest F
1
, 78.78%, on the RumourEval test subset and
highest accuracy, 80.33% and F1, 78.78%, on the Pheme test subset.
These results also show that the increase in model capacity may
not necessarily lead to an improvement in performance. The reason
could be found on the limited size of the datasets, which might
cause overtting when the model is too complex.
8 CONCLUSIONS
In this paper, we study the problem of misinformation detection on
social media platforms. One major problem faced by existing ma-
chine learning methods is the inability to represent uncertainty due
to incomplete or nite available information. We address the prob-
lem by proposing a Bayesian deep learning model. When encoding
claim content, we incorporate a latent distribution accounting for
uncertainty and randomness caused by noisy patterns in the nite
dataset. This latent distribution provides a prior belief of claim ve-
racity. We also encode auxiliary information from people’s replies
in a temporal order through an LSTM. Such auxiliary information
is then used to update the prior belief generating a posterior belief.
In order to optimize the Bayes model, we derive a minibatch-based
gradient estimation algorithm. Systematic experimentation has
demonstrated the superiority of our approach against the state-of-
the-art approaches in the misinformation detection task.
Despite encouraging experimental results, online misinforma-
tion detection is still a challenging problem with many open ques-
tions. In this paper, auxiliary information comes from people’s
replies alone, we argue that the proposed model can be enriched
by utilizing other auxiliary information, such as source credibility.
Also, the reply stances are a strong veracity indicator for a claim,
since false claims are usually controversial and accompanied by op-
posite stances. We let for future work, the combination of features
extract from credibility analysis and reply stances.
ACKNOWLEDGMENTS
This project was funded by the EPSRC Fellowship titled "Task Based
Information Retrieval", grant reference number EP/P024289/1. We
acknowledge the support of NVIDIA Corporation with the donation
of the Titan Xp GPU used for this research.
REFERENCES
[1]
Sadia Afroz, Michael Brennan, and Rachel Greenstadt. 2012. Detecting Hoaxes,
Frauds, and Deception in Writing Style Online. In Proceedings of the 2012 IEEE
Symposium on Security and Privacy (SP ’12). IEEE Computer Society, Washington,
DC, USA, 461–475. https://doi.org/10.1109/SP.2012.34
[2]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.
Journal of machine Learning research 3, Jan (2003), 993–1022.
[3]
Dimitrios Bountouridis, Mónica Marrero, Nava Tintarev, and Claudia Hau.
2018. Explaining Credibility in News Articles using Cross-Referencing. In SIGIR
workshop on ExplainAble Recommendation and Search (EARS).
[4]
Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeer, and Matt Stempeck. 2014.
Characterizing the life cycle of online news stories using social media reactions.
In Proceedings of the 17th ACM conference on Computer supported cooperative
work & social computing. ACM, 211–223.
[5]
Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credi-
bility on twitter. In Proceedings of the 20th international conference on World wide
web. ACM, 675–684.
[6]
Yimin Chen, Niall J Conroy, and Victoria L Rubin. 2015. Misleading online
content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on
Workshop on Multimodal Deception Detection. ACM, 15–19.
[7]
Yi-Chin Chen, Zhao-Yang Liu, and Hung-Yu Kao. 2017. IKM at SemEval-2017 Task
8: Convolutional neural networks for stance detection and rumor verication. In
Proceedings of the 11th International Workshop on Semantic Evaluation. 465–469.
[8]
Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Lan-
guage Modeling with Gated Convolutional Networks. In Proceedings of the 34th
International Conference on Machine Learning. 933–941.
[9]
Leon Derczynski, Kalina Bontcheva, Maria Liakata, Rob Procter, Geraldine Wong
Sak Hoi, and Arkaitz Zubiaga. 2017. SemEval-2017 Task 8: RumourEval: De-
termining rumour veracity and support for rumours. In Proceedings of the 11th
International Workshop on Semantic Evaluation (SemEval-2017). Association for
Computational Linguistics, 69–76. https://doi.org/10.18653/v1/S17-2006
[10]
Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks
for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th
International Conference on Computational Linguistics: Technical Papers. 69–78.
[11]
Johannes Fürnkranz. 1998. A study using n-gram features for text categorization.
Austrian Research Institute for Artical Intelligence 3, 1998 (1998), 1–10.
[12]
Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and
Understanding User Reactions to Deceptive and Trusted Social News Sources.
In Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics. Association for Computational Linguistics, 176–181.
[13]
Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2005. Bidirectional
LSTM networks for improved phoneme classication and recognition. In Inter-
national Conference on Articial Neural Networks. Springer, 799–804.
[14]
Gisel Bastidas Guacho, Sara Abdali, Neil Shah, and Evangelos E. Papalexakis.
2018. Semi-supervised Content-Based Detection of Misinformation via Tensor
Embeddings. In IEEE/ACM 2018 International Conference on Advances in Social
Networks Analysis and Mining. 322–325. https://doi.org/10.1109/ASONAM.2018.
8508241
[15]
Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi.
2013. Faking sandy: characterizing and identifying fake images on twitter during
hurricane sandy. In Proceedings of the 22nd international conference on World
Wide Web. ACM, 729–736.
[16]
Frank Hansen and Gert K Pedersen. 2003. Jensen’s operator inequality. Bulletin
of the London Mathematical Society 35, 4 (2003), 553–564.
[17]
Del Harvey and Yoel Roth. 2018. An Update On Our Elections In-
tegrity Work. https://blog.twitter.com/ocial/en{_}us/topics/company/2018/
an-update- on-our-elections- integrity-work.html
[18]
Carl I Hovland and Walter Weiss. 1951. The inuence of source credibility on
communication eectiveness. Public opinion quarterly 15, 4 (1951), 635–650.
[19]
Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News verication
by exploiting conicting social viewpoints in microblogs. In Thirtieth AAAI
Conference on Articial Intelligence.
[20]
Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. 2017. Novel
visual and statistical image features for microblogs news verication. IEEE
transactions on multimedia 19, 3 (2017), 598–608.
[21]
Michal Kakol, Radoslaw Nielek, and Adam Wierzbicki. 2017. Understanding
and predicting Web content credibility using the Content Credibility Corpus.
Information Processing & Management 53, 5 (2017), 1043–1061.
[22]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classication.
In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing. 1746–1751.
[23]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
mization. CoRR abs/1412.6980 (2014).
[24]
Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes.
CoRR abs/1312.6114 (2013). arXiv:1312.6114
[25]
Elena Kochkina, Maria Liakata, and Arkaitz Zubiaga. 2018. All-in-one: Multi-task
Learning for Rumour Verication. In Proceedings of the 27th International Con-
ference on Computational Linguistics. Association for Computational Linguistics,
3402–3413. http://aclweb.org/anthology/C18-1288
[26] Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, et al. 2013. Prominent
features of rumor propagation in online social media. In International Conference
on Data Mining. IEEE.
[27]
Tessa Lyons. 2018. Increasing Our Eorts to Fight False News |
Facebook Newsroom. https://newsroom.fb.com/news/2018/06/
increasing-our- eorts-to-ght- false-news/
[28]
Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. 2015. Detect
rumors using time series of social context information on microblogging websites.
In Proceedings of the 24th ACM International on Conference on Information and
Knowledge Management. ACM, 1751–1754.
[29]
Jing Ma, Wei Gao, and Kam-Fai Wong. 2018. Rumor Detection on Twitter with
Tree-structured Recursive Neural Networks. In Proceedings of the 56th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
Association for Computational Linguistics, 1980–1989.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je Dean. 2013.
Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems. 3111–3119.
[31]
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum.
2018. DeClarE: Debunking Fake News and False Claims using Evidence-Aware
Deep Learning. In Proceedings of the 2018 Conference on Empirical Methods in
Natural Language Processing. 22–32.
[32]
Chanthika Pornpitakpan. 2004. The persuasiveness of source credibility: A critical
review of ve decades’ evidence. Journal of applied social psychology 34, 2 (2004),
243–281.
[33]
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendor, and Benno
Stein. 2018. A Stylometric Inquiry into Hyperpartisan and Fake News. In Proceed-
ings of the 56th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers). Association for Computational Linguistics, 231–240.
[34]
Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi.
2017. Truth of varying shades: Analyzing language in fake news and political
fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in
Natural Language Processing. 2931–2937.
[35]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic
Backpropagation and Approximate Inference in Deep Generative Models. In
Proceedings of the 31st International Conference on Machine Learning (Proceedings
of Machine Learning Research), Eric P. Xing and Tony Jebara (Eds.), Vol. 32. PMLR,
Bejing, China, 1278–1286.
[36]
Victoria Rubin, Niall Conroy, Yimin Chen, and Sarah Cornwell. 2016. Fake news
or truth? using satirical cues to detect potentially misleading news. In Proceedings
of the Second Workshop on Computational Approaches to Deception Detection. 7–17.
[37]
Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. Csi: A hybrid deep model for
fake news detection. In Proceedings of the 2017 ACM on Conference on Information
and Knowledge Management. ACM, 797–806.
[38]
Nitish Srivastava. 2013. Improving neural networks with dropout. University of
Toronto 182 (2013), 566.
[39]
Robert Thomson, Naoya Ito, Hinako Suda, Fangyu Lin, Yafei Liu, Ryo Hayasaka,
Ryuzo Isochi, and Zian Wang. 2012. Trusting tweets: The Fukushima disaster and
information source credibility on Twitter. In Proceedings of the 9th International
ISCRAM Conference. Vancouver: Simon Fraser University, 1–10.
[40]
Shawn Tseng and BJ Fogg. 1999. Credibility and computing technology. Commun.
ACM 42, 5 (1999), 39–44.
[41]
Chong Wang and David M. Blei. 2013. Variational Inference in Nonconjugate
Models. J. Mach. Learn. Res. 14, 1 (April 2013), 1005–1031.
[42]
William Yang Wang. 2017. “Liar, Liar Pants on Fire”: A New Benchmark Dataset
for Fake News Detection. In Proceedings of the 55th Annual Meeting of the As-
sociation for Computational Linguistics (Volume 2: Short Papers). Association for
Computational Linguistics, 422–426. https://doi.org/10.18653/v1/P17-2067
[43]
Aleksander Wawer, Radoslaw Nielek, and Adam Wierzbicki. 2014. Predicting web-
page credibility using linguistic features. In Proceedings of the 23rd international
conference on world wide web. ACM, 1135–1140.
[44]
Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012. Automatic detection of
rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining
Data Semantics. ACM, 13.
[45]
Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, and Philip S Yu.
2018. TI-CNN: Convolutional Neural Networks for Fake News Detection. arXiv
preprint arXiv:1806.00749 (2018).
[46]
Qiang Zhang, Shangsong Liang, Aldo Lipani, Zhaochun Ren, and Emine Yil-
maz. 2019. From Stances’ Imbalance to Their Hierarchical Representation and
Detection. In Companion Proceedings of the The Web Conference 2019. ACM Press.
[47]
Qiang Zhang, Emine Yilmaz, and Shangsong Liang. 2018. Ranking-based Method
for News Stance Detection. In Companion Proceedings of the The Web Conference
2018. ACM Press.
[48]
Arkaitz Zubiaga, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, and Peter
Tolmie. 2016. Analysing how people orient to and spread rumours in social media
by looking at conversational threads. PloS one 11, 3 (2016), e0150989.