Conference PaperPDF Available

Current State of the Art to Detect Fake News in Social Media: Global Trendings and Next Challenges

Authors:

Abstract

Nowadays, false news can be created and disseminated easily through the many social media platforms, resulting in a widespread real-world impact. Modeling and characterizing how false information proliferates on social platforms and why it succeeds in deceiving readers are critical to develop efficient algorithms and tools for their early detection. A recent surge of researching in this area has aimed to address the key issues using methods based on machine learning, deep learning, feature engineering, graph mining, image and video analysis, together with newly created data sets and web services to identify deceiving content. Majority of the research has been targeting fake reviews, biased messages, and against-facts information (false news and hoaxes). In this work, we present a survey on the state of the art concerning types of fake news and the solutions that are being proposed. We focus our survey on content analysis, network propagation, fact-checking and fake news analysis and emerging detection systems. We also discuss the rationale behind successfully deceiving readers. Finally, we highlight important challenges that these solutions bring.
Current State of the Art to Detect Fake News in Social Media: Global
Trendings and Next Challenges
Alvaro Figueira1, Nuno Guimaraes1and Luis Torgo2
1CRACS / INESCTEC and University of Porto, Rua do Campo Alegre, Porto, Portugal
2Faculty of Computer Science, Dalhousie University, Halifax, Canada
Keywords: Fake News, Detection Systems, Survey, Next Challenges.
Abstract: Nowadays, false news can be created and disseminated easily through the many social media platforms, re-
sulting in a widespread real-world impact. Modeling and characterizing how false information proliferates
on social platforms and why it succeeds in deceiving readers are critical to develop efficient algorithms and
tools for their early detection. A recent surge of researching in this area has aimed to address the key issues
using methods based on machine learning, deep learning, feature engineering, graph mining, image and video
analysis, together with newly created data sets and web services to identify deceiving content. Majority of
the research has been targeting fake reviews, biased messages, and against-facts information (false news and
hoaxes). In this work, we present a survey on the state of the art concerning types of fake news and the solu-
tions that are being proposed. We focus our survey on content analysis, network propagation, fact-checking
and fake news analysis and emerging detection systems. We also discuss the rationale behind successfully
deceiving readers. Finally, we highlight important challenges that these solutions bring.
1 INTRODUCTION
The large increase of social media users in the past
few years has led to an overwhelming quantity of
information available in daily (or even hourly) ba-
sis. In addition, the easy accessibility to these plat-
forms whether it’s by a computer, tablet or mobile, al-
lows the consumption of information at a distance of
a click. Therefore, traditional and independent news
media urge to adopt social media to reach a broader
audience and gain new clients/consumers.
The ease of creating and disseminating content in
social networks like Twitter and Facebook has contri-
buted to the emergence of malicious users. In particu-
lar, users that infect the network with the propagation
of misinformation or rumours. This actions combi-
ned with the fact that 67% of adults consume some
type of news in social media (20% on a frequent ba-
sis) (Gottfried and Shearer, 2017) have already caused
real-world consequences (Snopes, 2016).
However, unreliable content or, how it is now re-
ferred ”fake news” –, is not a recent problem. Alt-
hough the term gained popularity in the 2016 US pre-
sidential election, throughout the years newspapers
and televisions have shared false content resulting in
severe consequences for the real world. For example,
in 1924 a forged document known as The Zinoviev
Letter was published on a well known British news-
paper four days before the general elections. The goal
was to destabilize the elections in favour of the con-
servative party with a directive from Moscow to Bri-
tish communists referring an Anglo-Soviet treaty and
inspiring ”agitation-propaganda” in the armed forces
(Norton-Taylor, 1999). Another example happened
after the ”Hillsborough accident”, where 96 people
died crushed due to overcrowding and lack of secu-
rity. Reports from an illustrious newspaper claimed
that, as people were dying, some fellow drunk sup-
porters stole from them and beat police officers that
were trying to help. Later, such claims were proven
false (Conn, 2016).
The verified impact of fake news in society
throughout the years and the influence that social net-
works currently have today forced high reputation
companies, such as Google and Facebook, to start
working on a method to mitigate the problem (Hern,
2017; Hern, 2018). The scientific community has also
been increasing the activity on the topic. In fact, if we
search in Google Scholar 1for ”fake news”, we will
find a significantly high number of results that have
1https://scholar.google.com
332
Figueira, A., Guimaraes, N. and Torgo, L.
Current State of the Art to Detect Fake News in Social Media: Global Trendings and Next Challenges.
DOI: 10.5220/0007188503320339
In Proceedings of the 14th International Conference on Web Information Systems and Technologies (WEBIST 2018), pages 332-339
ISBN: 978-989-758-324-7
Copyright ©2018 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
an increase of 7K publications, when compared with
the number obtained in the previous year.
Nevertheless, the problem of fake news is still a
reality since the solution is anything but trivial. More-
over, research on the detection of such content, in the
context of social networks, is still recent. Therefore,
in this work we attempt to summarize the different
and most promising branches of the problem as well
as the preliminary proposed solutions in the current
literature.
In addition, we present a perspective on the next
steps in the research with a focus on the need to evalu-
ate the current systems/proposals in a real-world en-
vironment.
2 LITERATURE REVIEW
There are several approaches to tackle the problem
of unreliable content on social media. Some authors
opt by analyzing the patterns of propagation, others
by creating supervised systems to classify unreliable
content, and others by focusing on the characteristics
of the accounts that share this type of content. In addi-
tion, some works also focus on developing techniques
for fact-checking claims or focus on specific case stu-
dies.
2.1 Account Analysis
Regarding the analysis of social media accounts, the
current state of the art has being focusing on trying to
identify bot or spammer accounts.
Castillo et al. (Castillo et al., 2011) target the de-
tection of credibility in Twitter events. The authors
created a dataset of tweets regarding specific trending
topics. Then, using a crowd sourcing approach, they
annotated the dataset regarding the credibility of each
tweet. Finally, they used four different sets of fea-
tures (Message, User, Topic, and Propagation) on a
Decision Tree Model that achieved an accuracy of
86% from a balanced dataset. A more recent work
(Erahin et al., 2017) used an Entropy Minimization-
Discretization technique that combines numerical fe-
atures with assessing fake accounts on Twitter.
Benevuto et al.(Benevenuto et al., 2010) develo-
ped a model to detect spammers by building a manual
annotated dataset of 1K rcors of spam and non-spam
accounts. Then, they extracted attributes regarding
content and user behaviour. The system was capable
of detecting correctly 70% of the spam accounts and
96% of non-spam.
A similar problem is the detection of bot accounts.
Chu et al. (Chu et al., 2012) distinguished accounts
into three different groups: humans, bots and cyborgs.
Using a human-labelled dataset of 6K users, they built
a system with four distinct areas of analysis (entropy
measures, spam detection, account properties, and de-
cision making). The performance of the system was
evaluated using accuracy, which reached 96% in the
”Human” class. Another similar work (Dickerson
et al., 2014), introduced a methodology to differen-
tiate accounts into two classes: humans and bots.
Gilani et al. (Gilani et al., 2017) presented a so-
lution to a similar goal: to distinguish automated ac-
counts from human ones. However, they introduced
the notion that ”automated” is not necessarily bad.
Using a dataset containing a large quantity of user ac-
counts, the authors divided and categorized each entry
into 4 different groups regarding the popularity (fol-
lowers) of the account. The evaluation was conducted
using the F1-measure. The results obtained fall bet-
ween 77% and 91% .
2.2 Content Analysis
A work by Antoniadis et al. (Antoniadis et al., 2015)
tried to identify misinformation on Twitter. The aut-
hors annotated a large dataset of tweets and develo-
ped a model using the features from the Twitter text,
the users, and the social feedback it got (number of
retweets, number of favourites, number of replies).
Finally, they assessed the capability of the model in
detecting misinformation in real time, i.e. in a priori
way (when the social feedback is not yet available).
Evaluations on real-time only decay in 3% when com-
pared with the model that uses all available features.
An approach also using social feedback was presen-
ted by Tacchini et al. (Tacchini et al., 2017). The
authors claim that by analyzing the users who liked a
small set of posts containing false and true informa-
tion, they can obtain a model with an accuracy near
80%.
Perez-Rosas (P´
erez-Rosas et al., 2017) created a
crowd-sourced fake news dataset in addition to fake
news available online. The dataset was built based
on real news. In other words, crowd-source workers
were provided with a real news story and were asked
to write a similar one, but false. Furthermore, they
were asked to simulate journalistic writing. The best
model obtained a 78% accuracy in the crowd-sourced
dataset and only less 5% in a dataset obtained by fake
news on the web.
Another example is the work of Potthast (Potthast
et al., 2017) which analyses the writing style of hyper-
partisan (extremely biased) news. The authors adopt a
meta-learning approach (”unmasking”) from the aut-
horship verification problem. The results obtained
Current State of the Art to Detect Fake News in Social Media: Global Trendings and Next Challenges
333
show models capable of reaching 78% in F1-measure
in the task of classifying hyper-partisan and main-
stream news, and 81% in distinguishing satire from
the hyper-partisan and mainstream news. However,
we must note that using only style-based features does
not seem to be enough to distinguish fake news since
the authors best result was 46%.
2.3 Network Propagation
In Shao (Shao et al., 2016) the authors expose a met-
hod describing the extraction of posts that contai-
ned links to fake news and fact-checking web pages.
Then, they analyzed the popularity and patterns of the
activity of the users that published these type of posts.
The authors concluded that the users that propagate
fake news are much more active on social media than
the users that refute the claims (by spreading fact-
checking links). The authors’ findings also suggest
that there is a small set of accounts that generate large
quantities of fake news in posts.
Another work by Tambuscio et al. (Tambuscio
et al., 2015) describes the relations between fake news
believers and fact-checkers. The study modifies and
resorts to a model commonly used in the analysis of
disease spreading, where the misinformation is ana-
lyzed as a virus. Nodes on the network can be belie-
vers of fake news, fact-checkers or susceptible (neu-
tral) users. Susceptible nodes can be infected by fake
news believers although they can ”recover” when con-
fronted with fact-checking nodes. By testing their ap-
proach in 3 different networks, the authors concluded
that fact-checking can actually cancel a hoax even for
users that believe, with a high probability, in the mes-
sage.
A similar approach is proposed in (Litou et al.,
2016) where a Dynamic Linear Model is developed to
timely limit the propagation of misinformation. The
model differs from other works since it relies on the
ability for the user’s susceptibility to change over time
and how it affects its dissemination of information.
The model categorizes users in 3 groups: infected,
protected and inactive, and validates the effectiveness
of the approach on a real-world dataset.
2.4 Fact-Checking
Another way to tackle the problem of false informa-
tion is through fact-checking. Due to the enormous
quantity of information spread through social net-
works, the necessity to automatize this task has be-
come crucial. Automated fact-checking aims to ve-
rify claims automatically through consultations and
extraction of data from different sources. Then, ba-
sed on the strength and stance of reputable sources
regarding the claim, a classification is assigned (Co-
hen et al., 2011). This methodology, despite being in
development is very promising.
2.4.1 Stance Detection
In earlier research, stance detection has been defined
as the task of a given fragment of text agrees, dis-
agrees or is unrelated to a specific target topic. Ho-
wever, in the context of fake news detection, stance
detection has been adopted as a primary step to de-
tect the veracity of a news piece. Simply putting it,
to determine the veracity of a news article, one can
look to what well-reputed news organizations are wri-
ting about that topic. Therefore, stance detection can
be applied to understand if a news written from an
unknown reputation source is agreeing or disagreeing
with the majority of the media outlets. A conceptually
similar task to stance detection is textual entailment
(Pfohl et al., 2016; Sholar et al., 2017)
The fake news challenge 2promotes the identifica-
tion of fake news through the used of stance detection.
More specifically, given a headline and a body of text
(not necessarily from different articles), the task con-
sists in identifying if the body of text agrees, dis-
agrees, discusses or its unrelated with the headline.
Several approaches were presented using the dataset
provided. The authors in (Mrowca and Wang, 2017)
present several approaches using a conditioned bidi-
rectional LSTM (Long Short Term Memory) and the
baseline model (GradientBoosted Classifier provided
by the authors of the challenge) with an additional va-
riation of features. As for the features, Bag of Words
and GloVe vectors were used. In addition, global fea-
tures like binary co-occurrence in words from the he-
adline and the text, polarity words and word grams
were used. The best result achieved was using bidi-
rectional LSTM with the inclusion of the global featu-
res mentioned. The improvement regarding the base-
line was 9.7%. Other works with similar approaches
were proposed (Pfohl et al., 2016; Sholar et al., 2017)
however, results do not vary significantly.
Stance detection is an important step towards the
problem of fake news detection. The fake news chal-
lenge seems to be a good starting point to test possible
approaches to the problem. Furthermore, the addition
of source reputation regarding topics (p.e. politics)
can provide useful insight to detect the veracity of a
news.
2http://www.fakenewschallenge.org/
WEBIST 2018 - 14th International Conference on Web Information Systems and Technologies
334
2.4.2 Fact-checking as a Network Problem
The authors in (Ciampaglia et al., 2015) tackle fact-
checking as a network problem. By using the Wiki-
pedia infoboxes to extract facts in a structured way,
the authors proposed an automatic fact-checking sy-
stem which relies on the path length and specificity of
the terms of the claim in the Wikipedia Knowledge
Graph. The evaluation is conducted in statements
(both true and false) from the entertainment history
and geography domains (for example xwas marry to
y”, ddirected f and cis the capital of r”) and an in-
dependent corpus with novel statements annotated by
human raters. The results of the first evaluation sho-
wed that true statements have higher truth values than
false. In the second evaluation, the values from hu-
man annotators and the ones predicted by the system
are correlated.
Another work by the same authors (Shiralkar
et al., 2017) use an unsupervised approach to the pro-
blem. The Knowledge Stream methodology adapts
the Knowledge Network to a flow network since mul-
tiple paths may provide more context than a single
path and reusing edges and limiting the paths where
they can participate may limit the path search space.
This technique, when evaluated in multiple datasets,
achieves results similar to the state of the art. Howe-
ver, in various cases, it provides additional evidence
to support the fact-checking of claims.
2.5 Fake News Analysis
Another major area of study is the analysis of large
quantities of fake news spread through social net-
works. Vosoughi et al. (Vosoughi et al., 2018) presen-
ted a study of the differences between propagation of
true and false news. The work focused on the retweet
propagation of false, true, and mixed news stories for
a period of 11 years. The findings were several. First,
false news stories peaks were at the end of 2013, 2015
and 2016. Then, through the analysis of retweets of
false news stories, the authors concluded that false-
hood reaches a significantly larger audience than the
truthful. In addition, tweets containing false news
stories are spread by users with fewer followers and
friends, and that are less active than users who spread
true news stories. Another work (Vargo et al., 2017)
studied the agenda-setting relationships between on-
line news media, fake news, and fact checkers. In
other words, if each type of content is influenced by
the agenda of others. The authors found out that cer-
tain issues were transferred to news media due to fake
news (more frequently in fake stories about internati-
onal relations). Furthermore, fake news also predicted
the issue agenda of partisan media (more in the libe-
ral side than the conservative). Other relevant findings
are the reactive approach of fake news media to tradi-
tional and emerging media and the autonomy of fact-
checking websites regarding online media agendas.
2.6 Case Studies
Some works focus on analyzing the dissemination of
false information regarding a particular event. One of
those related to the Boston Marathon in 2013, where
two homemade bombs were detonated near the finish
of the race (CNN, 2013). For example, in (Gupta
et al., 2013) the authors performed an analysis on 7.9
million tweets regarding the bombing. The main con-
clusions were that 20% of the tweets were true facts
whether 29% were false information (the remaining
were opinions or comments), it was possible to pre-
dict the virality of fake content based on the attributes
of the users that disseminate it, and accounts created
with the sole purpose of disseminating fake content
often opt by names similar with official accounts or
names that explore the sympathy of people (by using
words like ”pray” or ”victim”). Another work has the
analysis focused on the the main rumours spread on
Twitter after the bombings occurred (Starbird et al.,
2014).
A different event tackled was the US Presidential
Election in 2016. For example, the authors in (Al-
lcot and Gentzkow, 2017) combined online surveys
with information extracted from fact-checking websi-
tes to perceive the impact of fake news in social media
and how it influenced the elections. Findings suggest
that articles containing fake news pro-Trump were
shared three times more than articles pro-Clinton and
the average American adult has seen at least one fake
news stories on the month around the election. Anot-
her work (Bovet and Makse, 2018) studied the in-
fluence of fake news and well know news outlets on
Twitter during the election. The authors collected ap-
proximately 171 million tweets in the 5 months prior
to the elections and showed that bots diffusing unre-
liable news are more active than the ones spreading
other types of news (similar to what was found in
(Allcot and Gentzkow, 2017)). In addition, the net-
work diffusing fake and extreme bias news is den-
ser than the network diffusing center and left-leaning
news. Other works regarding this event are presented
in (Kollanyi et al., 2016; Shao et al., 2017).
Other works address similar events such as Hur-
ricane Sandy (Antoniadis et al., 2015), the Fukus-
hima Disaster (Thomson et al., 2012) and the Mumbai
Blasts in 2011 (Gupta, 2011).
Current State of the Art to Detect Fake News in Social Media: Global Trendings and Next Challenges
335
2.7 Fake News Detection Systems
The majority of the implementations to detect fake
news comes in the form of a browser add-on. For ex-
ample, the bs-detector (The Self Agency, 2016) flags
content in social media in different categories such
as clickbait, bias, conspiracy theory and junk science.
To make this evaluation, the add-on uses OpenSour-
ces 3which is a curated list of dubious websites. A
more advanced example is the Fake News Detector
(Chaves, 2018). This add-on uses machine learning
techniques in a ground truth dataset combined with
the ”wisdom of the crowd” to be constantly learning
and improving the detection of fake news. An inte-
resting system that also took the shape of an add-on
was the one developed by four colleges students du-
ring a hackathon at Princeton University (Anant Goel,
2017). Their methodology combined two different
approaches: the first makes a real-time analysis of
the content in user’s feed. The other notifies the user
when they are posting or sharing doubtful content.
The system is capable of analyzing keywords, recog-
nizes images and verified sources to accurately detect
fake content online. With confidence we can say that
new systems are being created with a frequency of
more than a dozen a year. Most of then uses the add-
on approach, but many are not yet system to be usable
by the normal people as they are yet proof of concept
prototypes.
3 DISCUSSION
Fake news is nothing new. It has been shown that even
before the term has become trending, the concept has
been active in different occasions. We might say that
fake news is only a threatening problem these days be-
cause of the mass distribution an dissemination capa-
bilities that current digital social networks have. Due
to these problems, and particularly to the problems
that consequently emerge from it for the society, the
scientific community started tackling the problem, ta-
king an approach of addressing first its different sub-
problems.
The detection of bot/spam accounts, the machine
learning and deep learning approaches to the de-
tection of fake content or even the network analysis
to understand how this type of content can be iden-
tified is diffuse and generally yet quite difficult to be
understood by a general public.
Regarding the bot/spam detection, we do believe
that even they play an important role on the diffusion
3http://www.opensources.co/
of fake news and misinformation, they do not repre-
sent all the accounts that spread this type of content.
In some cases, the spreaders of misinformation are
cyborg accounts (humanly operated but that also in-
clude some automatic procedures), as the authors in
(Chu et al., 2012) refer. Another case which has been
less discussed in the current literature are the human
operated accounts that spread misinformation. Com-
mon examples are users who are highly influenced by
extreme biased news and that spread that information
intentionally to their followers. One could argue that
this is the effect of the propagation of unreliable con-
tent through the network. However, the probability of
having misinformation in our feed through the propa-
gation of close nodes in our network is higher than
from the original node that spread the content. The-
refore, the evaluation of this accounts can be of major
importance when implementing a solid system for an
everyday use.
Another important aspect in adapting the current
theory, to a system that informs users about which
news are credible and which are misinformation, is
the effect that such system may have on more skeptic
or biased users. In fact, the ”hostile media pheno-
menon” can affect the use of these systems if these
are not equipped with the capability of justifying the
credibility of the content. Hostile media phenome-
non states that users who already have an opinion on
a given subject can interpret the same content (regar-
ding that subject) in different ways. The concept was
first studied in (Abdulla et al., 2002) with news re-
garding the Beirut massacre. Consequently, just like
news media, such systems can be criticized by clas-
sifying a piece of news as fake by users who are in
favor of the content for analysis. This leads us to the
problem of the current detection approaches in the li-
terature. For example, deep learning approaches and
some machine learning algorithms are black-box sys-
tems that, given an input (in this case, a social media
post), they output a score or a label (in this case a cre-
dibility score or a fake/true label). Therefore, explain
to a common user why the algorithm predicted such
label/score can be a hard task. Furthermore, without
some type of justification, such systems can be dis-
credited. To tackle this problem in a real-world envi-
ronment, the major focus after developing an accurate
system must be to be capable to explain how it got to
the result.
3.1 Future Guidelines to Tackle
Unreliable Content
We do believe that the analysis and effect of detection
systems on the perception and beliefs of users towards
WEBIST 2018 - 14th International Conference on Web Information Systems and Technologies
336
fake news and all sorts of misinformation should be
the next important step to be studied by the scientific
community. Accordingly, we suggest some guideli-
nes to approach the problem.
Recently, we have observed an increasing distrust
on the press by the common citizen. Several reasons
can be pointed such as the president of the United Sta-
tes calling fake news to mainstream news media such
as CNN or NBC 4or even the mainstream media it-
self retracting a large number of stories and fail to
highlight the importance of certain entities or events
such as Donald Trump, Jeremy Corbin or Brexit.
Therefore, a misinformation detection system that
only scores the credibility of a social media post, jus-
tifying it by the absence/presence of similar informa-
tion on traditional news media outlets may not be
enough to convince the majority of users that con-
sume this type of information, and to change their
beliefs and acknowledge the veracity or falsehood of
the content. In addition, if it is used, the selection
of mainstream news media sources to perform such
comparison, then is must be balanced (at least with
some normalization of its intensity/frequency) regar-
ding some possible bias. Moreover, it is necessary to
add features to the system to allow more information
to be provided alongside the credibility score. Such
examples could include the reputation of the original
source (that created the post), and analyze and present
the social feedback of the users that voice it, again in
a weighted manner.
The credibility of the original source can be ana-
lyzed and explained through the history of previous
published posts that include misinformation and were
debunked by fact-checking entities such as Snopes 5
or PolitiFact 6. In addition, information should be
provided by the system on the possibility of an origi-
nal source be a bot or cyborg. This can be done by the
analysis of the posting frequency and near accounts
(followers and friends). Such analysis can have a
more impact on changing user beliefs than a simple
score or label.
Regarding the social feedback, one can look at the
propagation of the content through shares/retweets,
comments/replies and favorites/likes. However, an
important factor must come into play which is the
”echo chamber” effect. This refers to the problem
of users with the same interests are aggregated toget-
her in social circles, and the opposing ideas are re-
jected and disapproved by the majority. In Facebook,
for example, when looking at comments/replies to a
4https://twitter.com/realDonaldTrump/status/
1006891643985854464
5https://www.snopes.com/
6http://www.politifact.com/
post inside one of this echo chambers, the majority of
comments are in agreement with the post. This factor
combined with the number of likes, shares, etc, may
lead the user to a false impression that the informa-
tion is true. Moreover, Facebook by default increases
the ranking of the comments based on the number of
replies and likes that the comments got as well as if
the comments come from the user’s friends. This sce-
nario is extremely propitious to influence the opinion
of a user, therefore contributing to the formation and
expansion of these ”echo chambers”.
Therefore, in a real-world fake news detection sy-
stem, information put on social feedback should be
ranked and presented to the user based on the ”bias”
of the social media accounts that are engaging with
the post being analyzed. In addition, social feedback
from users who are more neutral or equally active on
both sides of the political spectrum should be prioriti-
zed since they could present a more careful and unbi-
ased view of the subject or even act as fact-checkers
of more suspicious claims. It is our hypothesis that
the way social feedback is presented may influence
users’ beliefs regarding the credibility score (even not
affecting the score directly).
For the sake of clarity, let us consider the dom-
ain of politics where there are two main groups: con-
servatives (c) and liberals (l). Let us also consider a
social media post which is false and favours the li-
beral side. A user that likes, spreads, replies, and is
connected to liberal content accounts should have a
lower rank on posts that favour his political views. In
the same way, an opposing user which displays the
same behaviour regarding conservative content must
also have a smaller rank with respect to the content
that opposites their political views. However, users
who are capable of engaging with content from both
political views in an equal and neutral position should
have a higher rank. A simple function fthat can be
used to score the ranking of an user umay be given
by:
fu=
1
1 |pu,lpu,c|
where prefers to an intermediate score of user u
engagement in liberal (l) or conservative content (c).
This probability can be computed using the type of
posts that users propagate, like or reply/comment. In
addition (and expanding what is done in (Tacchini
et al., 2017)), the compliance between the post and
the user comments/replies can also determine the ten-
dency of a user. This can be analyzed recurring, for
example, to sentiment analysis tools.
Analyzing the source account of the content and
ranking the presentation of social feedback (prioriti-
Current State of the Art to Detect Fake News in Social Media: Global Trendings and Next Challenges
337
zing neutral but active users) might increase skeptic
to trust in the fake news detection system.
4 CONCLUSION
Fake news is nowadays of major concern. With more
and more users consuming news from their social
networks, such as Facebook and Twitter, and with
an ever-increasing frequency of content available, the
ability to question the content instead of instinctively
sharing or liking it is becoming rare. The search for
instant gratification by posting/sharing content that
will allow social feedback from peers has reached a
status where the veracity of information is left to the
background.
Industry and scientific communities are trying to
fix the problem, either by taking measures directly
into the platforms that are spreading this type of con-
tent (Facebook, Twitter, Google, for example), deve-
loping analysis and systems capable of detecting fake
content using machine and deep learning approaches,
or even by developing software to leverage social net-
work users in distinguishing what is fake from what
is real.
However, observing the various approaches taken
so far, mainly by the scientific community but also
some rumours about actions taken by Facebook and
Google, we might say that mitigating or removing
fake news comes with a cost (Figueira and Oliveira,
2017): there is the danger to having someone establis-
hing the limits of reality, if the not the reality itself.
The trend to design and develop systems that are
based on open source resources, frameworks or APIs
which facilitate entity recognition, sentiment analy-
sis, emotion recognition, bias recognition, relevance
identification (to name just a few), and which may be
freely available, or available at a small price, gives an
escalating power to those service-providers. That po-
wer consists on their internal independent control to
choose their machine learning algorithms, their pre-
trained data and, ultimately, in a control over the in-
telligence that is built on the service provided by their
systems.
Therefore, the saying ”the key to one problem
usually leads to another problem” is again true. Ho-
wever, we have not many choices at the moment. It is
a too much important endeavor to create systems that
hamper of stop the proliferation of fake news and give
back to the people not only real information, but also
a sentiment of trust in what they are reading. Mean-
while, we need to be prepared to the next challenge,
which will be for the definition of what is real, what
is important, or even, what is real.
ACKNOWLEDGEMENTS
Project ”TEC4Growth - Pervasive Intelligence, En-
hancers and Proofs of Concept with Industrial
Impact/NORTE-01-0145-FEDER-000020” is finan-
ced by the North Portugal Regional Operational Pro-
gramme (NORTE 2020), under the PORTUGAL
2020 Partnership Agreement, and through the Euro-
pean Regional Development Fund (ERDF).
REFERENCES
Abdulla, R. A., Garrison, B., Salwen, M., Driscoll, P., Ca-
sey, D., Gables, C., and Division, S. (2002). The
credibility of newspapers, television news, and online
news.
Allcot, H. and Gentzkow, M. (2017). Social Media and Fake
News in the 2016 Election.
Anant Goel, Nabanita De, Q. C. M. C. (2017). Fib - lets stop
living a lie. https://devpost.com/software/fib. Acces-
sed: 2018-06-18.
Antoniadis, S., Litou, I., and Kalogeraki, V. (2015). A Mo-
del for Identifying Misinformation in Online Social
Networks. 9415:473–482.
Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V.
(2010). Detecting spammers on twitter. Collabora-
tion, electronic messaging, anti-abuse and spam con-
ference (CEAS), 6:12.
Bovet, A. and Makse, H. A. (2018). Influence of fake news
in Twitter during the 2016 US presidential election.
pages 1–23.
Castillo, C., Mendoza, M., and Poblete, B. (2011). Infor-
mation Credibility on Twitter.
Chaves, R. (2018). Fake news detector. https://
fakenewsdetector.org/en. Accessed: 2018-06-18.
Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2012).
Detecting automation of Twitter accounts: Are you a
human, bot, or cyborg? IEEE Transactions on Depen-
dable and Secure Computing, 9(6):811–824.
Ciampaglia, G. L., Shiralkar, P., Rocha, L. M., Bollen, J.,
Menczer, F., and Flammini, A. (2015). Computational
fact checking from knowledge networks. PLoS ONE,
10(6):1–13.
CNN (2013). What we know about the boston bombing
and its aftermath. https://edition.cnn.com/2013/04/18/
us/boston-marathon-things-we-know. Acessed: 2018-
06-12.
Cohen, S., Li, C., Yang, J., and Yu, C. (2011). Computatio-
nal Journalism: a call to arms to database researchers.
Proceedings of the 5th Biennial Conference on Inno-
vative Data Systems Research (CIDR 2011) Asilomar,
California, USA., (January):148–151.
Conn, D. (2016). How the sun’s truth’ about hills-
borough unravelled. https://www.theguardian.com/
football/2016/apr/26/how-the-suns-truth-about-
hillsborough-unravelled. Acessed: 2018-06-07.
WEBIST 2018 - 14th International Conference on Web Information Systems and Technologies
338
Dickerson, J. P., Kagan, V., and Subrahmanian, V. S.(2014).
Using sentiment to detect bots on Twitter: Are humans
more opinionated than bots? ASONAM 2014 - Pro-
ceedings of the 2014 IEEE/ACM International Confe-
rence on Advances in Social Networks Analysis and
Mining, (Asonam):620–627.
Erahin, B., Akta, ¨
O., Kilmc¸, D., and Akyol, C. (2017). Twit-
ter fake account detection. 2nd International Confe-
rence on Computer Science and Engineering, UBMK
2017, pages 388–392.
Figueira, ´
A. and Oliveira, L. (2017). The current state of
fake news: challenges and opportunities. Procedia
Computer Science, 121(December):817–825.
Gilani, Z., Kochmar, E., and Crowcroft, J. (2017).
Classification of Twitter Accounts into Automated
Agents and Human Users. Proceedings of the 2017
IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining 2017 - ASO-
NAM ’17, pages 489–496.
Gottfried, B. Y. J. and Shearer, E. (2017). News Use Across
Social Media Platforms 2017. Pew Research Center,
Sept 2017(News Use Across Social Media Platforms
2017):17.
Gupta, A. (2011). Twitter Explodes with Activity in Mum-
bai Blasts! A Lifeline or an Unmonitored Daemon
in the Lurking? Precog.Iiitd.Edu.in, (September
2017):1–17.
Gupta, A., Lamba, H., and Kumaraguru, P. (2013). $1.00
per RT #BostonMarathon #PrayForBoston: Analy-
zing fake content on twitter. eCrime Researchers Sum-
mit, eCrime.
Hern, A. (2017). Google acts against fake news on
search engine. https://www.theguardian.com/
technology/2017/apr/25/google-launches-major-
offensive-against-fake-news. Accessed: 2018-04-13.
Hern, A. (2018). New facebook controls aim to re-
gulate political ads and fight fake news. https://
www.theguardian.com/technology/2018/apr/06/
facebook-launches-controls-regulate-ads-publishers.
Accessed: 2018-04-13.
Kollanyi, B., Howard, P. N., and Woolley, S. C. (2016).
Bots and Automation over Twitter during the First
U.S. Election. Data Memo, (4):1–5.
Litou, I., Kalogeraki, V., Katakis, I., and Gunopulos, D.
(2016). Real-time and cost-effective limitation of mis-
information propagation. Proceedings - IEEE Inter-
national Conference on Mobile Data Management,
2016-July:158–163.
Mrowca, D. and Wang, E. (2017). Stance Detection for
Fake News Identification. pages 1–12.
Norton-Taylor, R. (1999). Zinoviev letter was dirty trick
by mi6. https://www.theguardian.com/politics/1999/
feb/04/uk.politicalnews6. Acessed: 2018-06-07.
P´
erez-Rosas, V., Kleinberg, B., Lefevre, A., and Mihalcea,
R. (2017). Automatic Detection of Fake News.
Pfohl, S., Triebe, O., and Legros, F. (2016). Stance De-
tection for the Fake News Challenge with Attention
and Conditional Encoding. pages 1–14.
Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., and
Stein, B. (2017). A Stylometric Inquiry into Hyper-
partisan and Fake News.
Shao, C., Ciampaglia, G. L., Flammini, A., and Menczer,
F. (2016). Hoaxy: A Platform for Tracking Online
Misinformation. pages 745–750.
Shao, C., Ciampaglia, G. L., Varol, O., Yang, K., Flam-
mini, A., and Menczer, F. (2017). The spread of low-
credibility content by social bots.
Shiralkar, P., Flammini, A., Menczer, F., and Ciampaglia,
G. L. (2017). Finding Streams in Knowledge Graphs
to Support Fact Checking.
Sholar, J. M., Chopra, S., and Jain, S. (2017). Towards
Automatic Identification of Fake News : Headline-
Article Stance Detection with LSTM Attention Mo-
dels. 1:1–15.
Snopes (2016). Fact-check: Comet ping pong pizzeria
home to child abuse ring led by hillary clinton. https://
www.snopes.com/fact-check/pizzagate-conspiracy/.
Accessed: 2018-04-13.
Starbird, K., Maddock, J., Orand, M., Achterman, P., and
Mason, R. M. (2014). Rumors, False Flags, and Di-
gital Vigilantes: Misinformation on Twitter after the
2013 Boston Marathon Bombing. iConference 2014
Proceedings.
Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S.,
and de Alfaro, L. (2017). Some Like it Hoax: Auto-
mated Fake News Detection in Social Networks. pa-
ges 1–12.
Tambuscio, M., Ruffo, G., Flammini, A., and Menczer, F.
(2015). Fact-checking Effect on Viral Hoaxes: A Mo-
del of Misinformation Spread in Social Networks. pa-
ges 977–982.
The Self Agency, L. (2016). B.s. detector - a browser exten-
sion that alerts users to unreliable news sources. http://
bsdetector.tech/. Accessed: 2018-06-18.
Thomson, R., Ito, N., Suda, H., Lin, F., Liu, Y., Hayasaka,
R., Isochi, R., and Wang, Z. (2012). Trusting Tweets :
The Fukushima Disaster and Information Source Cre-
dibility on Twitter. Iscram, (April):1–10.
Vargo, C. J., Guo, L., and Amazeen, M. A. (2017). The
agenda-setting power of fake news: A big data analy-
sis of the online media landscape from 2014 to 2016.
New Media & Society, page 146144481771208.
Vosoughi, S., Roy, D., and Aral, S. (2018). The spread of
true and false news online. Science, 359(6380):1146–
1151.
Current State of the Art to Detect Fake News in Social Media: Global Trendings and Next Challenges
339
... Choraś et al. (2021) [35] and Varlamis et al. (2022) [41], were concerned with the directions of application of intelligent systems in the detection of misinformation sources or use Graph Convolutional Networks (GCNs) for the task of detecting fake news, fake accounts, and rumors spreading in OSNs. Figueira et al. (2018) [42] and Kumar and Shah (2018) [43] focus on content analysis, network propagation, fact-checking, fake news analysis and emerging detection systems in their surveys and discuss the reasons behind successful deception. They present various aspects of fake information, namely the actors involved in spreading fake information, the rationale behind successfully deceiving readers, quantifying the impact of fake information, measuring its characteristics across different dimensions, and finally algorithms developed to detect fake information. ...
... Choraś et al. (2021) [35] and Varlamis et al. (2022) [41], were concerned with the directions of application of intelligent systems in the detection of misinformation sources or use Graph Convolutional Networks (GCNs) for the task of detecting fake news, fake accounts, and rumors spreading in OSNs. Figueira et al. (2018) [42] and Kumar and Shah (2018) [43] focus on content analysis, network propagation, fact-checking, fake news analysis and emerging detection systems in their surveys and discuss the reasons behind successful deception. They present various aspects of fake information, namely the actors involved in spreading fake information, the rationale behind successfully deceiving readers, quantifying the impact of fake information, measuring its characteristics across different dimensions, and finally algorithms developed to detect fake information. ...
Article
Full-text available
In recent years, the world has witnessed a global outbreak of fake news, propaganda and disinformation (FNPD) flows on online social networks (OSN). In the context of information warfare and the capabilities of generative AI, FNPDs have proliferated. They have become a powerful and quite effective tool for influencing people’s social identities, attitudes, opinions and even behavior. Ad hoc malicious social media accounts and organized networks of trolls and bots target countries, societies, social groups, political campaigns and individuals. As a result, conspiracy theories, echo chambers, filter bubbles and other processes of fragmentation and marginalization are polarizing, radicalizing, and disintegrating society in terms of coherent politics, governance, and social networks of trust and cooperation. This systematic review aims to explore advances in using machine and deep learning to detect FNPD in OSNs effectively. We present the results of a combined PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) review in three analysis domains: (i) propagators (authors, trolls, and bots), (ii) textual content, (iii) social impact. This systemic research framework integrates meta-analyses of three research domains, providing an overview of the wider research field and revealing important relationships between these research domains. It not only addresses the most promising ML/DL research methodologies and hybrid approaches in each domain, but also provides perspectives and insights on future research directions.
... As a result, specific components have been developed, using some classic datasets, to provide insight into their issue of interest [7]. Some distinctive examples of fake news are the "Zinoviev Letter" [8], the fake news on the 2016 elections in the United States [9][10], and the untrue environmental report on the spread of fires in the Amazon rainforest in 2018 [11]. Google fake news trends [10]. ...
Article
Full-text available
Today, detecting fake news has become challenging as anyone can interact by freely sending or receiving electronic information. Deep learning processes to detect multimodal fake news have achieved great success. However, these methods easily fuse information from different modality sources, such as concatenation and element-wise product, without considering how each modality affects the other, resulting in low accuracy. This study presents a focused survey on the use of deep learning approaches to detect multimodal visual and textual fake news on various social networks from 2019 to 2024. Several relevant factors are discussed, including a) the detection stage, which involves deep learning algorithms, b) methods for analyzing various data types, and c) choosing the best fusion mechanism to combine multiple data sources. This study delves into the existing constraints of previous studies to provide future tips for addressing open challenges and problems.
Article
Full-text available
Indonesia and Malaysia already have regulations prohibiting the spreading of fake news and hoaxes. However, the critical question is whether these regulations can tackle the spread of fake news and hoaxes, considering their detrimental impact on the economy and reputation. This is aimed at comprehensively understanding the legal framework in both countries. This research was designed using the normative juridical method. The approaches used are statutory approach, conceptual approach, and comparative approach. The result show Indonesia's penal policy was recently created by passing a new Criminal Code. It seeks to protect public order, public welfare, and democratic values, emphasizing a balance between freedom of speech and combating the adverse effects of hoax and fake news. Meanwhile, Malaysia employs regulatory measures through the Communication and Multimedia Content Forum, relying on voluntary compliance and cooperation from various stakeholders. Looking ahead, emerging technologies and methodologies in digital forensics offer promise for more effective means of identifying the origins of fake news
Conference Paper
A rising amount of content on social media involves inaccurate and misleading information, a phenomenon known as knowledge disorder. In this study, we look into multimodal false news identification using a large-scale multimodal dataset. We propose a multifunctional network architecture capable of supporting different types and levels of information fusion. In addition to the visual and written contents of a posting, we leverage secondary information of users’ comments and metadata. We mix information at different levels to compensate for the modality’s distinct inherent structure. Our findings demonstrate that multidimensional analysis is highly successful for the job and that, when appropriately fused, all modalities contribute favourably. Automatic machine learning categorization models are useful in counteracting the broad dissemination of fake news. However, lacking a suitable, comprehensive dataset is challenging for fake news study and model building. Previous fake news datasets needed more scope and breadth of our collection, which contains multimodal text and picture data, information, comments data, and fine-grained false news categorization. We presented a text-and-image-based architecture model for identifying fake information.
Chapter
As time flows, the amount of data, especially text data, increases exponentially. Along with the data, our understanding of Machine Learning also increases and the computing power enables us to train very complex and large models faster. Fake news has been gathering a lot of attention worldwide recently. The effects can be political, economic, organizational, or even personal. This paper discusses the different analysis of datasets and classifiers approach which is effective for implementation of Deep learning and machine learning in order to solve the problem. Secondary purpose of this analysis in this paper is a fake news detection model that use n-gram analysis and machine learning techniques. We investigate and compare two different features extraction techniques, and three different machine classification datasets provide a mechanism for researchers to address high impact questions that would otherwise be prohibitively expensive and time-consuming to study.KeywordsDeep learningMachine learningN-gramNLPText-classification
Chapter
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction.KeywordsGeographic information extractionDocument geolocationGeoparsingToponym recognitionToponym resolution
Chapter
The automatic detection of disinformation has gained an increased focus by the research community during the last years. The spread of false information can be an issue for political processes, opinion mining and journalism in general. In this dissertation, I propose a novel approach to gain new insights on the automatic detection of disinformation in textual content. Additionally, I will combine multiple research domains, such as fake news, hate speech, propaganda, and extremism. For this purpose, I will create two novel and annotated datasets in German - a large multi-label dataset for disinformation detection in news articles and a second dataset for hate speech detection in social media posts, which both can be used for training the models in the listed domains via transfer learning. With the usage of transfer learning, an extensive data analysis and classification of the presented domains will be conducted. The classification models will be enhanced during and after training using a knowledge graph, containing additional information (i.e. named entities, relationships, topics), to find explicit insights about the common traits or lines of disinformative arguments in an article. Lastly, methods of explainable artificial intelligence will be combined with visualization techniques to understand the models predictions and present the results in a user-friendly and interactive way.KeywordsDisinformationNLPFake newsTransfer learningXAI
Chapter
Over these past five years, significant breakthroughs, led by Transformers and large language models, have been made in understanding natural language text. However, the ability to capture contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. These unsolved challenges and the interest in the community are at the basis of the sixth edition of Text2Story workshop to be held in Dublin on April 2nd, 2023 in conjunction with the 45th European Conference on Information Retrieval (ECIR’23). In its sixth edition, we aim to bring to the forefront the challenges involved in understanding the structure of narratives and in incorporating their representation in well-established models, as well as in modern architectures (e.g., transformers) which are now common and form the backbone of almost every IR and NLP application. It is hoped that the workshop will provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction and generation task. Text2Story includes sessions devoted to full research papers, work-in-progress, demos and dissemination papers, keynote talks and space for an informal discussion of the methods, of the challenges and of the future of this research area.
Article
Full-text available
The massive spread of digital misinformation has been identified as a major threat to democracies. Communication, cognitive, social, and computer scientists are studying the complex causes for the viral diffusion of misinformation, while online platforms are beginning to deploy countermeasures. Little systematic, data-based evidence has been published to guide these efforts. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during ten months in 2016 and 2017. We find evidence that social bots played a disproportionate role in spreading articles from low-credibility sources. Bots amplify such content in the early spreading moments, before an article goes viral. They also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, resharing content posted by bots. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
Article
Full-text available
The dynamics and influence of fake news on Twitter during the 2016 US presidential election remains to be clarified. Here, we use a dataset of 171 million tweets in the five months preceding the election day to identify 30 million tweets, from 2.2 million users, which contain a link to news outlets. Based on a classification of news outlets curated by www.opensources.co, we find that 25% of these tweets spread either fake or extremely biased news. We characterize the networks of information flow to find the most influential spreaders of fake and traditional news and use causal modeling to uncover how fake news influenced the presidential election. We find that, while top influencers spreading traditional center and left leaning news largely influence the activity of Clinton supporters, this causality is reversed for the fake news: the activity of Trump supporters influences the dynamics of the top fake news spreaders.
Article
Full-text available
The authenticity of Information has become a longstanding issue affecting businesses and society, both for printed and digital media. On social networks, the reach and effects of information spread occur at such a fast pace and so amplified that distorted, inaccurate or false information acquires a tremendous potential to cause real world impacts, within minutes, for millions of users. Recently, several public concerns about this problem and some approaches to mitigate the problem were expressed. In this paper, we discuss the problem by presenting the proposals into categories: content based, source based and diffusion based. We describe two opposite approaches and propose an algorithmic solution that synthesizes the main concerns. We conclude the paper by raising awareness about concerns and opportunities for businesses that are currently on the quest to help automatically detecting fake news by providing web services, but who will most certainly, on the long term, profit from their massive usage.
Conference Paper
Full-text available
Online social networks (OSNs) have seen a remarkable rise in the presence of surreptitious automated accounts. Massive human user-base and business-supportive operating model of social networks (such as Twitter) facilitates the creation of automated agents. In this paper we outline a systematic methodology and train a classifier to categorise Twitter accounts into 'automated' and 'human' users. To improve classification accuracy we employ a set of novel steps. First, we divide the dataset into four popularity bands to compensate for differences in types of accounts. Second, we create a large ground truth dataset using human annotations and extract relevant features from raw tweets. To judge accuracy of the procedure we calculate agreement among human annotators as well as with a bot detection research tool. We then apply a Random Forests classifier that achieves an accuracy close to human agreement. Finally, as a concluding step we perform tests to measure the efficacy of our results.
Article
Full-text available
The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the automatic identification of fake content in online news. Our contribution is twofold. First, we introduce two novel datasets for the task of fake news detection, covering seven different news domains. We describe the collection, annotation, and validation process in detail and present several exploratory analysis on the identification of linguistic differences in fake and legitimate news content. Second, we conduct a set of learning experiments to build accurate fake news detectors. In addition, we provide comparative analyses of the automatic and manual identification of fake news.
Article
Full-text available
This study examines the agenda-setting power of fake news and fact-checkers who fight them through a computational look at the online mediascape from 2014 to 2016. Although our study confirms that content from fake news websites is increasing, these sites do not exert excessive power. Instead, fake news has an intricately entwined relationship with online partisan media, both responding and setting its issue agenda. In 2016, partisan media appeared to be especially susceptible to the agendas of fake news, perhaps due to the election. Emerging news media are also responsive to the agendas of fake news, but to a lesser degree. Fake news coverage itself is diverging and becoming more autonomous topically. While fact-checkers are autonomous in their selection of issues to cover, they were not influential in determining the agenda of news media overall, and their influence appears to be declining, illustrating the difficulties fact-checkers face in disseminating their corrections.
Conference Paper
Full-text available
In recent years, the reliability of information on the Internet has emerged as a crucial issue of modern society. Social network sites (SNSs) have revolutionized the way in which information is spread by allowing users to freely share content. As a consequence, SNSs are also increasingly used as vectors for the diffusion of misinformation and hoaxes. The amount of disseminated information and the rapidity of its diffusion make it practically impossible to assess reliability in a timely manner, highlighting the need for automatic hoax detection systems. As a contribution towards this objective, we show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "liked" them. We present two classification techniques, one based on logistic regression, the other on a novel adaptation of boolean crowdsourcing algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236 users, we obtain classification accuracies exceeding 99% even when the training set contains less than 1% of the posts. We further show that our techniques are robust: they work even when we restrict our attention to the users who like both hoax and non-hoax posts. These results suggest that mapping the diffusion pattern of information can be a useful component of automatic hoax detection systems.
Article
Full-text available
This paper reports on a writing style analysis of hyperpartisan (i.e., extremely one-sided) news in connection to fake news. It presents a large corpus of 1,627 articles that were manually fact-checked by professional journalists from BuzzFeed. The articles originated from 9 well-known political publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the hyperpartisan right-wing. In sum, the corpus contains 299 fake news, 97% of which originated from hyperpartisan publishers. We propose and demonstrate a new way of assessing style similarity between text categories via Unmasking---a meta-learning approach originally devised for authorship verification---, revealing that the style of left-wing and right-wing news have a lot more in common than any of the two have with the mainstream. Furthermore, we show that hyperpartisan news can be discriminated well by its style from the mainstream (F1=0.78), as can be satire from both (F1=0.81). Unsurprisingly, style-based fake news detection does not live up to scratch (F1=0.46). Nevertheless, the former results are important to implement pre-screening for fake news detectors.
Article
Lies spread faster than the truth There is worldwide concern over false news and the possibility that it can influence political, economic, and social well-being. To understand how false news spreads, Vosoughi et al. used a data set of rumor cascades on Twitter from 2006 to 2017. About 126,000 rumors were spread by ∼3 million people. False news reached more people than the truth; the top 1% of false news cascades diffused to between 1000 and 100,000 people, whereas the truth rarely diffused to more than 1000 people. Falsehood also diffused faster than the truth. The degree of novelty and the emotional reactions of recipients may be responsible for the differences observed. Science , this issue p. 1146