Content uploaded by Xinyi Zhou
Author content
All content in this area was uploaded by Xinyi Zhou on Aug 04, 2020
Content may be subject to copyright.
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities
XINYI ZHOU, Syracuse University, USA
REZA ZAFARANI, Syracuse University, USA
The explosive growth in fake news and its erosion to democracy, justice, and public trust has increased the demand for fake news
detection and intervention. This survey reviews and evaluates methods that can detect fake news from four perspectives: (1) the false
knowledge it carries, (2) its writing style, (3) its propagation patterns, and (4) the credibility of its source. The survey also highlights
some potential research tasks based on the review. In particular, we identify and detail related fundamental theories across various
disciplines to encourage interdisciplinary research on fake news. We hope this survey can facilitate collaborative eorts among experts
in computer and information sciences, social sciences, political science, and journalism to research fake news, where such eorts can
lead to fake news detection that is not only ecient but more importantly, explainable.
CCS Concepts:
•Human-centered computing →
Collaborative and social computing theory, concepts and paradigms;Empirical
studies in collaborative and social computing;
•Computing methodologies →
Natural language processing;Machine learning;
•
Security and privacy →Social aspects of security and privacy;•Applied computing →Sociology;Computer forensics;
Additional Key Words and Phrases: Fake news, news verication, disinformation, misinformation, fact-checking, knowledge graph,
deception detection, information credibility, social media
ACM Reference Format:
Xinyi Zhou and Reza Zafarani. 2020. A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. ACM
Comput. Surv. 1, 1, Article 1 (January 2020), 37 pages.
1 INTRODUCTION
Fake news is now viewed as one of the greatest threats to democracy, journalism, and freedom of expression. It has
weakened public trust in governments and its potential impact on the contentious “Brexit” referendum and the equally
divisive 2016 U.S. presidential election – which it might have aected [Pogue 2017] – is yet to be realized [Allcott and
Gentzkow 2017;Zafarani et al
.
2019;Zhou et al
.
2019b]. The reach of fake news was best highlighted during the critical
months of the 2016 U.S. presidential election campaign. During that period, the top twenty frequently-discussed fake
election stories generated 8,711,000 shares, reactions, and comments on Facebook, ironically, more than the 7,367,000
for the top twenty most-discussed election stories posted by 19 major news websites [Silverman 2016]. Research has
shown that compared to the truth, fake news on Twitter is typically retweeted by many more users and spreads far
more rapidly, especially for political news [Vosoughi et al
.
2018]. Our economies are not immune to the spread of fake
news either, with fake news being connected to stock market uctuations and large trades. For example, fake news
Authors’ addresses: Xinyi Zhou, Data Lab, EECS Department, Syracuse University, Syracuse, NY, 13244, USA, zhouxinyi@data.syr.edu; Reza Zafarani,
Data Lab, EECS Department, Syracuse University, Syracuse, NY, 13244, USA, reza@data.syr.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request permissions from permissions@acm.org.
©2020 Association for Computing Machinery.
Manuscript submitted to ACM
Manuscript submitted to ACM 1
arXiv:1812.00315v2 [cs.CL] 17 Jul 2020
2 Xinyi Zhou and Reza Zafarani
claiming that Barack Obama, the 44th President of the United States, was injured in an explosion wiped out $130 billion
in stock value [Rapoza 2017]. These events and losses have motivated fake news research and sparked the discussion
around fake news, as observed by skyrocketing usage of terms such as “post-truth” – selected as the international word
of the year by Oxford Dictionaries in 2016 [Wang 2016].
While fake news is not a new phenomenon [Tandoc Jr et al
.
2018], questions such as why it has emerged as a global
topic of interest and why it is attracting increasingly more public attention are particularly relevant at this time. The
leading cause is that fake news can be created and published online faster and cheaper when compared to traditional
news media such as newspapers and television [Shu et al
.
2017]. The rise of social media and its popularity also plays
an essential role in this surge of interest [Olteanu et al
.
2019;Zafarani et al
.
2014]. As of August 2018, around two thirds
(68%) of Americans get their news from social media.
1
With the existence of an echo chamber eect on social media,
biased information is often amplied and reinforced [Jamieson and Cappella 2008]. As an ideal platform to accelerate
fake news dissemination, social media breaks the physical distance barrier among individuals, provides rich platforms
to share, forward, vote, and review, and encourages users to participate and discuss online news. This surge of activity
around online news can lead to grave repercussions and substantial potential political and economic benets. Such
generous benets encourage malicious entities to create, publish, and spread fake news.
Take the dozens of “well-known” teenagers in the Macedonian town of Veles as an example of users who created
fake news for millions on social media and became wealthy by penny-per-click advertising during the U.S. presidential
election. As reported by the NBC, each individual “has earned at least $60,000 in the past six months – far outstripping
their parents’ income and transforming his prospects in a town where the average annual wage is $4,800.” [Smith and
Banic 2016]. The tendency of individuals to overestimate the benets associated with disseminating fake news rather
than its costs, as the valence eect indicates [Jones and McGillis 1976], further attracts individuals to engage in fake
news activities. Clearly, when governments, parties, and business tycoons are standing behind fake news generation,
seeking its tempting power and prots, there is a greater motivation and capability to make fake news more persuasive
and indistinguishable from truth to the public. But, how can fake news gain public trust?
Social and psychological factors play an important role in fake news gaining public trust and further facilitate the
spread of fake news. For instance, humans have been proven to be irrational and vulnerable when dierentiating between
truth and falsehood while overloaded with deceptive information. Studies in social psychology and communications
have demonstrated that human ability to detect deception is only slightly better than chance: typical accuracy rates
are in the 55%-58% range, with a mean accuracy of 54% over 1,000 participants in over 100 experiments [Rubin 2010].
The situation is more critical for fake news than other types of information. For news, where one expects authenticity
and objectivity, it is relatively easier to gain public trust. In addition, individuals tend to trust fake news after repeated
exposures (validity eect [Boehm 1994]), or if it conrms their preexisting beliefs (conrmation bias [Nickerson 1998]) or
attitudes (selective exposure [Freedman and Sears 1965;Metzger et al
.
2015]), or if it pleases them (desirability bias [Fisher
1993]). Peer pressure can also at times “control” our perception and behavior (e.g., bandwagon eect [Leibenstein 1950]).
The many perspectives on what fake news is, what characteristics and nature fake news or those who disseminate it
share, and how fake news can be detected motivate the need for a comprehensive introduction and in-depth analysis,
which this survey aims to develop. In addition, this survey aims to attract researchers within general areas of data
mining, machine learning, graph mining, Natural Language Processing (NLP), and Information Retrieval (IR). More
importantly, we hope to boost collaborative eorts among experts in computer and information sciences, political
1https://www.journalism.org/2018/09/10/news-use-across- social- media-platforms-2018/
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 3
Table 1. A Comparison between Concepts related to Fake News
Concept Authenticity Intention News?
Deceptive news Non-factual Mislead Yes
False news Non-factual Undened Yes
Satire news Non-unied2Entertain Yes
Disinformation Non-factual Mislead Undened
Misinformation Non-factual Undened Undened
Cherry-picking Commonly factual Mislead Undened
Clickbait Undened Mislead Undened
Rumor Undened Undened Undened
science, journalism, social sciences, psychology, and economics to study fake news, where such eorts can lead to fake
news detection that is not only ecient but more importantly, explainable [Zafarani et al. 2019;Zhou et al. 2019b].
To achieve these goals, we rst discuss the ways to dene fake news (see Section 1.1) and summarize related
fundamental theories across disciplines (e.g., in social sciences and economics) that can help study fake news (see
Section 1.2). Before further specication, we present an overview of this survey in Section 1.3.
1.1 What is Fake News?
There has been no universal denition for fake news, even in journalism. A clear and accurate denition helps lay a
solid foundation for fake news analysis and evaluating related studies. Here we (I) distinguish between several concepts
that frequently co-occur or have overlaps with fake news, (II) present a broad and a narrow denition for the term fake
news, justifying each denition, and (III) further discuss the potential research problems raised by such denitions.
I. Related Concepts. Existing studies often connect fake news to terms and concepts such as deceptive news [Allcott
and Gentzkow 2017;Lazer et al
.
2018;Shu et al
.
2017], false news [Vosoughi et al
.
2018], satire news [Rubin et al
.
2015;
Tandoc Jr et al
.
2018;Wardle 2017], disinformation [Kshetri and Voas 2017;Wardle 2017], misinformation [Kucharski 2016;
Wardle 2017], cherry-picking [Asudeh et al
.
2020], clickbait [Chen et al
.
2015] and rumor [Zubiaga et al
.
2018]. Based
on how these terms and concepts are dened, we can distinguish one from the others based on three characteristics:
(i)
authenticity
(containing
any
non-factual statement or not), (ii)
intention
(aiming to mislead or entertain the public),
and (iii)
whether the information is news
. Table 1summarizes these related concepts based on these characteristics.
For example, disinformation is false information [news or not-news] with a malicious intention to mislead the public.
II. Dening Fake News. Challenges of fake news research start from dening fake news. To date no universal
denition is provided for fake news, where it has been looked upon as “a news article that is intentionally and veriably
false” [Allcott and Gentzkow 2017;Shu et al
.
2017] (deceptive news), “a news article or message published and propagated
through media, carrying false information regardless of the means and motives behind it” which overlaps with false
news, disinformation [Kshetri and Voas 2017], misinformation [Kucharski 2016], satire news [Rubin et al
.
2015], or
even the stories that a person does not like (considered improper) [Golbeck et al
.
2018]. Furthermore, what news is has
become harder to dene as it can range from an account of a recent, interesting, and signicant event, to a dramatic
account of something novel or deviant; in particular, “the digitization of news has challenged traditional denitions of
2
For example, Golbeck et al. regard satire news as “factually incorrect” [Golbeck et al
.
2018] while Tandoc Jr et al. state that “where parodies dier from
satires is their use of non-factual information to inject humor.” [Tandoc Jr et al. 2018]
Manuscript submitted to ACM
4 Xinyi Zhou and Reza Zafarani
news. Online platforms provide space for non-journalists to reach a mass audience.” [Tandoc Jr et al
.
2018]. Under these
circumstances, we rst broadly dene fake news as:
Definition 1 (Broad definition of fake news). Fake news is false news,
where news broadly includes articles, claims, statements, speeches, posts, among other types of information related to
public gures and organizations. It can be created by journalists and non-journalists. Such denition of news raises
some social concerns, e.g., the term “fake news” should be “about more than news” and “about the entire information
ecosystem.” [Wardle 2017] The broad denition aims to impose minimum constraints in accord with the current
resources: it emphasizes information authenticity, purposefully adopts a broad denition for the term news [Vosoughi
et al
.
2018], and weakens the requirement for information intention due to the diculty in obtaining the ground truth
(true intention). This denition supports most existing fake-news-related studies and datasets, as provided by the
existing fact-checking websites (Section 2.1 provides a detailed introduction). Current fake news datasets often provide
ground truth for the authenticity of claims, statements, speeches, or posts related to public gures and organizations,
while limited information is provided on intentions.
We provide a more narrow denition of fake news, which satises the overall requirements for fake news as follows.
Definition 2 (Narrow definition of fake news). Fake news is intentionally false news published by a news outlet.
This narrow denition supports recent advancements in fake news studies [Allcott and Gentzkow 2017;Shu et al
.
2017]. It addresses the public’s perception of fake news, especially following the 2016 U.S. presidential election. Note
that deceptive news is more harmful and less distinguishable than incautiously false news, as the former pretends to be
truth to mislead the public better. The narrow denition emphasizes both news authenticity and intentions; it also
ensures the posted information is news by investigating if its publisher is a news outlet (e.g., CNN and New York
Times). Often news outlets publish news in the form of articles with xed components: a title, author(s), a body text,
image(s) and/or video(s) that include the claims made by, or about, public gures and organizations.
Both denitions require the authenticity of fake news to be false (i.e., being non-factual). As the goal is to provide a
scientic denition for fake news; hence, news falsity should be derived by comparing with
objective
facts and not
with individual viewpoints (preferences). Hence, it is improper to consider fake news as articles that do not agree with
individuals’ or groups’ interests or viewpoints, which is sometimes how the term fake news is used by the general
public or in politics [Golbeck et al
.
2018]. Such falsity can be assigned to the whole or part of the news content, or even
to true news when subsequent events have rendered the original truth outdated (e.g., “Britain has control over fty-six
colonial countries”). In this general framework, a more comprehensive strategy for automatic fake news detection is
needed, as the aforementioned fake news types emphasize various aspects of detection (see Section 6for a discussion).
III. Discussion. We have dierentiated between fake and fake news-related terms based on three properties (au-
thenticity, intention, and if it is news). We have also dened fake news as (1) deceptive news, narrowly and as (2)
false news, broadly. Questions are thus left, such as how to [manually or automatically] identify the authenticity and
intention of the given information (news)? Many domain-experts and platforms have investigated ways to analyze news
authenticity manually; however, how one can automatically assess news authenticity in an eective and explainable
manner is still an open issue. We will detail both manual and automatic assessment of news authenticity (also known
as fact-checking) in Section 2. To assess information (news) intention, analyzing the news (i) writing style and its (ii)
propagation patterns can be useful. First, the information (news) created to mislead or deceive the public intentionally
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 5
(e.g., deceptive news) should look or sound “more persuasive” compared to the news without such intentions (e.g., satire
news). Secondly, malicious users should play a part in the propagation of deceptive information (news) to enhance
its social inuence. Both news writing styles and propagation characteristics will be discussed with current methods
in Sections 3-5. For intention analysis, often some level of manual news annotation is necessary. The accuracy of
such annotations dramatically impacts automatic intention analysis within a machine learning framework. When the
intentions behind non-factual information are determined, intervention strategies can be more appropriate and eective.
For example, punitive measures should be taken against non-factual information and those who intentionally create it.
1.2 Fundamental Theories
Fundamental human cognition and behavior theories developed across various disciplines, such as social sciences
and economics, provide invaluable insights for fake news analysis. These theories can introduce new opportunities
for qualitative and quantitative studies of
big
fake news data [Zhou et al
.
2019a]. These theories can also facilitate
building well-justied and explainable models for fake news detection and intervention, which, to date, have been
rarely available [Miller et al
.
2017]. We have conducted a comprehensive literature survey across various disciplines
and have identied well-known theories that can be potentially used to study fake news. These theories are provided in
Table 2along with short descriptions, which are related to either (I) the news itself or (II) its spreaders.
I. News-related theories. News-related theories reveal the possible characteristics of fake news content compared to
true news content. For instance, theories have implied that fake news potentially diers from the truth in terms of, e.g.,
writing style and quality (by Undeutsch hypothesis) [Undeutsch 1967], quantity such as word counts (by information
manipulation theory) [McCornack et al
.
2014], and sentiments expressed (by four-factor theory) [Zuckerman et al
.
1981].
It should be noted that these theories, developed by forensic psychology, target deceptive statements or testimonies (i.e.,
disinformation) but not fake news, though these are similar concepts (see Section 1.1 for details). Thus, one research
opportunity is to verify whether these attributes (e.g., information sentiment polarity) are statistically distinguishable
among disinformation, fake news, and the truth, in particular, using big fake news data. On the other hand, these
(discriminative) attributes identied can be used to automatically detect fake news using its writing style, where a
typical study using supervised learning can be seen in [Zhou et al
.
2019a]; we will provide further details in Section 3.
II. User-related theories. User-related theories investigate the characteristics of users involved in fake news activities,
e.g., posting, forwarding, liking, and commenting. Fake news, unlike information such as fake reviews [Jindal and Liu
2008], can “attract” both malicious and normal users [Shao et al
.
2018]. Malicious users (e.g., some social bots [Ferrara
et al
.
2016]) spread fake news often intentionally and are driven by benets [Hovland et al
.
1957;Kahneman and Tversky
2013]. Some normal users (which we denote as vulnerable normal users) can frequently and unintentionally spread
fake news without recognizing the falsehood. Such vulnerability psychologically stems from (i) social impacts and
(ii) self-impact, where theories have been accordingly categorized and detailed in Table 2. Specically, as indicated
by the bandwagon eect [Leibenstein 1950], normative inuence theory [Deutsch and Gerard 1955], social identity
theory [Ashforth and Mael 1989], and availability cascade [Kuran and Sunstein 1999], to be liked and/or accepted by
the community, normal users are encouraged to engage in fake news activities when many users have done so (i.e.,
peer pressure). One’s trust to fake news and his or her unintentional spreading can be promoted as well when being
exposed more to fake news (i.e., validity eect) [Boehm 1994], which often takes place due to the echo chamber eect
on social media [Jamieson and Cappella 2008]. Such trust to fake news can be built when the fake news conrms
one’s preexisting attitudes, beliefs or hypotheses (i.e., conrmation bias [Nickerson 1998], selective exposure [Freedman
Manuscript submitted to ACM
6 Xinyi Zhou and Reza Zafarani
Table 2. Fundamental Theories in Social Sciences (Including Psychology and Philosophy) and Economics
Theory Phenomenon
News-related
Theories
Undeutsch hypothesis A statement based on a factual experience diers in content style and quality from that of fantasy.
[Undeutsch 1967]
Reality monitoring Actual events are characterized by higher levels of sensory- perceptual information.
[Johnson and Raye 1981]
Four-factor theory
Lies are expressed dierently in terms of arousal, behavior control, emotion, and thinking from truth.
[Zuckerman et al. 1981]
Information manipulation theory Extreme information quantity often exists in deception.
[McCornack et al. 2014]
User-related Theories (User’s Engagements and Roles in Fake News Activities)
Social Impacts
Conservatism bias The tendency to revise one’s belief insuciently when presented with new evidence.
[Basu 1997]
Semmelweis reex Individuals tend to reject new evidence because it contradicts with established norms and beliefs.
[Bálint and Bálint 2009]
Echo chamber eect Beliefs are amplied or reinforced by communication and repetition within a closed system.
[Jamieson and Cappella 2008]
Attentional bias An individual’s perception is aected by his or her recurring thoughts at the time.
[MacLeod et al. 1986]
Validity eect Individuals tend to believe information is correct after repeated exposures.
[Boehm 1994]
Bandwagon eect Individuals do something primarily because others are doing it.
[Leibenstein 1950]
Normative inuence theory The inuence of others leading us to conform to be liked and accepted by them.
[Deutsch and Gerard 1955]
Social identity theory An individual’s self-concept derives from perceived membership in a relevant social group.
[Ashforth and Mael 1989]
Availability cascade
Individuals tend to adopt insights expressed by others when such insights are gaining more popularity
within their social circles
[Kuran and Sunstein 1999]
Self-impact
Conrmation bias Individuals tend to trust information that conrms their preexisting beliefs or hypotheses.
[Nickerson 1998]
Selective exposure Individuals prefer information that conrms their preexisting attitudes.
[Freedman and Sears 1965]
Desirability bias Individuals are inclined to accept information that pleases them.
[Fisher 1993]
Illusion of asymmetric insight Individuals perceive their knowledge to surpass that of others.
[Pronin et al. 2001]
Naïve realism The senses provide us with direct awareness of objects as they really are.
[Ward et al. 1997]
Overcondence eect A person’s subjective condence in his judgments is reliably greater than the objective ones.
[Dunning et al. 1990]
Benets
Prospect theory People make decisions based on the value of losses and gains rather than the outcome.
[Kahneman and Tversky 2013]
Contrast eect The enhancement or diminishment of cognition due to successive or simultaneous exposure to a
stimulus of lesser or greater value in the same dimension.
[Hovland et al. 1957]
Valence eect People tend to overestimate the likelihood of good things happening rather than bad things.
[Frijda 1986]
and Sears 1965], and desirability bias [Fisher 1993]), which are often perceived to surpass that of others [Dunning
et al
.
1990;Pronin et al
.
2001;Ward et al
.
1997] and tend to be insuciently revised when new refuting evidence
is presented [Bálint and Bálint 2009;Basu 1997]. In such settings, strategies for intervening fake news from a user
perspective (more discussions on fake news intervention are in Section 6) should be cautiously designed for users
with dierent levels of credibility or intentions, even though they might all engage in the same fake news activity. For
instance, it is reasonable to intervene with the spread of fake news by penalizing (e.g., removing) malicious users, but
not for normal accounts. Instead, education and personal recommendations of true news articles and refuted fake ones
can be helpful for normal users [Vo and Lee 2018]. Such recommendations should not only cater to the topics that the
users want to read but should also capture topics that users are most gullible to. In Section 5, we will provide the path
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 7
Stage 2: Publication
Propagation path
Feedback
Publisher
(Source web)
Headline
Bodytext
A news art icle
Creator
Stage 1: Creation Stage 3: Propagation
Source-based
Style-based
Knowledge-based Propagation-based
Bodyim age User
(Spreader)
Fig. 1. Fake News Life Cycle and Connections to the Four Fake News Detection Perspectives Presented in this Survey
for utilizing these theories, i.e., quantifying social and self-impact, to enhance fake news research by identifying user
intent and evaluating user credibility.
Meanwhile, we should point out that clearly understanding the potential roles that the fundamental theories listed
in Table 2can play in fake news research requires further in-depth investigations of interdisciplinary nature.
1.3 An Overview of this Survey
We have dened fake news (Section 1.1) and presented relevant fundamental theories in various disciplines (Section 1.2).
The rest of this survey is organized as follows. We detail the detection of fake news from four perspectives (see Fig. 1
for an overview): (I) Knowledge-based methods (Section 2), which detect fake news by verifying if the knowledge within
the news content (text) is consistent with facts (true knowledge); (II) Style-based methods (Section 3) are concerned
with how fake news is written (e.g., if it is written with extreme emotions); (III) Propagation-based methods (Section 4),
where they detect fake news based on how it spreads online; and (IV) Source-based methods (Section 5) detect fake news
by investigating the credibility of news sources at various stages (being created, published online, and spread on social
media). In Section 6, we discuss open issues in current fake news studies and in fake news detection. We highlight six
potential research tasks, hoping to facilitate the development of fake news research. We conclude in Section 7.
Comparison to Related Surveys. Our survey varies from related surveys from three perspectives. First, we discuss the
ways fake news is dened in the current fake news research and its harmfulness to the public. We detail how fake news
is related to terms such as deceptive news, false news, satire news, disinformation, misinformation, cherry-picking,
clickbait, and rumor. Compared to related surveys and forums that often provide a specic denition for fake news, this
survey highlights the challenges of dening fake news and introduces both a narrow and a broad denition for it.
Second, though recent studies have highlighted the importance of multidisciplinary fake news research [Lazer et al
.
2018], we provide a path towards it by conducting an extensive literature survey across various disciplines, identifying
a comprehensive list of well-known theories. We demonstrate how these theories relate to fake news and its spreaders
and illustrate technical methods utilizing these theories both in fake news detection and intervention.
Third, for fake news detection, current surveys have mostly limited their scope to reviewing research from a certain
perspective (or within a certain research area, e.g., NLP [Oshikawa et al
.
2018] and data mining [Shu et al
.
2017]). These
surveys generally classify fake news detection models by the types of [deep] machine learning methods used [Oshikawa
et al
.
2018] or by whether they utilize social context information [Shu et al
.
2017]. In our survey, we categorize automatic
Manuscript submitted to ACM
8 Xinyi Zhou and Reza Zafarani
fake news detection methods from four perspectives: knowledge (Section 2), style (Section 3), propagation (Section 4),
and source (Section 5). Reviewing and organizing fake news detection studies in such a way allows analyzing both news
content (mainly Sections 2-3) and the medium (often, social media) on which the news spreads (Sections 4-5), where fake
news detection can be dened as a probabilistic/regression problem linked to, e.g., entity resolution and link prediction
tasks (Section 2), or a classication problem that relies on, e.g., feature engineering and [text and graph] embedding
techniques (Sections 3-5). In our survey of fake news detection, patterns of fake news in terms of its content (text and
images, see Figs. 7-8) or how it propagates (see Fig. 10 and Fig. 15) are revealed, algorithms and model architectures are
presented (e.g., Figs. 6,11,12-14), and performance of various fake news detection methods are compared (e.g., Table 6).
We point out that our survey focuses more on how to construct a fake news dataset, i.e., ground truth data, and the
possible sources to obtain such ground truth (e.g., Section 2.1), rather than detailing existing datasets, which have been
provided in past surveys [Oshikawa et al
.
2018;Shu et al
.
2017]. Nevertheless, we acknowledge the contributions to
automatic fake news detection by these existing datasets (e.g., CREDBANK [Mitra and Gilbert 2015], LIAR [Wang
2017], FakeNewsNet [Shu et al
.
2018], FakevsSatire [Golbeck et al
.
2018], NELA-GT-2018 [Nørregaard et al
.
2019],
FEVER [Thorne et al
.
2018], PHEME [Kochkina et al
.
2018], and Emergent [Ferreira and Vlachos 2016]) and systems
that can be used for building datasets (e.g., ClaimBuster [Hassan et al
.
2017b], XFake [Yang et al
.
2019], Hoaxy [Shao
et al. 2016], MediaRank [Ye and Skiena 2019], Botometer [Davis et al. 2016], and RumorLens [Resnick et al. 2014].
2 KNOWLEDGE-BASED FAKE NEWS DETECTION
When detecting fake news from a knowledge-based perspective, one often uses a process known as fact-checking. Fact-
checking, initially developed in journalism, aims to
assess news authenticity
by comparing the knowledge extracted
from to-be-veried news content (e.g., its claims or statements) with known facts. In this section, we will discuss the
traditional fact-checking (also known as manual fact-checking) and how it can be incorporated into automatic means to
detect fake news (i.e., automatic fact-checking).
2.1 Manual Fact-checking
Broadly speaking, manual fact-checking can be divided into (I) expert-based and (II) crowd-sourced fact-checking.
I. Expert-based Manual Fact-checking. Expert-based fact-checking relies on domain-experts as fact-checkers to verify
the given news contents. Expert-based fact-checking is often conducted by a small group of highly credible fact-checkers,
is easy to manage, and leads to highly accurate results, but is costly and poorly scales with the increase in the volume
of the to-be-checked news contents.
▶
Expert-based Fact-checking Websites. Recently, many websites have emerged to allow expert-based fact-checking
better serve the public. We list and provide details on the well-known websites in Table 3. Some websites provide further
information, for instance, PolitiFact provides “the PolitiFact scorecard”, which presents statistics on the authenticity
distribution of all the statements related to a specic topic (see an example on Donald Trump, the 45th President of the
5http://www.politifact.com/
6https://www.factcheck.org/
7https://www.washingtonpost.com/news/fact-checker
8https://www.snopes.com/
9https://www.truthorction.com/
10https://fullfact.org/
11http://hoax- slayer.com/
12https://www.gossipcop.com
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 9
Table 3. A Comparison among Expert-based Fact-checking Websites
Website Topics Covered Content Analyzed Assessment Labels
PolitiFact3American politics Statements
True; Mostly true; Half true; Mostly false; False;
Pants on re
The Washington
Post Fact Checker4
American politics Statements and claims
One pinocchio; Two pinocchio; Three pinoc-
chio; Four pinocchio; The Geppetto checkmark;
An upside-down Pinocchio; Verdict pending
FactCheck5American politics
TV ads, debates, speeches,
interviews, and news
True; No evidence; False
Snopes6
Politics and other social and
topical issues
News articles and videos
True; Mostly true; Mixture; Mostly false; False;
Unproven; Outdated; Miscaptioned; Correct at-
tribution; Misattributed; Scam; Legend
TruthOrFiction7
Politics, religion, nature,
aviation, food, medical, etc.
Email rumors Truth; Fiction; etc.
FullFact8
Economy, health, education,
crime, immigration, law
Articles Ambiguity (no clear labels)
HoaxSlayer9Ambiguity Articles and messages
Hoaxes, scams, malware, bogus warning, fake
news, misleading, true, humour, spams, etc.
GossipCop10 Hollywood and celebrities Articles
0-10 scale, where 0 indicates completely fake
news and 10 indicates completely true news
United States, in Fig. 2(a)). This information can provide the ground truth on the credibility of a topic [Zhang et al
.
2018],
and help identify check-worthy topics (see Section 6for details) that require further scrutiny for verication. Another
example is HoaxSlayer, which is dierent from most fact-checking websites that focus on information authenticity
because it further classies the articles and messages into, e.g., hoaxes, spams, and fake news. Though the website
does not provide clear denitions for these categories, its information can be potentially exploited as ground-truth
for comparative studies of fake news. In addition to the list provided here, a comprehensive list of fact-checking
websites is provided by Reporters Lab at Duke University,
11
where over two hundred fact-checking websites across
countries and languages are listed. Generally, these expert-based fact-checking websites can provide ground-truth for
the detection of fake news, in particular, under the broad denition (Denition 1). Among these websites PolitiFact
and GossipCop have supported the development of fake news datasets that are publicly available (e.g., LIAR [Wang
2017] and FakeNewsNet [Shu et al
.
2018]). The detailed expert-based analysis that these websites provide for checked
contents (e.g., what is false and why is it false) carries invaluable insights for various aspects of fake news analysis, e.g.,
for identifying check-worthy content [Hassan et al
.
2017a] and explainable fake news detection [Shu et al
.
2019a] (see
Section 6for more discussions); however, to date, such insights have not been well utilized.
II. Crowd-sourced Manual Fact-checking. Crowd-sourced fact-checking relies on a large population of regular individu-
als acting as fact-checkers (i.e., the collective intelligence). Such large population of fact-checkers can be gathered within
some common crowd-sourcing marketplaces such as Amazon Mechanical Turk, based on which CREDBANK [Mitra
and Gilbert 2015], a publicly available large-scale fake news dataset, has been constructed. Compared to expert-based
fact-checking, crowd-sourced fact-checking is relatively dicult to manage, less credible and accurate due to the
political bias of fact-checkers and their conicting annotations, and has better (though insucient) scalability. Hence,
in crowd-sourced fact-checking, one often needs to (1) lter non-credible users and (2) resolve conicting fact-checking
results; both requirements become more critical as the number of fact-checkers grows. Nevertheless, crowd-sourcing
11https://reporterslab.org/fact-checking/
Manuscript submitted to ACM
10 Xinyi Zhou and Reza Zafarani
(a) (Expert-based) PolitiFact: the PolitiFact scorecard (b) (Crowd-sourced) Fiskkit: the tag distribution
Fig. 2. Illustrations of Manual Fact-checking Websites
platforms often allow fact-checkers to provide more detailed feedback (e.g., their sentiments or stances), which can be
further explored in fake news studies.
▶
Crowd-sourced Fact-checking Websites. Unlike expert-based fact-checking, crowd-sourced fact-checking websites
are still in early development. An example is Fiskkit,
12
where users can upload articles, provide ratings for sentences
within articles, and choose tags that best describe the articles. The given sources of articles help (i) distinguish the
types of content (e.g., news vs. non-news) and (ii) determine its credibility (Section 5provides the details). The tags
categorized into multiple dimensions allow one to study the patterns across fake and non-fake news articles (see Fig. 2(b)
for an example). While crowd-sourced fact-checking websites are not many, we believe more crowd-sourced platforms
or tools will arise as major Web and social media websites start to realize their importance in identifying fake news
(e.g., Google,13 Facebook,14 Twitter,15 and Sina Weibo16).
2.2 Automatic Fact-checking
Manual fact-checking does not scale with the volume of newly created information, especially on social media. To address
scalability, automatic fact-checking techniques have been developed, heavily relying on Information Retrieval (IR),
Natural Language Processing (NLP), and Machine Learning (ML) techniques, as well as on network/graph theory [Cohen
et al
.
2011]. To review these techniques, a unied standard representation of knowledge is rst presented that can be
automatically processed by machines and has been widely adopted in related studies [Nickel et al. 2016]:
Definition 3 (Knowledge). A set of
(Subject, Predicate, Object)
(
SPO
) triples extracted from the given
information that well-represent the given information.
For instance, the knowledge within sentence “Donald Trump is the president of the U.S.” can be (
DonaldTrump
,
Profession
,
President
). Based on the above representation of knowledge, we present the following widely-accepted
12http://skkit.com/
13https://blog.google/topics/journalism- news/labeling-fact-check- articles-google-news/
14https://newsroom.fb.com/news/2016/12/news- feed-fyi-addressing- hoaxes-and-fake- news/
15https://blog.twitter.com/2010/trust-and- safety
16http://service.account.weibo.com/ (sign in required)
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 11
Knowledge
extraction
Open Web R aw "facts" Knowledge-base
(Reference) A news article
(Input)
Knowledge
comparison
Authenticity
(Output)
Redundancy
Invalidit y
Conflicts
Unreliability
Incom pleteness
100
Stage 1: Fact extraction Stage 2: Fact-checking
Fig. 3. Automatic News Fact-checking Process
denitions for the key terms that often appear in automatic fact-checking literature for a better understanding [Ciampaglia
et al. 2015;Dong et al. 2014;Shi and Weninger 2016]:
Definition 4 (Fact). A fact is a knowledge (SPO triple) veried as truth.
Definition 5 (Knowledge Base). Knowledge Base (KB) is a set of facts.
Definition 6 (Knowledge Graph). Knowledge Graph (KG) is a graph structure representing the SPO triples in a
knowledge base, where the entities (i.e., subjects or objects in SPO triples) are represented as nodes and relationships (i.e.,
predicates in SPO triples) are represented as edges.
As a systematic approach for automatic fact-checking of news has never been presented before, here we prioritize
organizing the related research to clearly present the automatic news fact-checking process over presenting each related
study in detail. The automatic fact-checking process is shown in Fig. 3. It can be divided into two stages: fact extraction
(a.k.a. knowledge-base construction, see Section 2.2.1) and fact-checking (a.k.a. knowledge comparison, see Section 2.2.2).
2.2.1 Fact Extraction. To collect facts and construct a KB (KG), knowledge is rst extracted from the open Web as
raw “facts” that need further processing. Such process often refers to knowledge extraction or relation extraction [Nickel
et al
.
2016]. Knowledge extraction can be classied into single-source or open-source knowledge extraction. Single-source
knowledge extraction, which relies on one comparatively reliable source (e.g., Wikipedia) to extract knowledge, is
relatively ecient but often leads to incomplete knowledge (see related studies in [Auer et al
.
2007;Bollacker et al
.
2008;
Suchanek et al
.
2007]). Open-source knowledge extraction aims to fuse knowledge from distinct sources; hence, it is
less ecient that single-source knowledge extraction, but leads to more complete knowledge (see related studies in
[Carlson et al
.
2010;Dong et al
.
2014;Nakashole et al
.
2012;Niu et al
.
2012]). More recent studies in relation extraction
can be seen in [Di et al
.
2019;Lin et al
.
2019;Yu et al
.
2019]. Finally, to form a KB (KG) from these extracted raw “facts”,
they need to be further cleaned-up and completed by addressing the following issues:
•
Redundancy. For example,
(DonaldJohnTrump, profession, President)
is redundant when having
(Donald-
Trump, profession, President)
as
DonaldTrump
and
DonaldJohnTrump
match the same entity. The task to
reduce redundancy is often referred to as entity resolution [Getoor and Machanavajjhala 2012], a.k.a. deduplication
or record linkage [Nickel et al
.
2016] (see related studies in, e.g., [Altowim et al
.
2014;Bhattacharya and Getoor
2007;Christen 2008;Steorts et al. 2016;Whang and Garcia-Molina 2012]);
•
Invalidity. The correctness of some facts depends on a specic time interval, for example,
(Britain, joinIn,
EuropeanUnion)
has been outdated and should be updated. One way to address this issue is to allow facts to
have beginning and ending dates [Bollacker et al
.
2008]; or one can reify current facts by adding extra assertions
to them [Hoart et al. 2013];
Manuscript submitted to ACM
12 Xinyi Zhou and Reza Zafarani
•
Conicts. For example,
(DonaldTrump, bornIn, NewYorkCity)
and
(DonaldTrump, bornIn, LosAngeles)
are a pair with conicting knowledge. Conicts can be resolved by Multi-Criteria Decision-Making (MCDM)
methods [Deng 2015;Kang and Deng 2019;Pasi et al. 2019;Viviani and Pasi 2016];
•
Unreliability (Low-credibility). For example, the knowledge extracted from The Onion,
17
a satire news organization,
is unreliable knowledge. Unreliable knowledge can be reduced by ltering low-credibility website(s) from which
the knowledge is extracted. The credibility of website(s) can be obtained from resources such as NewsGuard
18
(expert-based), or systems such as MediaRank [Ye and Skiena 2019]. Section 5provides more details; and
•
Incompleteness. Raw facts extracted from online resources, particularly, using a single source, are far from
complete. Hence, reliably inferring new facts based on existing facts, a.k.a. KG completion, is necessary to improve
the KG being built. KG completion performs link prediction between entities, where the methods can be classied
into three groups based on their assumptions: (1) latent feature models, that assume the existence of KB triples is
conditionally independent given latent features and parameters (e.g., RESCAL [Nickel et al
.
2012], NTN [Socher
et al
.
2013], DistMult [Yang et al
.
2014], TransE [Bordes et al
.
2013], TransH [Wang et al
.
2014], TransR [Lin et al
.
2015], ComplEx [Trouillon et al
.
2016], SimplE [Kazemi and Poole 2018], and ConMask [Shi and Weninger 2018]);
(2) graph feature models that assume the existence of triples is conditionally independent given observed graph
features and parameters (e.g., Path Ranking Algorithm (PRA) [Lao and Cohen 2010]); and (3) Markov Random
Field (MRF) models, that assume existing triples have local interactions [Nickel et al. 2016].
Note that instead of building a KB (KG) from scratch, one can rely on existing large-scale ones, e.g., YAGO [Hoart
et al
.
2013;Suchanek et al
.
2007], Freebase [Bollacker et al
.
2008], NELL [Carlson et al
.
2010], PATTY [Nakashole et al
.
2012], DBpedia [Auer et al. 2007], Elementary/DeepDive [Niu et al. 2012], and Knowledge Vault [Dong et al. 2014].
2.2.2 Fact-checking. To assess the authenticity of news articles, we need to compare the knowledge extracted from
to-be-veried news content (i.e., SPO triples) with the facts (i.e., true knowledge). KBs (KGs) are suitable resources for
providing ground truth for news fact-checking, i.e., we can reasonably assume that the existing triples in a KB (KG)
represent facts. However, for non-existing triples, their authenticity relies on assumptions made – we list three common
assumptions below – and we may need further inference:
•Closed-world assumption: non-existing triples indicate false knowledge;
•Open-world assumption: non-existing triples indicate unknown knowledge that can be either true or false; and
•
Local closed-world assumption [Dong et al
.
2014]: the authenticity of non-existing triples can be determined by
the following rule: let
T(s,p)
denote the set of existing triples in the KB (KG) for subject
s
and predicate
p
. For
any (s,p,o)<T(s,p), if |T(s,p)| >0, we say the triple is false; if |T(s,p)| =0, its authenticity is unknown.
Generally, the fact-checking strategy for a (
Subject
,
Predicate
,
Object
) triple is to evaluate the possibility that the
edge labeled Predicate exists from the node labeled Subject to the node representing Object in a KG. Specically,
Step 1:
Entity locating.
Subject
(Similarly,
Object
) is rst matched with a node in the KG that represents the same entity
as the
Subject
(similarly,
Object
), where entity resolution techniques (e.g., [Altowim et al
.
2014;Bhattacharya
and Getoor 2007;Trivedi et al. 2018]) can be used to identify proper matchings.
Step 2:
Relation verication. Triple (
Subject
,
Predicate
,
Object
) is considered as truth if an edge labeled
Predicate
from the node representing
Subject
to the one representing
Object
exists in the KG. Otherwise, its authenticity
is (1) false based on the aforementioned closed-world assumption, or (2) determined after knowledge inference.
17https://www.theonion.com/
18http://www.newsguardtech.com/
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 13
Step 3:
Knowledge inference. When the triple (
Subject
,
Predicate
,
Object
) does not exist in the KG, the probability for
the edge labeled
Predicate
to exist from the node representing
Subject
to the one representing
Object
in the
KG can be computed, e.g., using link prediction methods such as semantic proximity [Ciampaglia et al
.
2015],
discriminative predicate path [Shi and Weninger 2016], or LinkNBed [Trivedi et al. 2018].
Finally, we conclude this section by formally dening the problem of automatic news fact-checking in Problem 1,
and discussing the potential research avenues in automatic news fact-checking in Section 2.2.3.
Problem 1 (Fact-checking). Assume a to-be-veried news article is represented as a set of knowledge statements, i.e.,
SPO triples
(si,pi,oi),i=
1
,
2
, . . . , n
. Let
GKB
refer to a knowledge graph containing a set of facts (i.e., true knowledge)
denoted by
(stj,ptj,otj),j=
1
,
2
, . . . , m
. News fact-checking is to identify a function
F
that assigns an authenticity value
Ai∈ [
0
,
1
]
to each corresponding
(si,pi,oi)
by comparing it with every
(stj,ptj,otj)
in the knowledge graph, where
Ai=
1
(
Ai=
0) indicates the triple is true (false). The nal authenticity index
A∈ [
0
,
1
]
of the to-be-veried news article is obtained
by aggregating all Ai’s. To summarize,
F:(si,pi,oi)GKB
−−−→ Ai,
A=I(A1,A2,· · · ,An),
(1)
where
I
is an aggregation function of choice ( e.g., weighted or arithmetic average). The to-be-veried news article is true if
A=1, and is [completely] false if A=0. Specically, function Fcan be formulated as
F((si,pi,oi),GKB)=P(edge labeled pilinking s′
ito o′
iin GKB),(2)
where P(·) denotes the probability, and s′
iand o′
iare the matched entities to siand oiin GKB, respectively:
s′
i=arg min
stj|D(si,stj)| <θ,o′
i=arg min
otj|D(oi,otj)| <θ.(3)
In Eq. (3),
D(a,b)
measures the distance between entity
a
and
b
. Such distance can be computed by, e.g., Jaccard distance
directly, or by cosine distance after entity embedding. When the distance between
a
and
b
is zero (i.e.,
D(a,b)=
0), or the
distance is less than a certain threshold θ(i.e., |D(a,b)| <θ), one can regard aas the same entity as b.
2.2.3 Discussion. We have detailed fact extraction (i.e., KB/KG construction) and fact-checking (i.e., knowledge
comparison), the two main components of automatic news fact-checking. There are some open issues and several
potential research tasks. First, when collecting facts to construct KB (KG), one concern is the source(s) from which
facts are extracted. In addition to the traditional sources such as Wikipedia, some other sources, e.g., fact-checking
websites that contain expert analysis and justications for checked news content, might help provide high-quality
domain knowledge. However, such sources have rarely been considered in current research. Second, we highlight the
value of research on dynamic KBs (KGs) for news fact-checking that can automatically remove invalid knowledge and
introduce new facts. Such properties are especially important due to news timeliness – news articles are often not
about “common knowledge”, but around recent events. Third, it has been veried that fake news spreads faster than
true news [Vosoughi et al
.
2018], which attaches great importance to fast news fact-checking to achieve fake news
early detection (see Section 6for a summary and discussion). Current research in building KBs (KGs) has focused on
constructing KBs (KGs) with as many facts as possible. However, fast news fact-checking requires not only identifying
parts of the to-be-veried news that is check-worthy (see Section 6for a discussion on identifying check-worthy
content), but also a KB (KG) that only stores as many “valuable” facts as possible (i.e., a KB (KG) simplication process).
Manuscript submitted to ACM
14 Xinyi Zhou and Reza Zafarani
3 STYLE-BASED FAKE NEWS DETECTION
Similar to knowledge-based fake news detection (Section 2), style-based fake news detection also focuses on analyzing
the news content. However, knowledge-based methods mainly evaluate the authenticity of the given news, while
style-based methods can
assess news intention
, i.e., is there an intention to mislead the public or not? The intuition
and assumption behind style-based methods is that malicious entities prefer to write fake news in a “special” style
to encourage others to read and convince them to trust. Before discussing how such “special” content styles can be
automatically identied, we rst dene fake news style in a way that facilitates use of machine learning:
Definition 7 (Fake News Style). A set of quantiable characteristics (e.g., machine learning features) that can well
represent fake news content and dierentiate it from true news content.
Based on this denition for fake news style, style-based fake news detection is often formulated as a binary (or at
times, a multi-label) classication problem:
Problem 2 (Style-based Fake News Detection). Assume a to-be-veried news article
N
can be represented as a set
of
k
content features denoted by feature vector
f∈Rk
. The task to verify the news article based on its content style is to
identify a function S, such that
S:fT D
−−−→ ˆ
y(4)
where
ˆ
y∈ {
0
(true),
1
(fake)}
is the predicted news label and
T D ={(fl,yl) | fl∈Rk,yl∈ {
0
,
1
},l=
1
. . . n}
is
the training dataset. The training dataset helps estimate the parameters within
S
and consists of a set of
n
news articles
represented by the same set of features (fl) with known news labels (yl).
Hence, the performance of style-based fake news detection methods rely on (I) how well the style of news content
(text and images) can be captured and represented (see Section 3.1); and (II) how the classier (model) is performing
based on dierent news content representations (see Section 3.2). In addition, we summarize (III) some veried patterns
of fake news content style in Section 3.3, and provide (IV) our discussion on style-based methods in Section 3.4.
3.1 Style Representation
As provided in Denition 7, the content style is commonly represented by a set of quantiable characteristics, often
machine learning features. Generally, these features can be grouped into textual features (see Section 3.1.1) and visual
features (see Section 3.1.2), representing news text and images, respectively.
3.1.1 News Text. Broadly speaking, textual features can be grouped into (I) general features and (II) latent features.
▶
General textual features. General textual features are often used to detect fake news within a traditional machine
learning framework (detailed in Section 3.2.1). These features describe content style from (at least) four language levels:
(i) lexicon, (ii) syntax, (iii) discourse and (iv) semantic [Conroy et al
.
2015]. The main task at the lexicon level is to assess
the frequency statistics of lexicons, which can be basically conducted using a Bag-Of-Word (BOW) model [Zhou et al
.
2019a]. At the syntax level, shallow syntax tasks are performed by Part-Of-Speech (POS)-taggers to assess POS (e.g.,
nouns and verbs) frequencies [Feng et al
.
2012;Zhou et al
.
2019a]. Deep syntax tasks are performed by Probabilistic
Context-Free Grammar (PCFG) parse trees (see Fig. 4for an example) that enable assessing the frequencies of rewrite
rules (i.e., productions) [Feng et al
.
2012;Pérez-Rosas et al
.
2017;Zhou et al
.
2019a]. Four dierent encodings of rewrite
rules can be considered for a PCFG parse tree [Feng et al. 2012]:
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 15
Fig. 4. PCFG Parse Tree for the sentence “The CIA confirmed Rus-
sian interference in the presidential election” in a fake news article (di-
rectly from [Zhou et al
.
2019a]). The lexicalized rewrite rules of this
sentence are: S
→
NP PP, NP
→
DT NNP VBN JJ NN, PP
→
IN NP, NP
→
DT
JJ NN, DT
→
“the”, NNP
→
“CIA”, VBN
→
“confirmed”, JJ
→
“Russian”,
NN→“interference”, IN→“in”, JJ→“presidential”, and NN→“election”.
Fig. 5. Rhetorical Structure for the partial content “Huington Post is
really running with this story from The Washington Post about the CIA
confirming Russian interference in the presidential election. They’re saying
if 100% true, the courts can PUT HILLARY IN THE WHITE HOUSE!” in a
fake news article (directly from [Zhou et al
.
2019a]). Here, one elaboration,
one aribution, and one condition rhetorical relationship exist.
•r: unlexicalized rewrite rules, i.e., all rewrite rules except for those with leaf nodes such as IN→“in”;
•r∗: lexicalized rewrite rules, i.e., all rewrite rules;
•ˆ
r: unlexicalized rewrite rules with grandparent nodes, e.g., PPˆS→IN NP; and
•ˆ
r∗: lexicalized rewrite rules with grandparent nodes, e.g., INˆPP→“in”.
At the discourse level, Rhetorical Structure Theory (RST) and rhetorical parsing tools can be used to capture the
frequencies of rhetorical relations among sentences as features [Karimi and Tang 2019;Zhou et al
.
2019a] (see Fig. 5for
an illustration). Finally, at a semantic level, such frequencies can be assigned to lexicons or phrases that fall into each
psycho-linguistic category (e.g., those dened in Linguistic Inquiry and Word Count (LIWC) [Pérez-Rosas et al
.
2017]), or
that fall into each self-dened psycho-linguistic attribute. These attributes can be derived from experience, or be inspired
by related deception theories (see news-related theories in Table 2or [Zhou et al
.
2019a] as a typical interdisciplinary
fake news study). Based on our investigation, such attributes and their corresponding computational features can
be grouped along ten dimensions: quantity [McCornack et al
.
2014], complexity,uncertainty,subjectivity [Undeutsch
1967], non-immediacy,sentiment [Zuckerman et al
.
1981], diversity [Undeutsch 1967], informality [Undeutsch 1967],
specicity [Johnson and Raye 1981], and readability (see Table 4), which are initially developed for identifying deception
in computer-mediated communications [Fuller et al
.
2009;Zhou et al
.
2004b] and testimonies [Afroz et al
.
2012], and
recently used in fake news detection [Bond et al. 2017;Pérez-Rosas et al. 2017;Potthast et al. 2017;Zhou et al. 2019a].
It should be noted that “frequency” can be dened and computed in three ways. Assume a corpus
C
contains
p
news articles
C={A1,A2,· ·· ,Ap}
and a total of
q
words
W={w1,w2,·· · ,wq}
(POS tags, rewrite rules, rhetorical
relationships, etc.). xi
jdenotes the number of wjappearing in Ai. Then the “frequency” of wjfor news Aican be:
•Absolute frequency fa, i.e., fa=xi
j;
•Standardized frequency fs, which removes the impact of content length, i.e., fs=
xi
j
Íjxi
j
[Zhou et al. 2019a]; or
•
Relative frequency
fr
by using Term Frequency-Inverse Document Frequency (TF-IDF), which further compares
such frequency with that in other news articles in the corpus, i.e.,
fr=
xi
j
Íjxi
j
lnp1
Íixi
j
[Pérez-Rosas et al
.
2017].
In general, TF-IDF can be applied at various language levels, so can
n
-gram models which enable capturing the sequence
of words (POS tags, rewrite rules, etc.) [Feng et al. 2012;Pérez-Rosas et al. 2017].
▶
Latent textual features. Latent textual features are often used for news text embedding. Such an embedding can
be conducted at the word-level [Mikolov et al
.
2013;Pennington et al
.
2014], sentence-level [Arora et al
.
2016;Le and
Mikolov 2014], or document-level [Le and Mikolov 2014]; results are vectors representing a news article and can be
Manuscript submitted to ACM
16 Xinyi Zhou and Reza Zafarani
Table 4. Semantic-level Features in News Content
Attribute Type Feature
[Zhou et al. 2004b]
[Fuller et al. 2009]
[Afroz et al. 2012]
[Siering et al. 2016]
[Zhang et al. 2016]
[Bond et al. 2017]
[Potthast et al. 2017]
[Pérez-Rosas et al. 2017]
[Zhou et al. 2019a]
Quantity
# Characters ✓ ✓
# Words ✓ ✓ ✓ ✓ ✓ ✓
# Noun phrases ✓
# Sentences ✓ ✓ ✓ ✓ ✓
# Paragraphs ✓ ✓
Complexity
Average # characters per word ✓ ✓ ✓ ✓ ✓
Average # words per sentence ✓ ✓ ✓ ✓ ✓ ✓
Average # clauses per sentence ✓ ✓
Average # punctuations per sentence ✓ ✓ ✓ ✓
Uncertainty
#/% Modal verbs (e.g., “shall”) ✓ ✓ ✓ ✓
#/% Certainty terms (e.g., “never” and “always”) ✓ ✓ ✓ ✓ ✓ ✓
#/% Generalizing terms (e.g., “generally” and “all”) ✓
#/% Tentative terms (e.g., “probably”) ✓ ✓ ✓ ✓
#/% Numbers and quantiers ✓
#/% Question marks ✓ ✓
Subjectivity
#/% Biased lexicons (e.g., “attack”) ✓
#/% Subjective verbs (e.g., “feel” and “believe”) ✓ ✓
#/% Report verbs (e.g., “announce”) ✓
#/% Factive verbs (e.g., “observe”) ✓
Non-immediacy
#/% Passive voice ✓ ✓
#/% Self reference: 1st person singular pronouns ✓✓✓✓✓
#/% Group reference: 1st person plural pronouns ✓✓✓✓✓
#/% Other reference: 2nd and 3rd person pronouns ✓✓✓✓✓
#/% Quotations ✓ ✓
Sentiment
#/% Positive words ✓✓✓✓✓✓ ✓
#/% Negative words ✓✓✓✓✓✓ ✓
#/% Anxiety/angry/sadness words ✓ ✓
#/% Exclamation marks ✓ ✓
Content sentiment polarity ✓
Diversity
Lexical diversity: #/% unique words or terms ✓ ✓ ✓ ✓ ✓ ✓
Content word diversity: #/% unique content words ✓ ✓ ✓ ✓
Redundancy: #/% unique function words ✓ ✓ ✓ ✓ ✓
#/% Unique nouns/verbs/adjectives/adverbs ✓
Informality #/% Typos (misspelled words) ✓ ✓ ✓
#/% Swear words/netspeak/assent/nonuencies/llers ✓
Specicity
Temporal/spatial ratio ✓ ✓ ✓
Sensory ratio ✓ ✓ ✓ ✓ ✓
Causation terms ✓ ✓ ✓
Exclusive terms ✓
Readablity (e.g., Flesch-Kincaid and Gunning-Fog index) ✓ ✓ ✓ ✓
The studies labeled with gray background color investigate news articles.
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 17
Table 5. Visual Features in News Content (defined in [Jin et al. 2017])
Feature Description
Visual Clarity Score Distribution dierence between the image set of a news article and that in the corpus
Visual Coherence Score Average similarities between pairs of images in a news article
Visual Similarity Distribution Histogram Distribution histogram of the similarity matrix of images in a news article
Visual Diversity Score Weighted average of dissimilarities between image pairs in a news article
Visual Clustering Score The number of image clusters in a news article
directly used as the input to classiers (e.g., SVMs) when predicting fake news within a traditional machine learning
framework [Zhou et al
.
2019a] (detailed in Section 3.2.1). Such embeddings (often at the word-level) can be further
incorporated into neural network architectures (e.g., Convolution Neural Networks, CNNs [He et al
.
2016;Huang
et al
.
2017;Krizhevsky et al
.
2012;LeCun et al
.
1989;Simonyan and Zisserman 2014;Szegedy et al
.
2015], Recurrent
Neural Networks, RNNs [Cho et al
.
2014;Hochreiter and Schmidhuber 1997;Schuster and Paliwal 1997], and the
Transformer [Devlin et al
.
2018;Vaswani et al
.
2017]) to predict fake news within a deep learning framework (see
details in Section 3.2.2), where CNNs represent news text from a local to global view, and RNNs and the Transformer
capture the sequences with news text. Theoretically, such latent representation can also be obtained by matrix or tensor
factorization; current style-based fake news detection studies have rarely considered them, while a few studies focusing
on the propagation aspects of fake news have considered these methods, which we will review later in Section 4.
3.1.2 News Images. Currently, not many studies exist on detecting fake news by exploring news images [Shu et al
.
2017]. News images can be represented by hand-crafted (i.e., non-latent) features, e.g., the visual features dened in [Jin
et al
.
2017] (see Table 5). On the other hand, to be further processed by neural networks such as VGG-16/19 [Simonyan
and Zisserman 2014] to obtain a latent representation, each image is often embedded as a pixel matrix or tensor with
size width ×heiдht ×#channel(s), where channels can be the gray value (#channels=1) or RGB data (#channels=3).
3.2 Style Classification
We detail style-based fake news detection models that rely on traditional Machine Learning (ML) (in Section 3.2.1) and
on Deep Learning (DL) [LeCun et al. 2015] (in Section 3.2.2), along with their performance results.
3.2.1 Traditional Machine Learning-based Models. Within a traditional ML framework, news content is represented
by using a set of manually selected [latent and non-latent] features, which can be extracted from news images, or text
at various language levels (lexicon-, syntax-, semantic-, and discourse-level) [Feng et al
.
2012;Pérez-Rosas et al
.
2017;
Zhou et al
.
2019a]. Machine learning models that can detect news type (e.g., true or fake) based on this representation
can be supervised, semi-supervised, or unsupervised, where supervised methods (classiers) have been mainly used
for style-based fake news detection, e.g., style-based methods have relied on SVMs [Feng et al
.
2012;Pérez-Rosas et al
.
2017], Random Forests (RF) [Zhou et al. 2019a], and XGBoost [Chen and Guestrin 2016;Zhou et al. 2019a].
Within the same experimental setup as that in [Zhou et al
.
2019a], the performance of [latent and non-latent]
features at various levels is presented and compared in Table 6. Results indicate that when predicting fake news using a
traditional ML framework, (1) non-latent features often outperform latent ones; (2) combining features across levels can
outperform using single-level features; and (3) [standardized] frequencies of lexicons and rewrite rules better represent
fake news content style and perform better (while more time-consuming to compute) than other feature groups.
It should be noted that as classiers perform best for machine learning settings they were initially designed for, it
is unjustied to determine algorithms that perform best for fake news detection (related discussions can be found in
[Fernández-Delgado et al. 2014;Kotsiantis et al. 2007]).
Manuscript submitted to ACM
18 Xinyi Zhou and Reza Zafarani
Table 6. Feature Performance (accuracy
Acc.
and
F1
-score) in Fake News Detection using Traditional Machine Learning (
R
andom
F
orests and
XGBoost
classifiers) [Zhou et al
.
2019a]. Results show that (1) non-latent features can outperform latent ones; (2) com-
bining features across levels can outperform using single-level features; and (3) the [standardized] frequencies of lexicons and rewrite
rules beer represent fake news content style and perform beer (while more time-consuming to compute) than other feature groups.
Feature Group
PolitiFact data [Shu et al. 2018]BuzzFeed data [Shu et al. 2018]
XGBoost RF XGBoost RF
Acc. F1Acc. F1Acc. F1Acc. F1
Non-latent Features
Lexicon BOWs (fs) 0.856 0.858 0.837 0.836 0.823 0.823 0.815 0.815
Unigram+bigram (fr) 0.755 0.756 0.754 0.755 0.721 0.711 0.735 0.723
Syntax
POS tags (fs) 0.755 0.755 0.776 0.776 0.745 0.745 0.732 0.732
Rewrite rules (r∗,fs)0.877 0.877 0.836 0.836 0.778 0.778 0.845 0.845
Rewrite rules (r∗,fr) 0.749 0.753 0.743 0.748 0.735 0.738 0.732 0.735
Semantic LIWC 0.645 0.649 0.645 0.647 0.655 0.655 0.663 0.659
Theory-driven [Zhou et al. 2019a] 0.745 0.748 0.737 0.737 0.722 0.750 0.789 0.789
Discourse Rhetorical relationships 0.621 0.621 0.633 0.633 0.658 0.658 0.665 0.665
Combination [Zhou et al. 2019a] 0.865 0.865 0.845 0.845 0.855 0.856 0.854 0.854
Latent Features Word2Vec [Mikolov et al. 2013] 0.688 0.671 0.663 0.667 0.703 0.714 0.722 0.718
Doc2Vec [Le and Mikolov 2014] 0.698 0.684 0.712 0.698 0.615 0.610 0.620 0.615
(a) EANN (directly from [Wang et al. 2018])
... ...
Input
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
... ... ...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
... ... ...
...
New s content Im age
Text
Mult i- m odal
featur e extract ion Fake news
predicti on
Cross-m odal
simi lari ty captu re
im age2sentence
FAKE
(b) SAFE (directly from [Zhou et al. 2020])
Fig. 6. Multimodal Fake News Detection Models
3.2.2 Deep Learning-based Models. Within a deep learning framework, news content (text and/or images) is often
rst embedded at the word-level [Mikolov et al
.
2013] (for text), or as a pixel matrix or tensor (for images). Then,
such an embedding is processed by a well-trained neural network (e.g., CNNs [He et al
.
2016;Huang et al
.
2017;
Krizhevsky et al
.
2012;LeCun et al
.
1989;Szegedy et al
.
2015] such as VGG-16/19 [Simonyan and Zisserman 2014]
and Text-CNN [Kim 2014]; RNNs such as LSTMs [Hochreiter and Schmidhuber 1997], GRUs [Cho et al
.
2014], and
BRNNs [Schuster and Paliwal 1997]; and the Transformer [Devlin et al
.
2018;Vaswani et al
.
2017]) to extract latent
textual and/or visual features of news content. Ultimately, the given news content is classied as true news or fake
news often by concatenating and feeding all these features to a well-trained classier such as a softmax.
This general procedure can be improved to, e.g., facilitate explainable fake news detection and enhance feature
representativeness – more discussions on explainable fake news detection are provided in Section 6. An example is
the Event Adversarial Neural Network (EANN) [Wang et al
.
2018], which can enhance feature representativeness by
extracting features that are invariant under dierent world events to represent news content (text and images). Fig.
6(a) presents the architecture of the EANN model, which has three components: (1) multi-modal feature extractor,
which extracts both textual and visual features from a given news article using neural networks; (2) event discriminator,
which further captures event-invariant features of the given news by playing a min-max game; and (3) fake news
detector for news classication (true or fake). In EANN, the event discriminator ensures that the extracted features are
representative. Another example is SAFE, a multimodal fake news detection method (see its architecture in Fig. 6(b)).
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 19
Fake True
0
0.01
0.02
0.03
0.04
% Swear Words
Fake True
0
0.1
0.2
% Unique Verbs
Fake True
0
0.02
0.04
0.06
% Report Verbs
Fake True
0
0.05
0.1
% Emotional Words
Fig. 7. Fake News Textual Paerns [Zhou et al
.
2019a] (PolitiFact, data is from FakeNewsNet [Shu et al
.
2018]): Compared to true
news text, fake news text has (i) higher informality (% swear words), (ii) diversity (% unique verbs), (iii) subjectivity (% report verbs),
and is (iv) more emotional (% emotional words).
Fig. 8. Fake News Visual Paerns (Twier+Weibo, directly from [Jin et al
.
2017]): Compared to true news images, fake news images
oen have higher clarity and coherence, while lower diversity and clustering score (see Table 5for a description of these features).
SAFE explores the relationship between the textual and visual features in a news article to detect fake news based on
the falsity that can exist in the news multimodal and/or relational information [Zhou et al
.
2020]. Specically, SAFE
assumes that a “gap” often exists between the textual and visual information of fake news by observing that (i) to attract
public attention, some fake news writers prefer to use attractive while irrelevant images, and (ii) when a fake news
article tells a fake story, it is dicult to nd both pertinent and nonmanipulated images to match such fake contents.
3.3 Paerns of Fake News Content Style
Some patterns of fake news content (text and images) style are distinguishable from those of the true news. Recent
studies have revealed such patterns [Jin et al. 2017;Zhou et al. 2019a]. Particularly,
•
Fake news text, compared to true news text as shown in Fig. 7, has higher (i) informality (% swear words), (ii) diversity
(% unique verbs), (iii) subjectivity (% report verbs), and is (iv) more emotional (% emotional words); and
•
Fake news images, compared to true news images as illustrated in Fig. 8, often have higher clarity and coherence,
while lower diversity and clustering scores (see Table 5for a description of these features).
3.4 Discussion
We have detailed how to (1) represent and (2) classify news content style, the two main components of style-based fake
news detection, along with some [textual and visual] patterns within fake news content that can help distinguish it from
true news content. It should be pointed out that patterns in Fig. 7are limited to political news articles and those in Fig. 8
target a mix of English (Twitter) and Chinese (Weibo) news articles from vague domains. Hence, a more comprehensive
analysis of fake news content style across dierent domains, languages, time periods, etc. is highly encouraged as
fake news content style may vary across domains and languages, evolve over time, and ultimately impact prediction
performance [Pérez-Rosas et al
.
2017] (more discussions on this topic are presented in Section 6). Furthermore, the
writing style can be manipulated. Style-based (or knowledge-based) fake news detection relies heavily on news content,
which enables the models to identify fake news before it has been propagated on social media (i.e., to achieve fake
Manuscript submitted to ACM
20 Xinyi Zhou and Reza Zafarani
news early detection, which we discuss more in Section 6). However, such heavy dependence “helps” malicious entities
bypass style-based models by changing their writing style. In other words, style-based fake news detection sometimes
can be a cat-and-mouse game; any success at detection, in turn, will inspire future countermeasures by fake news
writers. To resist such an attack, one can further involve social context information into news analysis to enhance
model robustness, which we will discuss next in Sections 4and 5.
4 PROPAGATION-BASED FAKE NEWS DETECTION
When detecting fake news from a propagation-based perspective, one can investigate and utilize the information related
to the dissemination of fake news, e.g., how users spread it. Similar to style-based fake news detection, propagation-
based fake news detection is often formulated as a binary (or multi-label) classication problem as well, however,
with a dierent input. Broadly speaking, the input to a propagation-based method can be either a (I) news cascade, a
direct representation of news propagation, or a (II) self-dened graph, an indirect representation capturing additional
information on news propagation. Hence, propagation-based fake news detection boils down to classifying (I) news
cascades or (II) self-dened graphs. We review fake news detection using news cascades in Section 4.1 and fake news
detection using self-dened propagation graphs in Section 4.2, and provide our discussions in Section 4.3.
4.1 Fake News Detection using News Cascades
We rst dene a news cascade in Denition 8, a formal representation of news dissemination that has been adopted by
many studies (e.g., [Castillo et al. 2011;Ma et al. 2018;Vosoughi et al. 2018;Wu et al. 2015]).
Definition 8 (News Cascade). A news cascade is a tree or tree-like structure that directly captures the propagation of
a certain news article on a social network (Fig. 9provides examples). The root node of a news cascade represents the user
who rst shared the news article (i.e., initiator); Other nodes in the cascade represent users that have subsequently spread
the article by forwarding it after it was posted by their parent nodes, which they are connected to via edges. A news cascade
can be represented in terms of the number of steps (i.e., hops) that the news has traveled (i.e., hop-based news cascade) or
the times that it was posted (i.e., time-based news cascade).
Hop-based news cascade, often a standard tree,
allowing natural measures such as
- Depth: the maximum number of steps (hops)
that the news has traveled within a cascade;
- Breadth (at hop k): the number of users that
have spread the news ksteps (hops) after it was
initially posted within a cascade; and
- Size: the total number of users in a cascade.
Time-based news cascade, often a tree-like structure,
allowing natural measures such as
- Lifetime: the longest interval during which the news
has been propagated;
- Real-time heat (at time t): the number of users
posting/forwarding the news at time t; and
- Overall heat: the total number of users that have
forwarded/posted the news.
Fig. 9. Illustrations of News Cascades
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 21
Table 7. News Cascade Features
Feature Type Feature Description H T P
Cascade Size Overall number of nodes in a cascade [Castillo et al. 2011;Vosoughi et al. 2018]✓ ✓ ✓
Cascade Breath Maximum (or average) breadth of a news cascade [Vosoughi et al. 2018]✓ ✓
Cascade Depth Depth of a news cascade [Castillo et al. 2011;Vosoughi et al. 2018]✓ ✓
Structural Virality Average distance among all pairs of nodes in a cascade [Vosoughi et al. 2018]✓ ✓
Node Degree Degree of the root node of a news cascade [Castillo et al. 2011]✓
Maximum (or average) degree of non-root nodes in a news cascade [Castillo et al. 2011]✓
Spread Speed Time taken for a cascade to reach a certain depth (or size) [Vosoughi et al. 2018]✓ ✓
Time interval between the root node and its child nodes [Wu et al. 2015]✓
Cascade Similarity Similarity scores between a cascade and other cascades in the corpus [Wu et al. 2015]✓
H: Hop-based news cascades; T: Time-based news cascades; P: Pattern-driven features.
Note that a specic news article can lead to multiple simultaneous cascades due to multiple initiating users. Further-
more, often within a news cascade, nodes (users) are represented with a series of attributes and additional information,
e.g., whether they (support or oppose) the fake news, their prole information, previous posts, and their comments.
Based on Denition 8, classifying a news article (true or fake) using its cascade(s) boils down to classifying its
cascade(s) as true or fake. To perform this classication, some proposed methods rely on (I) traditional machine learning,
while others utilize (II) [deep] neural networks.
▶
Traditional Machine Learning (ML) Models. Within a traditional ML framework, to classify a news cascade that has
been represented as a set of features, one often relies on supervised learning methods such as SVMs [Castillo et al
.
2011;
Kwon et al
.
2013;Wu et al
.
2015], decision trees [Castillo et al
.
2011;Kwon et al
.
2013], decision rules [Castillo et al
.
2011], naïve Bayes [Castillo et al
.
2011], and random forests [Kwon et al
.
2013]. Table 7summarizes the news cascade
features used in current research.
As shown in Table 7, cascade features can be inspired by fake news propagation patterns observed in empirical studies.
For example, a recent study has investigated the dierences in diusion patterns of all veried true and fake news
stories on Twitter from 2006 to 2017 [Vosoughi et al
.
2018]. The study reveals that fake news spreads faster, farther, more
widely, and is more popular with a higher structure virality score compared to true news. Specically, the cascade depth,
maximum breadth (and mean breadth at every depth), size, and structural virality [Goel et al
.
2015] (i.e., the average
distance among all pairs of nodes in a cascade) for fake news are generally greater than that of true news. Also, the
time it takes for fake news cascades to reach any depth (and size) is less than that for true news cascades (see Fig. 10).
The cascade features in Table 7can be viewed from another perspective, where they can either capture the local
structure of a cascade, e.g., its depth and width [Castillo et al
.
2011;Vosoughi et al
.
2018], or allow comparing the overall
structure of a cascade with that of other cascades by computing similarities [Wu et al
.
2015]. A common strategy to
compute such graph similarities is to use graph kernels [Vishwanathan et al
.
2010]. For example, Wu et al. develop a
random walk graph kernel to compute the similarity between two hop-based news cascades that contain additional
information, e.g., the user roles (opinion leader or normal user ) as well as approval, sentiment, and doubt scores for user
posts (see the structure of such a cascade in Fig. 11(a)).
▶
Deep Learning (DL) Models. Within a DL framework, learning the representation of news cascades often relies on
neural networks, where a softmax function often acts as a classier. For example, Ma et al. develop Recursive Neural
Networks (RvNNs), a tree-structured neural network, based on news cascades [Bian et al
.
2020;Ma et al
.
2018]. A
top-down RvNN model with Gated Recurrent Units (GRUs) is shown in Fig. 11(b). Specically, for each node
j
with a
Manuscript submitted to ACM
22 Xinyi Zhou and Reza Zafarani
Fig. 10. Fake News Cascade-based Propagation Paerns (Twier data, directly from [Vosoughi et al. 2018]). (A-D): CCDF (Comple-
mentary Cumulative Distribution Function) distributions of cascade depth, size, max-breadth and structural virality of fake news are
always above that of true news; (
E
-
F
): The average time taken for fake news cascades to reach a certain depth and a certain number
of unique users are both less than that for true news cascades; and (
G
-
H
): For fake news cascades, their average number of unique
users and breadth at a certain depth are always greater than that of true news cascades.
P
N N
NP
(A,S,D)(A,S,D)
(A,S,D) (A,S,D)
N
(A,S,D)
N
(A,S,D)
N
(A,S,D)
N
(A,S,D)
P: O pini on leader N : Norm al user
(A, S, D ): (Approval, Senti ment, D oubt)
(a) A Traditional Machine Learning Model [Wu et al. 2015]
x1
x0
x2
x3x4
Pooling
Softmax
(b) A Deep Learning Model [Ma et al. 2018]
Fig. 11. Examples of Cascade-based Fake News Detection Models
post on a certain news report represented as a TF-IDF vector
xj
, its hidden state
hj
is recursively determined by
xj
and
the hidden state of its parent node P(j), denoted as hP(j). Formally, hjis derived using a standard GRU formulation:
rj=σ(WrxjV+UrhP(j)),
zj=σ(WzxjV+UzhP(j)),
hj=zj⊙tanh(WhxjV+Uh(hP(j)⊙rj)) +(1−zj) ⊙ hP(j),
(5)
where
rj
is a reset gate vector,
zj
is an update gate vector,
W∗
,
U∗
, and
V
denote parameter matrices,
σ(·)
is the sigmoid
function,
tanh(·)
is the hyperbolic tangent, and
⊙
denotes entry-wise product. In this way, a representation (hidden
state) is learned for each leaf node in the cascade. These representations are used as inputs to a max-pooling layer,
which computes the nal representation
h
for the news cascade. Max-pooling layer takes the maximum value of each
dimension of the hidden state vectors over all the leaf nodes. Finally, the label of news cascade is predicted as
˜
y=Somax(Qh +b),(6)
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 23
N S
N
S
N
S: Susceptible user
N: Normal user : Influential user
Following relation
(a) Spreader Net. [Zhou and Zafarani 2019]
+-
-
-
+
-
-
-
+
+
Topic-
View
Topic-
View
Topic-
View
Topic-
View
Topic-
View -
-
Supporting viewpoint
Opposing viewpoint
+
-
(b) Stance Net. [Jin et al. 2016]
Fig. 12. Homogeneous Networks
Publishi ng Spreading Social relations
Publishers News Users
Left
Righ t
Neutr al
(a) [Shu et al. 2019c]
GDU
xs#
y
Subjects
NewsAuthors
GDU
xn
y
GDU
xa
y
Belonging to
Creating
GDU
xa
y
GDU
xa
y
GDU
xn
y
GDU
xn
y
GDU
xn
y
GDU
xs#
y
GDU
xs#
y
(b) [Zhang et al. 2018]
Fig. 13. Heterogeneous Networks
S: Smilarity
News
...
...
NewsPostsReposts
Com ments
Belonging to
(a) [Shu et al. 2019b]
#
#
...
...
...
...
s
s
s
s
s
ss
s
s
s
s
s
s
Belonging to S: Smilarity
Posts Sub-events News
(b) [Jin et al. 2014]
Fig. 14. Hierarchical Networks
where
Q
and
b
are parameters. The model (parameters) can be trained (estimated) by minimizing some cost function,
e.g., squared error [Ma et al
.
2018] and cross-entropy [Zhang et al
.
2018], using some optimization algorithm, e.g.,
Stochastic Gradient Descent (SGD) [Wang et al
.
2018], Adam [Kingma and Ba 2014], and Alternating Direction Method
of Multipliers (ADMM) [Boyd et al. 2011;Shu et al. 2019c].
4.2 Fake News Detection using Self-defined Propagation Graphs
When detecting fake news using self-dened propagation graphs (networks), one constructs exible networks to
indirectly capture fake news propagation. These networks can be (1) homogeneous, (2) heterogeneous, or (3) hierarchical.
▶
Homogeneous Network. Homogeneous networks are networks containing a single type of node and a single type
of edge. One example is the news spreader network (see Fig. 12(a)), a subgraph of social network of users, where each
network is for a news article; each node in the network is a user spreading the news; and an edge between two nodes
indicates the following relationship of two news spreaders [Zhou and Zafarani 2019]. Classifying a news article (fake or
true) using its spreader network is equivalent to classifying the network. Recently, Zhou et al. analyze such networks at
the level of node,ego,triad,community, and the overall network [Zhou and Zafarani 2019], respectively, and reveal four
patterns within fake news spreader networks: (1) More-Spreader Pattern, i.e., more users spread fake news than
true news; (2) Farther-Distance Pattern, i.e., fake news spreads farther than true news; (3) Stronger-Engagement
Manuscript submitted to ACM
24 Xinyi Zhou and Reza Zafarani
Fake True
0
10
20
30
40
Network Diameter
(a) Farther-Distance
Fake True
0
500
1000
1500
# News Spreaders
(b) More-Spreaders
Fake True
0
500
1000
# Engagements
(c) Stronger-Engagements
Fake True
0
0.01
0.02
0.03
0.04
Ego Density
(d) Denser-Networks
Fig. 15. Fake News Network-based Propagation Paerns [Zhou and Zafarani 2019] (PolitiFact+BuzzFeed, data are from FakeNews-
Net [Shu et al
.
2018]). Compared to the truth, fake news can (a) spread farther and (b) aract more spreaders, where these spreaders
are oen (c) more strongly engaged with fake news and (d) more densely connected within fake news spreader networks.
Pattern, i.e., spreaders engage more strongly with fake news than with true news; and (4) Denser-Network Pattern,
i.e., fake news spreaders form denser networks compared to true news spreaders (see Fig. 15). Another example of a
homogeneous network is a stance network [Jin et al
.
2016], where nodes are news-related posts by users and edges
represent supporting (+) or opposing (-) relations among each pair of posts, e.g., the similarity between each pair of
posts that can be calculated using a distance measure such as Jensen-Shannon divergence [Jin et al
.
2016] or Jaccard
distance [Jin et al
.
2014]. The stance network is shown in Fig. 12(b). Fake news detection using a stance network boils
down to evaluating the credibility of news-related posts (i.e., lower credibility = fake news), which can be further cast
as a graph optimization problem. Specically, let
A∈Rn×n
denote the adjacency matrix of the aforementioned stance
network with
n
posts and
c=(c1, . . . , cn)
denote the vector of post credibility scores. Assuming that supporting posts
have similar credibility scores, the cost function in [Zhou et al. 2004a] can be used and the problem can be dened as
arg min
cµ||c−c0||2+(1−µ)
n
Õ
i,j=1
Aij (ci
√Dii −cj
pDj j )2,(7)
where c0refers to true credibility scores of training posts, Dij =ÍkAi k , and µ∈ [0,1]is a weight parameter.
▶
Heterogeneous Network. Heterogeneous networks have multiple types of nodes or edges. An early instance is
a (users)-(tweets)-(news events) network using which Gupta et al. [Gupta et al
.
2012] evaluate news credibility by
designing an algorithm similar to PageRank [Page et al
.
1999]. Their algorithm determines credibility relationships
among these three entities with the assumption that users with low credibility tend to post tweets that are often related
to fake news. Another example is the network capturing relationships among news publishers, news articles, and news
spreaders (users) shown in Fig. 13(a). Using this network, Shu et al. [Shu et al
.
2019c] classify news articles (as fake
news or true news) using a linear classier, where
(I) News article are represented using latent features, derived using Nonnegative Matrix Factorization (NMF):
min
X−DV⊤
2
Fs.t. D,V≥0,(8)
where
X∈Rm×t
+
is the given article-word matrix for
m
news articles.
X
will be factorized as
D∈Rm×d
+
(i.e.,
article-latent feature matrix) and some weight matrix V∈Rt×d
+;
(II)
Assuming that the political bias for each publisher is known: left (
−
1), least-biased (0), or right (
+
1), let
b∈
{−
1
,
0
,
1
}l
denote the political biases for
l
publishers. Assuming that the political biases of publishers can be
Manuscript submitted to ACM
A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities 25
represented by the latent features of their published articles, the publisher-article relationship is derived by
min
¯
PDq −b
2
2,(9)
where
¯
P∈Rl×m
is the normalized publisher-article relation (adjacency) matrix, and
q∈Rd
is the weights; and
(III)
Assuming that non-credible (credible) users spread fake (true) news, a similar optimization formulation can be
utilized to derive the relationship between spreaders (users) and new articles.
Finally, Zhang et al. develop a framework for a heterogeneous network of users (news authors), news articles, and
news subjects to detect fake news (see Fig. 13(b)). They introduce Gated Diusive Units (GDUs), a neural network
component that can jointly learn the representations of users, news articles, and news subjects [Zhang et al. 2018].
▶
Hierarchical Network. In hierarchical networks, various types of nodes and edges form set-subset relationships (i.e.,
ahierarchy). One example is the news-tweet-retweet-reply network (see Fig. 14(a)), an extension of the news cascade
dened in Denition 8[Shu et al
.
2019b]. Hence, the same features listed for news cascades in Table 7(e.g., cascade
depth and size) can be used to predict fake news using this hierarchical network within a traditional ML framework.
Another example of a hierarchical network is shown in Fig. 14(b), which includes relationships across (i.e., hierarchical
relationships) and within (i.e., homogeneous relationships) news events, sub-events, and posts. In such networks, news
verication can be transformed into a graph optimization problem [Jin et al
.
2014], extending the optimization in Eq. (7).
4.3 Discussion
We have discussed the current solutions for detecting fake news by investigating how news spreads online. By involving
dissemination information (i.e., social context) in fake news detection, propagation-based methods are more robust
against writing style manipulation by malicious entities. However, propagation-based fake news detection is inecient
for fake news early detection (see Section 6for more details) as it is dicult for propagation-based models to detect fake
news before it has been disseminated, or to perform well when limited news dissemination information is available.
Furthermore, mining news propagation and news writing style allow one to assess news intention. As discussed, the
intuition is that (1) news created with a malicious intent, that is, to mislead and deceive the public, aims to be “more
persuasive” compared to those not having such aims, and (2) malicious users often play a part in the propagation of
fake news to enhance its social inuence [Leibenstein 1950]. However, to evaluate if news intentions are properly
assessed one relies on the ground truth (news labels) in training datasets often annotated by domain experts. This
ground truth dependency particularly exists when predicting fake news by (semi-) supervised learning within graph
optimization [Shu et al
.
2019c], traditional statistical learning [Zhou and Zafarani 2019], or deep neural networks [Zhang
et al
.
2018]. Most current fake news datasets have not provided a clear-cut declaration on whether the annotations
within the datasets consider news intention, or how the annotators have manually evaluated it, which motivates the
construction of fake news datasets or repositories that provide information on news intention.
Finally, research has shown that political fake news spreads faster, farther, more widely, and is more popular with a higher
structure virality score than fake news in other domains such as business, terrorism, science, and entertainment [Vosoughi
et al
.
2018]. Discovering more dissemination patterns of fake news are hence highly encouraged by comparing it with
true news, or fake news from dierent domains and languages. Such patterns can deepen the public understanding of
fake news and enhance explainability of fake news detection (we discuss explainable fake news detection in Section 6).
Manuscript submitted to ACM