ArticlePDF Available

Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network

Authors:
  • Adobe Research

Abstract and Figures

This paper tackles the problem of detecting incongruities between headlines and body text, where a news headline is irrelevant or even in opposition to the information in its body. Our model, called the graph-based hierarchical dual encoder (GHDE), utilizes a graph neural network to efficiently learn the content similarity between news headlines and long body paragraphs. This paper also releases a million-item-scale dataset of incongruity labels that can be used for training. The experimental results show that the proposed graph-based neural network model outperforms previous state-of-the-art models by a substantial margin (5.3%) on the area under the receiver operating characteristic (AUROC) curve. Real-world experiments on recent news articles confirm that the trained model successfully detects headline incongruities. We discuss the implications of these findings for combating infodemics and news fatigue.
Content may be subject to copyright.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
Learning to Detect Incongruence
in News Headline and Body Text
via a Graph Neural Network
SEUNGHYUN YOON (MEMBER, IEEE)1, KUNWOO PARK2, MINWOO LEE3, TAEGYUN KIM4,5,
MEEYOUNG CHA (MEMBER, IEEE)5,4, AND KYOMIN JUNG (MEMBER, IEEE)3
1Adobe Research, San Jose, CA, USA
2School of AI Convergence, Soongsil University, Seoul, South Korea
3Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
4School of Computing, KAIST, Daejeon, South Korea
5Data Science Group, Institute for Basic Science, Daejeon, South Korea
Corresponding author: Kyomin Jung (e-mail: kjung@snu.ac.kr).
The first two authors contributed equally to this work. T. Kim and M. Cha are supported by the Institute for Basic Science in South Korea
(IBS-R029-C2) and the Basic Science Research Program through the National Research Foundation of Korea
(NRF-2017R1E1A1A01076400). M. Lee and K. Jung are supported by Samsung Electronics Co., Ltd (IO201208-07852-01). K. Jung
works at Department of Electrical and Computer Engineering, ASRI, Seoul National University, Seoul, Korea.
ABSTRACT This paper tackles the problem of detecting incongruities between headlines and body text,
where a news headline is irrelevant or even in opposition to the information in its body. Our model,
called the graph-based hierarchical dual encoder (GHDE), utilizes a graph neural network to efficiently
learn the content similarity between news headlines and long body paragraphs. This paper also releases
a million-item-scale dataset of incongruity labels that can be used for training. The experimental results
show that the proposed graph-based neural network model outperforms previous state-of-the-art models
by a substantial margin (5.3%) on the area under the receiver operating characteristic (AUROC) curve.
Real-world experiments on recent news articles confirm that the trained model successfully detects headline
incongruities. We discuss the implications of these findings for combating infodemics and news fatigue.
INDEX TERMS Graph neural network, headline incongruity, online misinformation
I. INTRODUCTION
THE volume of news content generated every day is surg-
ing [1]. In contrast to newspapers, which publish limited
content each day, publishing articles online incurs little cost.
Furthermore, some of these news articles (e.g., weather and
financial reports) are written by automated algorithms [2],
which further reduce the cost of news generation. To draw
traffic to news articles among the plethora of competitors,
some news media attempt to capture readers’ attention by us-
ing news headlines unrelated to the main content. Such mis-
matches can be extremely harmful in an online environment,
where readers usually skim headlines without consuming the
content of the news articles [3]. Thus, misleading headlines
potentially contribute to incorrect perceptions of events and
inhibit their dissemination [4].
This study aims to tackle the headline incongruity prob-
lem [5], which involves determining whether the news head-
lines are unrelated to or distinct from the main parts of the
full body text. Figure 1 illustrates an example in which based
FIGURE 1. An example of an incongruent headline problem and its graph
representation between the headline and body paragraphs. The red edge in
the graph describes the incongruence between paragraph 3 and other texts.
solely on the headline, a reader might expect to learn specific
information related to the novel coronavirus; however, the
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
body text contains an advertisement for a dietary supplement.
The challenge is that many individual users will not notice the
incongruity by simply reading the news headlines because
the body text is revealed only after a click. Content incon-
gruity is a growing problem that negatively impacts the news
reading experience.
Researchers have proposed several practical approaches
using deep learning to address the detection problem as a
binary classification (i.e., incongruent or not) by determin-
ing the ground truth based on manual annotation. A recent
method learns the characteristics of news headlines and body
text jointly via a neural network [6]. However, there are two
critical challenges in these approaches. First, the existing
models focus on learning the relationship between a short
headline and lengthy body text that can reach thousands of
words, posing challenges to efficient neural network-based
learning due to the excessive lengths of news articles. Sec-
ond, the lack of a large-scale dataset makes it difficult to train
deep learning models, which have numerous parameters, to
detect headline incongruities.
This paper presents a new method to tackle the headline in-
congruity problem: a graph-based hierarchical dual encoder
(GHDE) that captures the textual relationship between a news
headline and its body text of arbitrary size. It leverages the
hierarchical nature of news articles by embedding the text
content of the headline and body paragraphs as nodes. This
approach is used to construct a graph in which headline nodes
lie on one side and body paragraph nodes lie on the other.
Then, we connect undirected edges between these nodes.
The GHDE learns to compute edge weights between the
headline and paragraph nodes and assigns a higher edge
weight to the more relevant edges. Then, GHDE updates
each node representation by aggregating information from its
neighboring nodes. The iterative update process propagates
the relevant information in paragraph nodes to the headline
node, which is essential in determining content incongruities.
This work also presents a dataset generation method and
makes a million-item-scale dataset available for future re-
search. This dataset is currently the largest English dataset
compiled for the headline incongruity problem. From the
corpus of 7,127,692 English news articles published by 57
media outlets, our method iteratively matches two new stories
to a similar topic and then combines their body paragraphs
to create a synthetic news article with varying levels of
incongruity.
The extensive experiments show that GHDE outperforms
existing incongruity-detection models by a substantial mar-
gin (5.3%) on the AUROC metric (an improvement from
0.879 to 0.926). A study on real-world articles was con-
ducted where crowdsourced workers were asked to annotate
incongruous labels from recent news posts; then, GHDE
was used to evaluate the incongruity. The results of this
experiment demonstrate that the proposed method can be
applied to incongruity detection in news articles in the wild.
In fact, GHDE can successfully detect incongruence between
headlines and body text even for unseen topics, such as health
supplements for COVID-19, as demonstrated in Figure 1.
The remainder of this paper is organized as follows. Sec-
tion II provides a brief review of the literature on head-
line incongruity detection and using graph neural networks
with text. We propose an efficient automatic data genera-
tion method in Section III and introduce our newly cre-
ated million-item-scale dataset for research in this field.
In Section IV, we start by describing the baseline models
considered in this paper, including the previous state-of-the-
art neural network-based model and a recent BERT-based
model. Next, we introduce the proposed model in detail.
The experimental setup for model evaluation, a discussion
of the result achieved by the various approaches, and em-
pirical studies in the wild are presented in Section V. We
conclude by discussing the implications of this study in
the context of fighting against infodemics and news fatigue
online. Finally, Section VI concludes the paper through a
discussion on the limitations of this study and possible
directions for future research on the news incongruence
detection problem. The code and the data are available at
https://github.com/minwhoo/detecting-incongruity-gnn.
II. RELATED WORKS
A. MACHINE LEARNING FOR HEADLINE INCONGRUITY
Incongruity between news heading and body content is a
common type of misinformation on the Internet [7]. In digital
environments, people are less likely to read full news stories;
they tend to only peruse the news headlines. Such news
reading habits aggravate the harm caused by misleading
(incongruent) headlines [4]. Several machine learning tech-
niques have been proposed to tackle this challenge. The main
challenge is that this field of study still lacks large-scale
realistic datasets; consequently, many of the existing studies
relied on relatively small datasets of manually annotated data.
In terms of data complexity, the best-known model utilizes an
attention-based hierarchical dual encoder to process the long
body paragraphs common in news articlesefficiently [8].
Headline incongruity detection is also related to the stance
detection problem, which aims at identifying the stance of
specified claims against a reference text. The similarity of
stance detection to our task is that both require a model to
investigate relationships between a short claim and a long
article. The Fake News Challenge 2019 was held to promote
the development of methods for stance detection, and many
of the teams utilized deep learning models (e.g., [9]). The
winning model was an XGBoost [10] model based on hand-
designed features. An unsupervised learning technique was
introduced to detect the stance of users in social media [11].
Most recently, a study proposed a method that detects head-
line incongruity via a semantic matching framework between
the original and synthetically generated headlines [12].
B. GRAPH NEURAL NETWORKS FOR TEXT
A graph neural network (GNN) utilizes graph-like struc-
tural information to either explicitly or implicitly represent
data [13]. Several methods can embed network information.
2VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
H
P1
P2
P3
H
P1
P2
P3
H
P1
P2
P3
H
P1
P2
P3
H
P1
H
P2
P2
P3
(1) Select two articles (2) Generate articles with an incongruity type
P1
P3
P2
P3
P1
P3
Type (A) Type (B) Type (C) Type (D)
Target Sampled
FIGURE 2. An illustration of the generation process of news articles with incongruent headlines (H: Headline, P: Paragraph).
For example, Veli ˇ
ckovi´
cet al. employed an attention mecha-
nism to aggregate node vectors by learning the importance of
each edge [14]. Palm et al. adopted a recurrent node updating
approach to capture changes in information across time [15].
A GNN embeds relational information in textual data.
Thanks to its unique architecture, such information is prop-
agated into neighboring nodes during the training process.
Hence, GNN models can perform reasoning regarding their
nodes and edges, which is challenging for more standard
architectures such as recurrent neural network (RNN) and
convolutional neural network (CNN) models. For example,
GNNs have excelled at question-answering [16, 17, 18],
relation extraction [19], and knowledge base completion [20]
tasks.
C. FAKE NEWS DETECTION AND GNNS
The problem of fake news detection has been actively studied
over the past several years. To estimate the truthfulness of
claims and further enable the detection of fake news, re-
searchers have relied on resources available on fact-checking
websites such as politifact.com [21]. The Liar dataset is
one representative example of such a resource; it comprises
12.8 K political statements with veracity labels on a 6-point
scale [22]. That study also showed that a CNN can achieve
a reasonable performance using only text. A recent study
suggested an approach that learns and constructs discourse-
level structures from articles to detect false claims [23]. The
active prevention of fake news was addressed by detecting
early-stage diffusion [24], a blockchain proof-of-authority
protocol [25], and by developing misinformation reputing
measures [26].
Most recently, a handful of studies have proposed using
GNNs to detect fake news. The researchers implemented a
model named FakeDetector that learns the representations of
news articles, creators and subjects simultaneously through a
gated graph neural network [27]. A follow-up study proposed
a hierarchical attention mechanism that learns the importance
of each node and a schema for fake news detection [28]. Yet
another study proposed aggregating information by consid-
ering content characteristics, sharing behaviors, and social
connections through a graph neural network [29].
Based on recent GNN developments, this study presents
Train Dev Test
Number of data 1,347,097 9,493 9,435
Avg. word counts (H) 11.73 12.83 12.98
Min. word counts (H) 3 3 3
Max. word counts (H) 56 35 35
Avg. word counts (B) 765.25 793.79 715.52
Min. word counts (B) 20 63 29
Max. word counts (B) 27,597 7,173 11,597
TABLE 1. Data characteristics. H and B indicate headline and body text,
respectively.
a GNN-based model to address the headline incongruity
problem. We will introduce how to define the nodes and how
to learn edge weights for this task.
III. DATA GENERATION
There are two main challenges in detecting incongruities
between headlines and body text: (i) the lack of large training
datasets and (ii) the length of news stories. This section
focuses on the first challenge by presenting a rule-based
approach to generate news articles with incongruity. We will
address the second challenge in the coming section as well.
While previous studies manually annotated the ground
truth [6, 30], it is almost impossible to apply a manual method
to datasets consisting of millions of news articles. Therefore,
we propose an alternative approach that instead generates
news articles with incongruous headlines automatically. This
process starts with an extensive collection of real news
stories. For each news article in a randomly chosen set of
“target” news stories that we wish to manipulate, we replace
the body text of each target article with paragraphs from
a different news article, again chosen from the remaining
news corpus (which we call a “sampled” article). Here, the
assumption is that the seed target article’s headline and body
text are consistent with regard to the news content.
The seed corpus of real news stories comes from Real-
News [31], which consists of 32,797,763 English news arti-
cles published over multiple years. Following the guidelines
from the Media Bias/Fact Check [32], we consider only
7,127,692 of these news articles written by listed trustworthy
media outlets as the seed corpus, because untrustworthy news
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
sources may already share incongruent headings. Based on
1,000 of the news articles sampled from the corpus, we
manually confirmed that trustworthy media are unlikely to
publish incongruent headlines.
Figure 2 illustrates the process through which the dataset
of incongruent labels (i.e., “positive” labels) is built. The
figure shows one selected ‘target’ and one ‘sampled’ news
story. We generate two types of datasets: one where the
‘sample’ news stories are randomly chosen (i.e., Random
dataset) and one in which ‘sample’ news stories are chosen to
contain the similar news stories (i.e., Similar dataset). Then,
paragraphs from the ‘sample’ news stories are mixed into the
‘target’ news story. The number of swapped paragraphs is
randomly determined and ranges from 1 to the number of
sampled paragraphs, which causes the incongruity difficulty
to vary.
After this process is completed, an equal number of arti-
cles are sampled from the remaining news pool to include
congruent data (i.e., those with a “negative” label). The final
dataset consists of 1,366,025 news articles with a balanced
distribution between incongruity labels, mixing types (i.e.,
Types in the figure) and the number of swapped paragraphs.
Headline similarity is measured by the Euclidean distance
of the fastText embeddings pre-trained on the WikiNews
corpus [33]. To avoid selecting sample news stories that are
not incongruent with the target article (e.g., stories reporting
the same event), we filter out news stories published in
a similar period. We apply a maximum threshold for the
similarity measure to control the incongruity difficulty of
the generated dataset. We use an efficient implementation of
the similarity search [34], which consumes approximately 3
hours on a server equipped with a 32-core Intel Xeon CPU to
find similar articles for more than 2 million target articles.
The data generation methods from the existing work in-
sert sampled paragraphs into a target article [8], leading
to longer news stories, depicted as Type A and Type B in
Figure 2. However, such a change to the article length can
be mistakenly learned by the detector as a trivial feature
for the detection task. Therefore, it is crucial to maintain
a length distribution of news stories similar to that of the
original distribution. The existing generation methods also
do not consider textual similarities between the target and
sample articles, resulting in trivial topic differences. Because
ordinary news articles cover a single topic, this inconsistency
could induce the machine learning models to focus on body
text patterns rather than on understanding the relationship
between headline and body text. Compared to the previous
approaches (Types A and B), our approach (Types C and D)
can generate news articles with headline incongruities while
preventing the detection models from learning the artifacts
produced by data generation.
The dataset includes labels specifying whether each para-
graph originates from the sampled article; we exploit thee
paragraph labels to dynamically represent news articles as a
graph structure. More details of the final dataset are described
in Table 1. The dataset constructed using random sampling
Train Dev Test
Number of data 1,360,095 9,478 9,395
Avg. word counts (H) 11.04 12.59 12.70
Min. word counts (H) 3 3 3
Max. word counts (H) 57 35 35
Avg. word counts (B) 760.61 794.92 709.94
Min. word counts (B) 20 49 22
Max. word counts (B) 27,362 5,918 11,964
TABLE 2. Data characteristics of the random dataset. H and B indicate
headline and body text, respectively.
exhibits a similar data distribution in terms of word counts,
as shown in Table 2. When splitting the dataset into training,
development, and test sets, we ensured that they do not have
an overlapping period with one another to prevent the models
from unintentionally focusing on topical patterns.
IV. METHODS
Our objective is to detect whether a news headline is incon-
gruent to any subset of body text. Formally, we consider the
detection task as one in which each news article is provided
as a tuple (H, P ), where His the headline, and Pis a set of
paragraphs comprising the body text. Each paragraph piP
is a sequence of words that may consist of one or more
sentences. Our goal is to determine a binary incongruity label
y. Paragraph-level incongruity labels YP={y1, ..., y|P|}are
available as additional supervision during training.
We first review the learning approaches that have been
proposed to detect headline incongruity. We then present
a new graph-based neural network model that embeds the
relationship information between a headline and its corre-
sponding body text.
A. BASELINE APPROACHES
We discuss four prominent baseline approaches.
1) XGBoost
XGBoost, which implements gradient boosted decision trees,
is a well-recognized and fast algorithm for classification
tasks [10]. We adopted XGBoost as a representative baseline
because it was used in the winning model for the stance de-
tection challenge in news headlines [35]. Here, given a news
headline and the text body content, the task was to assign
the news headline’s stance label to one of the following:
agree, disagree, discuss, or unrelated. Using an incongruity
label instead, we implemented the winning model from this
challenge by extracting a feature set consisting of TF-IDF
vectors based on word occurrences. Singular values decom-
posed from these vectors indicate word-vector similarities
between a headline and its corresponding body text. We call
this model XGB.
4VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
1. Hierarchical
Node encoding 2. Edge learning
3. Feature propagation
4. Incongruity prediction
word-level
News
Headline
Paragraph 1
Paragraph 2
Paragraph
para.-level K
FC
Global-local
fusion
FC
FC
Bilinear
Bilinear
Bilinear
Paragraph
Incongruity
Predictions
Bilinear
Bilinear
Bilinear
Headline RNN
Paragraph RNN
Miracle
These
Every
Click
Food
superfoods
...
...
...
...
FIGURE 3. An overview of the GHDE (graph-based hierarchical dual encoder) model. The first hierarchical node-encoding step computes the initial hidden
representations for the news headline and each paragraph of an article. In the second, edge-learning step, the model computes an edge weight between each
paragraph and the headline. The computed edge weights are used to update hidden representations using the GNN during the feature propagation step. The final
step computes paragraph-level incongruity scores from the updated hidden representations.
2) Attentive hierarchical dual encoder
Among the available approaches for the headline incongruity
problem is the attentive hierarchical dual encoder (AHDE),
which has a two-level hierarchy of recurrent neural net-
works [8]. This model utilizes paragraph structure to ad-
dress the arbitrarily long sizes of news article. It returns
uHand {u1,· · · ,u|P|}, which correspond to a headline
and the paragraphs of the associated body text. An attention
mechanism is applied to the headline’s hidden states and the
paragraphs to learn the importance of each paragraph and
detect incongruity in its relationship with the headline. This
model is the current state-of-the-art. The vector hB, which
is the context vector for the entire body text, is calculated as
follows:
si=v|tanh(WB
uui+WH
uuH),
ai=exp(si)/Pjexp(sj),
hB=Piaiui,
,(1)
where iis the paragraph index. The output probability of the
headline and body being incongruent is computed by
ˆy=σ(hH|WhB+b),,(2)
where Wand bare trainable weights, and hHis uH.
3) BERT-based dual encoder
BERT is a transformer network that was pretrained for a
masked language model and with a next-sentence prediction
objective [36]. The pretrained network provides a fixed-
dimensional representation for each input token by jointly
conditioning the left and right contexts from the previous
layers. We input a headline and its corresponding body text
and retrieve hHand hBby mean-pooling the hidden vectors
of the last layer, respectively.1The output probability is
calculated by Equation (2). Using the BERT-based model
1We take the average of hidden vectors that correspond to valid tokens
(other than special tokens such as [C LS], [SE P], and [ PAD]). We utilize
the mean-pool operation instead of using the hidden vector corresponding to
the first special token [C LS] based on the results of comparison experiments
in [37].
as a backbone, we train the BDE model while freezing the
weights of the pretrained BERT network due to the lack of
computational resources. We call this model BDE. As an-
other baseline, we also measure the next-sentence prediction
performance of BERT.
B. PROPOSED APPROACH
The existing approaches compute a similarity score between
the headline and body text, and many of these methods suffer
from performance degradation due to the increased content
complexity that occurs when an article is too long. AHDE,
the state-of-the-art model, utilizes a hierarchical structure
to cope with long news stories and abstract content at the
paragraph level. We also exploit this hierarchical structure
by considering the headline and paragraphs as analysis units.
We further utilize graph-based learning to better detect in-
congruities by learning the importance of each paragraph in
an end-to-end manner. The proposed model is a graph-based
hierarchical dual encoder (GHDE) that computes the head-
line incongruity probability of a news article in four steps, as
illustrated in Figure 3. It first computes a node representation
of each headline and paragraph using a hierarchical RNN
structure. The headline node and each paragraph node are
paired to compute a matching score, which is considered as
an edge weight for those nodes. After the graph is completed
using the previous steps, the graph neural network propagates
information between nodes to examine the article’s incon-
gruity. The final step fuses the updated information from each
node and outputs the incongruity predictions. We describe
this model in more detail in the following section.
1) The hierarchical node encoding step
The GHDE constructs an undirected graph G= (V, E )for
each news article that represents its innate structure, which
is then used to train a graph neural network. Vis the set of
nodes comprising the headlines and each paragraph of the
news content. An edge in Eis formed between the headline
and each paragraph, resulting in a total of |E|=|P|edges.
A hierarchical dual encoder layer learns the initial node
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
representation using a two-level hierarchy. To encode a head-
line into a fixed-size vector, a gated recurrent unit (GRU)-
based RNN takes word sequences as input. The final hidden
state of the RNN’s hhead corresponds to the headline’s rep-
resentation. For the body text, a GRU-based RNN learns the
word sequence of each paragraph and takes the last hidden
state of the RNN as the representation of each paragraph:
{h1,· · · ,h|P|}. The GRU-based bidirectional RNN then
learns the paragraph representation from the first level of
the RNN and the context-aware paragraph representation
{˜
h1,· · · ,˜
h|P|}.
2) The edge learning step
The next step is to learn the edge weights of the input graph G
to prevent detrimental smoothing of the node representation
between the congruent and incongruent paragraphs during
GNN propagation. A bilinear operation with sigmoid non-
linearity σcomputes an edge weight eibetween the news
headline and the i-th paragraph:
ei=σ(hhead|WE˜
hi+bE),,(3)
where WEand bEare trainable weights. The use of the
sigmoid function bounds the edge weight to a value between
zero and one; these weights plays a mask role when the
features are aggregated in the GNN. We add supervision to
the edge weights by using the paragraph congruity value of
1yias a label during the cross-entropy loss, where yi
indicates whether a paragraph originates from another article.
This edge-level supervision enables the GHDE to assign
high weights to congruent paragraphs and low weights to
incongruent paragraphs; thus, it helps congruent paragraphs
propagate more information to the headline node than can the
incongruent paragraphs alone from the propagation step.
The following loss helps in learning weights such that that
the edges of congruent paragraphs are retained, while the
edges of incongruent paragraphs are masked.
Ledge =X
i
(1 yi) log(ei) + yilog(1 ei).(4)
3) The feature propagation step
The third step is to propagate the node features into the neigh-
boring nodes through the pre-defined graph structure and the
trainable edge weights from the GNN framework. GHDE
employs an edge-weighted variant of the graph convolutional
network (GCN) aggregation function from [13]:
z(k)
i=X
j N (i) {i}
eij
p˜
di˜
dj
h(k)
j,(5)
where z(k)
iis the information propagated to the i-th node
from the corresponding set of neighbor nodes N(i),eij is
the edge weight, and ˜
diis the degree of the i-th node in
the augmented graph with self-loops. The edge weights for
the self-loops eii are set to 1. After feature aggregation, a
non-linear transformation is applied to the resulted outputs
as follows:
h(k+1)
i=ReLU(W(k)
Gz(k)
i+b(k)
G),(6)
where W(k)
Gandb(k)
Gare trainable weights. The graph propa-
gation layer is iterated for ktimes with residual connections.
4) The incongruity prediction step
The final step predicts the incongruity scores of news articles;
this is equivalent to the graph classification task in GNN.
To fuse the global-level graph representation with the local-
level node representation, GHDE adapts a fusion block as
proposed in [38]. It concatenates the node embedding outputs
from every GNN layer and passes the embedding through a
fully-connected (FC) layer. It then concatenates each node
embedding with the max-pooled and sum-pooled representa-
tions of the node embeddings in G.
The output node embeddings of the fusion layer are
passed through two FC layers to compute the news head-
line representation vhead and paragraph representations
{v1,· · · ,v|P|}. At this point, GHDE can determine an in-
congruity label for each paragraph in a news article based on
a bilinear operation:
ˆyi=σ(vhead|WBvi+bB),(7)
where σis the sigmoid nonlinear activation function and WB
and bBare the learned model parameters. The paragraph-
level incongruity scores {ˆy1,· · · ,ˆy|P|}are merged to de-
termine the article-level incongruity score ˆyby taking the
maximum of the paragraph-level scores:
ˆy= max{ˆy1, ..., ˆy|P|}.(8)
GHDE is trained in an end-to-end manner to minimize the
following loss:
Larticle =CE(ˆy, y)
Ledge =P|P|
i=1 CE (ei,1yi)
L=Larticle +λLedge
(9)
where yis the incongruity label of the input news article, CE
is the cross-entropy loss, and Larticle and Ledge are the loss
for the article incongruity prediction and the edge weight,
respectively. λis a hyperparameter for adjusting the tradeoff.
V. PERFORMANCE EVALUATION
A. DETECTION ON THE GENERATED DATASET
We conducted classification experiments to compare the
newly proposed GHDE model with baseline methods in
terms of accuracy and the area under the receiver operating
characteristic (AUROC) curve. We report the average value
of all the results after running the experiments five times with
distinct seeds.
For AHDE, we employ two single-layer GRUs with 200
hidden units for the word-level RNN and another two single-
layer bidirectional GRUs with 100 hidden units for the
paragraph-level RNN. For regularization, we apply dropout
6VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
at ratios of 0.7 and 0.9 for the word-level RNN and
paragraph-level RNN, respectively. We used the Adam op-
timizer with norm gradient clipping at a threshold of 1 [39].
We used BERTbase for BDE, which includes 12 transformer
layers and 12 attention heads and outputs hidden vectors
with 768 dimensions. The model is trained using the AdamW
optimizer with the learning rate set to 0.001.
For GHDE, we utilize a single-layer GRU with 200 hid-
den units to encode a headline and each paragraph of the
corresponding body text, and use a single-layer bidirectional
GRU with 100 hidden units for the paragraph-level RNN. The
number of GNN layers, K, is set to 3 with 200 hidden units
in each layer. The hidden unit dimensions of the FC layers
applied after feature propagation on the graphs are 200, 200,
and 100, respectively. The model is trained using the Adam
optimizer with a batch size of 120, and gradient clipping is
applied with a threshold of 1.0. We decay the learning rate
every three epochs starting from an initial learning rate of
0.001 at a decay rate of 0.1. The tradeoff hyperparameter λ
for edge loss is set to 0.1.
For all the models that include an embedding layer, we
initialize the layer using the pre-trained GloVe embedding
matrix [40]. The vocabulary size of the embedding matrix
is determined by the number of words that occur at least
eight times in the training dataset. All the hyperparameters
are optimized on the development set based on more than
twenty trials. The dataset and the implementation details
for the empirical results will be available via a public web
repository.2For the experiments, we use a computer equipped
with an Intel(R) Core(TM) i7-6850K CPU (3.60 GHz) and
a GeForce GTX 1080 Ti GPU. The software environments
are Python 3.6 and PyTorch 1.2.0. The total number of
trainable parameters in GHDE is 1,214,702. A single GHDE
training run takes approximately 3 hours and 18 minutes. The
resulting accuracy and AUROC scores on the validation set
were 0.8561 and 0.9326 on the Similar dataset and 0.9560
and 0.9860 on the Random dataset.
Table 3 displays the model performances when applied
to detect headlines incongruencies on two datasets: Similar
(where the target and sampled articles have similar topics)
and Random (where the target and sampled articles are ran-
dom matches with no constraint on the topic of the sampled
article compared to the topic of the target article). Other
than the method for selecting the sampled news articles,
the generation processes for these two datasets are identical.
We measure the performance on each different test set;
consequently a high performance value does not imply the
superiority of a sampling method for detection.
From these results, we make two observations. First,
GHDE achieves the highest accuracies on the Similar and
Random datasets, 0.852 and 0.959, respectively. The next
best algorithm is AHDE, which reaches 0.799 and 0.922,
respectively. Both models embed the hierarchical structures
of news articles; however, our graph-based neural network
2The pointer to the repository will be placed here after the review process.
Model Similar Dataset Random Dataset
Accuracy AUROC Accuracy AUROC
GHDE 0.852 0.928 0.959 0.989
AHDE 0.799 0.879 0.922 0.971
XGB 0.700 0.776 0.687 0.756
BDE 0.654 0.712 0.720 0.799
BERT 0.510 0.487 0.512 0.561
TABLE 3. Experimental results of headline incongruity predictions on two
datasets: Similar and Random. The top scores for each comparison set are
highlighted in bold text.
Accuracy AUROC
Supervision
Article-level 0.838 0.916
Paragraph-level 0.832 0.923
Paragraph-level +Edge loss 0.846 0.927
Graph structure
+Inter-paragraph edges 0.832 0.921
+Fully-connected edges 0.827 0.917
Article-level supervision +Edge loss 0.852 0.928
TABLE 4. Ablation results of the GHDE model with varying levels of
supervision and different graph structures.
further exploits the news headlines and the unique structures
of body paragraphs. The coherence values between a head-
line and a body paragraph and across body paragraphs are
learned as edge weights of the graph-like structure. Second,
all four models exhibit better performances on the Random
dataset than on the Similar dataset. This suggests that identi-
fying incongruent articles generated by random sampling is
easier, yet it does not answer the question of which type of
data better represents incongruent news articles in the real
world.
B. ABLATION EXPERIMENTS
To test the individual components of the GHDE model, we
conducted an ablation study by examining the performances
of models after removing each model component. Table 4
shows the ablation results.
In terms of supervision, article-level supervision and edge
loss indicate Larticle and Ledge in Eq. 8, respectively.
Paragraph-level supervision indicates the cross-entropy loss
between the i-th paragraph incongruity label yiand the
paragraph-level incongruity prediction ˆyiaveraged over all
the paragraphs in an article. Training the model with article-
level supervision alone outperforms the state-of-the-art (i.e.,
AHDE) by an accuracy margin of 0.038 and an AUROC
margin of 0.032. Paragraph-level supervision further im-
proves the AUROC value, but reduces the accuracy. The
model that combines article-level supervision with edge loss
achieves the best performance, which suggests that dynamic
edge updating is a crucial aspect of for detecting headline
incongruity through a graph neural network.
We also investigated the benefit of the graph structure
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
itself by training a GHDE model with augmentation in which
inter-paragraph edges connect the edges between each pair of
consecutive paragraphs, and fully-connected edges connect
all the possible combinations of paragraph pairs. Here, we
utilize paragraph-level supervision for a fair comparison.
The results show that the additional connections between
paragraphs are redundant; adding the inter-paragraph edges
resulted in a negligible performance difference, while adding
the fully-connected edges decreases the performance accu-
racy of 0.005 and the AUROC of 0.006. We suspect that
the additional edges may cause detrimental smoothing effects
between the features corresponding to congruent nodes and
those corresponding to nodes of incongruent paragraphs dur-
ing the feature propagation step, making each node feature
less discriminative.
C. DETECTION ON REAL NEWS ARTICLES
To test whether the trained model can identify incongruous
headlines in the wild, we conducted experiments on Amazon
Mechanical Turk using real articles where the body text was
not manipulated by any generation method. Through iterative
rounds, we asked Turkers to label the following kinds of
incongruity based on the definitions of the literature [5, 7, 8]:
(1) when the headline only partially supports the claims of
the main article, or (2) when the headline does not represent
the body text.
Table 5 shows the instructions given to the annotators in
the crowdsourcing task. We asked the crowd workers to read
the provided news articles carefully and mark their decisions
as to whether each article has an incongruent headline. We
provided three types of incongruent headline examples and
one congruent headline example to help the workers decide.
We further asked the workers to classify the incongruent type
of each news article. The task included an optional question
so that the workers could detail the reasoning leading to their
decisions.
The evaluation experiment involves newly gathered news
stories that were not used during the training phase. We gath-
ered 63,271 English news articles from news media outlets
known for their biased political orientations and active use of
clickbait [30, 32]: FoxNews, BuzzFeed, and The Huffington
Post. In addition to 500 randomly sampled articles, assuming
that an article’s prior probability of being incongruity is
low, we included the top 40 articles in terms of prediction
scores from each of the five models. We assigned at least ten
Turkers to each article to annotate incongruity based on the
three criteria above. We aggregated the responses by majority
voting and assigned an incongruity classification when an
article received 7 or higher out of 10 votes.
Figure 6 presents the model performances evaluated by
the annotated labels. We account for bias in real-world
experiments and report the unweighted accuracy (UA) or
each class’s average accuracy. GHDE trained on the Similar
dataset achieves the best performance, achieving an accuracy
of 0.760 and an AUROC of 0.784. As was apparent from
Table 3, the Similar dataset is more challenging than the
FIGURE 4. Example of an incongruent headline (Type 1: Partial
representation)
FIGURE 5. Example of an incongruent headline (Type 2: Incorrect
representation)
Random dataset. However, the real-world evaluation results
suggest that the proposed data generation method represents
headline incongruity better, enabling a model to be trained
that can effectively capture headline incongruity problems in
the real world. The models trained on the Random dataset
result in poor performance in the wild: most of the models
achieved UA scores similar to or lower than 0.6. This result
8VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
[Overview ]
In this task, you are supposed to read random news articles.
Please read the article carefully and help us determine whether the headline is incongruent with the main body text.
We consider a headline incongruent when it makes claims that are unrelated to or distinct from the story.
[ Instructions ]
1. Read each news article carefully; the article consists of a headline along with its corresponding body text.
2. Give us your thoughts regarding whether each headline is incongruent with its body text.
What is an incongruent headline?
A headline that does not accurately represent the story in the body text”
Type 1: A headline makes claims that only partially represent the story in the body text
Type 2: A headline that is distinct from the main story in the body text
(Type 1: Partial representation)
The following article introduces multiple stories in the body text. However, the headline describes only a partial story;
it fails to cover the multiple stories embodied in the body text. We consider this type of ‘briefing’ article incongruent
because of the mismatch between headline and body text.
(Type 2: Incorrect representation)
In the following article, the headline promises to provide specific benefits of wearing masks to fight against COVID-19,
yet the body text does not explain the benefits of wearing masks; it includes only a direction to wear a mask and
introduces an advertisement.
TABLE 5. Instruction used for educating annotators in Amazon Mechanical Turk. We further provided the annotators with specific examples, as shown in Figures 4
and 5.
(a) UA
(b) AUROC
FIGURE 6. Human evaluation results measured on real-world articles.
is likely due to the training on the Random dataset, which
may induce the models to learn trivial features of topical
differences.
Compared to the performances measured on the synthetic
test set (see Table 3), the performance values on the real-
world evaluation are slightly lower. This reduction calls for
future studies to develop a more robust detection model and
a more realistic data generation method for the headline
incongruity problem.
VI. DISCUSSION
News headlines are known to play a crucial role in news se-
lection in online media [41]. According to the Pew Research
Center, most U.S. adults (62%) are unlikely to click and read
a full news story; instead, they prefer to consume news in
aggregated forms via news headlines blurbs [3]. Twitter also
announced that they plan to introduce a new design to urge
users to click a link before retweeting it because many users
do not read the main content [42]. Consequently, when the
short headline text does not accurately represent the main
content, it can mislead and adversely affect the entire news
reading experiences [43]. Therefore, detecting incongruence
between headlines and article bodies is both a timely and
important aspect for minimizing the negative consequences
of potential misinformation.
This paper demonstrated the use of a graph neural network
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
to solve the headline incongruity problem. We found that
the hierarchical nature of news posts (i.e., composed of a
single headline and multiple paragraphs that are semantically
closely related) lends itself well to a graph structure. There-
fore, content incongruity can be learned based on low edge
weights between hijacked paragraphs and the headline and
low edge weights with other paragraphs. The real-world case
study confirms that the model is topic-independent (i.e., it
can be applied to previously unseen topics such as breaking
news).
The solid performances achieved in the crowdsourced
experiments suggest that the data generation method con-
tributes to training models that can detect misleading news
headlines. Nonetheless, through manual annotations, we ob-
served a few false-positive cases in which a model misiden-
tified a coherent article containing an incongruent headline.
For example, one article that covered multiple issues in the
main text belonged to this case. According to the definition
of headline incongruity, the model correctly predicted the
label, but such a “briefing” article does not mislead readers
by presenting incorrect information.
Our findings highlight the need for future studies to im-
prove the data generation method and build a training dataset
that better represents headline incongruity in the wild. This
study did not edit the headline to make it incongruent with the
main content, which is a challenging task even for humans.
In newsrooms, editors are typically responsible for crafting
the headlines of news articles written by reporters. One could
address this challenging task by developing a generative
model that produces an incongruent headline by inputting
pairs of congruent headlines and body text through a gen-
erative neural model [44]. The line of research on controlled
generation and text style transfer could be used to generate
synthetic datasets for headline incongruity [45, 46].
A methodology to discover the relational information
among sentences and paragraphs can help in understanding
long news articles. A graph neural network is a reliable
choice for such cases because the graph embedding propa-
gates information between nodes (given that a node repre-
sents a sentence or a paragraph). Many successful studies
have adopted graph-based models to tackle NLP tasks such
as question answering, document understanding, and other
text-related tasks [16, 47, 48]. These studies propose differ-
ent graph topologies for the learning task, where the graph
topology determines the path that allows information to flow
between nodes.
In addition to graph-based neural networks, the headline
incongruity problem could benefit from other approaches.
One such approach would be to use pretrained models such
as BERT, which was compared in this study in rudimentary
form due to its computational load. Future works could fine-
tune the pretrained weights and improve on the transformer
layer. For example, in GHDE, we utilize the hierarchical dual
encoder (HDE) block to embed the node information corre-
sponding to the headline and paragraphs, but it is possible
to use transformer layers instead of an RNN-based block.
Another direction might be to directly encode the entire
text without adopting a hierarchical model architecture. The
previous BERT and its variant models possess limitations
in that they can address a maximum token length of only
512. Using recent technologies such as Transformer-XL [49]
and Longformer [50], it would be possible to overcome this
limitation and explore different ways of computing node
representations.
Furthermore, the proposed GHDE can be applied to
other applications that require content understanding, such
as document summarization, detecting reasoning sentences
for question-answering systems, and possibly understanding
multimodal (i.e., text, image) contents.
VII. CONCLUSION
In this paper, we studied the detection of news articles that
feature headline and body text incongruity, which is an im-
portant type of misinformation. Inspired by the hierarchical
nature of news articles, we propose a graph-based hier-
archical dual encoder (GHDE) that facilitates information
flows between headlines and paragraphs to aid in incongruity
detection. The evaluation experiments suggest that the pro-
posed approach successfully identifies such misinformation
with high accuracy. We hope this study contributes to the
construction of more credible online environments for news
consumption.
REFERENCES
[1] T. Atlantic, “How Many Stories Do Newspapers Pub-
lish Per Day?” https://bit.ly/3clqqai, 2016, [Online; ac-
cessed 21-Sep-2020].
[2] T. Montal and Z. Reich, “I, robot. you, journalist. who
is the author? authorship, bylines and full disclosure in
automated journalism,” Digital journalism, vol. 5, no. 7,
pp. 829–849, 2017.
[3] E. S. Matsa, Katerina Eva, “News use across social
media platforms,” Pew Research Center, 2018.
[4] U. K. Ecker, S. Lewandowsky, E. P. Chang, and R. Pil-
lai, “The effects of subtle misinformation in news head-
lines,” Journal of experimental psychology: applied,
vol. 20, no. 4, p. 323, 2014.
[5] S. Chesney, M. Liakata, M. Poesio, and M. Purver, “In-
congruent headlines: Yet another way to mislead your
readers,” in Proceedings of the 2017 EMNLP Work-
shop: Natural Language Processing meets Journalism,
2017, pp. 56–61.
[6] W. Wei and X. Wan, “Learning to identify ambiguous
and misleading news headlines, in Proceedings of the
26th International Joint Conference on Artificial Intel-
ligence, 2017, pp. 4172–4178.
[7] L. Lagerwerf, C. Timmerman, and A. Bosschaert, “In-
congruity in news headings: Readers’ choices and re-
sulting cognitions,” Journalism Practice, vol. 10, no. 6,
pp. 782–804, 2016.
[8] S. Yoon, K. Park, J. Shin, H. Lim, S. Won, M. Cha, and
K. Jung, “Detecting Incongruity between News Head-
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
line and Body Text via a Deep Hierarchical Encoder,
in Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 33, 2019, pp. 791–800.
[9] B. Riedel, I. Augenstein, G. Spithourakis, and
S. Riedel, “A simple but tough-to-beat baseline for
the Fake News Challenge stance detection task. CoRR
abs/1707.03264,” 2017.
[10] T. Chen and C. Guestrin, “Xgboost: A scalable tree
boosting system,” in Proceedings of the 22nd acm
sigkdd international conference on knowledge discov-
ery and data mining. ACM, 2016, pp. 785–794.
[11] K. Darwish, P. Stefanov, M. Aupetit, and P. Nakov,
“Unsupervised user stance detection on twitter, in Pro-
ceedings of the International AAAI Conference on Web
and Social Media, vol. 14, 2020, pp. 141–152.
[12] R. Mishra, P. Yadav, R. Calizzano, and M. Leippold,
“Musem: Detecting incongruent news headlines using
mutual attentive semantic matching, arXiv preprint
arXiv:2010.03617, 2020.
[13] T. N. Kipf and M. Welling, “Semi-supervised classifica-
tion with graph convolutional networks, arXiv preprint
arXiv:1609.02907, 2016.
[14] P. Veli ˇ
ckovi´
c, G. Cucurull, A. Casanova, A. Romero,
P. Lio, and Y. Bengio, “Graph attention networks,” in
International Conference on Learning Representations
(ICLR), 2018.
[15] R. Palm, U. Paquet, and O. Winther, “Recurrent rela-
tional networks, in Advances in Neural Information
Processing Systems, 2018, pp. 3368–3378.
[16] N. De Cao, W. Aziz, and I. Titov, “Question answering
by reasoning across documents with graph convolu-
tional networks, in Proceedings of the 2019 Confer-
ence of the North American Chapter of the Association
for Computational Linguistics: Human Language Tech-
nologies, Volume 1 (Long and Short Papers), 2019, pp.
2306–2317.
[17] L. Song, Z. Wang, M. Yu, Y. Zhang, R. Flo-
rian, and D. Gildea, “Exploring graph-structured pas-
sage representation for multi-hop reading compre-
hension with graph neural networks, arXiv preprint
arXiv:1809.02040, 2018.
[18] Y. Xiao, Y. Qu, L. Qiu, H. Zhou, L. Li, W. Zhang, and
Y. Yu, “Dynamically fused graph network for multi-hop
reasoning,” arXiv preprint arXiv:1905.06933, 2019.
[19] Y. Zhang, P. Qi, and C. D. Manning, “Graph convo-
lution over pruned dependency trees improves relation
extraction, in Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing,
2018, pp. 2205–2215.
[20] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van
Den Berg, I. Titov, and M. Welling, “Modeling re-
lational data with graph convolutional networks, in
European Semantic Web Conference. Springer, 2018,
pp. 593–607.
[21] A. Vlachos and S. Riedel, “Fact checking: Task defi-
nition and dataset construction,” in Proceedings of the
ACL 2014 Workshop on Language Technologies and
Computational Social Science, 2014, pp. 18–22.
[22] W. Y. Wang, ““Liar, Liar Pants on Fire”: A New Bench-
mark Dataset for Fake News Detection, in Proceedings
of the 55th Annual Meeting of the Association for Com-
putational Linguistics (Volume 2: Short Papers), 2017,
pp. 422–426.
[23] H. Karimi and J. Tang, “Learning Hierarchical
Discourse-level Structure for Fake News Detection,” in
Proceedings of the 2019 Conference of the North Amer-
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1
(Long and Short Papers), 2019, pp. 3432–3442.
[24] Y. Liu and Y.-F. B. Wu, “Fned: A deep network for fake
news early detection on social media, ACM Transac-
tions on Information Systems (TOIS), vol. 38, no. 3, pp.
1–33, 2020.
[25] Q. Chen, G. Srivastava, R. M. Parizi, M. Aloqaily, and
I. Al Ridhawi, An incentive-aware blockchain-based
solution for internet of fake media things, Information
Processing & Management, vol. 57, no. 6, p. 102370,
2020.
[26] G. Shrivastava, P. Kumar, R. P. Ojha, P. K. Srivastava,
S. Mohan, and G. Srivastava, “Defensive modeling of
fake news through online social networks, IEEE Trans-
actions on Computational Social Systems, vol. 7, no. 5,
pp. 1159–1167, 2020.
[27] J. Zhang, B. Dong, and S. Y. Philip, “Fakedetector:
Effective fake news detection with deep diffusive neural
network, in 2020 IEEE 36th International Conference
on Data Engineering (ICDE). IEEE, 2020, pp. 1826–
1829.
[28] Y. Ren and J. Zhang, “Hgat: Hierarchical graph atten-
tion network for fake news detection, arXiv preprint
arXiv:2002.04397, 2020.
[29] S. Chandra, P. Mishra, H. Yannakoudakis, and
E. Shutova, “Graph-based modeling of online com-
munities for fake news detection, arXiv preprint
arXiv:2008.06274, 2020.
[30] A. Chakraborty, B. Paranjape, S. Kakarla, and N. Gan-
guly, “Stop clickbait: Detecting and preventing click-
baits in online news media, in 2016 IEEE/ACM Inter-
national Conference on Advances in Social Networks
Analysis and Mining (ASONAM). IEEE, 2016, pp. 9–
16.
[31] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk,
A. Farhadi, F. Roesner, and Y. Choi, “Defending against
neural fake news, in Advances in Neural Information
Processing Systems, 2019, pp. 9051–9062.
[32] L. Media Bias Fact Check, “Media Bias/Fact Check,
https://mediabiasfactcheck.com, 2015, [Online; ac-
cessed 21-Sep-2020].
[33] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and
A. Joulin, Advances in pre-training distributed word
representations,” in Proceedings of the International
Conference on Language Resources and Evaluation
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
(LREC 2018), 2018.
[34] J. Johnson, M. Douze, and H. Jégou, “Billion-
scale similarity search with gpus,” arXiv preprint
arXiv:1702.08734, 2017.
[35] C. Talos, “Fake News Challenge - Team SOLAT
IN THE SWEN,” https://github.com/Cisco-Talos/fnc-1,
2017, [Online; accessed 21-September-2020].
[36] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova,
“BERT: Pre-training of Deep Bidirectional Transform-
ers for Language Understanding,” in Proceedings of the
2019 Conference of the North American Chapter of
the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short
Papers), 2019, pp. 4171–4186.
[37] N. Reimers and I. Gurevych, “Sentence-BERT: Sen-
tence Embeddings using Siamese BERT-Networks,”
in Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), 2019, pp. 3973–3983.
[38] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein,
and J. M. Solomon, “Dynamic graph cnn for learning on
point clouds,” ACM Transactions on Graphics (TOG),
vol. 38, no. 5, p. 146, 2019.
[39] D. P. Kingma and J. Ba, Adam: A method for stochas-
tic optimization,” arXiv preprint arXiv:1412.6980,
2014.
[40] J. Pennington, R. Socher, and C. Manning, “Glove:
Global vectors for word representation, in Proceedings
of the 2014 conference on empirical methods in natural
language processing (EMNLP), 2014, pp. 1532–1543.
[41] M. Y. Almoqbel, D. Y. Wohn, R. A. Hayes, and M. Cha,
“Understanding Facebook news post comment reading
and reacting behavior through political extremism and
cultural orientation,” Computers in Human Behavior,
vol. 100, pp. 118–126, 2019.
[42] T. Verge, “Twitter is bringing its ‘read before you
retweet’ prompt to all users,” https://bit.ly/2EEhD6Y,
2020, [Online; accessed 26-Sep-2020].
[43] J. Reis, F. Benevenuto, P. V. de Melo, R. Prates,
H. Kwak, and J. An, “Breaking the news: First im-
pressions matter on online news, in Proceedings of the
ICWSM, 2015.
[44] K. Lopyrev, “Generating news headlines with recurrent
neural networks, arXiv preprint arXiv:1512.01712,
2015.
[45] J. Ficler and Y. Goldberg, “Controlling Linguistic Style
Aspects in Neural Language Generation,” in Proceed-
ings of the Workshop on Stylistic Variation, 2017, pp.
94–104.
[46] T. Shen, T. Lei, R. Barzilay, and T. Jaakkola, “Style
transfer from non-parallel text by cross-alignment, in
Advances in neural information processing systems,
2017, pp. 6830–6841.
[47] G. Nikolentzos, A. J.-P. Tixier, and M. Vazirgiannis,
“Message passing attention networks for document un-
derstanding,” in Proceedings of the AAAI Conference
on Artificial Intelligence, vol. 34, 2020.
[48] H. Linmei, T. Yang, C. Shi, H. Ji, and X. Li, “Hetero-
geneous graph attention networks for semi-supervised
short text classification, in Proceedings of the 2019
Conference on Empirical Methods in Natural Language
Processing and the 9th International Joint Conference
on Natural Language Processing (EMNLP-IJCNLP),
2019, pp. 4823–4832.
[49] Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. Le, and
R. Salakhutdinov, “Transformer-xl: Attentive language
models beyond a fixed-length context, in Proceedings
of the 57th Annual Meeting of the Association for
Computational Linguistics, 2019, pp. 2978–2988.
[50] I. Beltagy, M. E. Peters, and A. Cohan, “Longformer:
The long-document transformer, arXiv preprint
arXiv:2004.05150, 2020.
SEUNGHYUN YOON received a Ph.D. degree in
Electrical and Computer Engineering from Seoul
National University in 2020. He is currently a re-
search scientist at Adobe Research, San Jose, US.
His research interests include machine learning
and natural language processing (NLP), focusing
on question-answering systems and learning lan-
guage representations for NLP tasks.
KUNWOO PARK received a Ph.D. degree in Web
Science from the School of Computing, KAIST,
South Korea, in 2018. He was a postdoctoral re-
searcher at KAIST, Qatar Computing Research
Institute, and UCLA. He is currently an assistant
professor at School of AI Convergence at Soongsil
University, South Korea. His recent interests focus
on detecting misinformation and social bias from
online media through NLP and multi-modal meth-
ods.
MINWOO LEE received a B.S. degree in Inte-
grated Technology from Yonsei University, South
Korea, in 2018. He is currently pursuing a Ph.D.
degree in Electrical and Computer Engineering
at Seoul National University, South Korea. His
current research interests include natural language
processing (NLP) and graph neural networks.
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3062029, IEEE Access
Yoon et al.: Learning to Detect Incongruence in News Headline and Body Text via a Graph Neural Network
TAEGYUN KIM is an undergraduate student at the
School of Computing, KAIST, South Korea. His
research interests lie in applying natural language
processing techniques to social science problems.
MEEYOUNG CHA received a Ph.D. degree from
the Department of Computer Science at KAIST
in Daejeon, South Korea, in 2008. From 2008
to 2010, she was a post-doctoral researcher at
the Max Planck Institute for Software Systems in
Germany. She has been a faculty member at the
School of Computing and the Graduate School of
Culture Technology at KAIST since 2010. She is
currently jointly affiliated as a Chief Investigator
at the Institute for Basic Science in South Korea.
Her research interests include data science, information science, and com-
putational social science with an emphasis on modeling socially-relevant
information propagation processes.
KYOMIN JUNG received a B.S degree in Math-
ematics from Seoul National University, Seoul,
Korea, in 2003 and a Ph.D. degree in Mathematics
from the Massachusetts Institute of Technology,
Cambridge, MA, USA, in 2009. From 2009 to
2013, he was an assistant professor in the depart-
ment of Computer Science at KAIST. Since 2016,
he was first an assistant professor and then an
associate professor in the department of Electrical
and Computer Engineering at Seoul National Uni-
versity (SNU). He is also an adjunct professor in the department of Math-
ematical Sciences, SNU. His research interests include natural language
processing, deep learning and applications, data analysis and web services.
VOLUME 4, 2016 13
... The iterative update technique propagates relevant data from paragraph nodes to the headline node, which is important for spotting content inconsistencies. In this paper we basically followed the data preparation and modeling part from the paper [21] and tried to reproduce the work for Bangla. However, our contribution can be summarized as below: ...
... We propose an approach for detecting incongruity between Bangla news headlines and content where we specifically tackle three significant challenges for preparing the dataset. We followed the same approach from this paper [21] for preparing our dataset for bangla language. The first is the scarcity of manually annotated training datasets, as well as the high expense of creating them. ...
... The figure 2 shows The overall workflow of incongruent level data preparation [21]. There is one target news article and one sample news article. ...
Preprint
Full-text available
Incongruity between news headlines and the body content is a common method of deception used to attract readers. Profitable headlines pique readers' interest and encourage them to visit a specific website. This is usually done by adding an element of dishonesty, using enticements that do not precisely reflect the content being delivered. As a result, automatic detection of incongruent news between headline and body content using language analysis has gained the research community's attention. However, various solutions are primarily being developed for English to address this problem, leaving low-resource languages out of the picture. Bangla is ranked 7th among the top 100 most widely spoken languages, which motivates us to pay special attention to the Bangla language. Furthermore, Bangla has a more complex syntactic structure and fewer natural language processing resources, so it becomes challenging to perform NLP tasks like incongruity detection and stance detection. To tackle this problem, for the Bangla language, we offer a graph-based hierarchical dual encoder (BGHDE) model that learns the content similarity and contradiction between Bangla news headlines and content paragraphs effectively. The experimental results show that the proposed Bangla graph-based neural network model achieves above 90% accuracy on various Bangla news datasets.
... Sentences from the headline and corresponding body can be directly compared as in [8], [33]. Attention based models have been proposed recently [12], [18], [19], [37], [38] showing promising performance for detecting incongruent headline news. However, detecting incongruent headline news is only one field of interest. ...
... It seems similar to ours, but its document is made by permuting the sentence order within a document rather than another document. • congruence / incongruence: In [12], [18], [19], [37], [38], the goal is comparing relationships with different kinds of texts such as headline and body text in news 1 https://github.com/dongin1009/GraDID FIGURE 1. ...
... Classification layer classifies the supernode V through two-layered MLP. [18], [19], [37], [38] performed incongruent news headline detection tasks and proposed their attention based models with auxiliary techniques. ...
Article
Full-text available
Extremely large volumes of documents are available from online news platforms and social media. While the quantity of these documents have grown exponentially, the majority lack their quality, which can cause digital fatigue or promote misinformation. To this end, we propose a novel framework that can evaluate the quality of documents in terms of consistency. We model low-quality document detection as a binary classification task, which is able to measure how the documents have consistent contents. Specifically, we relax the problem by considering each sentence or paragraph as node. A given document is then considered as a network of nodes. We show how we define the supernode in a network and show how it is informative enough to predict whether the document is consistent or not. We believe this scheme can be applied to various applications including fake news detection, and document screening with qualitative evaluations. We achieve the state-of-the-art on existing tasks using the NELA17 dataset, and YH-News dataset which we release in this paper.
... Therefore, detecting in-congruent news is vital to fight social media misinformation. Researchers have currently exploited different methods for detecting fake news, ranging from simple n-gram features based methods [4], hierarchical encoding based models [5], summarization based models [6] to artificially intelligent systems [7]- [9]. Normally, a system based on artificial intelligence encounters a bottleneck when optimization and tuning of different parameters [10] are essential. ...
Article
Full-text available
News is a source of information to know about progress in the various areas of life all across the globe. However, the volume of this information is high, and getting benefits from the available information becomes difficult. Moreover, the frequency of fake news is increasing significantly and used to fulfill a particular agenda. This led to research on the classification of news to prevent the spread of disinformation. In this work, we use Adversarial Training as a means of regularization for fake news classification. We train two transformed-based encoder models using adversarial examples that help the model learn noise invariant representations. We generate these examples by perturbing the model’s word embedding matrix, and then we fine-tune the model on clean and adversarial examples simultaneously. We train and evaluate the models on the Buzzfeed Political News and Random Political News datasets. Results show consistent improvements over the baseline models when we train models using adversarial examples. Experiments show that Adversarial Training improves the performance by 1.25% over the BERT baseline, 2.05% over the Longformer baseline for the Random Political News dataset, 1.25% over the BERT baseline and 0.9% over the Longformer baseline for Buzzfeed Political News dataset in terms of F1-score.
... PCNN can be directly utilized for two applications that concern the contradiction or inconsistency between texts. The first is to detect incongruence between news headline and its body text [59]. This task can be performed if we put the news headline and body text together as an article, which can be directly fed into PCNN for incongruence detection. ...
Preprint
While Wikipedia has been utilized for fact-checking and claim verification to debunk misinformation and disinformation, it is essential to either improve article quality and rule out noisy articles. Self-contradiction is one of the low-quality article types in Wikipedia. In this work, we propose a task of detecting self-contradiction articles in Wikipedia. Based on the "self-contradictory" template, we create a novel dataset for the self-contradiction detection task. Conventional contradiction detection focuses on comparing pairs of sentences or claims, but self-contradiction detection needs to further reason the semantics of an article and simultaneously learn the contradiction-aware comparison from all pairs of sentences. Therefore, we present the first model, Pairwise Contradiction Neural Network (PCNN), to not only effectively identify self-contradiction articles, but also highlight the most contradiction pairs of contradiction sentences. The main idea of PCNN is two-fold. First, to mitigate the effect of data scarcity on self-contradiction articles, we pre-train the module of pairwise contradiction learning using SNLI and MNLI benchmarks. Second, we select top-K sentence pairs with the highest contradiction probability values and model their correlation to determine whether the corresponding article belongs to self-contradiction. Experiments conducted on the proposed WikiContradiction dataset exhibit that PCNN can generate promising performance and comprehensively highlight the sentence pairs the contradiction locates.
... A hierarchical architecture that models a complex textual representation of news articles, and measures the incongruity between news headlines and body text approach is proposed in (Yoon et al., 2019). An approach to detect incongruity between a news headline and body text of a news article using a graph-based hierarchical dual encoder (GHDE) is the work of (Yoon et al., 2021). A deep hierarchical attention network that trained to extract hidden patterns in fake news using the concatenation of news headlines and their corresponding body text as input data-set is proposed in (Meel and Vishwakarma, 2021). ...
Conference Paper
Full-text available
Oncology is one of the most dynamic branches of medicine. As a result of numerous oncology studies, there has been a significant increase in scientific and clinical data that the human brain cannot store. Advances in artificial intelligence (AI) technology have led to its rapid clinical application. In this paper, we wanted to see the role of the use of artificial intelligence (AI) in oncology. We conducted an unsystematic search of databases (Pub Med, MEDLINE, and Google Scholar) using the keywords: artificial intelligence, deep learning, machine learning, oncology, personalized medicine. From a large number of articles available to us, we singled out review articles and clinical trial results according to their clarity and innovation regarding the use of artificial intelligence in oncology. Of particular importance to us was the ability to apply their results in everyday clinical work. The possibilities of using artificial intelligence in oncology are innumerable. Thus, AI can be used for diagnostic purposes (malignant screening, histopathology, and molecular diagnostics), therapeutic purposes (personalized treatment, prediction of treatment side effects and response to therapy, treatment decisions) as well as for prognostic purposes (risk stratification, 5-year survival, monitoring). The implementation of AI in clinical practice presents new challenges for clinicians. Namely, in the era of evidence-based and patient-centered medicine, they will have to master statistical as well as computer skills in addition to clinical ones. Therefore, it is necessary to start educating future doctors about the importance of AI in medicine as soon as possible.
... A hierarchical architecture that models a complex textual representation of news articles, and measures the incongruity between news headlines and body text approach is proposed in (Yoon et al., 2019). An approach to detect incongruity between a news headline and body text of a news article using a graph-based hierarchical dual encoder (GHDE) is the work of (Yoon et al., 2021). A deep hierarchical attention network that trained to extract hidden patterns in fake news using the concatenation of news headlines and their corresponding body text as input data-set is proposed in (Meel and Vishwakarma, 2021). ...
Conference Paper
Full-text available
The information on the web can be inconsistent across different web pages. News articles are examples of information on the web that are inconsistent and this paper proposes an approach that enables the visual analysis of inconsistencies in online news. It presents an approach which will enable the visual identification of inconsistencies associated to a news headline of interest. It uses a visual assessment approach that relies on two techniques, namely Fault Tolerance and Co-occurrence techniques. The Fault Tolerance technique is used in extracting related news headlines on the internet while the Co-occurrence technique is used for grouping and scaling related news headlines on the web. Also the bar-chart is used to plot charts that summaries the inconsistencies from which news readers can visually assess related news headlines of particular context.
Conference Paper
Full-text available
Measuring congruence between two texts has several useful applications, such as detecting the prevalent deceptive and misleading news headlines on the web. Many works have proposed machine learning based solutions such as text similarity between the headline and body text to detect the incongruence. Text similarity based methods fail to perform well due to different inherent challenges such as relative length mismatch between the news headline and its body content and non-overlapping vocabulary. On the other hand, more recent works that use headline guided attention to learn a headline derived contextual representation of the news body also result in convoluting overall representation due to the news body’s lengthiness. This paper proposes a method that uses inter-mutual attention-based semantic matching between the original and synthetically generated headlines, which uses the difference between all pairs of word embeddings of words involved. The paper also investigates two more variations of our method, which uses concatenation and dot-products of word embeddings of the words of original and synthetic headlines. We observe that the proposed method outperforms prior-arts significantly for two publicly available datasets.
Article
Full-text available
Online social networks (OSNs) have become an integral mode of communication among people and even nonhuman scenarios can also be integrated into OSNs. The ever-growing rise in the popularity of OSNs can be attributed to the rapid growth of Internet technology. OSN becomes the easiest way to broadcast media (news/content) over the Internet. In the wake of emerging technologies, there is dire need to develop methodologies, which can minimize the spread of fake messages or rumors that can harm society in any manner. In this article, a model is proposed to investigate the propagation of such messages currently coined as fake news. The proposed model describes how misinformation gets disseminated among groups with the influence of different misinformation refuting measures. With the onset of the novel coronavirus-19 pandemic, dubbed COVID-19, the propagation of fake news related to the pandemic is higher than ever. In this article, we aim to develop a model that will be able to detect and eliminate fake news from OSNs and help ease some OSN users stress regarding the pandemic. A system of differential equations is used to formulate the model. Its stability and equilibrium are also thoroughly analyzed. The basic reproduction number (R₀) is obtained which is a significant parameter for the analysis of message spreading in the OSNs. If the value of R₀ is less than one (R₀<1), then fake message spreading in the online network will not be prominent, otherwise if R₀> 1 the rumor will persist in the OSN. Real-world trends of misinformation spreading in OSNs are discussed. In addition, the model discusses the controlling mechanism for untrusted message propagation. The proposed model has also been validated through extensive simulation and experimentation.
Article
Full-text available
Graph neural networks have recently emerged as a very effective framework for processing graph-structured data. These models have achieved state-of-the-art performance in many tasks. Most graph neural networks can be described in terms of message passing, vertex update, and readout functions. In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD). We also propose several hierarchical variants of MPAD. Experiments conducted on 10 standard text classification datasets show that our architectures are competitive with the state-of-the-art. Ablation studies reveal further insights about the impact of the different components on performance. Code is publicly available at: https://github.com/giannisnik/mpad.
Article
We present a highly effective unsupervised framework for detecting the stance of prolific Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our framework has three major advantages over pre-existing methods, which are based on supervised or semi-supervised classification. First, we do not require any prior labeling of users: instead, we create clusters, which are much easier to label manually afterwards, e.g., in a matter of seconds or minutes instead of hours. Second, there is no need for domain- or topic-level knowledge either to specify the relevant stances (labels) or to conduct the actual labeling. Third, our framework is robust in the face of data skewness, e.g., when some users or some stances have greater representation in the data. We experiment with different combinations of user similarity features, dataset sizes, dimensionality reduction methods, and clustering algorithms to ascertain the most effective and most computationally efficient combinations across three different datasets (in English and Turkish). We further verified our results on additional tweet sets covering six different controversial topics. Our best combination in terms of effectiveness and efficiency uses retweeted accounts as features, UMAP for dimensionality reduction, and Mean Shift for clustering, and yields a small number of high-quality user clusters, typically just 2–3, with more than 98% purity. The resulting user clusters can be used to train downstream classifiers. Moreover, our framework is robust to variations in the hyper-parameter values and also with respect to random initialization.
Article
The concept of Fake Media or Internet of Fake Media Things (IoFMT) has emerged in different domains of digital society such as politics, news, and social media. Due to the integrity of the media being compromised quite frequently, revolutionary changes must be taken to avoid further and more widespread IoFMT. With today’s advancements in Artificial Intelligence (AI) and Deep Learning (DL), such compromises may be profoundly limited. Providing proof of authenticity to outline the authorship and integrity for digital content has been a pressing need. Blockchain, a promising new decentralized secure platform, has been advocated to help combat the authenticity aspect of fake media in a context where resistance to the modification of data is important. Although some methods around blockchain have been proposed to take on authentication problems, most current studies are built on unrealistic assumptions with the after-the-incident type of mechanisms. In this article, we propose a preventative approach using a novel blockchain-based solution suited for IoFMT incorporated with a gamification component. More specifically, the proposed approach uses concepts of a customized Proof-of-Authority consensus algorithm, along with a weighted-ranking algorithm, serving as an incentive mechanism in the gamification component to determine the integrity of fake news. Although our approach focuses on fake news, the framework could be very well extended for other types of digital content as well. A proof of concept implementation is developed to outline the advantage of the proposed solution.
Article
The fast spreading of fake news stories on social media can cause inestimable social harm. Developing effective methods to detect them early is of paramount importance. A major challenge of fake news early detection is fully utilizing the limited data observed at the early stage of news propagation and then learning useful patterns from it for identifying fake news. In this article, we propose a novel deep neural network to detect fake news early. It has three novel components: (1) a status-sensitive crowd response feature extractor that extracts both text features and user features from combinations of users’ text response and their corresponding user profiles, (2) a position-aware attention mechanism that highlights important user responses at specific ranking positions, and (3) a multi-region mean-pooling mechanism to perform feature aggregation based on multiple window sizes. Experimental results on two real-world datasets demonstrate that our proposed model can detect fake news with greater than 90% accuracy within 5 minutes after it starts to spread and before it is retweeted 50 times, which is significantly faster than state-of-the-art baselines. Most importantly, our approach requires only 10% labeled fake news samples to achieve this effectiveness under PU-Learning settings.