Content uploaded by Subhabrata Dutta
Author content
All content in this area was uploaded by Subhabrata Dutta on Sep 13, 2019
Content may be subject to copyright.
Into the Balefield: antifying and Modeling Intra-community
Conflicts in Online Discussion
Subhabrata Dutta, Dipankar Das
Jadavpur University
Kolkata, India
{subha0009,dipankar.dipnil2005}@gmail.com
Gunkirat Kaur, Shreyans Mongia, Arpan
Mukherjee, Tanmoy Chakraborty
IIIT-Delhi, India
{gunkirat15032,shreyans15178,arpan17007,tanmoy}@iiitd.ac.in
ABSTRACT
Over the last decade, online forums have become primary news
sources for readers around the globe, and social media platforms
are the space where these news forums nd most of their audience
and engagement. Our particular focus in this paper is to study
conict dynamics over online news articles in Reddit, one of the
most popular online discussion platforms. We choose to study how
conicts develop around news inside a discussion community, the
r/news subreddit. Mining the characteristics of these engagements
often provide useful insights into the behavioral dynamics of large-
scale human interactions. Such insights are useful for many reasons
– for news houses to improvise their publishing strategies and
potential audience, for data analytics to get a better introspection
over media engagement as well as for social media platforms to
avoid unnecessary and perilous conicts.
In this work, we present a novel quantication of conict in
online discussion. Unlike previous studies on conict dynamics,
which model conict as a binary phenomenon, our measure is
continuous-valued, which we validate with manually annotated
ratings. We address a two-way prediction task. Firstly, we predict
the probable degree of conict a news article will face from its au-
dience. We employ multiple machine learning frameworks for this
task using various features extracted from news articles. Secondly,
given a pair of users and their interaction history, we predict if
their future engagement will result in a conict. We fuse textual
and network-based features together using a support vector ma-
chine which achieves an AUC of 0.89. Moreover, we implement a
graph convolutional model which exploits engagement histories of
users to predict whether a pair of users who never met each other
before will have a conicting interaction, with an AUC of 0.69.
We perform our studies on a massive discussion dataset crawled
from the Reddit news community, containing over 41
k
news articles
and 5
.
5million comments. Apart from the prediction tasks, our
studies oer interesting insights on the conict dynamics – how
users form clusters based on conicting engagements, how dierent
is the temporal nature of conict over dierent online news forums,
how is contribution of dierent language based features to induce
conict, etc. In short, our study paves the way towards new methods
of exploration and modeling of conict dynamics inside online
discussion communities.
ACM Reference Format:
Subhabrata Dutta, Dipankar Das and Gunkirat Kaur, Shreyans Mongia,
Arpan Mukherjee, Tanmoy Chakraborty. 2019. Into the Battleeld: Quan-
tifying and Modeling Intra-community Conicts in Online Discussion. In
CIKM ’19, November 3–7, 2019, Beijing, China
2019. ACM ISBN 978-1-4503-6976-3/19/11. . . $15.00
https://doi.org/10.1145/3357384.3358037
The 28th ACM International Conference on Information and Knowledge Man-
agement (CIKM ’19), November 3–7, 2019, Beijing, China. ACM, New York,
NY, USA, 10 pages. https://doi.org/10.1145/3357384.3358037
1 INTRODUCTION
Mining knowledge from social media has gained tremendous atten-
tion among the research community in recent years. Endeavours
started with entity recognition, opinion mining, object detection,
etc.; current advents are pushing barriers to more complex analysis
such as inuence detection, malicious activity identication, multi-
modal and heterogeneous data mining, etc. Usage of social media
platforms are now so all-encompassing that these analyses yield
rich insight into individual and community interaction in general.
Online discussion forums are a particular type of social media and
networks, which, due to their ever growing usage and popularity,
need no introduction today. Reddit
1
is one such example, where
people across the globe engage in discussion related to innumerable
sets of topics.
With more and more people coming together in this virtual
world, dierences of opinions and conict are an inevitability. Con-
ict may arise from several premises – partial knowledge, socio-
political understandings, clash of cultural and moral positions, and
many more. It can be raised and developed from purely virtual indi-
vidual interactions as well as real-world happenings. Although any
dierence of opinion can be identied as a conict, its actual aspects
are versatile. It may manifest itself within a vast spectrum, from con-
structive debates with well-formed argumentation to degenerated,
unhealthy cyber-bullying and abuse. Thus a better introspection
into the complex dynamics of conict over online discussion plat-
forms may provide more useful insights to the data analytics and
social computing community as well as help moderators of online
platforms to identify and eliminate abusive conicts and make the
web a better place.
Versatility in manifestation of conict is also the primary chal-
lenge of modeling conict dynamics. Let us take the following three
comments taken from Reddit:
Comment 1:
I’m talking specically about the 2010 Afghan War
Diary, when Wikileaks was too lazy to scrub the names of about
100 Afghan civilian informants, thus revealing their identities
to Taliban death squads. You actually sound a lot like Assange,
who when asked why he didn’t bother scrubbing the names said
“Well, they’re informants, so, if they get killed, they’ve got it
coming to them. They deserve it.”
1www.reddit.com/
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
12
3
0Non-conflict
between pair
Conflict
between pair
Non-conflict
with others
Conflict
with others
Figure 1: Hypothetical state-transition model of conict for
pair of users; state
0
signies starting of engagement be-
tween a hypothetical user pair.
Comment 2:
Assange didn’t put them in danger. Participating
in an illegal war and murdering innocent people in a country
that never attacked the US put them in danger. Being actual
Nazis put them in danger. Anyone who does that deserves to
have a light shone on what they are doing, so that they hope-
fully stop. So sorry your conscience is worked up over maybes,
instead of the hard reality of all the ACTUAL MURDERING that
the US committed.
Comment 3: You seem a bit daft. If you were in Vichy France,
informing for the Nazis, do you think you would have an expec-
tation of privacy?When people like you start owning up to who
the real monsters are, then the world can change.
Both comments 2 and 3 are put in reply to comment 1, and both
of them hold an opposite view. But how do we decide which one
is more conicting? Comment 3 is more subjectively aggressive
towards the user posting comment 1. However, if we look at the
content, comment 2 presents an opposite opinion in a more pro-
found sense. Previous studies [
25
] on conicts either treated it as a
binary phenomenon; or identied controversy scores over topics
and not between two text segments [
15
]. Sophisticated NLP tools
may come handy in this content; however one major downside is
their lack of scalability in handling large-scale online data. In this
work, we focus more on objective, argumentative conict, rather
than subjective, aggressive conict. Simply put, we dene comment
2 to hold more conicting opinion compared to comment 1.
Online discussion platforms, through the lenses of engagement
conict, becomes a more complex dynamical process when the
system interacts frequently with external sources. In this work, the
external source is online news. Reddit has a specic community,
r/news, dedicated to discuss on news articles from various online
news sources. Users post their views regarding news report and
are engaged into discussion. Here a two way conict comes into
play – users holding opposite opinions against a report and users
holding opposite opinions towards each other. These two conicts
are even related; previous studies showed that certain news reports
tend to blow up conict of opinion between readers, mostly due to
the topic of the news, language usage, political bias, etc. [8, 28].
The state transition model in Figure 1 can be hypothesized as an
abstract model of inter-user conict dynamics. A transition from 0
Notation Denotation
TCorpus-wide keyword set
TDKeywords present in document D
T SDTarget-sentiment vector of document D
T Su
Target-sentiment vector averaged over
comments from user u
Nk
i
No. of comments from user ui
containing term T[k]
cf (D1,D2)Conict score between documents D1,D2
nc (N)Total conict towards news article N
G′(t)Dynamic user engagement network
Table 1: Important notations used throughout the paper.
to 1 or 2 signies interaction between a hypothetical user pair. Any
state transition from 1 or 2 can be of two types: either the users
interact with each other (solid lines) or they interact with rest of
the users (solid + dashed lines). Then state 1 corresponds to users
having only non-conicting engagement with each other, while
state 2 denotes only conicting engagements. State 3, in either way,
identies user groups who have preferential conict.
This abstract model can be actuated with a dynamic user-user
interaction graph, with edges between users signifying previous
interactions. We can further weigh these edges according to the
degree of conict arose in previous interactions. The problem of
predicting future conict between any two users then translates
into a signed link prediction task.
Our contributions in this work are as follows:
(1)
We dene a simple yet powerful and scalable measurement
of conict between pair of documents, focused on objective
expression of opinion. We use target dependent sentiment
scoring to compute a continuous valued score between text
documents. We use this metric to quantify conict between
news reports and their audience as well as between user pairs
interacting over discussion comments, on a large dataset
of news articles and corresponding discussions from Red-
dit r/news. We manually annotate randomly selected news
report-comment pair and comment-comment pair to test
our conict metric. We achieve 0
.
96 and 0
.
79 mean squared
error over
[
0
−
10
]
interval of conict rating. For ranking
comments according to the conict they express towards
particular news reports and comments, our method achieves
mean average precision of 0.77 and 0.83, respectively.
(2)
Using the conict measurement, we attempt to predict the de-
gree of conict a news article will experience from the users
reading and discussing it. Our prediction is solely based on
the content of the article. We extract several textual features
from the articles and employ multiple machine learning al-
gorithms. We achieve symmetric mean average percentage
error of 0.077 with with Support Vector Regression model.
(3)
We attempt to predict whether a future interaction between
any two given users will be conicting or not, given their pre-
vious history of comments and engagement. We implement a
Support Vector Machine based framework with selected tex-
tual and network-based features for this task, which achieved
0.89 AUC. We perform a fusion of textual features extracted
from users’ comment history and their interaction feature
over the engagement network using graph convolution over
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
dynamic user-user engagement, which correctly predicts
conict type between users (who have no previous history
of interaction) with 0.69 AUC.
(4)
We conduct several experiments using the conict metric
to reveal intriguing patterns of conict dynamics of news
reporting over r/news community. We explore how conict
towards news articles from dierent online news sources
vary over time, and dierent news sources trigger inter-user
conict at dierent degrees.
(5)
We explore how inter-user conict patterns emerge over
time in discussion threads as well as in interaction network.
We identify dierent community formation through conict,
which closely follow the abstract conict model we described
in Figure 1.
2 RELATED WORKS
In this section, we describe previous studies which we deem to be
closely related to our work.
Conict in community interaction
, which is the prime theme
of our study, is a well studied problem in social network theory,
psychology and sociology [
30
,
39
]. Dierent models and valuable
introspection have emerged from these studies, such as how people
tend to adapt towards certain acquaintances after initial conict,
ssion in small group networks post conict, emotional eects of
conict on individuals etc. However, studies on its online counter-
parts are much recent. Most of the studies in controversy and polar-
ization over social media are based on Twitter [
7
,
16
]. Garimella et
al. [
15
] proposed a graph-based approach to identify controversial
topics on Twitter and measures to quantify controversy of a topic.
They used 20 dierent hashtags to classify topics of conversation.
Partitioning retweet, follow and reply graphs they compute the
controversy related to each topic. Their work suggested the inef-
ciency of content-based measurements of controversy, majorly
attributed by short spans of texts in tweets and high noise. Guerra
et al. [
18
] proposed a similar approach to measure polarization over
social media; there data also is mostly based on Twitter. However,
one must keep in mind that, the nature of conict for microblogs is
substantially dierent from that of discussion forums, primarily due
to the size of the text. Kumar et al. [
25
] focused on Reddit to iden-
tify roles of conict in community interactions. They performed
their study on 36,000 Reddit communities (subreddits), identifying
relation between inter-community mobilization and conict. Their
study also includes patterns of how people ‘gang up’ on the verge
of conicting engagements. They predicted mobilizations between
communities based on conicts using user-level, community-level
and text-level features. They achieved 0.67, 0.72 and 0.76 AUC
using Random Forest, LSTM and ensemble of both, respectively.
Our work can be thought of as another side of their story – while
they focused on conict as a inter-community phenomenon, we
attempt to address its dynamics in a microscopic level, inside a
single community.
Stance detection and opinion mining
is closely related to
conict identication and measurement. Most of the previous works
in stance detection are based on stance classication of rumors
in Twitter [
27
,
29
,
42
]. Rosenthal and McKewon [
33
] propsed a
agreement-disagreement identication framework for discussions
in Create Debate and Wikipedia Talkpages. They dened vari-
ous lexical and semantic features from discussion comments and
achieved an average accuracy of 77% on the Create Debate corpus.
Zhang et al. [
40
] used discourse act classication on Reddit dis-
cussions to characterize agreement-disagreement over discussion
threads. Dutta et al. [
11
] employed an attention-based hierarchical
LSTM model for further improvement of discourse act classication
on the same dataset.
News popularity prediction
, though does not handle conict
explicitly, is related to this work as it deals with engagement dy-
namics of online news. Previous studies can be classied into two
main heads of approach – popularity of news in social media plat-
forms [
31
,
32
,
38
], and popularity of news on web in general [
12
,
22
].
The second approach deals with the prediction problem unaware
of inter-user network information, thereby excludes the explicit
interaction of users with themselves and with the news sources. Pop-
ularity prediction models focus only on the degree of engagement
a news gets, without concerning about the types of engagement,
which is our focus in this work.
Link prediction on social networks
, as we already stated,
is closely related to our formulated problem of predicting future
conict between users. There is rich literature focusing on this task
[
1
,
17
,
26
,
37
]. Bliss et al. [
5
] used evolutionary algorithm for link
prediction in dynamic networks. One important advancement in
recent times for learning graph-based data is Graph Convolution
Networks [
9
,
23
]. Zhang and Chen [
41
] applied convolution on
enclosing subgraphs for link prediction. Berg et al. [
3
] also dened
recommendation as a link prediction problem and used graph auto-
encoder using deep stacking of graph convolutional layers.
3 DATA
We crawled discussion threads containing at least one news link in
the posts or comments from r/news subreddit, starting from 2016-
09-01 to 2019-01-16. Out of 43,343 discussion threads crawled, we
discarded threads containing less than 10 comments. The remaining
17,351 threads containing a total of 5,502,258 comments were used
for the experiments. We also crawled news articles mentioned in
the threads, resulting in a total of 41,430 news articles from 5,175
dierent news sources.2
To evaluate our conict measurement strategy, we employed
three expert annotators
3
to identify conict between two given
texts (articles/comments). We asked them to rate an interaction
with higher conict score than another if they found more elabo-
rate opposition in the rst one. We provided the annotators with
multiple examples annotated by us (one of these examples is pre-
sented in Section 1). We asked them to annotate the conict in
[
0
−
10
]
scale such that non-conicting and highly conicting texts
will receive 0and 10, respectively. For any interaction where only
negativity has been expressed (sarcasm, popular slang without men-
tioning to what or whom it is addressed), we asked the annotators
to rate as 1. We compute nal ratings as the average of the rat-
ings received. A total of randomly selected 3
,
734 news-comment
pairs and 6
,
725 comment-comment pairs were annotated. The inter-
annotator agreement based on Fleiss’ κ[13] is 0.79.
2We have made the dataset containing the news articles public.
3They were experts on social media and their age ranged between 25-40 years.
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
4 CONFLICT QUANTIFICATION
Given two text segments, we measure conict between them as
how much opposite sentiment they exhibit. Here, we use target-
dependent sentiment measurement (TD-sentiment) as sentence-
level sentiment may not be a good indicator of stance towards a
motion. Let us take the following two sentences:
(1)
Applauds for the writer to rightly explain why immigration is
not a real problem.
(2)
This is an extremely good analysis of why immigration should
be stopped.
Both of these sentences have positive sentence-level sentiment,
though they carry conicting opinion towards immigration. TD-
sentiment for the term ‘immigration’ is neutral for sentence 1 and
negative for sentence 2. From this, we can conclude that these two
sentences are potential indicator of conict.
As dened in our problem statement, we compute conict be-
tween news article and platform users as well as between pair
of users. Firstly, we compute a set of keywords from our dataset
(comments + news articles). We tag the sentences using Spacy
4
parts-of-speech tagger and collect nouns only, after removing stop-
words and lemmatization. To handle co-references of persons, we
substitute nominal pronouns ‘he’ and ‘she’ by the last named-entity
found with ‘Person’-tag. We include all the named entities in our
keyword set, and top 60% of the rest, ranked in order of tf-idf values.
This results in a nal corpus-wide term set T.
Next, we compute TD-sentiment of news articles and comments
using Multi-Task Target Dependent Sentiment Classier (MTTDSC),
a state-of-the-art deep learning framework proposed by Gupta et
al. [
20
] recently. MTTDSC is informed by feature representation
learnedd for the related auxiliary task of passage-level sentiment
classication. For the auxiliary task and main task, it uses separated
gated recurrent unit (GRU), and sends the respective states to the
fully connected layer, trained for the respective task. The model is
trained and evaluated using multiple manually annotated datasets
[10, 34, 36].
Let a document
D
(a single comment or a news article) be a
sequence of sentences
[s1,s2,· · · ,sn]
and
TD⊂T
be the keyword
set present in
D
(where
T
is the corpus-wide term set dened earlier).
For any
t∈T
occurring in
si
, MTTDSC computes a three class
probability (positive, negative and neutral) vector
vi
t
. Then for all
the occurrences of
t
in
D
, we compute aggregate sentiment of
D
towards
t
as
SD,t=arдmax(1
nÍivi
t)
,
SD,t∈ {
1
,
2
,
3
}
, where
negative, neutral and positive sentiments are represented by 1, 2
and 3 respectively. Following this, we construct a vector
TSD
of
size |T|such that,
TSD[i]=(SD,T[i]if T[i] ∈ TD
0otherwise (1)
TSD
now represents the aggregate sentiments of document
D
to-
wards all the terms present in it. For any two documents
D1
and
D2
, we then compute the conict factor (
c f
) between them using
4https://spacy.io/usage/linguistic-features
their aggregate TD-sentiment vectors T S D1and T SD2as follows:
c f (D1,D2)=
|T|
Õ
i=0
min(TS D1[i],TSD2[i],1)|TSD1[i] − T S D2[i]| (2)
The component
min(TS D1[i],TSD2[i],
1
)
returns 0 when either of
the
ith
terms of
TSD1
and
TSD2
are 0, i.e., the term is not common,
and 1 otherwise. This excludes terms which are not present in
either of the texts to contribute to conict computation. The value
of the component
|TS D1[i] − T S D2[i]|
can be 0 (when both texts
have same sentiment towards the term), 1 (when one of texts hold
neutral sentiment and other one positive or negative) and 2 (when
texts hold opposite sentiments).
5 NEWS-USER CONFLICT PREDICTION
Given a news article
N
and the set of all comments
C
related to
N
,
we dene News Conict Score as,
nc(N)=1
|C|Õ
c∈C
c f (N,c)(3)
This is a normalized score referring to what degree users oppose
the views presented in the news article. We then extract following
features from news texts to predict this score given a news article:
(1) TD-sentiment vector, entity-wise sentiment expressed in
the news, as we compute T S Din Eq. 1.
(2) Count of positive, negative and neutral words
, tagged
using SenticNet [6].
(3) Cumulative entropy of terms, given by,
p=1
|T|Õ
t∈T
t ft(log |T| − log(t ft))
where
T
is the set of all unique tokens in the corpus, and
t ft
is the frequency of term tin the news text.
(4) Fraction of controversy and bias words
, measured using
the lexicon sets General Inquirer
5
and Biased Language
6
; we
use the fractions of these lexicons present in the article as
controversy and bias features.
(5) Latent semantic features
using ConceptNet Numberbatch
pretrained word vectors
7
[
35
]; we compute TF-IDF weighted
average of the vectors of the words present in an article to
represent latent semantics of the article.
(6) LIX readability
[
4
], computed as:
r=|w|
|s|+
100
×|cw |
|w|
,
where
w
and
s
are the sets of words and sentences respec-
tively, and
cw
is the set of words with more than six charac-
ters. Higher value of
r
indicates harness of the users to read
the article.
(7) Gunning Fog
[
19
], computed by: 0
.
4
× (ASL +PCW )
, where
ASL is the average sentence length, and PCW is the percent-
age of complex words. Higher value of this index indicates
harness of the users to read the article.
(8) Subjectivity
, calculated using TextBlob
8
. Its values lie in
the range [0,1].
5http://www.wjh.harvard.edu/~inquirer/homecat.htm
6http://www.cs.cornell.edu/~cristian/Biased_language.html
7https://github.com/commonsense/conceptnet-numberbatch
8https://textblob.readthedocs.io/en/dev/
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
To predict the conict score
nc(N)
, we use three regression models:
Lasso
,
Random Forest Regressor
, and
Support Vector Regres-
sor.
6 INTER-USER CONFLICT PREDICTION
As already stated, we dene the inter-user conict prediction as a
binary classication task to decide whether two users will engage
in a conict given their previous engagement history. We represent
engagement history as a weighted undirected graph
G={V,E,W}
,
where every node
vi∈V
represents a user
ui
, and every edge
ei j ∈E
connects two nodes
vi,vj
if and only if
ui
and
uj
have
been engaged with each other earlier (i.e., either of them have
commented in reply to at least one comment/post put by other).
Every edge
ei j
is accompanied by a weight
wi j ∈W
equal to the
average conict between
ui
, and
uj
, which is computed as follows:
wi j =1
Ni j
N
Õ
k=0
c f (Di
k,Dj
k)(4)
where
Di
k
and
Dj
k
represent the comments posted by
ui
and
uj
,
respectively at their
kth
interaction, and
Ni j
is the total number
of such interactions already occurred.
c f (Di
k,Dj
k)
is computed fol-
lowing Eq. 2.
To predict conict between user pairs, we propose four dierent
frameworks: one using graph convolution and three using Support
Vector Machine (SVM) with dierent feature combinations.
6.1 Graph convolution on engagement network
As typical user-user engagement networks of online discussion
platforms are huge in size, we need to implement graph convolution
over a subgraph. To predict the engagement type between a pair of
users corresponding to vertices
vi
and
vj
, we compute an enclosed
subgraph
Gsu b ={Vsub ,Esu b }
containing
vi,vj
from
G
such that
∀vk∈Vsub
,
dis (vi,vk),dis(vj,vk) ≤ dismax
, where
dis (vi,vk)
is
the length of the shortest path between
vi
and
vk
, and
dism ax
is a
threshold distance (see Section 7 for more details). All the edges in
Esu b share the same weight as in G.
We compute the adjacency matrix Afrom Gsu b as follows:
A[i][j]=A[j][i]=(wi j if ei j ∈Esub
0otherwise (5)
We represent every node
vi
with a
d
-dimensional feature vector
xi∈Rd
, which represents previous commenting history of user
ui
. We compute
xi
as the average over all the feature vectors cor-
responding to previous comments from
ui
, using the same feature
selection method as in Section 5 with an additional feature as fol-
lows – a binary vector representing the news sources the user is
engaged with. This leaves us with a tensor representation of user
vertex features X={x1,x2, . . . , x|V|}.
The adjacency matrix
A
and the vertex feature tensor
X
now
represent network history and comment history of all the users
at an instance, respectively. First, we learn a lower dimensional
feature representation X′from Xas follows:
X′=σr(Kf⊤X+Bf)(6)
w11
w21
wn1
w1n
w2n
wnn
User features Adjacency
matrix
Graph
convolution Graph
convolution
Graph
convolution
Engagement graph User
Conflict
engagement
Non-conflict
engagement
Conflict
No conflict
i-th feature
j-th feature
Figure 2: Inter-user conict prediction using graph convolu-
tion.
where
Kf
and
Bf
are kernel and bias matrices to learn while train-
ing and
σr(x)=max(x,
0
)
. We fuse these two histories together
using graph convolution. We compute a degree-normalized adja-
cency matrix
ˆ
A=D−1
2AD−1
2
, where
D
is the degree matrix of
A
.
This multiplication normalizes the eect of neighboring vertices
so that higher degree vertices do not get over-weighted. Now, our
convolution at the mth depth is computed as,
Hm+1=σr(ˆ
A·Hm·Km
g)(7)
where
Kg
is the graph convolution kernel to be learned while train-
ing, and
Hm
and
Hm+1
are the input and the output for the
mth
convolution respectively. Since we use three consecutive convolu-
tion layers, the nal feature representation is H3.
For predicting whether there will be a conicting engagement
between users
ui,uj
, we select the
ith
and the
jth
feature vectors
of H3and compute a score y∈ (0,1)as follows:
E=[H3[i],H3[i]] (8)
y=σs(Kc⊤·E+Bc)(9)
where
[·,·]
stands for the concatenation operator,
Kc
and
Bc
are
the kernel and bias for the classication layer respectively, and
σs(x)=(
1
+e−x)−1
. The complete architecture of the model is
illustrated in Figure 2. This model is trained to minimize cross-
entropy loss between true and predicted labels.
6.2 SVM-based frameworks
Graph convolution automatically learns feature representation for
the interaction between user pairs from node features and con-
nectivity of the nodes. For SVM, we need to manually identify
interaction features. We extract the following textual and network
based features for each user pair ui,uj:
(1) Count of relevant common tokens
from the previous
comments of the users; we take the sum of tf-idf values of
common unigram and bigrams in the comment history of
both the users.
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
(2) Conict vector CVi j
between the pair computed using TD-
sentiment vector
TSD
following Eq. 1; given previous
Nk
i
comments of user
ui
,
{C0,C1, . . . , CNk
i
}
where the term
T[k]
appear, we compute
TSui
, the target sentiment vector of
ui
averaged over the history as,
TSui[k]=1
Nk
i
Nk
i
Õ
l=0
TSCl[k](10)
We compute
CVij
as the element-wise absolute dierence
between T Suiand T Suj.
(3) Common news sources
,
CNi j
taken as a vector of length
equal to the number of news sources; for news source
k
,
CNi j [k]
indicates the number of articles from this news
source where ui,ujboth are engaged.
(4) Common discussions
, indicating the count of discussions
where both uiand ujare engaged.
(5) Previous mutual engagement
, the total number of previ-
ous interactions between uiand uj.
(6) Previous conict
, the average of mutual conicts between
uiand ujfor their previous engagements.
(7) Neighbor interactions
, the count of conicting and non-
conicting engagements for each user with its neighbor
nodes.
We use three SVMs with Gaussian kernel – rst SVM uses all the
features mentioned above (SVM-all), the second one (SVM-text)
uses only text based features (features 1 and 2) and the third one
(SVM-net) uses only network based features (features 3-5). SVM-
net, which has been used for negative link prediction by Wang et
al. [37], serves as our external baseline.
7 EXPERIMENTAL RESULTS
For the news-user conict prediction task, total size of our feature
vector is 8
,
136. On a total set of 41
,
430 news articles, we used 80 : 20
train-test split keeping the fractions of dierent news sources same
over train and test data.9
For the user-user conict prediction task, the number of features
representing user nodes in the graph convolution model is 8
,
236. To
construct enclosing subgraphs from user-user engagement network,
we set the value of
dmax
(dened in Section 6.1) to be 100. This
results in adjacency matrices with an upper bound of 5000 nodes.
10
We perform this prediction on 25 instances of the dynamic user
engagement network, taking a total of 1
,
637 dierent subgraphs
from these instances. For any user pair on these subgraphs, if there
is a conicting engagement between them over an interval of next
24 hours, we label them as positive, otherwise negative. We take
213
,
998 dierent user pairs altogether, randomly sampling equal
numbers of positive and negative labels to avoid bias. Here again,
we split the samples into 80 : 20 train-test splits, with 15% of the
train data used as the development set to tune the parameters. We
use Nadam (Adam with Nesterov momentum) optimization to train
the model, with a batch size of 256.
9
We used scikit-learn framework (https://scikit-learn.org/stable/) to implement
all the regression models mentioned.
10
We implement this model using Keras (https://keras.io/) and Tensorow frame-
works (https://www.tensorow.org/).
Conict type RMSE MAP MRR
News-comment conict 0.96 0.77 0.86
Comment-comment conict 0.79 0.83 0.91
Table 2: Evaluation of conict measurement on manually
annotated conict ratings.
-1.2
-0.6
0
0.6
1.2
050 100 150 200 250 300
Number of words in comment
1.8
Error (ytrue - ypred)
Figure 3: Error in conict score vs. size of comments in
words.
7.1 Evaluation of conict quantication
We test our conict measurement on the manually annotated news-
comment and comment-comment pairs (Section 3). To deal with
dierent ranges, we normalize the
c f
values to the
[
0
−
10
]
inter-
val and measure Root Mean Squared Error (MSE). We also con-
sider ranking comments accordingly to their conicting tendency
towards a particular news article and a particular comment. We
compute the Mean Average Precision (MAP) of the ranking and
Mean Reciprocal Rank (MRR) for top ranking position based on the
ground-truth annotation mentioned in Section 3.
As observed in Table 2, measuring inter-comment conict is
rather an easier task compared to news-comment conict. The
feedback obtained from the annotators reveal that as most news
articles are written in an objective style with less explicit opinion,
it is hard to apprehend whether a comment holds opposite opinion
to the news.
As there is no previous work in quantifying conict between
two text documents over online discussions, we implement the
agreement-disagreement detection models proposed by Rosenthal
and McKewon [
33
] (
Baseline-I
) and Dutta et al. [
11
] (
Baseline-II
).
Baseline-I performs a three-class classication: agreement,disagree-
ment and none. We identify disagreement as conict and rest of the
classes as non-conict. We also dene the probability of the dis-
agreement class (predicted by Baseline-I) for an interaction as a unit
norm score of conict. Similarly, Baseline-II performs a ten-class
classication of discourse acts, from which we identify the classes
disagreement and negative reaction together as conict, and rest of
the classes as non-conict. Sum of the probabilities of these two
mentioned classes is dened as unit norm conict score predicted
by Baseline-II.
We compare our strategy of conict score prediction with the
baselines through a three-way evaluation strategy:
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
Metric Our method
(conict factor) Baseline 1 Baseline 2
AUC 0.79 0.79 0.62
MAP 0.83 0.61 0.55
RMSE 0.79 1.67 2.09
Table 3: Comparison of conict score with baselines.
(1)
We dene a binary classication of interactions into conict
and non-conict, evaluated using ROC-AUC;
(2)
We dene a regression of the degree of conict, where we
scale the outputs of each model to the interval
[
0
,
10
]
and
evaluate using RMSE;
(3)
We dene a ranking problem of the interactions according
to their degree of conict, and evaluate using MAP.
As both the baselines perform their corresponding tasks (stance
classication and discourse act classication) on discussion data, we
perform this comparisons only for the comment-comment conict
prediction.
Table 3 shows that our proposed strategy outperforms both the
baselines for ranking and regression tasks. This is quite expected
as both the baseline models are actually classication frameworks.
For the binary classication of conicting and non-conicting in-
teractions, our strategy ties with Baseline-I.
Figure 3 plots the variance of error in conict score with the
change in the comment length. For news-comment pairs, we only
take the comment length, while for comment-comment pairs we
take the average of the length of both comments. To see whether
the error in our score has any bias towards underestimation / over-
estimation, we take the dierence
(ytr ue −yp r ed )
, where
ytru e
and
ypr e d
are manually annotated score and computed
c f
respectively.
As we can see in Figure 3, our computed score underestimates con-
ict when comments are short, and overestimates as the size grows
(more negative errors for size approximately less than 60 words;
more positive errors afterwards). Also, absolute error rate decreases
with increasing size of comments.
Such error pattern can be explained from the denition of conict
measurement itself. We use the sum of the absolute dierences
of sentiment towards specic targets common in documents, as
conict score which increases with the number of common targets
present. As the length of the comments increases, the common word
set also increases, and small dierences add up to large conict
scores. For short comments, the number of common targets are
also small, and the score tends to reect less conict than actual.
For shorter comments, another problem is the use of semantically
similar words occurring as targets in any of the comments in a given
pair. For example, the sentences ‘We do not support Democrats’ and
‘We support Hilary’ are actually conicting, as the targets Hilary
and Democrats are semantically similar. But due to no common
words, these pairs will be identied as non-conicting.
However as our dataset suggests, the fraction of comments hav-
ing greater than 50 words is 0
.
79; and the ratio between the number
of words and targets is 17
.
678. This is particular to the online dis-
cussion forums, where users tend to get engaged in an elaborate
manner, and therefore reduces the error margin of our conict score.
Our model achieves 0
.
96 and 0
.
79 RMSE for news-comment and
comment-comment conict, respectively, over the interval
[
0
,
10
]
Model MSE RMSE sMAPE
Random Forest 6.194 2.489 0.099
SVR 4.041 2.010 0.077
Lasso 3.179 1.783 0.080
Table 4: Performance of dierent regression algorithms for
news-user conict prediction.
Gunning
Fog
Entropy
LIX
Latent
semantic
Subjectivity
Controversy
and bias lexicon
Positive
polarity words
Negative
polarity words
TD-sentiment
0246810 12
Feature Importance
Figure 4: Importance of dierent features for news-user con-
ict prediction.
which might be considered as signicantly accurate for conict
modeling.
7.2 Evaluation of news-user conict prediction
In our dataset, the news conict scores (computed using Eq. 3) of
the news articles vary from 0 to 138.15. In Table 4, we present the
MSE (Mean Squared Error), RMSE and sMAPE (Symmetric Mean
Absolute Percentage Error) for predicting news conict scores using
dierent regression algorithms. In terms of MSE and RMSE, Lasso
regression performs the best, while SVR is the best performing one
when evaluated using sMAPE.
We check to see which features are given more importance by
our best performing regression algorithms. As we can see in Figure
4, term dependent sentiments are the most useful ones to predict
how much likely is a news article to get negative feedback. In fact,
this feature achieves way more importance compared to its next
competitor, which again are polarity-oriented features. Interest-
ingly, the count of negative polarity words has higher importance
than the count of positive polarity words. The high importance of
polarity related features may signify that news report expressing
polarized bias tends to get more conicting remarks. Readability
indices (Gunning-Fog and LIX), albeit low, play some role in the
prediction task. In fact, Gunning-Fog is substantially more useful
compared to LIX.
7.3 Evaluation of inter-user conict prediction
We evaluate all four models for two cases: (i) whole of the test data
where a pair of users may or may not have previous interaction
history, and (ii) user pairs who have no interaction history before
the prediction instance. We present the evaluation results in Table 5.
For the whole test data, SVM model with all the features performs
the best. It is readily conclusive that, network-based features are of
greater importance compared to text-based features for this task.
However, when there is no previous interaction history between
two users, graph convolution beats all the models by a substantial
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
Evaluation SVM-all SVM-text SVM-net GCN
Acc. 0.89 0.64 0.85 0.87
AUC 0.89 0.62 0.84 0.86
Acc. (new) 0.67 0.43 0.67 0.72
AUC (new) 0.65 0.43 0.65 0.69
Table 5: Evaluation of all the models for user-user conict
prediction. Accuracy is abbreviated as Acc. Acc. (new) and
AUC (new) signify evaluation results for user pairs with no
previous interactions.
20
0
40
60
80
100
120
140
The
GuardianReuters New
York
Times
BBC Fox
News USA
Today NBC
News Indepen-
dent.co.uk
Conflict
max value
avg. value
min value
Figure 5: Distribution of maximum, minimum and average
conict scores for dierent news sources. This plot is for
only top 7 news sources (ranked by number of articles).
margin. In fact, when there is no previous engagement history
between users, the only feature available to the SVM model is the
neighbour interactions; which means SVM-all and SVM-net actually
become the same model, and SVM-text becomes a model with all
zero features with all zero output.
8 CONFLICT DYNAMICS
We introspect into the dynamics of conict in r/news community
using the conict measurements that we propose in Eq. 2 (for inter-
user conict) and Eq. 3 (for an aggregate conict that a news article
receives from the users).
8.1 Patterns of conict for dierent news
sources
Dierent news sources tend to face dierent degree of conict from
the users. In Figure 5, we plot maximum, minimum and average
news conict for dierent news sources in our dataset. Although
the average conict for dierent sources is in a comparable range,
maximum values vary greatly. News sources such as Fox News,
USA Today or NBC News maintain a sustained negative response,
whereas New York Times or Reuters provoke sharp outrage at the
some point.
We nd that this outrage is signied by an article published in
NYTimes on Dec 1, 2017, titled Michael Flynn Pleads Guilty to Lying
to the F.B.I. and Will Cooperate With Russia Inquiry
11
. Figure 6 also
indicates the sharp peak for New York Times corresponding to this
article. The Guardian, Fox news and NBC News have similar peaks
(red-circled) at nearby time instance, all corresponding to articles
11
https://www.nytimes.com/2017/12/01/us/politics/
michael-ynn- guilty-russia- investigation.html
NBC News
New York Times
BBC
The Guardian FOX News Reuters
60
40
20
35
25
15
5
80
60
40
20
35
25
15
140
100
60
20
40
30
20
10
Figure 6: Temporal variation of news-user conict for var-
ious news sources; conict score and time are represented
in y- and x-axis respectively. All the plots have time frame
starting from Nov 17 - Dec 28, 2017. Red circled peaks denote
rise in conict due to articles corresponding to a particular
event.
related to the same event. One can draw an intuitive correlation
between the posting time of the article in the forum and the rise
in conict. It is important to note that at the time of posting, we
identify the time when the news appeared on Reddit, not the time
of its appearance on web.
8.2 Engagement dynamics and inter-user
conict
To explore how conict eects user engagement over r/news, we
construct a temporal graph
G′(t)={V′(t),E′(t)}
, where
vi(ti) ∈
V′(t)
corresponds to user
ui
who is engaged in a discussion at time
ti
for the rst time. For every pair of users
(ui,uj)
engaging with
each other (anyone of them commenting in reply to the other) at
time
ti j
, there is an edge
ei j (ti j ) ∈ E′(t)
. For better visualization,
we classify edges as conicting (blue) and non-conicting (green),
and plot only a subgraph using 5000 vertices. We use Fruchter-
man Reingold layout algorithm [
14
] on Gephi [
2
] to plot the graph
and DyCoNet[
21
] to identify communities. In Figure 7, we present
snapshots of the evolving graph. Each snapshot is taken at a time
dierence of approximately 24 hours, presenting a 4-day long ab-
straction through this engagement subgraph.
We can observe the formation of separate user clusters in terms
of engagement. It is interesting to see that there are some clus-
ters where users are predominantly engaged with each other in a
conicting manner (blue regions) and some in a non-conicting
manner (green regions). We also identify three dierent types of
engagement patterns in user clusters:
•Type-I
clusters tend to be formed with non-conicting en-
gagement between users. Users in these clusters do not seem
to get engaged in conicting manner with users in other
clusters as well.
•Type-II
clusters are formed with users having mutual con-
ict. They tend to have conicting interactions with other
clusters as well.
•Type-III
clusters show a organization-like behavior. These
users maintain almost non-conicting engagement with each
other, but aggressive towards other clusters (mostly green
regions inside the cluster and blue ones outwards in Figure
7).
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
Type-I cluster
Type-II cluster
Type-III cluster
Figure 7: Snapshots of cluster formation in user-user engagement graph (left to right); blue and green edges correspond to
controversial and non-controversial engagements respectively.
Type-III clusters tend to grow most compared to type-I and
type-II clusters. Dierent type-III clusters have most inter-cluster
conicts, even greater than that of type-II clusters. Type-I clusters
show least growth rate among three types, signifying that these
users are less prone to go out of their ‘comfort zone’.
This cluster types are of course not completely rigid. Although
there is no sign of conversion between type-I and II, both of them
can slowly convert into type-III. It is intriguing to observe two
dierent patterns in the formation of type-III cluster – (i) Some
of them emerge as type-III from the beginning. Users having no
previous engagement form non-conicting connections with each
other. This may signify a probable community interaction among
them beyond the discussion platform such as organized campaign-
ers, small group of people using multiple fake user accounts aka
sockpuppets [
24
], people are accustomed to each other in real life
and sharing similar opinions, etc. (ii) Some of them started as type-I
or II and slowly get converted into type-III, which possibly signies
the evolution of engagement via predominant platform interaction.
Users in type-II clusters start changing opinion towards each other
with long term interaction and get converted into type-III. Simi-
larly, type-I users tend to start interacting with opposite opinions
and convert themselves into type-III. We observe that 33% of the
type-III clusters at the end of time frame are the ones converted
from type-II, whereas 48% are from type-I. Rest of them started
growing as type-III clusters.
Depth of thread
Normalized inter-comment conflict
Figure 8: Variation of inter-comment conict with depth of
comments in discussion tree.
Formation and evolution of these clusters closely follow the
abstract model of user engagement in Figure 1. A repeated transition
from state 1 along the self loop results in type-I cluster, whereas
the same happening on state 2 will result in a type-II cluster. If all
the user pairs from state 1 start conicting with each other, it will
lead to a transition to state 2, which implies that a type-I cluster is
transformed into type-II. This can only be possible hypothetically;
however we did not nd any such evidence in our dataset. Likewise,
transition from state 1 or 2 to state 3 signies preferential conict,
resembling type-III clusters.
In Figure 8, we plot the variation of inter-comment conict with
the depth of the comments in discussion thread tree. We normalize
conict scores to
(
0
,
2
)
interval. For comment pairs at depth
i
and
i+
1, we plot their conict at depth
i
. As it is evident from the plot,
a discussion thread is most prone to conict at depth levels 3 and 4.
For interactions at more depth, variance goes up substantially, but
average inter-conict score drops steadily.
Table 6 shows an example statistics of dierent news sources
regarding which discussions lead to user clusters. We report this
for three dierent instances
G′(t1)
,
G′(t2)
, and
G′(t3)
at time
t1
,
t2
,
and
t3
respectively. We take the discussions initiated within past 24
hours for each instance of the network and map the users in each of
the largest three clusters to those discussions. As each discussion is
related to a news source, this nally maps news sources to clusters.
As we can see in Table 6, there are several common news sources
present in rst and second instances, whereas almost no common
sources is found in the third instance.
9 CONCLUSION
In this paper, we studied conict dynamics over online discussions
inside Reddit r/news community. We proposed a novel, continuous-
valued quantication of inter-document conict. Using this mea-
surement we attempted to predict how much negative response
a news article is going to face from audience in online discussion
platforms, solely based on its textual features. We proposed an
SVM based model and a graph convolutional model to predict fu-
ture conict between pairs of users. Extensive evaluation showed
that network-based features are more important in conict link
prediction compared to textual content-based features.
Our analyses provide novel insights into the conict dynamics
over large-scale online discussion. We show how dierent news
sources get dierent reactions from their audience and how this
varies temporally. We identied three distinct types of user clus-
ters developed in Reddit r/news community, based on the attitude
towards other users and engagement patterns. We also provided a
hypothetical state-transition model of user engagement, which is
closely followed by actual interaction patterns.
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
Cluster index
ranked along size Instance 1 Instance 2 Instance 3
1
Baltimore News (40.02%) Comic Book (76.48%) New York Times (49.03%)
Wichita Eagle (25.92%) Wichita Eagle (11.89%) Fox News (25.08%)
National Geographic (20.00%) Fox News (1.91%) abc13 (25.88%)
Fox News (13.33%) Detroit News (0.54%)
2
Baltimore News (100%) Wichita Eagle (50.57%) BBC (78.52%)
Baltimore News (47.12%) Independent (18.61%)
National Geographic (2.29%) New York Times (2.87%)
3
abc13 (44.44%) Baltimore News (100%) Guardian (42.98%)
Fox News (27.78%) Independent (41.32%)
Baltimore News (22.24%) Detroit News (15.70%)
Table 6: Percentage of dierent news sources in user clusters of user-user engagement network. We show the statistics of three
largest clusters at three dierent instances of the network. Up to top four news sources (according to %-contribution) is shown.
ACKNOWLEDGEMENT
The project was partially supported by Ramanujan Fellowship
(SERB, India), Early Career Research Award (ECR/2017/001691),
the Infosys Centre for AI, IIITD, and State Government Fellowship,
Jadavpur University.
REFERENCES
[1]
Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin
Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in
social media. ACM Transactions on the Web (TWEB) 6, 2 (2012), 9–29.
[2]
Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. 2009. Gephi: an open
source software for exploring and manipulating networks. In ICWSM. 11–20.
[3]
Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolu-
tional matrix completion. arXiv preprint arXiv:1706.02263 (2017).
[4]
Carl-Hugo Björnsson. 1983. Readability of newspapers in 11 languages. Reading
Research Quarterly (1983), 480–497.
[5]
Catherine A Bliss, Morgan R Frank, Christopher M Danforth, and Peter Sheridan
Dodds. 2014. An evolutionary algorithm approach to link prediction in dynamic
social networks. Journal of Computational Science 5, 5 (2014), 750–764.
[6]
Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018.
SenticNet 5: Discovering conceptual primitives for sentiment analysis by means
of context embeddings. In AAAI. 1–10.
[7]
Michael D Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves,
Filippo Menczer, and Alessandro Flammini. 2011. Political polarization on twitter.
In ICWSM. 1–10.
[8]
Peter A Cramer. 2011. Controversy as news discourse. Vol. 19. Springer Science &
Business Media.
[9]
Michaël Deerrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu-
tional neural networks on graphs with fast localized spectral ltering. In NIPS.
3844–3852.
[10]
Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, and Ke Xu. 2014.
Adaptive recursive neural network for target-dependent twitter sentiment classi-
cation. In ACL, Vol. 2. 49–54.
[11]
Subhabrata Dutta, Tanmoy Chakraborty, and Dipankar Das. 2019. How did the
discussion go: Discourse act classication in social media conversations. In
Linking and Mining Heterogeneous and Multi-view Data. Springer, 137–160.
[12]
Kelwin Fernandes, Pedro Vinagre, and Paulo Cortez. 2015. A proactive intelli-
gent decision support system for predicting the popularity of online news. In
Portuguese Conference on Articial Intelligence. Springer, 535–546.
[13]
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters.
Psychological bulletin 76, 5 (1971), 378.
[14]
Thomas MJ Fruchterman and Edward M Reingold. 1991. Graph drawing by force-
directed placement. Software: Practice and experience 21, 11 (1991), 1129–1164.
[15]
Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael
Mathioudakis. 2018. Quantifying controversy on social media. ACM Transactions
on Social Computing 1, 1 (2018), 3.
[16]
Venkata Rama Kiran Garimella and Ingmar Weber. 2017. A long-term analysis of
polarization on Twitter. In ICWSM. 1–10.
[17]
Eric Gilbert and Karrie Karahalios. 2009. Predicting tie strength with social media.
In SIGCHI. ACM, 211–220.
[18]
Pedro Calais Guerra, Wagner Meira Jr, Claire Cardie, and Robert Kleinberg.
2013. A measure of polarization on social media networks based on community
boundaries. In ICWSM. 1–10.
[19]
Robert Gunning. 1969. The fog index after twenty years. Journal of Business
Communication 6, 2 (1969), 3–13.
[20]
Divam Gupta, Kushagra Singh, Soumen Chakrabarti, and Tanmoy Chakraborty.
2019. Multi-task Learning for Target-dependent Sentiment Classication. arXiv
preprint arXiv:1902.02930 (2019).
[21]
Julie Kauman, Aristotelis Kittas, Laura Bennett, and Sophia Tsoka. 2014. Dy-
CoNet: a Gephi plugin for community detection in dynamic complex networks.
PloS one 9, 7 (2014), e101357.
[22]
Yaser Keneshloo, Shuguang Wang, Eui-Hong Han, and Naren Ramakrishnan.
2016. Predicting the popularity of news articles. In SDM. SIAM, 441–449.
[23]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classication with graph
convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[24]
Srijan Kumar, Justin Cheng, Jure Leskovec, and V.S. Subrahmanian. 2017. An
Army of Me: Sockpuppets in Online Discussion Communities. In WWW. 857–
866.
[25]
Srijan Kumar, William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2018. Com-
munity interaction and conict on the web. In WWW. International World Wide
Web Conferences Steering Committee, 933–943.
[26]
David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for
social networks. Journal of the American society for information science and
technology 58, 7 (2007), 1019–1031.
[27]
Michal Lukasik, PK Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, and
Trevor Cohn. 2016. Hawkes processes for continuous time sequence classication:
an application to rumour stance classication in twitter. In ACL, Vol. 2. 393–398.
[28]
Marian Meyers. 1994. Dening Homosexuality: News Coverage of theRepeal the
Ban’Controversy. Discourse & Society 5, 3 (1994), 321–344.
[29]
Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin
Cherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In SemEval. 31–41.
[30]
Reed E Nelson. 1989. The strength of strong ties: Social networks and intergroup
conict in organizations. Academy of Management Journal 32, 2 (1989), 377–401.
[31]
Alicja Piotrkowicz, Vania Dimitrova, Jahna Otterbacher, and Katja Markert. 2017.
Headlines matter: Using headlines to predict the popularity of news articles on
twitter and facebook. In ICWSM. 1–10.
[32]
Georgios Rizos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2016. Pre-
dicting news popularity by mining online discussions. In WWW. International
World Wide Web Conferences Steering Committee, 737–742.
[33]
Sara Rosenthal and Kathy McKeown. 2015. I couldn’t agree more: The role of
conversational structure in agreement and disagreement detection in online
discussions. In Proceedings of the 16th Annual Meeting of the Special Interest Group
on Discourse and Dialogue. 168–177.
[34]
Niek J Sanders. 2011. Sanders-twitter sentiment corpus. Sanders Analytics LLC
242 (2011).
[35]
Robert Speer and Joshua Chin. 2016. An ensemble method to produce high-quality
word embeddings. arXiv preprint arXiv:1604.01692 (2016).
[36]
Bo Wang, Maria Liakata, Arkaitz Zubiaga, and Rob Procter. 2017. Tdparse: Multi-
target-specic sentiment recognition on twitter. In EMNLP. 483–493.
[37]
Peng Wang, BaoWen Xu, YuRong Wu, and XiaoYu Zhou. 2015. Link prediction
in social networks: the state-of-the-art. Science China Information Sciences 58, 1
(2015), 1–38.
[38]
Bo Wu and Haiying Shen. 2015. Analyzing and predicting news popularity on
Twitter. International Journal of Information Management 35, 6 (2015), 702–711.
[39]
Wayne W Zachary. 1977. An information ow model for conict and ssion in
small groups. Journal of anthropological research 33, 4 (1977), 452–473.
[40]
Amy Zhang, Bryan Culbertson, and Praveen Paritosh. 2017. Characterizing
online discussion using coarse discourse sequences. (2017).
[41]
Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural
networks. In NIPS. 5165–5175.
[42]
Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik,
Kalina Bontcheva, Trevor Cohn, and Isabelle Augenstein. 2018. Discourse-aware
rumour stance classication in social media using sequential classiers. Informa-
tion Processing & Management 54, 2 (2018), 273–290.