PreprintPDF Available

Abstract and Figures

In this work, we present a novel quantification of conflict in online discussion. Unlike previous studies on conflict dynamics, which model conflict as a binary phenomenon, our measure is continuous-valued, which we validate with manually annotated ratings. We address a two-way prediction task. Firstly, we predict the probable degree of conflict a news article will face from its audience. We employ multiple machine learning frameworks for this task using various features extracted from news articles. Secondly, given a pair of users and their interaction history, we predict if their future engagement will result in a conflict. We fuse textual and network-based features together using a support vector machine which achieves an AUC of 0.89. Moreover, we implement a graph convolutional model which exploits engagement histories of users to predict whether a pair of users who never met each other before will have a conflicting interaction, with an AUC of 0.69. We perform our studies on a massive discussion dataset crawled from the Reddit news community, containing over 41k news articles and 5.5 million comments. Apart from the prediction tasks, our studies offer interesting insights on the conflict dynamics -- how users form clusters based on conflicting engagements, how different is the temporal nature of conflict over different online news forums, how is contribution of different language based features to induce conflict, etc. In short, our study paves the way towards new methods of exploration and modeling of conflict dynamics inside online discussion communities.
Content may be subject to copyright.
Into the Balefield: antifying and Modeling Intra-community
Conflicts in Online Discussion
Subhabrata Dutta, Dipankar Das
Jadavpur University
Kolkata, India
Gunkirat Kaur, Shreyans Mongia, Arpan
Mukherjee, Tanmoy Chakraborty
IIIT-Delhi, India
Over the last decade, online forums have become primary news
sources for readers around the globe, and social media platforms
are the space where these news forums nd most of their audience
and engagement. Our particular focus in this paper is to study
conict dynamics over online news articles in Reddit, one of the
most popular online discussion platforms. We choose to study how
conicts develop around news inside a discussion community, the
r/news subreddit. Mining the characteristics of these engagements
often provide useful insights into the behavioral dynamics of large-
scale human interactions. Such insights are useful for many reasons
– for news houses to improvise their publishing strategies and
potential audience, for data analytics to get a better introspection
over media engagement as well as for social media platforms to
avoid unnecessary and perilous conicts.
In this work, we present a novel quantication of conict in
online discussion. Unlike previous studies on conict dynamics,
which model conict as a binary phenomenon, our measure is
continuous-valued, which we validate with manually annotated
ratings. We address a two-way prediction task. Firstly, we predict
the probable degree of conict a news article will face from its au-
dience. We employ multiple machine learning frameworks for this
task using various features extracted from news articles. Secondly,
given a pair of users and their interaction history, we predict if
their future engagement will result in a conict. We fuse textual
and network-based features together using a support vector ma-
chine which achieves an AUC of 0.89. Moreover, we implement a
graph convolutional model which exploits engagement histories of
users to predict whether a pair of users who never met each other
before will have a conicting interaction, with an AUC of 0.69.
We perform our studies on a massive discussion dataset crawled
from the Reddit news community, containing over 41
news articles
and 5
5million comments. Apart from the prediction tasks, our
studies oer interesting insights on the conict dynamics – how
users form clusters based on conicting engagements, how dierent
is the temporal nature of conict over dierent online news forums,
how is contribution of dierent language based features to induce
conict, etc. In short, our study paves the way towards new methods
of exploration and modeling of conict dynamics inside online
discussion communities.
ACM Reference Format:
Subhabrata Dutta, Dipankar Das and Gunkirat Kaur, Shreyans Mongia,
Arpan Mukherjee, Tanmoy Chakraborty. 2019. Into the Battleeld: Quan-
tifying and Modeling Intra-community Conicts in Online Discussion. In
CIKM ’19, November 3–7, 2019, Beijing, China
2019. ACM ISBN 978-1-4503-6976-3/19/11. . . $15.00
The 28th ACM International Conference on Information and Knowledge Man-
agement (CIKM ’19), November 3–7, 2019, Beijing, China. ACM, New York,
NY, USA, 10 pages.
Mining knowledge from social media has gained tremendous atten-
tion among the research community in recent years. Endeavours
started with entity recognition, opinion mining, object detection,
etc.; current advents are pushing barriers to more complex analysis
such as inuence detection, malicious activity identication, multi-
modal and heterogeneous data mining, etc. Usage of social media
platforms are now so all-encompassing that these analyses yield
rich insight into individual and community interaction in general.
Online discussion forums are a particular type of social media and
networks, which, due to their ever growing usage and popularity,
need no introduction today. Reddit
is one such example, where
people across the globe engage in discussion related to innumerable
sets of topics.
With more and more people coming together in this virtual
world, dierences of opinions and conict are an inevitability. Con-
ict may arise from several premises – partial knowledge, socio-
political understandings, clash of cultural and moral positions, and
many more. It can be raised and developed from purely virtual indi-
vidual interactions as well as real-world happenings. Although any
dierence of opinion can be identied as a conict, its actual aspects
are versatile. It may manifest itself within a vast spectrum, from con-
structive debates with well-formed argumentation to degenerated,
unhealthy cyber-bullying and abuse. Thus a better introspection
into the complex dynamics of conict over online discussion plat-
forms may provide more useful insights to the data analytics and
social computing community as well as help moderators of online
platforms to identify and eliminate abusive conicts and make the
web a better place.
Versatility in manifestation of conict is also the primary chal-
lenge of modeling conict dynamics. Let us take the following three
comments taken from Reddit:
Comment 1:
I’m talking specically about the 2010 Afghan War
Diary, when Wikileaks was too lazy to scrub the names of about
100 Afghan civilian informants, thus revealing their identities
to Taliban death squads. You actually sound a lot like Assange,
who when asked why he didn’t bother scrubbing the names said
“Well, they’re informants, so, if they get killed, they’ve got it
coming to them. They deserve it.
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
between pair
between pair
with others
with others
Figure 1: Hypothetical state-transition model of conict for
pair of users; state
signies starting of engagement be-
tween a hypothetical user pair.
Comment 2:
Assange didn’t put them in danger. Participating
in an illegal war and murdering innocent people in a country
that never attacked the US put them in danger. Being actual
Nazis put them in danger. Anyone who does that deserves to
have a light shone on what they are doing, so that they hope-
fully stop. So sorry your conscience is worked up over maybes,
instead of the hard reality of all the ACTUAL MURDERING that
the US committed.
Comment 3: You seem a bit daft. If you were in Vichy France,
informing for the Nazis, do you think you would have an expec-
tation of privacy?When people like you start owning up to who
the real monsters are, then the world can change.
Both comments 2 and 3 are put in reply to comment 1, and both
of them hold an opposite view. But how do we decide which one
is more conicting? Comment 3 is more subjectively aggressive
towards the user posting comment 1. However, if we look at the
content, comment 2 presents an opposite opinion in a more pro-
found sense. Previous studies [
] on conicts either treated it as a
binary phenomenon; or identied controversy scores over topics
and not between two text segments [
]. Sophisticated NLP tools
may come handy in this content; however one major downside is
their lack of scalability in handling large-scale online data. In this
work, we focus more on objective, argumentative conict, rather
than subjective, aggressive conict. Simply put, we dene comment
2 to hold more conicting opinion compared to comment 1.
Online discussion platforms, through the lenses of engagement
conict, becomes a more complex dynamical process when the
system interacts frequently with external sources. In this work, the
external source is online news. Reddit has a specic community,
r/news, dedicated to discuss on news articles from various online
news sources. Users post their views regarding news report and
are engaged into discussion. Here a two way conict comes into
play – users holding opposite opinions against a report and users
holding opposite opinions towards each other. These two conicts
are even related; previous studies showed that certain news reports
tend to blow up conict of opinion between readers, mostly due to
the topic of the news, language usage, political bias, etc. [8, 28].
The state transition model in Figure 1 can be hypothesized as an
abstract model of inter-user conict dynamics. A transition from 0
Notation Denotation
TCorpus-wide keyword set
TDKeywords present in document D
T SDTarget-sentiment vector of document D
T Su
Target-sentiment vector averaged over
comments from user u
No. of comments from user ui
containing term T[k]
cf (D1,D2)Conict score between documents D1,D2
nc (N)Total conict towards news article N
G(t)Dynamic user engagement network
Table 1: Important notations used throughout the paper.
to 1 or 2 signies interaction between a hypothetical user pair. Any
state transition from 1 or 2 can be of two types: either the users
interact with each other (solid lines) or they interact with rest of
the users (solid + dashed lines). Then state 1 corresponds to users
having only non-conicting engagement with each other, while
state 2 denotes only conicting engagements. State 3, in either way,
identies user groups who have preferential conict.
This abstract model can be actuated with a dynamic user-user
interaction graph, with edges between users signifying previous
interactions. We can further weigh these edges according to the
degree of conict arose in previous interactions. The problem of
predicting future conict between any two users then translates
into a signed link prediction task.
Our contributions in this work are as follows:
We dene a simple yet powerful and scalable measurement
of conict between pair of documents, focused on objective
expression of opinion. We use target dependent sentiment
scoring to compute a continuous valued score between text
documents. We use this metric to quantify conict between
news reports and their audience as well as between user pairs
interacting over discussion comments, on a large dataset
of news articles and corresponding discussions from Red-
dit r/news. We manually annotate randomly selected news
report-comment pair and comment-comment pair to test
our conict metric. We achieve 0
96 and 0
79 mean squared
error over
interval of conict rating. For ranking
comments according to the conict they express towards
particular news reports and comments, our method achieves
mean average precision of 0.77 and 0.83, respectively.
Using the conict measurement, we attempt to predict the de-
gree of conict a news article will experience from the users
reading and discussing it. Our prediction is solely based on
the content of the article. We extract several textual features
from the articles and employ multiple machine learning al-
gorithms. We achieve symmetric mean average percentage
error of 0.077 with with Support Vector Regression model.
We attempt to predict whether a future interaction between
any two given users will be conicting or not, given their pre-
vious history of comments and engagement. We implement a
Support Vector Machine based framework with selected tex-
tual and network-based features for this task, which achieved
0.89 AUC. We perform a fusion of textual features extracted
from users’ comment history and their interaction feature
over the engagement network using graph convolution over
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
dynamic user-user engagement, which correctly predicts
conict type between users (who have no previous history
of interaction) with 0.69 AUC.
We conduct several experiments using the conict metric
to reveal intriguing patterns of conict dynamics of news
reporting over r/news community. We explore how conict
towards news articles from dierent online news sources
vary over time, and dierent news sources trigger inter-user
conict at dierent degrees.
We explore how inter-user conict patterns emerge over
time in discussion threads as well as in interaction network.
We identify dierent community formation through conict,
which closely follow the abstract conict model we described
in Figure 1.
In this section, we describe previous studies which we deem to be
closely related to our work.
Conict in community interaction
, which is the prime theme
of our study, is a well studied problem in social network theory,
psychology and sociology [
]. Dierent models and valuable
introspection have emerged from these studies, such as how people
tend to adapt towards certain acquaintances after initial conict,
ssion in small group networks post conict, emotional eects of
conict on individuals etc. However, studies on its online counter-
parts are much recent. Most of the studies in controversy and polar-
ization over social media are based on Twitter [
]. Garimella et
al. [
] proposed a graph-based approach to identify controversial
topics on Twitter and measures to quantify controversy of a topic.
They used 20 dierent hashtags to classify topics of conversation.
Partitioning retweet, follow and reply graphs they compute the
controversy related to each topic. Their work suggested the inef-
ciency of content-based measurements of controversy, majorly
attributed by short spans of texts in tweets and high noise. Guerra
et al. [
] proposed a similar approach to measure polarization over
social media; there data also is mostly based on Twitter. However,
one must keep in mind that, the nature of conict for microblogs is
substantially dierent from that of discussion forums, primarily due
to the size of the text. Kumar et al. [
] focused on Reddit to iden-
tify roles of conict in community interactions. They performed
their study on 36,000 Reddit communities (subreddits), identifying
relation between inter-community mobilization and conict. Their
study also includes patterns of how people ‘gang up’ on the verge
of conicting engagements. They predicted mobilizations between
communities based on conicts using user-level, community-level
and text-level features. They achieved 0.67, 0.72 and 0.76 AUC
using Random Forest, LSTM and ensemble of both, respectively.
Our work can be thought of as another side of their story – while
they focused on conict as a inter-community phenomenon, we
attempt to address its dynamics in a microscopic level, inside a
single community.
Stance detection and opinion mining
is closely related to
conict identication and measurement. Most of the previous works
in stance detection are based on stance classication of rumors
in Twitter [
]. Rosenthal and McKewon [
] propsed a
agreement-disagreement identication framework for discussions
in Create Debate and Wikipedia Talkpages. They dened vari-
ous lexical and semantic features from discussion comments and
achieved an average accuracy of 77% on the Create Debate corpus.
Zhang et al. [
] used discourse act classication on Reddit dis-
cussions to characterize agreement-disagreement over discussion
threads. Dutta et al. [
] employed an attention-based hierarchical
LSTM model for further improvement of discourse act classication
on the same dataset.
News popularity prediction
, though does not handle conict
explicitly, is related to this work as it deals with engagement dy-
namics of online news. Previous studies can be classied into two
main heads of approach – popularity of news in social media plat-
forms [
], and popularity of news on web in general [
The second approach deals with the prediction problem unaware
of inter-user network information, thereby excludes the explicit
interaction of users with themselves and with the news sources. Pop-
ularity prediction models focus only on the degree of engagement
a news gets, without concerning about the types of engagement,
which is our focus in this work.
Link prediction on social networks
, as we already stated,
is closely related to our formulated problem of predicting future
conict between users. There is rich literature focusing on this task
]. Bliss et al. [
] used evolutionary algorithm for link
prediction in dynamic networks. One important advancement in
recent times for learning graph-based data is Graph Convolution
Networks [
]. Zhang and Chen [
] applied convolution on
enclosing subgraphs for link prediction. Berg et al. [
] also dened
recommendation as a link prediction problem and used graph auto-
encoder using deep stacking of graph convolutional layers.
We crawled discussion threads containing at least one news link in
the posts or comments from r/news subreddit, starting from 2016-
09-01 to 2019-01-16. Out of 43,343 discussion threads crawled, we
discarded threads containing less than 10 comments. The remaining
17,351 threads containing a total of 5,502,258 comments were used
for the experiments. We also crawled news articles mentioned in
the threads, resulting in a total of 41,430 news articles from 5,175
dierent news sources.2
To evaluate our conict measurement strategy, we employed
three expert annotators
to identify conict between two given
texts (articles/comments). We asked them to rate an interaction
with higher conict score than another if they found more elabo-
rate opposition in the rst one. We provided the annotators with
multiple examples annotated by us (one of these examples is pre-
sented in Section 1). We asked them to annotate the conict in
scale such that non-conicting and highly conicting texts
will receive 0and 10, respectively. For any interaction where only
negativity has been expressed (sarcasm, popular slang without men-
tioning to what or whom it is addressed), we asked the annotators
to rate as 1. We compute nal ratings as the average of the rat-
ings received. A total of randomly selected 3
734 news-comment
pairs and 6
725 comment-comment pairs were annotated. The inter-
annotator agreement based on Fleiss’ κ[13] is 0.79.
2We have made the dataset containing the news articles public.
3They were experts on social media and their age ranged between 25-40 years.
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
Given two text segments, we measure conict between them as
how much opposite sentiment they exhibit. Here, we use target-
dependent sentiment measurement (TD-sentiment) as sentence-
level sentiment may not be a good indicator of stance towards a
motion. Let us take the following two sentences:
Applauds for the writer to rightly explain why immigration is
not a real problem.
This is an extremely good analysis of why immigration should
be stopped.
Both of these sentences have positive sentence-level sentiment,
though they carry conicting opinion towards immigration. TD-
sentiment for the term ‘immigration’ is neutral for sentence 1 and
negative for sentence 2. From this, we can conclude that these two
sentences are potential indicator of conict.
As dened in our problem statement, we compute conict be-
tween news article and platform users as well as between pair
of users. Firstly, we compute a set of keywords from our dataset
(comments + news articles). We tag the sentences using Spacy
parts-of-speech tagger and collect nouns only, after removing stop-
words and lemmatization. To handle co-references of persons, we
substitute nominal pronouns ‘he’ and ‘she’ by the last named-entity
found with ‘Person’-tag. We include all the named entities in our
keyword set, and top 60% of the rest, ranked in order of tf-idf values.
This results in a nal corpus-wide term set T.
Next, we compute TD-sentiment of news articles and comments
using Multi-Task Target Dependent Sentiment Classier (MTTDSC),
a state-of-the-art deep learning framework proposed by Gupta et
al. [
] recently. MTTDSC is informed by feature representation
learnedd for the related auxiliary task of passage-level sentiment
classication. For the auxiliary task and main task, it uses separated
gated recurrent unit (GRU), and sends the respective states to the
fully connected layer, trained for the respective task. The model is
trained and evaluated using multiple manually annotated datasets
[10, 34, 36].
Let a document
(a single comment or a news article) be a
sequence of sentences
[s1,s2,· · · ,sn]
be the keyword
set present in
is the corpus-wide term set dened earlier).
For any
occurring in
, MTTDSC computes a three class
probability (positive, negative and neutral) vector
. Then for all
the occurrences of
, we compute aggregate sentiment of
SD,t∈ {
, where
negative, neutral and positive sentiments are represented by 1, 2
and 3 respectively. Following this, we construct a vector
size |T|such that,
TSD[i]=(SD,T[i]if T[i] ∈ TD
0otherwise (1)
now represents the aggregate sentiments of document
wards all the terms present in it. For any two documents
, we then compute the conict factor (
c f
) between them using
their aggregate TD-sentiment vectors T S D1and T SD2as follows:
c f (D1,D2)=
min(TS D1[i],TSD2[i],1)|TSD1[i] T S D2[i]| (2)
The component
min(TS D1[i],TSD2[i],
returns 0 when either of
terms of
are 0, i.e., the term is not common,
and 1 otherwise. This excludes terms which are not present in
either of the texts to contribute to conict computation. The value
of the component
|TS D1[i] − T S D2[i]|
can be 0 (when both texts
have same sentiment towards the term), 1 (when one of texts hold
neutral sentiment and other one positive or negative) and 2 (when
texts hold opposite sentiments).
Given a news article
and the set of all comments
related to
we dene News Conict Score as,
c f (N,c)(3)
This is a normalized score referring to what degree users oppose
the views presented in the news article. We then extract following
features from news texts to predict this score given a news article:
(1) TD-sentiment vector, entity-wise sentiment expressed in
the news, as we compute T S Din Eq. 1.
(2) Count of positive, negative and neutral words
, tagged
using SenticNet [6].
(3) Cumulative entropy of terms, given by,
t ft(log |T| − log(t ft))
is the set of all unique tokens in the corpus, and
t ft
is the frequency of term tin the news text.
(4) Fraction of controversy and bias words
, measured using
the lexicon sets General Inquirer
and Biased Language
; we
use the fractions of these lexicons present in the article as
controversy and bias features.
(5) Latent semantic features
using ConceptNet Numberbatch
pretrained word vectors
]; we compute TF-IDF weighted
average of the vectors of the words present in an article to
represent latent semantics of the article.
(6) LIX readability
], computed as:
×|cw |
are the sets of words and sentences respec-
tively, and
is the set of words with more than six charac-
ters. Higher value of
indicates harness of the users to read
the article.
(7) Gunning Fog
], computed by: 0
× (ASL +PCW )
, where
ASL is the average sentence length, and PCW is the percent-
age of complex words. Higher value of this index indicates
harness of the users to read the article.
(8) Subjectivity
, calculated using TextBlob
. Its values lie in
the range [0,1].
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
To predict the conict score
, we use three regression models:
Random Forest Regressor
, and
Support Vector Regres-
As already stated, we dene the inter-user conict prediction as a
binary classication task to decide whether two users will engage
in a conict given their previous engagement history. We represent
engagement history as a weighted undirected graph
where every node
represents a user
, and every edge
ei j E
connects two nodes
if and only if
been engaged with each other earlier (i.e., either of them have
commented in reply to at least one comment/post put by other).
Every edge
ei j
is accompanied by a weight
wi j W
equal to the
average conict between
, and
, which is computed as follows:
wi j =1
Ni j
c f (Di
represent the comments posted by
respectively at their
interaction, and
Ni j
is the total number
of such interactions already occurred.
c f (Di
is computed fol-
lowing Eq. 2.
To predict conict between user pairs, we propose four dierent
frameworks: one using graph convolution and three using Support
Vector Machine (SVM) with dierent feature combinations.
6.1 Graph convolution on engagement network
As typical user-user engagement networks of online discussion
platforms are huge in size, we need to implement graph convolution
over a subgraph. To predict the engagement type between a pair of
users corresponding to vertices
, we compute an enclosed
Gsu b ={Vsub ,Esu b }
such that
dis (vi,vk),dis(vj,vk) ≤ dismax
, where
dis (vi,vk)
the length of the shortest path between
, and
dism ax
is a
threshold distance (see Section 7 for more details). All the edges in
Esu b share the same weight as in G.
We compute the adjacency matrix Afrom Gsu b as follows:
A[i][j]=A[j][i]=(wi j if ei j Esub
0otherwise (5)
We represent every node
with a
-dimensional feature vector
, which represents previous commenting history of user
. We compute
as the average over all the feature vectors cor-
responding to previous comments from
, using the same feature
selection method as in Section 5 with an additional feature as fol-
lows – a binary vector representing the news sources the user is
engaged with. This leaves us with a tensor representation of user
vertex features X={x1,x2, . . . , x|V|}.
The adjacency matrix
and the vertex feature tensor
represent network history and comment history of all the users
at an instance, respectively. First, we learn a lower dimensional
feature representation Xfrom Xas follows:
User features Adjacency
convolution Graph
Engagement graph User
No conflict
i-th feature
j-th feature
Figure 2: Inter-user conict prediction using graph convolu-
are kernel and bias matrices to learn while train-
ing and
. We fuse these two histories together
using graph convolution. We compute a degree-normalized adja-
cency matrix
, where
is the degree matrix of
This multiplication normalizes the eect of neighboring vertices
so that higher degree vertices do not get over-weighted. Now, our
convolution at the mth depth is computed as,
is the graph convolution kernel to be learned while train-
ing, and
are the input and the output for the
convolution respectively. Since we use three consecutive convolu-
tion layers, the nal feature representation is H3.
For predicting whether there will be a conicting engagement
between users
, we select the
and the
feature vectors
of H3and compute a score y∈ (0,1)as follows:
E=[H3[i],H3[i]] (8)
stands for the concatenation operator,
the kernel and bias for the classication layer respectively, and
. The complete architecture of the model is
illustrated in Figure 2. This model is trained to minimize cross-
entropy loss between true and predicted labels.
6.2 SVM-based frameworks
Graph convolution automatically learns feature representation for
the interaction between user pairs from node features and con-
nectivity of the nodes. For SVM, we need to manually identify
interaction features. We extract the following textual and network
based features for each user pair ui,uj:
(1) Count of relevant common tokens
from the previous
comments of the users; we take the sum of tf-idf values of
common unigram and bigrams in the comment history of
both the users.
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
(2) Conict vector CVi j
between the pair computed using TD-
sentiment vector
following Eq. 1; given previous
comments of user
{C0,C1, . . . , CNk
where the term
appear, we compute
, the target sentiment vector of
averaged over the history as,
We compute
as the element-wise absolute dierence
between T Suiand T Suj.
(3) Common news sources
CNi j
taken as a vector of length
equal to the number of news sources; for news source
CNi j [k]
indicates the number of articles from this news
source where ui,ujboth are engaged.
(4) Common discussions
, indicating the count of discussions
where both uiand ujare engaged.
(5) Previous mutual engagement
, the total number of previ-
ous interactions between uiand uj.
(6) Previous conict
, the average of mutual conicts between
uiand ujfor their previous engagements.
(7) Neighbor interactions
, the count of conicting and non-
conicting engagements for each user with its neighbor
We use three SVMs with Gaussian kernel – rst SVM uses all the
features mentioned above (SVM-all), the second one (SVM-text)
uses only text based features (features 1 and 2) and the third one
(SVM-net) uses only network based features (features 3-5). SVM-
net, which has been used for negative link prediction by Wang et
al. [37], serves as our external baseline.
For the news-user conict prediction task, total size of our feature
vector is 8
136. On a total set of 41
430 news articles, we used 80 : 20
train-test split keeping the fractions of dierent news sources same
over train and test data.9
For the user-user conict prediction task, the number of features
representing user nodes in the graph convolution model is 8
236. To
construct enclosing subgraphs from user-user engagement network,
we set the value of
(dened in Section 6.1) to be 100. This
results in adjacency matrices with an upper bound of 5000 nodes.
We perform this prediction on 25 instances of the dynamic user
engagement network, taking a total of 1
637 dierent subgraphs
from these instances. For any user pair on these subgraphs, if there
is a conicting engagement between them over an interval of next
24 hours, we label them as positive, otherwise negative. We take
998 dierent user pairs altogether, randomly sampling equal
numbers of positive and negative labels to avoid bias. Here again,
we split the samples into 80 : 20 train-test splits, with 15% of the
train data used as the development set to tune the parameters. We
use Nadam (Adam with Nesterov momentum) optimization to train
the model, with a batch size of 256.
We used scikit-learn framework ( to implement
all the regression models mentioned.
We implement this model using Keras ( and Tensorow frame-
works (https://www.tensor
Conict type RMSE MAP MRR
News-comment conict 0.96 0.77 0.86
Comment-comment conict 0.79 0.83 0.91
Table 2: Evaluation of conict measurement on manually
annotated conict ratings.
050 100 150 200 250 300
Number of words in comment
Error (ytrue - ypred)
Figure 3: Error in conict score vs. size of comments in
7.1 Evaluation of conict quantication
We test our conict measurement on the manually annotated news-
comment and comment-comment pairs (Section 3). To deal with
dierent ranges, we normalize the
c f
values to the
val and measure Root Mean Squared Error (MSE). We also con-
sider ranking comments accordingly to their conicting tendency
towards a particular news article and a particular comment. We
compute the Mean Average Precision (MAP) of the ranking and
Mean Reciprocal Rank (MRR) for top ranking position based on the
ground-truth annotation mentioned in Section 3.
As observed in Table 2, measuring inter-comment conict is
rather an easier task compared to news-comment conict. The
feedback obtained from the annotators reveal that as most news
articles are written in an objective style with less explicit opinion,
it is hard to apprehend whether a comment holds opposite opinion
to the news.
As there is no previous work in quantifying conict between
two text documents over online discussions, we implement the
agreement-disagreement detection models proposed by Rosenthal
and McKewon [
] (
) and Dutta et al. [
] (
Baseline-I performs a three-class classication: agreement,disagree-
ment and none. We identify disagreement as conict and rest of the
classes as non-conict. We also dene the probability of the dis-
agreement class (predicted by Baseline-I) for an interaction as a unit
norm score of conict. Similarly, Baseline-II performs a ten-class
classication of discourse acts, from which we identify the classes
disagreement and negative reaction together as conict, and rest of
the classes as non-conict. Sum of the probabilities of these two
mentioned classes is dened as unit norm conict score predicted
by Baseline-II.
We compare our strategy of conict score prediction with the
baselines through a three-way evaluation strategy:
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
Metric Our method
(conict factor) Baseline 1 Baseline 2
AUC 0.79 0.79 0.62
MAP 0.83 0.61 0.55
RMSE 0.79 1.67 2.09
Table 3: Comparison of conict score with baselines.
We dene a binary classication of interactions into conict
and non-conict, evaluated using ROC-AUC;
We dene a regression of the degree of conict, where we
scale the outputs of each model to the interval
evaluate using RMSE;
We dene a ranking problem of the interactions according
to their degree of conict, and evaluate using MAP.
As both the baselines perform their corresponding tasks (stance
classication and discourse act classication) on discussion data, we
perform this comparisons only for the comment-comment conict
Table 3 shows that our proposed strategy outperforms both the
baselines for ranking and regression tasks. This is quite expected
as both the baseline models are actually classication frameworks.
For the binary classication of conicting and non-conicting in-
teractions, our strategy ties with Baseline-I.
Figure 3 plots the variance of error in conict score with the
change in the comment length. For news-comment pairs, we only
take the comment length, while for comment-comment pairs we
take the average of the length of both comments. To see whether
the error in our score has any bias towards underestimation / over-
estimation, we take the dierence
(ytr ue yp r ed )
, where
ytru e
ypr e d
are manually annotated score and computed
c f
As we can see in Figure 3, our computed score underestimates con-
ict when comments are short, and overestimates as the size grows
(more negative errors for size approximately less than 60 words;
more positive errors afterwards). Also, absolute error rate decreases
with increasing size of comments.
Such error pattern can be explained from the denition of conict
measurement itself. We use the sum of the absolute dierences
of sentiment towards specic targets common in documents, as
conict score which increases with the number of common targets
present. As the length of the comments increases, the common word
set also increases, and small dierences add up to large conict
scores. For short comments, the number of common targets are
also small, and the score tends to reect less conict than actual.
For shorter comments, another problem is the use of semantically
similar words occurring as targets in any of the comments in a given
pair. For example, the sentences ‘We do not support Democrats’ and
We support Hilary’ are actually conicting, as the targets Hilary
and Democrats are semantically similar. But due to no common
words, these pairs will be identied as non-conicting.
However as our dataset suggests, the fraction of comments hav-
ing greater than 50 words is 0
79; and the ratio between the number
of words and targets is 17
678. This is particular to the online dis-
cussion forums, where users tend to get engaged in an elaborate
manner, and therefore reduces the error margin of our conict score.
Our model achieves 0
96 and 0
79 RMSE for news-comment and
comment-comment conict, respectively, over the interval
Random Forest 6.194 2.489 0.099
SVR 4.041 2.010 0.077
Lasso 3.179 1.783 0.080
Table 4: Performance of dierent regression algorithms for
news-user conict prediction.
and bias lexicon
polarity words
polarity words
0246810 12
Feature Importance
Figure 4: Importance of dierent features for news-user con-
ict prediction.
which might be considered as signicantly accurate for conict
7.2 Evaluation of news-user conict prediction
In our dataset, the news conict scores (computed using Eq. 3) of
the news articles vary from 0 to 138.15. In Table 4, we present the
MSE (Mean Squared Error), RMSE and sMAPE (Symmetric Mean
Absolute Percentage Error) for predicting news conict scores using
dierent regression algorithms. In terms of MSE and RMSE, Lasso
regression performs the best, while SVR is the best performing one
when evaluated using sMAPE.
We check to see which features are given more importance by
our best performing regression algorithms. As we can see in Figure
4, term dependent sentiments are the most useful ones to predict
how much likely is a news article to get negative feedback. In fact,
this feature achieves way more importance compared to its next
competitor, which again are polarity-oriented features. Interest-
ingly, the count of negative polarity words has higher importance
than the count of positive polarity words. The high importance of
polarity related features may signify that news report expressing
polarized bias tends to get more conicting remarks. Readability
indices (Gunning-Fog and LIX), albeit low, play some role in the
prediction task. In fact, Gunning-Fog is substantially more useful
compared to LIX.
7.3 Evaluation of inter-user conict prediction
We evaluate all four models for two cases: (i) whole of the test data
where a pair of users may or may not have previous interaction
history, and (ii) user pairs who have no interaction history before
the prediction instance. We present the evaluation results in Table 5.
For the whole test data, SVM model with all the features performs
the best. It is readily conclusive that, network-based features are of
greater importance compared to text-based features for this task.
However, when there is no previous interaction history between
two users, graph convolution beats all the models by a substantial
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
Evaluation SVM-all SVM-text SVM-net GCN
Acc. 0.89 0.64 0.85 0.87
AUC 0.89 0.62 0.84 0.86
Acc. (new) 0.67 0.43 0.67 0.72
AUC (new) 0.65 0.43 0.65 0.69
Table 5: Evaluation of all the models for user-user conict
prediction. Accuracy is abbreviated as Acc. Acc. (new) and
AUC (new) signify evaluation results for user pairs with no
previous interactions.
GuardianReuters New
News USA
Today NBC
News Indepen-
max value
avg. value
min value
Figure 5: Distribution of maximum, minimum and average
conict scores for dierent news sources. This plot is for
only top 7 news sources (ranked by number of articles).
margin. In fact, when there is no previous engagement history
between users, the only feature available to the SVM model is the
neighbour interactions; which means SVM-all and SVM-net actually
become the same model, and SVM-text becomes a model with all
zero features with all zero output.
We introspect into the dynamics of conict in r/news community
using the conict measurements that we propose in Eq. 2 (for inter-
user conict) and Eq. 3 (for an aggregate conict that a news article
receives from the users).
8.1 Patterns of conict for dierent news
Dierent news sources tend to face dierent degree of conict from
the users. In Figure 5, we plot maximum, minimum and average
news conict for dierent news sources in our dataset. Although
the average conict for dierent sources is in a comparable range,
maximum values vary greatly. News sources such as Fox News,
USA Today or NBC News maintain a sustained negative response,
whereas New York Times or Reuters provoke sharp outrage at the
some point.
We nd that this outrage is signied by an article published in
NYTimes on Dec 1, 2017, titled Michael Flynn Pleads Guilty to Lying
to the F.B.I. and Will Cooperate With Russia Inquiry
. Figure 6 also
indicates the sharp peak for New York Times corresponding to this
article. The Guardian, Fox news and NBC News have similar peaks
(red-circled) at nearby time instance, all corresponding to articles
michael-ynn- guilty-russia- investigation.html
NBC News
New York Times
The Guardian FOX News Reuters
Figure 6: Temporal variation of news-user conict for var-
ious news sources; conict score and time are represented
in y- and x-axis respectively. All the plots have time frame
starting from Nov 17 - Dec 28, 2017. Red circled peaks denote
rise in conict due to articles corresponding to a particular
related to the same event. One can draw an intuitive correlation
between the posting time of the article in the forum and the rise
in conict. It is important to note that at the time of posting, we
identify the time when the news appeared on Reddit, not the time
of its appearance on web.
8.2 Engagement dynamics and inter-user
To explore how conict eects user engagement over r/news, we
construct a temporal graph
, where
vi(ti) ∈
corresponds to user
who is engaged in a discussion at time
for the rst time. For every pair of users
engaging with
each other (anyone of them commenting in reply to the other) at
ti j
, there is an edge
ei j (ti j ) ∈ E(t)
. For better visualization,
we classify edges as conicting (blue) and non-conicting (green),
and plot only a subgraph using 5000 vertices. We use Fruchter-
man Reingold layout algorithm [
] on Gephi [
] to plot the graph
and DyCoNet[
] to identify communities. In Figure 7, we present
snapshots of the evolving graph. Each snapshot is taken at a time
dierence of approximately 24 hours, presenting a 4-day long ab-
straction through this engagement subgraph.
We can observe the formation of separate user clusters in terms
of engagement. It is interesting to see that there are some clus-
ters where users are predominantly engaged with each other in a
conicting manner (blue regions) and some in a non-conicting
manner (green regions). We also identify three dierent types of
engagement patterns in user clusters:
clusters tend to be formed with non-conicting en-
gagement between users. Users in these clusters do not seem
to get engaged in conicting manner with users in other
clusters as well.
clusters are formed with users having mutual con-
ict. They tend to have conicting interactions with other
clusters as well.
clusters show a organization-like behavior. These
users maintain almost non-conicting engagement with each
other, but aggressive towards other clusters (mostly green
regions inside the cluster and blue ones outwards in Figure
antifying and Modeling Intra-community Conflicts in Online Discussion CIKM ’19, November 3–7, 2019, Beijing, China
Type-I cluster
Type-II cluster
Type-III cluster
Figure 7: Snapshots of cluster formation in user-user engagement graph (left to right); blue and green edges correspond to
controversial and non-controversial engagements respectively.
Type-III clusters tend to grow most compared to type-I and
type-II clusters. Dierent type-III clusters have most inter-cluster
conicts, even greater than that of type-II clusters. Type-I clusters
show least growth rate among three types, signifying that these
users are less prone to go out of their ‘comfort zone’.
This cluster types are of course not completely rigid. Although
there is no sign of conversion between type-I and II, both of them
can slowly convert into type-III. It is intriguing to observe two
dierent patterns in the formation of type-III cluster – (i) Some
of them emerge as type-III from the beginning. Users having no
previous engagement form non-conicting connections with each
other. This may signify a probable community interaction among
them beyond the discussion platform such as organized campaign-
ers, small group of people using multiple fake user accounts aka
sockpuppets [
], people are accustomed to each other in real life
and sharing similar opinions, etc. (ii) Some of them started as type-I
or II and slowly get converted into type-III, which possibly signies
the evolution of engagement via predominant platform interaction.
Users in type-II clusters start changing opinion towards each other
with long term interaction and get converted into type-III. Simi-
larly, type-I users tend to start interacting with opposite opinions
and convert themselves into type-III. We observe that 33% of the
type-III clusters at the end of time frame are the ones converted
from type-II, whereas 48% are from type-I. Rest of them started
growing as type-III clusters.
Depth of thread
Normalized inter-comment conflict
Figure 8: Variation of inter-comment conict with depth of
comments in discussion tree.
Formation and evolution of these clusters closely follow the
abstract model of user engagement in Figure 1. A repeated transition
from state 1 along the self loop results in type-I cluster, whereas
the same happening on state 2 will result in a type-II cluster. If all
the user pairs from state 1 start conicting with each other, it will
lead to a transition to state 2, which implies that a type-I cluster is
transformed into type-II. This can only be possible hypothetically;
however we did not nd any such evidence in our dataset. Likewise,
transition from state 1 or 2 to state 3 signies preferential conict,
resembling type-III clusters.
In Figure 8, we plot the variation of inter-comment conict with
the depth of the comments in discussion thread tree. We normalize
conict scores to
interval. For comment pairs at depth
1, we plot their conict at depth
. As it is evident from the plot,
a discussion thread is most prone to conict at depth levels 3 and 4.
For interactions at more depth, variance goes up substantially, but
average inter-conict score drops steadily.
Table 6 shows an example statistics of dierent news sources
regarding which discussions lead to user clusters. We report this
for three dierent instances
, and
at time
respectively. We take the discussions initiated within past 24
hours for each instance of the network and map the users in each of
the largest three clusters to those discussions. As each discussion is
related to a news source, this nally maps news sources to clusters.
As we can see in Table 6, there are several common news sources
present in rst and second instances, whereas almost no common
sources is found in the third instance.
In this paper, we studied conict dynamics over online discussions
inside Reddit r/news community. We proposed a novel, continuous-
valued quantication of inter-document conict. Using this mea-
surement we attempted to predict how much negative response
a news article is going to face from audience in online discussion
platforms, solely based on its textual features. We proposed an
SVM based model and a graph convolutional model to predict fu-
ture conict between pairs of users. Extensive evaluation showed
that network-based features are more important in conict link
prediction compared to textual content-based features.
Our analyses provide novel insights into the conict dynamics
over large-scale online discussion. We show how dierent news
sources get dierent reactions from their audience and how this
varies temporally. We identied three distinct types of user clus-
ters developed in Reddit r/news community, based on the attitude
towards other users and engagement patterns. We also provided a
hypothetical state-transition model of user engagement, which is
closely followed by actual interaction patterns.
CIKM ’19, November 3–7, 2019, Beijing, China Dua et al.
Cluster index
ranked along size Instance 1 Instance 2 Instance 3
Baltimore News (40.02%) Comic Book (76.48%) New York Times (49.03%)
Wichita Eagle (25.92%) Wichita Eagle (11.89%) Fox News (25.08%)
National Geographic (20.00%) Fox News (1.91%) abc13 (25.88%)
Fox News (13.33%) Detroit News (0.54%)
Baltimore News (100%) Wichita Eagle (50.57%) BBC (78.52%)
Baltimore News (47.12%) Independent (18.61%)
National Geographic (2.29%) New York Times (2.87%)
abc13 (44.44%) Baltimore News (100%) Guardian (42.98%)
Fox News (27.78%) Independent (41.32%)
Baltimore News (22.24%) Detroit News (15.70%)
Table 6: Percentage of dierent news sources in user clusters of user-user engagement network. We show the statistics of three
largest clusters at three dierent instances of the network. Up to top four news sources (according to %-contribution) is shown.
The project was partially supported by Ramanujan Fellowship
(SERB, India), Early Career Research Award (ECR/2017/001691),
the Infosys Centre for AI, IIITD, and State Government Fellowship,
Jadavpur University.
Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin
Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in
social media. ACM Transactions on the Web (TWEB) 6, 2 (2012), 9–29.
Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. 2009. Gephi: an open
source software for exploring and manipulating networks. In ICWSM. 11–20.
Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolu-
tional matrix completion. arXiv preprint arXiv:1706.02263 (2017).
Carl-Hugo Björnsson. 1983. Readability of newspapers in 11 languages. Reading
Research Quarterly (1983), 480–497.
Catherine A Bliss, Morgan R Frank, Christopher M Danforth, and Peter Sheridan
Dodds. 2014. An evolutionary algorithm approach to link prediction in dynamic
social networks. Journal of Computational Science 5, 5 (2014), 750–764.
Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018.
SenticNet 5: Discovering conceptual primitives for sentiment analysis by means
of context embeddings. In AAAI. 1–10.
Michael D Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves,
Filippo Menczer, and Alessandro Flammini. 2011. Political polarization on twitter.
In ICWSM. 1–10.
Peter A Cramer. 2011. Controversy as news discourse. Vol. 19. Springer Science &
Business Media.
Michaël Deerrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu-
tional neural networks on graphs with fast localized spectral ltering. In NIPS.
Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, and Ke Xu. 2014.
Adaptive recursive neural network for target-dependent twitter sentiment classi-
cation. In ACL, Vol. 2. 49–54.
Subhabrata Dutta, Tanmoy Chakraborty, and Dipankar Das. 2019. How did the
discussion go: Discourse act classication in social media conversations. In
Linking and Mining Heterogeneous and Multi-view Data. Springer, 137–160.
Kelwin Fernandes, Pedro Vinagre, and Paulo Cortez. 2015. A proactive intelli-
gent decision support system for predicting the popularity of online news. In
Portuguese Conference on Articial Intelligence. Springer, 535–546.
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters.
Psychological bulletin 76, 5 (1971), 378.
Thomas MJ Fruchterman and Edward M Reingold. 1991. Graph drawing by force-
directed placement. Software: Practice and experience 21, 11 (1991), 1129–1164.
Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael
Mathioudakis. 2018. Quantifying controversy on social media. ACM Transactions
on Social Computing 1, 1 (2018), 3.
Venkata Rama Kiran Garimella and Ingmar Weber. 2017. A long-term analysis of
polarization on Twitter. In ICWSM. 1–10.
Eric Gilbert and Karrie Karahalios. 2009. Predicting tie strength with social media.
In SIGCHI. ACM, 211–220.
Pedro Calais Guerra, Wagner Meira Jr, Claire Cardie, and Robert Kleinberg.
2013. A measure of polarization on social media networks based on community
boundaries. In ICWSM. 1–10.
Robert Gunning. 1969. The fog index after twenty years. Journal of Business
Communication 6, 2 (1969), 3–13.
Divam Gupta, Kushagra Singh, Soumen Chakrabarti, and Tanmoy Chakraborty.
2019. Multi-task Learning for Target-dependent Sentiment Classication. arXiv
preprint arXiv:1902.02930 (2019).
Julie Kauman, Aristotelis Kittas, Laura Bennett, and Sophia Tsoka. 2014. Dy-
CoNet: a Gephi plugin for community detection in dynamic complex networks.
PloS one 9, 7 (2014), e101357.
Yaser Keneshloo, Shuguang Wang, Eui-Hong Han, and Naren Ramakrishnan.
2016. Predicting the popularity of news articles. In SDM. SIAM, 441–449.
Thomas N Kipf and Max Welling. 2016. Semi-supervised classication with graph
convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
Srijan Kumar, Justin Cheng, Jure Leskovec, and V.S. Subrahmanian. 2017. An
Army of Me: Sockpuppets in Online Discussion Communities. In WWW. 857–
Srijan Kumar, William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2018. Com-
munity interaction and conict on the web. In WWW. International World Wide
Web Conferences Steering Committee, 933–943.
David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for
social networks. Journal of the American society for information science and
technology 58, 7 (2007), 1019–1031.
Michal Lukasik, PK Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, and
Trevor Cohn. 2016. Hawkes processes for continuous time sequence classication:
an application to rumour stance classication in twitter. In ACL, Vol. 2. 393–398.
Marian Meyers. 1994. Dening Homosexuality: News Coverage of theRepeal the
Ban’Controversy. Discourse & Society 5, 3 (1994), 321–344.
Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin
Cherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In SemEval. 31–41.
Reed E Nelson. 1989. The strength of strong ties: Social networks and intergroup
conict in organizations. Academy of Management Journal 32, 2 (1989), 377–401.
Alicja Piotrkowicz, Vania Dimitrova, Jahna Otterbacher, and Katja Markert. 2017.
Headlines matter: Using headlines to predict the popularity of news articles on
twitter and facebook. In ICWSM. 1–10.
Georgios Rizos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2016. Pre-
dicting news popularity by mining online discussions. In WWW. International
World Wide Web Conferences Steering Committee, 737–742.
Sara Rosenthal and Kathy McKeown. 2015. I couldn’t agree more: The role of
conversational structure in agreement and disagreement detection in online
discussions. In Proceedings of the 16th Annual Meeting of the Special Interest Group
on Discourse and Dialogue. 168–177.
Niek J Sanders. 2011. Sanders-twitter sentiment corpus. Sanders Analytics LLC
242 (2011).
Robert Speer and Joshua Chin. 2016. An ensemble method to produce high-quality
word embeddings. arXiv preprint arXiv:1604.01692 (2016).
Bo Wang, Maria Liakata, Arkaitz Zubiaga, and Rob Procter. 2017. Tdparse: Multi-
target-specic sentiment recognition on twitter. In EMNLP. 483–493.
Peng Wang, BaoWen Xu, YuRong Wu, and XiaoYu Zhou. 2015. Link prediction
in social networks: the state-of-the-art. Science China Information Sciences 58, 1
(2015), 1–38.
Bo Wu and Haiying Shen. 2015. Analyzing and predicting news popularity on
Twitter. International Journal of Information Management 35, 6 (2015), 702–711.
Wayne W Zachary. 1977. An information ow model for conict and ssion in
small groups. Journal of anthropological research 33, 4 (1977), 452–473.
Amy Zhang, Bryan Culbertson, and Praveen Paritosh. 2017. Characterizing
online discussion using coarse discourse sequences. (2017).
Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural
networks. In NIPS. 5165–5175.
Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik,
Kalina Bontcheva, Trevor Cohn, and Isabelle Augenstein. 2018. Discourse-aware
rumour stance classication in social media using sequential classiers. Informa-
tion Processing & Management 54, 2 (2018), 273–290.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Over the last two decades, social media has emerged as almost an alternate world where people communicate with each other and express opinions about almost anything. This makes platforms like Facebook, Reddit, Twitter, Myspace, etc., a rich bank of heterogeneous data, primarily expressed via text but reflecting all textual and non-textual data that human interaction can produce. We propose a novel attention-based hierarchical LSTM model to classify discourse act sequences in social media conversations, aimed at mining data from online discussion using textual meanings beyond sentence level. The very uniqueness of the task is the complete categorization of possible pragmatic roles in informal textual discussions, contrary to extraction of question–answers, stance detection, or sarcasm identification which are very much role specific tasks. Early attempt was made on a Reddit discussion dataset. We train our model on the same data, and present test results on two different datasets, one from Reddit and one from Facebook. Our proposed model outperformed the previous one in terms of domain independence; without using platform-dependent structural features, our hierarchical LSTM with word relevance attention mechanism achieved F1-scores of 71% and 66%, respectively, to predict discourse roles of comments in Reddit and Facebook discussions. Efficiency of recurrent and convolutional architectures in order to learn discursive representation on the same task has been presented and analyzed, with different word and comment embedding schemes. Our attention mechanism enables us to inquire into relevance ordering of text segments according to their roles in discourse. We present a human annotator experiment to unveil important observations about modeling and data annotation. Equipped with our text-based discourse identification model, we inquire into how heterogeneous non-textual features like location, time, leaning of information, etc. play their roles in characterizing online discussions on Facebook.
Full-text available
Traditional methods for link prediction can be categorized into three main types: graph structure feature-based, latent feature-based, and explicit feature-based. Graph structure feature methods leverage some handcrafted node proximity scores, e.g., common neighbors, to estimate the likelihood of links. Latent feature methods rely on factorizing networks' matrix representations to learn an embedding for each node. Explicit feature methods train a machine learning model on two nodes' explicit attributes. Each of the three types of methods has its unique merits. In this paper, we propose SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction), a new framework for link prediction which combines the power of all the three types into a single graph neural network (GNN). GNN is a new type of neural network which directly accepts graphs as input and outputs their labels. In SEAL, the input to the GNN is a local subgraph around each target link. We prove theoretically that our local subgraphs also reserve a great deal of high-order graph structure features related to link existence. Another key feature is that our GNN can naturally incorporate latent features and explicit features. It is achieved by concatenating node embeddings (latent features) and node attributes (explicit features) in the node information matrix for each subgraph, thus combining the three types of features to enhance GNN learning. Through extensive experiments, SEAL shows unprecedentedly strong performance against a wide range of baseline methods, including various link prediction heuristics and network embedding methods.
Conference Paper
Full-text available
Social media like Facebook or Twitter have become an entry point to news for many readers. In that scenario, the headline is the most prominent – and often the only visible – part of the news article. We propose a novel task of using only headlines to predict the popularity of news articles. The prediction model is evaluated on headlines from two major broadsheet news outlets – The Guardian and New York Times. We significantly improve over several baselines, noting differences in the model performance between Facebook and Twitter.
Full-text available
Social media has played an important role in shaping political discourse over the last decade. At the same time, it is often perceived to have increased political polarization, thanks to the scale of discussions and their public nature. In this paper, we try to answer the question of whether political polarization in the US on Twitter has increased over the last eight years. We analyze a large longitudinal Twitter dataset of 679,000 users and look at signs of polarization in their (i) network - how people follow political and media accounts, (ii) tweeting behavior - whether they retweet content from both sides, and (iii) content - how partisan the hashtags they use are. Our analysis shows that online polarization has indeed increased over the past eight years and that, depending on the measure, the relative change is 10%-20%. Our study is one of very few with such a long-term perspective, encompassing two US presidential elections and two mid-term elections, providing a rare longitudinal analysis.
Conference Paper
Full-text available
The paper presents a framework for the prediction of several news story popularity indicators, such as comment count, number of users, vote score and a measure of controversiality. The framework employs a feature engineering approach, focusing on features from two sources of social interactions inherent in online discussions: the comment tree and the user graph. We show that the proposed graph-based features capture the complexities of both these social interaction graphs and lead to improvements on the prediction of all popularity indicators in three online news post datasets and to significant improvement on the task of identifying controversial stories. Specifically, we noted a 5% relative improvement in mean square error for controversiality prediction on a news-focused Reddit dataset compared to a method employing only rudimentary comment tree features that were used by past studies.
In this work, we present a novel method for classifying comments in online discussions into a set of coarse discourse acts towards the goal of better understanding discussions at scale. To facilitate this study, we devise a categorization of coarse discourse acts designed to encompass general online discussion and allow for easy annotation by crowd workers. We collect and release a corpus of over 9,000 threads comprising over 100,000 comments manually annotated via paid crowdsourcing with discourse acts and randomly sampled from the site Reddit. Using our corpus, we demonstrate how the analysis of discourse acts can characterize different types of discussions, including discourse sequences such as Q&A pairs and chains of disagreement, as well as different communities. Finally, we conduct experiments to predict discourse acts using our corpus, finding that structured prediction models such as conditional random fields can achieve an F1 score of 75%. We also demonstrate how the broadening of discourse acts from simply question and answer to a richer set of categories can improve the recall performance of Q&A extraction.
With the recent development of deep learning, research in AI has gained new vigor and prominence. While machine learning has succeeded in revitalizing many research fields, such as computer vision, speech recognition, and medical diagnosis, we are yet to witness impressive progress in natural language understanding. One of the reasons behind this unmatched expectation is that, while a bottom-up approach is feasible for pattern recognition, reasoning and understanding often require a top-down approach. In this work, we couple sub-symbolic and symbolic AI to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities in a new three-level knowledge representation for sentiment analysis. In particular, we employ recurrent neural networks to infer primitives by lexical substitution and use them for grounding common and commonsense knowledge by means of multi-dimensional scaling.
Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse features inherent in social media interactions or 'conversational threads'. Testing the effectiveness of four sequential classifiers -- Hawkes Processes, Linear-Chain Conditional Random Fields (Linear CRF), Tree-Structured Conditional Random Fields (Tree CRF) and Long Short Term Memory networks (LSTM) -- on eight datasets associated with breaking news stories, and looking at different types of local and contextual features, our work sheds new light on the development of accurate stance classifiers. We show that sequential classifiers that exploit the use of discourse properties in social media conversations while using only local features, outperform non-sequential classifiers. Furthermore, we show that LSTM using a reduced set of features can outperform the other sequential classifiers; this performance is consistent across datasets and across types of stances. To conclude, our work also analyses the different features under study, identifying those that best help characterise and distinguish between stances, such as supporting tweets being more likely to be accompanied by evidence than denying tweets. We also set forth a number of directions for future research.
Conference Paper
In online discussion communities, users can interact and share information and opinions on a wide variety of topics. However, some users may create multiple identities, or sockpuppets, and engage in undesired behavior by deceiving others or manipulating discussions. In this work, we study sockpuppetry across nine discussion communities, and show that sockpuppets differ from ordinary users in terms of their posting behavior, linguistic traits, as well as social network structure. Sockpuppets tend to start fewer discussions, write shorter posts, use more personal pronouns such as "I", and have more clustered ego-networks. Further, pairs of sockpuppets controlled by the same individual are more likely to interact on the same discussion at the same time than pairs of ordinary users. Our analysis suggests a taxonomy of deceptive behavior in discussion communities. Pairs of sockpuppets can vary in their deceptiveness, i.e., whether they pretend to be different users, or their supportiveness, i.e., if they support arguments of other sockpuppets controlled by the same user. We apply these findings to a series of prediction tasks, notably, to identify whether a pair of accounts belongs to the same underlying user or not. Altogether, this work presents a data-driven view of deception in online discussion communities and paves the way towards the automatic detection of sockpuppets.