ChapterPDF Available

Fake News Detection on Twitter Using Propagation Structures

Authors:

Abstract and Figures

The growth of social media has revolutionized the way people access information. Although platforms like Facebook and Twitter allow for a quicker, wider and less restricted access to information, they also consist of a breeding ground for the dissemination of fake news. Most of the existing literature on fake news detection on social media proposes user-based or content-based approaches. However, recent research revealed that real and fake news also propagate significantly differently on Twitter. Nonetheless, only a few articles so far have explored the use of propagation features in their detection. Additionally, most of them have based their analysis on a narrow tweet retrieval methodology that only considers tweets to be propagating a news piece if they explicitly contain an URL link to an online news article. By basing our analysis on a broader tweet retrieval methodology that also allows tweets without an URL link to be considered as propagating a news piece, we contribute to fill this research gap and further confirm the potential of using propagation features to detect fake news on Twitter. We firstly show that real news are significantly bigger in size, are spread by users with more followers and less followings, and are actively spread on Twitter for a longer period of time than fake news. Secondly, we achieve an 87% accuracy using a Random Forest Classifier solely trained on propagation features. Lastly, we design a Geometric Deep Learning approach to the problem by building a graph neural network that directly learns on the propagation graphs and achieve an accuracy of 73.3%.
Content may be subject to copyright.
Fake News Detection on Twitter Using
Propagation Structures
Marion Meyers(B
), Gerhard Weiss, and Gerasimos Spanakis
Department of Data Science and Knowledge Engineering, Maastricht University,
Maastricht, The Netherlands
marion.meyers@hotmail.com,
{gerhard.weiss,jerry.spanakis}@maastrichtuniversity.nl
Abstract. The growth of social media has revolutionized the way people
access information. Although platforms like Facebook and Twitter allow
for a quicker, wider and less restricted access to information, they also
consist of a breeding ground for the dissemination of fake news. Most
of the existing literature on fake news detection on social media pro-
poses user-based or content-based approaches. However, recent research
revealed that real and fake news also propagate significantly dierently
on Twitter. Nonetheless, only a few articles so far have explored the use
of propagation features in their detection. Additionally, most of them
have based their analysis on a narrow tweet retrieval methodology that
only considers tweets to be propagating a news piece if they explicitly
contain an URL link to an online news article. By basing our analysis on
a broader tweet retrieval methodology that also allows tweets without
an URL link to be considered as propagating a news piece, we contribute
to fill this research gap and further confirm the potential of using propa-
gation features to detect fake news on Twitter. We firstly show that real
news are significantly bigger in size, are spread by users with more follow-
ers and less followings, and are actively spread on Twitter for a longer
period of time than fake news. Secondly, we achieve an 87% accuracy
using a Random Forest Classifier solely trained on propagation features.
Lastly, we design a Geometric Deep Learning approach to the problem by
building a graph neural network that directly learns on the propagation
graphs and achieve an accuracy of 73.3%.
Keywords: Fake news ·Twitter ·Propagation
1Introduction
The way people access information and news has radically shifted since the rise
of social networks. From being platforms centered around creating and maintain-
ing better social connections, applications such as Facebook and Twitter have
become news providers for many of their users [3]. Twitter, with its 326 million
monthly active users, has become more than just a social platform but has re-
invented how citizens interact with each other and access information about the
c
!Springer Nature Switzerland AG 2020
M. van Duijn et al. (Eds.): MISDOOM 2020, LNCS 12259, pp. 138–158, 2020.
https://doi.org/10.1007/978-3-030-61841-4_10
Fake News Detection on Twitter Using Propagation Structures 139
world [11,19]. As those platforms constitute a place where any opinion can be
expressed and shared, they are also highly exposed to the dissemination of fake
information. While traditional media sources such as newspapers and the tele-
vision have a one-to-many structure, information on social media is shared on
a many-to-many fashion hence making the monitoring of the information being
diused a much more complicated task.
The term fake news has been the subject of much controversy in the past
years. Many definitions exist but none is universally accepted. It often encom-
passes notions such as manipulation, disinformation (information purposefully
misleading), misinformation (information that is verifiably fake) and rumors [14].
In order to remain consistent throughout this article, the terms fake news,fake
information and fake fact will be used interchangeably and their definitions will
be restricted to claims that are verifiably false. Similarly, real news,real infor-
mation and real fact will refer to claims that are verifiably true.
Fake news are referred to by many institutions and governments as one of the
most dangerous threats to our current society [12], for example because of their
influence on elections’ results [6,9,10,15,16,23]. As the power and dangers of fake
news are increasingly acknowledged, many groups are taking actions against
their diusion, but a systematic way to detect them on social media is still
lacking. Most approaches to fake news detection make use of user and content-
based features. However, a recent study showed that fake and real news have
significantly dierent propagation patterns [28]. This suggests that propagation
features could be successfully used as a basis for classification. Additionally,
compared to content-based features, propagation characteristics present the key
advantage of being language independent. However, only a few studies so far
have leveraged these features for the fake news detection task. Additionally,
they have only done so on URL-restricted data sets, defined throughout this
research as data sets created by a tweet retrieval methodology where a tweet
is only considered to be propagating a news piece if it explicitly contains an
URL link to an online news article. In contrast, we define a non-URL-restricted
data set as one created by a tweet retrieval methodology that also allows tweets
without an URL link to be considered as propagating a news piece.
Building on the apparent potential of propagation features to detect fake
news on Twitter, and considering the narrow definition of news used in most of
the research so far, this paper contributes to filling this research gap by answering
the following research question: given a news graph G, defined here as a set of
tweets and retweets that have been associated to a specific news item using a non
URL-restricted retrieval methodology, how significant are propagation features
at classifying Gas a real or a fake piece of information?
This paper answers this question in 2 ways. On one hand, it does so by
further investigating the significant dierences in the propagation of real and fake
information on a non URL-restricted Twitter data set. On the other, it evaluates
the performance of 2 dierent types of classifiers that solely leverage propagation
information: a Random Forest Classifier trained on manually extracted features
140 M. Meyers et al.
from the propagation graphs, and a Geometric Deep Learning approach directly
applied on the full graphs representation. Our code is available via GitHub1
2 Related Work
Approaches to fake news detection typically make use of 3 types of information:
user-based, news-based and propagation-based [26].
First, user-based approaches have shown promising classification results.
Indeed, features extracted from user profiles such as their amount of follow-
ers and followings, their time since creation as well as their activity rate have
shown to dier between real and fake information [2,22]. Additionally, user-based
approaches to fake news detection have been further supported by the evidence
that fake accounts play a great role in the dissemination of fake information on
social media [6,20,23,24]. Hence, detecting fake accounts on social media is a
valuable proxy for attempting to detect fake news [4].
Second, some approaches discriminate real and fake information on social
media based on the content of the message being spread. This entails the topic
being discussed in the post but also the type of words used, the sentiment por-
trayed and the ‘non-linguistic’ information such as the number of question marks
or exclamation points employed. [2] for example showed that tweets displaying a
stronger sentiment, containing many question marks or smiling emoticons were
more likely to be related to non-credible news.
Third, propagation-based approaches classify real and fake information based
on their respective diusion patterns on social media. They are built on a the-
oretical framework of news diusion on social media to which a considerable
amount of research has been dedicated [21,27,30,31]. Propagation models gen-
erally represent tweets (or users) as nodes of a graph and social connections
(follower, following) or influence paths (retweet, mention, comments, etc.) as
edges. Throughout this article, those graphs will interchangeably be referred to
as propagation graphs, propagation models, propagation structures or propaga-
tion networks. While user-based and content-based approaches have been the
main focus in the existing literature, considerably less research has been dedi-
cated to applying propagation features to the fake news detection task. However,
some articles have successfully proved that fake and real news present signifi-
cantly dierent propagation patterns on Twitter. [28] discovered that real news
take about 6 times as long as fake news to reach 1500 users, consistently reach
less users in total and were less retweeted. Additionally, [13] proved that fake
news have a more fluctuated temporal diusion. Then, a few attempts to make
use of propagation features to detect fake news on Twitter have been developed.
[2] combined dierent types of features (message-based, user-based, topic-based
and propagation-based) and demonstrated that network features such as the
number of tweets in the graph and the average node degree played a key role in
their classifier’s performance. Furthermore, [13] showed that the temporal fea-
tures extracted from the propagation graphs allowed their classifier to achieve
1https://github.com/MarionMeyers/fake news detection propagation.
Fake News Detection on Twitter Using Propagation Structures 141
better results than the baseline performance. Together, those articles suggest
that propagation structures seem like promising features for classifying real and
fake information on Twitter.
However, a more novel approach to graph classification that aims to opti-
mize the use of propagation features has recently been applied to the problem.
[5]makesuseoftherecentadvancesinGeometricDeepLearningtoclassifynews
directly on their Twitter propagation graphs and achieves state-of-the-art classi-
fication results (92.7% AUC ROC). The field of ’Geometric Deep Learning’ refers
to methods that adapt deep learning approaches to higher dimensional data such
as graphs and manifolds. Indeed, most machine learning approaches only work
on Euclidian data, ie. 2-dimensional lists of features. When applied on graphs,
this means reducing and discarding parts of the information through the man-
ual choice of the 2D features to extract. Geometric Deep Learning approaches
counter this limitation by designing neural networks able to learn directly from
the 3D representation of the input: Graph Neural Networks. This entails the cre-
ation of layers able to cope with a varying input size since the training graphs
have a dierent number of nodes and edges: Graph Convolutional Layers [29].
The success of this approach once again supports the relevance of using propa-
gation features to classify real and fake news [17].
Lastly, both [28] and [5] gather news on Twitter by collecting URL links
relating to a news article from fact-checking websites such as Snopes.com or
Politifact.com2,3. Those websites collect news and score them on a veracity scale
based on extensive investigation by independent journalists. Next, they either
gather all tweets containing these URL links together with their corresponding
retweets [5], or gather all reply tweets containing those URL links together with
the original tweet and its associated retweets [28]. Both approaches lead to the
creation of a data set where each array of tweets relating to a certain news
item is labelled real or fake depending on the veracity of the article they are
sharing. As previously defined, their approaches both present a URL-restricted
tweet retrieval methodology.
3Dataset
3.1 Dataset Collection
In our research we make use of the FakeNewsNet data set created in response to
a clear lack of existing fake news data sets [25]. Their approach to data collection
is to gather news articles from fact-checking organizations (Politifact and Gos-
sipcop) together with their truth label assigned by independent journalists. From
those labelled news articles, the headline is extracted and separated into a set of
keywords. Then, those keywords are concatenated into a query for the Twitter
API. For each news article, labelled real or fake, dierent kinds of information
are then accessed:
2www.politifact.com.
3www.snopes.com.
142 M. Meyers et al.
news content: the body of the article, images, publish date
tweets: the list of tweets containing the article headline keywords
retweets: the list of retweets of all tweets previously retrieved
– user information: the profile information (user id, creation date, 200 most
recent published tweets, list of followers and friends) of all users that have
posted a tweet or retweet related to the news article.
Not only does this data set provide us with the necessary information to
create the propagation graphs detailed in the following section, but it also uses
a non URL-restricted tweet retrieval methodology. Indeed, instead of collecting
tweets that explicitly contain the URL link to the news piece, it gathers all
tweets that contain the keywords associated with the article’s headline.
The data set downloaded contains 347 fake news graphs and 310 real ones
for a total of 518,684 tweets and 686,245 retweets.
Due to retrieval rate limitations imposed by Twitter, some parts of the data
set require a very long time to be collected and were hence not included in this
research. This includes both followers and followings information. Additionally,
this limitation also led us to restrict the data set only to the Politifact website.
3.2 Propagation Graphs Creation
The propagation graphs, derived from the set of tweets and retweets correspond-
ing to a labelled piece of information, are defined as follows:
Let V be the set of nodes of the graph. A node can be of two types:
1. A tweet node: the node stores the tweet and its associated user. A tweet
belongs to a news graph if it contains the keywords extracted from the
headline of the news article.
2. A retweet node: the node stores the retweet and its associated user. All
retweets of a tweet node are present in the graph.
Let E be the set of edges of the graph. Edges are drawn between a tweet
and its retweets. Edges contain a time weight that corresponds to the time
dierence between the tweet and retweet publish times.
Then G = (V,E) is the news graph. G is then a composition of non-connected
sub-graphs where each sub-graph comprises a tweet and its associated retweets.
It is important to note that Twitter is designed in such a way that a retweet of
a retweet will point back to the original tweet. Hence, the depth of the graph is
never more than 1.
Fake News Detection on Twitter Using Propagation Structures 143
Fig. 1. Example of a propagation graph
4OurApproach
Our research consists of two main steps:
1. Manually extract features from the propagation graphs in order to further
investigate the possible significant dierences between how real and fake infor-
mation propagate on Twitter.
2. Build 2 classifiers trained on the propagation graphs (1) a classifier trained
on the manually extracted features (2) a Geometric Deep Learning approach
trained on the propagation graphs themselves.
4.1 Manual Extraction of Propagation Features
Table 1presents all features extracted from the propagation graphs. Once
extracted from all graphs, we perform a t-test statistical analysis on the means
of the features in the real news and fake news graphs with a 0.05 significant
level. Additionally, we perform an outlier analysis for several features in order
to gain a better understanding of our data. Lastly, we look more in depth at
the propagation of the tweets and retweets over time and analyze the temporal
characteristics of their spread.
144 M. Meyers et al.
Table 1 . Features Extracted From Each News Graph.
Scope Feature Description
User/Social
Context
Features
Avg number of
followers
For each user that has either posted a tweet or a retweet
in the graph, his amount of followers is retrieved. Those
counts are then averaged over all users involved in the
news graph
Avg number of
following
For each user that has either posted a tweet or a retweet
in the graph, his amount of following (friends) is
retrieved. Those counts are then averaged over all users
involved in the news graph
Network
Features
Retweet
Percentage
This is measured through the following equation:
number of retweets
number of tweets +number of r etweets
Average Time
Di
This measures the average time between a tweet and a
corresponding retweet. Since each edge of the graph has a
time weight on it, it is computed by making the average
of all the edge weights of the graph
Number of
tweets
Number of
retweets
Time first last
or News lifetime
This measure is obtained by computing the time
dierence between the first and last recorded publish
dates of tweets (or retweets) in the graph
Average
favorite count
For each node, its number of favourites is retrieved. Those
counts are then averaged over all nodes in the graph
AvgRetCount For each tweet, its number of retweets is retrieved. Those
counts are then averaged over all tweets in the graph
UsersTouched
10 h
Starting from the first post recorded in the graph, all
posts that happened in the first 10 h of the diusion are
retrieved. From those posts, the amount of unique users
involved in the spread is then calculated
PercPosts1hour This feature is calculated by the following equation:
number of tweets and retweets in the first hour
total number of tweets and retweets in the graph
4.2 Classification Approaches
Approach to the Classification on Manually Extracted Features. Our
approach to the creation and the analysis of a classifier trained on manually
extracted features from the graphs can be separated into 2 steps (1) Compare
and select the best type of classifier for the problem (2) Analyze the importance
of the dierent features in the classification.
Compare and Select the Best Type of Classifier
Dierent classifiers were trained using a 10-fold cross validation method. Namely,
the algorithms tried are: Random Forest, Decision Tree, Linear Discriminant
Analysis, Bayes Neural Network, Logistic Regression, K-Nearest Neighbors,
Quadratic Discriminant Analysis and Support Vector Machine. As the data set
Fake News Detection on Twitter Using Propagation Structures 145
is slightly unbalanced, it is important to evaluate if this significantly impacts the
classification performance. Hence, the performance of all classifiers is not only
recorded on the full data set but also on 5 dierent under-sampled balanced ver-
sions of the data set. Their results are then compared and the algorithm yielding
the highest accuracy will be chosen for further analysis.
Analyze the Importance of Dierent Features
To evaluate the importance given to each feature by the classifier, we record its
performance over all possible subsets of features. Given that there are 11 features
in total, the power set hence contains 2048 unique subsets (including the empty
set). For each set size, we then record which feature (or combination of features)
lead to the highest performance score. We do this for the best accuracy, best f1
score and best AUC ROC. This approach not only allows us to understand what
set size typically reaches the highest performance, but also which features play
key roles in the classification.
Fig. 2. GDL network architecture.
Geometric Deep Learning Approach. While most existing graph neural
networks have been developed for the node classification task, the problem tack-
led here is that of graph classification. However, [1] and [8] have adapted current
successes from the node to the graph classification task. They make use of the
graph convolutional layer described in [18]asthislayerwasshowntobeappli-
cable to social networks and molecule graphs classification. In out research, this
layer is combined with a specific pooling layer developed in [8], the topk pooling
layer, that reduces the size of the graph at each iteration by choosing the top
k best nodes and dropping the remaining ones. The choice of nodes to drop or
keep is based on their inner features.
The neural network architecture used in this research is described more in
details in Fig. 2.
The data fed into the network has to be specifically structured for the task.
Indeed, not only are the graph connections themselves used for learning, but
relevant features can also be encoded in both the nodes and the edges. Hence,
nodes will be characterized by the following information:
Number of followers of the user
Number of friends (following) of the user
Number of favorites of the tweet/retweet
146 M. Meyers et al.
Number of retweets of the tweet (0 if the node is a retweet)
Node type (either a tweet or a retweet)
Edges are characterized by the time dierence between the tweet and its associ-
ated retweet. It is to be noted that all features inserted in the nodes are features
that are also available to the classifier trained on manually extracted features in
order for the future performance comparison to be applicable. In order to build
our architecture, we have been using the recently released Pytorch Geometric
library that had already implemented the dierent layers we are utilizing [7].
The network is using a 10-fold cross validation method on a balanced version of
the data set (using under-sampling).
5 Experimental Results
5.1 Manually Extracted Propagation Features Analysis
After extracting the propagation features detailed in Table 1, their distribution
for both real and fake news are analyzed. Table 2presents the means and stan-
dard deviations of all features as well as the results of the student t-tests per-
formed. When using a 0.05 significance level, the outcome of the analysis shows
that 8 out of the 11 features are significantly dierent. Furthermore, the boxplot
distributions of all 8 significant features are displayed in Appendix A.
By combining the t-test results with the significant features distribution pre-
sented in Appendix A, dierent conclusions can be drawn on the data set and
the dierences in propagation between real and fake information on Twitter.
Real News Are ‘bigger’ Than Fake News. Real news have an average of
1212 tweets and 1796 retweets while fake news have on average 411 tweets and
372 retweets. From the statistical analysis displayed in Table 2, it is observed
that the means of both features are significantly dierent. By further analyzing
Table 2 . Features Summary.
mean real mean fake std real std fake tpValue signif
followerAvg 34607.0280 8835.2657 73084.3660 14107.1257 6.1079 0.0000 Y
followingAvg 3386.4674 4535.2654 3998.6284 3201.5401 4.0336 0.0001 Y
retweetPerc 0.4132 0.3730 0.2262 0.2214 2.2969 0.0219 Y
avgTi meDi(in
seconds)
372966.1338 320420.5629 1956157.2011 1451788.9476 0.3872 0.6988 N
numTwe ets 1212.3710 411.6686 2824.1935 1600.1888 4.4005 0.0000 Y
numRe tweets 1796.6161 372.6052 4927.2753 1969.8602 4.7600 0.0000 Y
avgFav .1861 1.3384 5.9917 4.5917 2.0175 0.0441 Y
avgRe tCount 3.1288 3.1925 8.6245 15.7255 0.0653 0.9480 N
news lifetime (in
seconds)
115662880.1871 27737159.8963 97964001.7070 45342932.5034 14.4778 0.0000 Y
usersTouched10hours 71.6710 57.7666 192.7123 150.3321 1.0225 0.3070 N
percPosts1hour 0.1528 0.0720 0.2521 0.1269 5.0940 0.0000 Y
Fake News Detection on Twitter Using Propagation Structures 147
the 4 highest outliers in the number of tweets (2 real and 2 fake), a limitation to
the data collection protocol used in this research was discovered. Indeed, they all
have an extremely large number of tweets because the list of keywords used to
extract the relevant twitter information is very broad and leads to the retrieval
of many posts that do not correspond to the original news. For example, a query
that lead to the retrieval of 24,338 tweets is ‘One in Four – Congressman Joe
Pitts’. Initially referring to an article written by the Congressman Joe Pitts
on addiction rate in Pennsylvia, the broad query led to the retrieval of many
unrelated tweets such as “In Chinese universities, students sleep four to a dorm
room. I would not have survived it. One was dicult enough...”.
Real News Stay Longer ‘in the Loop’. The news lifetime was shown to
be significantly dierent for real and fake graphs (see Table 2). Real news stay
on average 4.16 times longer on Twitter than fake ones (1338 vs 321 days). By
looking at the boxplots in Appendix A, it is interesting to note that while the
lifetime of fake news presents a certain amount of outliers, the real news lifetime
is more spread but doesn’t show any outlier. A deeper look at the fake news
outliers proves once again that very broad queries lead to the retrieval of many
more tweets than intended. For example, the query ‘Sid Miller’, initially referring
to a fake image of the politician spread on Twitter in 2016, encompassed a tweet
dating from 2011 that used the same keywords, thereby yielding an abnormally
large lifetime for the fake news. We also note the possibility of recurrent fake
news that lead to an abnormally long lifetime. This is the case for a fake news
that emerged both in 2012 and 2017 involving Barack Obama’s face being printed
on one-dollar bills.
Two hypothesis can then be formulated to try to explain why real news show
a longer lifetime on Twitter. First, real news could present queries that are more
likely to be used at dierent points in time hence augmenting their probability
of showing a larger news lifetime average. In comparison, fake news would show
a more novel and rare set of keywords that are less likely to be re-used in other
news items. Second, the lifetime of fake news could be shorter due to the fact
that once they are proven to be misleading, their spread is more likely to be
halted.
Users Spreading Real News Tend to Have More Followers but to Fol-
low Less Accounts. On average, users involved in the propagation of real
information have 34,607 followers while fake news propagators only have 4,535.
The statistical results in Table 2confirm that those means are significantly dif-
ferent. A quick look at the real news outliers in follower counts shows that they
seem to be shared by trustworthy accounts such as the NY Times (43,254,008
followers) or the Hungton Post (11,477,200 followers). On the contrary, real
news propagators follow on average less accounts than fake news propagators do.
While accounts linked to spreading real news follow on average 3386.47 other
accounts, fake news propagators follow on average 4535.27 accounts. Once again,
this dierence has been statistically proven to be significant.
148 M. Meyers et al.
(a) 0-80,000 hours (b) 0-300 hours
Fig. 3. Average Percentage of Posts over Time
Temporal Spread Analysis. Figure 3presents an average of the percentage
of tweets and retweets posted over time for fake and real news. Firstly, we see
on the first graph of Fig. 3that fake news reach 100% of their posts earlier than
real news (30,000 vs 70,000 h). This corresponds to our previous finding that
the lifetime of real news is bigger than that of fake news.
Secondly, the shapes of the two curves are very dierent. The fake news curve
shows a strong increase in the its beginning before increasing in a more moderate
manner and remaining relatively stable from about 15,000 h on. The real news
curve also shows an steep increase at the beginning but quickly evolves into a
more moderate increase over time, to only reach its 100% at about 70,000 h. In
order to better visualize and compare the early increases of the two curves, the
second graph of Fig. 3presents the same curves on a shorter amount of time.
We observe that although real news already reach 30% of their posts in the first
hour of spread, fake news quickly overtake and reach 70% of their posts after
300 h. By then, the real news have only reached 40% of their posts.
This analysis allows us to visually represent our previous finding that fake
news have a shorter lifetime than real news. Indeed, we see that while real news
have a slower increase over time and thereby a larger lifetime, fake news reach
the end of their spread faster, hence have a shorter lifetime. It is also important
to note that our previous finding about the news size are likely to have impacted
the results of this temporal analysis. Indeed, as real news are significantly bigger
in size, they are more likely to take a longer time to be spread.
5.2 Classification Results
Classifier on Manually Extracted Features
Compare And Select The Best Type Of Classifier Appendix B presents the scores
of all classifiers attempted. Firstly, we only observe a small dierence in the
accuracy of the classifiers when applied on the full data sets or on the balanced
versions. Looking at the Random Forest Classifier, its accuracy on the balanced
data sets oscillates between 83.5% and 86.5%, and obtains an accuracy of 85% on
the full data set. We then conclude that the slightly unbalanced characteristic of
the data set does not have a concrete influence on the classification performance.
Fake News Detection on Twitter Using Propagation Structures 149
The Random Forest Classifier ranked the highest in all scores and was hence
selected as classification algorithm for the rest of the analysis.
Fig. 4. Set of features reaching the highest accuracy per subset size
Analyze The Importance Of Dierent Features Firstly, we observe in Fig. 4that
the random forest classifier reaches its highest accuracy on the full data set
when using a set of 8 features. While it reaches 85% accuracy using the 11
features, the performance goes up to 87% when using the following set of fea-
tures: followingAvg,followerAvg,avgFav,usersTouched10hours,news lifetime,
numTweets,retweetPercentage,numRetweets.
Secondly, Table 3presents a summary of the number of occurrences of each
feature in all the best subsets presented in Fig. 4. We observe that the news life-
time is present in all of them, followed by the average number of followers present
in 10 out of 11 subsets. We also see in Fig. 4that these two features combined
already accurately classify 81.44% of all graphs. This leads us to conclude that
they are both of major importance in the classification. Additionally, both the
following average and the average number of favourites seem to be important as
they are present in respectively 8 and 9 of all best subsets.
Thirdly, we note without surprise that the 3 features that were proven to
be non-significant (avgTimeDi,usersTouched10hours andavgRetCount) don’t
contribute much to the classification performance.
Lastly, we observe that the number of tweets and retweets are only present in
4 of the best subsets. Although the features were both shown to be significant,
the median of both features were very similar between the real and fake sets of
graphs, which might explain why the random forest classifier did not give them
astrongimportance.
5.3 Geometric Deep Learning
Before training the algorithm, the pre-processing step of normalizing the features
is performed. Then, the neural network is trained using a 10-fold cross valida-
tion method. A mini-batch size of 1 and a learning rate of 0.001 were found to
be yielding the best results. When trained for 400 epochs, the neural network
150 M. Meyers et al.
Table 3 . Number of occurrences of each feature in all best subsets
Number of Occurrences
newsLifetime 11
followerAvg 10
followingAvg 9
avgFav 8
retweetPerc 5
numTweets 4
numRetweets 4
avgRetCount 4
usersTouched10hours 4
percPosts1hour 3
avgTimeDi1
achieved the results displayed in Fig. 5. On average over the 10 folds, the accu-
racy recorded on the last epoch is 73.29%, with a standard deviation of 0.0746
which proves the robustness of the model.
Our Geometric Deep Learning approach has only been tried on one neu-
ral network architecture, which leads us to conclude that a gdl-based detection
of fake news seems like a promising approach given the satisfactory results pre-
sented above. However, a systematic comparison of gdl models is needed in order
optimize the model for this specific task instead of utilizing a model proven to
be successful in other classification tasks.
mean standard dev
accuracy 0.7329 0.0746
precision 0.6846 0.1102
recall 0.8755 0.1081
f1 score 0.7606 0.0821
Fig. 5. Geometric Deep Learning approach scores over 400 epochs
6Discussion
The experiments performed in this paper led us to gain insights on how fake and
real news propagate on Twitter. It is then interesting to compare our findings
with those achieved by previous research. Firstly, [28] has found that fake news
Fake News Detection on Twitter Using Propagation Structures 151
propagate wider, faster and deeper than real news. More specifically, they dis-
covered that real news take about 6 times as long as fake news to reach 1500
users, consistently reach less users in total and were less retweeted. However,
our conclusions somehow contradict their findings since we have observed that
real news present more tweets and retweets. However, both the average retweet
count and the users touched in the first 10 h feature are not significant in our
results hence preventing us from fully arguing against their finding. It is however
important to note that while our results have been discovered on an entire news
graphs composed of non-connected sub-graphs, their conclusions are drawn from
individual retweet cascades. This methodological contrast might contribute to
the evident disaccord between our results. Secondly, both [2] and [28] support
our finding that real news are spread by users with more followers than those
spreading fake information. However, our results about the number of followings
is opposite to theirs. While both their analysis show that real news propagators
follow more people, our research shows that fake news propagators actually have
more followings. Lastly, to the best of our knowledge, no previous work seems to
make use of ‘lifetime’ as classification feature thereby preventing us from making
any comparison.
The last section of the experiments entailed the application and evaluation of
a Geometric Deep Learning approach to the problem, which achieved an accu-
racy of 73.3%. The only other application of Geometric Deep Learning to fake
news detection had achieved an AUC ROC of 92.7% on their URL-wise classifi-
cation but their network had the advantage of containing social connections and
influence paths [5].
Before summarizing the final conclusions of our research paper, it is neces-
sary to underline its major limitations. First of all, although using a non URL-
restricted news definition distinguishes our research from most of the existing
literature on fake news classification, it brings up the issue of using a definition
that is very broad. As explained in Sect. 5, using the keywords from the articles
headlines leads in some cases to the retrieval of many tweets that are unrelated
to the original news piece. This also causes some graphs to cover periods of time
that seem unrealistic. This limitation is hard to circumvent when dealing with
fake news detection research. One the one hand, our choice of data is restricted
by the very limited availability of Twitter labelled news data sets. On the other
hand, none of these data sets agree on a precise methodology to retrieve tweets
that correspond to a news piece. Although the majority has been following the
URL-restricted approach defined earlier, this methodology also has major limita-
tions. Second of all, all news analysed come from a single source of information,
Politifact, that mainly includes American political news. This hence prevents us
from generalizing our findings to other news topics.
7Conclusion
This paper demonstrated the potential of using propagation features to discrim-
inate real from fake news on Twitter by analyzing a non URL-restricted data
152 M. Meyers et al.
set. More specifically, it firstly discovers the following significant dierences in
the propagation of the real and fake news: real news graphs are bigger in size,
are spread by users with more followers and less followings, and stay longer on
Twitter than fake news. Secondly, it achieves a 87% detection accuracy using a
Random Forest Classifier solely trained on propagation features, hence further
confirming the latter assumption. Lastly, by developing a graph neural network
trained directly on the 3D representation of the propagation graphs, it achieves
an accuracy of 73.3%. Overall, the significant dierences discovered as well as
the good performances achieved by the 2 algorithms trained on propagation
information lead us to conclude that propagation features are a relevant and
important asset to the fake news detection task on Twitter.
Further research should firstly be dedicated to the evaluation of our classifica-
tion approaches on the early detection of fake news instead of at the end of their
diusion. Secondly, further eorts should go into refining our data set in order
to counter the negative impact of our broad definition of news on the reliability
of our results. In order to do that, a time limit on the retrieval of the tweets
could be set, or the analysis could be performed on the tweet cascades (the set
of one tweet and its corresponding retweets) instead of on the entire news graph.
Thirdly, it would be interesting to apply our approach to other news topics than
political news in order to evaluate if the same conclusions on the propagation
patterns can be drawn. Lastly, the GDL experiments were only performed on one
type of convolutional and pooling layers, while many more have been shown to
be successful in various applications. Further research should hence be dedicated
to trying dierent versions of this neural network and hopefully improve the clas-
sification performance by finding the optimal combination of convolutional and
pooling layers.
Appendix
Appendix A: Significant Features Distribution
Fig. 6. Number of Tweets Distribution.
Fake News Detection on Twitter Using Propagation Structures 153
Fig. 7. Number of Retweets Distribution.
Fig. 8. News Lifetime (time first last) Distribution.
Fig. 9. Average Number of Followers Distribution.
Fig. 10. Average Number of Followings Distribution.
154 M. Meyers et al.
Fig. 11. Number of Users Touched Within the First 10 h Distribution.
Fig. 12. Percentage of Posts In The First Hour Distribution.
Appendix B: Classifiers Scores Comparison
Fig. 13. Classifier Scores: under-sampled balanced data set 1
Fake News Detection on Twitter Using Propagation Structures 155
Fig. 14. Classifier Scores: under-sampled balanced data set 2
Fig. 15. Classifier Scores: under-sampled balanced data set 3
Fig. 16. Classifier Scores: under-sampled balanced data set 4
Fig. 17. Classifier Scores: under-sampled balanced data set 5
156 M. Meyers et al.
Fig. 18. Classifier Scores: full data set
Appendix C: Feature Importance Analysis
Fig. 19. Best Subsets Analysis Full Data Set
Fake News Detection on Twitter Using Propagation Structures 157
References
1. Cangea, C., Veliˇckovi´c, P., Jovanovi´c, N., Kipf, T., Li`o, P.: Towards sparse hierar-
chical graph classifiers. arXiv preprint arXiv:1811.01287 (2018)
2. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on twitter. In: Pro-
ceedings of the 20th International Conference on World Wide Web, pp. 675–684.
ACM (2011)
3. Center, P.R.: News use across social media platforms 2018 (2018). https://
www.journalism.org/2018/09/10/news-use-across-social-media-platforms-2018/.
Accessed 03 June 2019
4. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a sys-
tem to evaluate social bots. In: Proceedings of the 25th International Conference
Companion on World Wide Web, pp. 273–274. International World Wide Web
Conferences Steering Committee (2016)
5. Federico, M., Fabrizio, F., Davide, E., Damon, M.: Fake news detection on social
media using geometric deep learning. arXiv preprint arXiv:1902.06673 (2019)
6. Ferrara, E.: Disinformation and social bot operations in the run up to the 2017
french presidential election (2017)
7. Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric.
In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
8. Gao, H., Ji, S.: Graph u-net (2019). https://openreview.net/forum?
id=HJePRoAct7
9. Gorodnichenko, Y., Pham, T., Talavera, O.: Social media, sentiment and pub-
lic opinions: Evidence from# brexit and# uselection. Technical report, National
Bureau of Economic Research (2018)
10. Guardian, T.: Bolsonaro business backers accused of illegal whatsapp fake news
campaign (2018). https://www.theguardian.com/world/2018/oct/18/brazil-jair-
bolsonaro-whatsapp-fake-news-campaign. Accessed 03 Aug 2019
11. Iqbal, M.: Twitter revenue and usage statistics (2018). http://www.businessofapps.
com/data/twitter-statistics/. Accessed 03 June 2019
12. Kalsnes, B.: Fake news, May 2019. https://oxfordre.com/communication/view/10.
1093/acrefore/9780190228613.001.0001/acrefore-9780190228613-e-809
13. Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor
propagation in online social media. In: 2013 IEEE 13th International Conference
on Data Mining, pp. 1103–1108. IEEE (2013)
14. Lazer, D.M., et al.: The science of fake news. Science 359(6380), 1094–1096 (2018)
15. Leonhardt, D., Thompson, S.A.: Trump’s lies (2017). https://www.nytimes.com/
interactive/2017/06/23/opinion/trumps-lies.html, archived from the original on 23
June 2017
16. Marwick, A., Lewis, R.: Media Manipulation and Disinformation Online. Data &
Society Research Institute, New York (2017)
17. Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., Bronstein, M.M.: Geo-
metric deep learning on graphs and manifolds using mixture model CNNS. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 5115–5124 (2017)
18. Morris, C., et al.: Weisfeiler and leman go neural: Higher-order graph neural net-
works. arXiv preprint arXiv:1810.02244 (2018)
19. Nielsen, R.K.: News media, search engines and social networking sites as varieties
of online gatekeepers. In: Rethinking Journalism Again, pp. 93–108. Routledge
(2016)
158 M. Meyers et al.
20. Review, M.T.: First evidence that social bots play a major role in spreading fake
news (2017). https://www.technologyreview.com/s/608561/first-evidence-that-
social-bots-play-a-major-role-in-spreading-fake-news/. Accessed 03 June 2019
21. Sadikov, E., Martinez, M.M.M.: Information propagation on twitter. CS322 project
report (2009)
22. Shao, C., Ciampaglia, G.L., Flammini, A., Menczer, F.: Hoaxy: a platform for
tracking online misinformation. In: Proceedings of the 25th International Confer-
ence Companion on World Wide Web, pp. 745–750. International World Wide Web
Conferences Steering Committee (2016)
23. Shao, C., Ciampaglia, G.L., Varol, O., Flammini, A., Menczer, F.: The spread of
fake news by social bots. arXiv preprint arXiv:1707.07592 pp. 96–104 (2017)
24. Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.: The
spread of low-credibility content by social bots. Nat. Commun. 9(1), 4787 (2018)
25. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: Fakenewsnet: A data
repository with news content, social context and dynamic information for studying
fake news on social media. arXiv preprint arXiv:1809.01286 (2018)
26. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media:
adataminingperspective.ACMSIGKDDExplorationsNewsletter19(1), 22–36
(2017)
27. Tambuscio, M., Ruo, G., Flammini, A., Menczer, F.: Fact-checking eect on viral
hoaxes: a model of misinformation spread in social networks. In: Proceedings of
the 24th International Conference on World Wide Web, pp. 977–982. ACM (2015)
28. Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science
359(6380), 1146–1151 (2018)
29. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey
on graph neural networks. arXiv preprint arXiv:1901.00596 (2019)
30. Xiong, F., Liu, Y.: Opinion formation on social media: an empirical approach.
Chaos: An Interdisciplinary J. Nonlinear Sci. 24(1), 013130 (2014)
31. Xiong, F., Liu, Y., Zhang, Z.J., Zhu, J., Zhang, Y.: An information diusion model
based on retweeting mechanism for online social media. Phys. Lett. A 376(30–31),
2103–2108 (2012)
... Federico et al. [19] extracted user profile, user activity, network and content-based features from dataset and concluded that network-based features are good for fake news detection in terms of high accuracy and early detection. Meyers et al. [20] applied propagation-based features on Random Forest Classifier and achieved the accuracy of 87%. They also used graph based i.e., Geometric Deep Learning approach on same features and achieved 73.3% accuracy. ...
... For the performance comparison we compared our model performance with existing model proposed by Meyers et al. [20]. Table 5 shows that after adding the user profile-based features and MICC feature selection technique the accuracy of news classification improves by 6.81% from existing model executed on same dataset. ...
Chapter
In the recent years, social media acts as double-edged sword for the society as it is being used for exchanging real as well as fake news. Large number of researchers are involved for the detection of fake news using user credibility, content and propagation-based features. This paper proposes PropFND (Propagation based Fake News Detection) model to classify news as real or fake based on the combination of propagation pattern and user profile features. For the training of proposed model, we used combined features and applied several classifiers and finally concluded that Support Vector Machine (SVM) gives an improved result. This proposed model gives the improved accuracy of 93.81% which is higher than state-of-the-art model. After experimental analysis we noticed that real news propagates for long duration as compared to the fake news.KeywordsOnline social mediaPropagation featuresMachine Learning classifier
... Although fake news is not a new phenomenon, the rapidly changing social realities have prompted us to revisit the scientific theories and continue to develop new approaches to manage and analyze fake news, e.g., the systematic change in news consumption. Compared with traditional news media, such as newspapers and television, fake news is published and propagated online, faster and at a lower cost [12]. However, for the general public, the ability to identify fake news is very low [13]. ...
Article
Full-text available
In recent years, fake news detection and its characteristics have attracted a number of researchers. However, most detection algorithms are driven by data rather than theories, which causes the existing approaches to only perform well on specific datasets. To the extreme, several features only perform well on specific datasets. In this study, we first define the feature drift in fake news detection methods, and then demonstrate the existence of feature drift and use interpretable models (i.e., Shapley Additive Explanations and Partial Dependency Plots) to verify the feature drift. Furthermore, by controlling the distribution of tweets’ creation times, a novel sampling method is proposed to explain the reason for feature drift. Finally, the Anchors method is used in this paper as a supplementary interpretation to exhibit the potential characteristics of feature drift further. Our work provides deep insights into the temporal patterns of fake news detection, proving that the model’s performance is also highly related to the distribution of datasets.
Article
Fake news poses a grave threat with devastating consequences in this information-centric age. While advances in data science undeniably hold the key to accurately detecting and curtailing the unfettered spread of fake news, guidance on the selection of algorithms and models that are best suited to a specific fake news scenario leaves much to be desired. Most studies have focused on fake news in a specific domain and employed a limited range of algorithmic techniques. In contrast, the thematic diversity of fake news raises questions over the comprehensiveness of such techniques, whose performance drops when exposed to fake news from a different domain. The current study responds to this call for guidance by focusing on thematically diverse datasets, applying a series of complex algorithms, and performing topic modeling on them. The results demonstrate that ensemble techniques outperform other algorithms, achieving high levels of accuracy of over 98 percent and 95 percent on thematically diverse and pandemic-related datasets, respectively. The study also demonstrates that neural networks are not a panacea for all situations, while topic modeling helps illustrate the lack of coherence in fake news articles. The study offers a distinct perspective on the accuracy of a diverse set of algorithmic approaches and their ability to adapt to an ever-evolving multi-domain world of fake news. A key implication of the study is the unique and comprehensive view of classification performance when exposed to diverse datasets, including pandemic-related news and data from other disciplines, as opposed to its performance on pandemic-related data alone. Our practical contribution is truly the comparative perspective we offer to practitioners when a choice of algorithm is to be made to accurately detect fake news with thematic heterogeneity.
Article
Digital false information is a global problem and the European Union (EU) has taken profound actions to counter it. However, from an academic perspective the United States has attracted particular attention. This article aims at mapping the current state of academic inquiry into false information at scale in the EU across fields. Systematic filtering of academic contributions resulted in the identification of 93 papers. We found that Italy is the most frequently studied country, and the country of affiliation for most contributing authors. The fields that are best represented are computer science and information studies, followed by social science, communication, and media studies. Based on the review, we call for (1) a greater focus on cross-platform studies; (2) resampling of similar events, such as elections, to detect reoccurring patterns; and (3) longitudinal studies across events to detect similarities, for instance, in who spreads misinformation.
Chapter
The development of social networks is increasing, and such networks are gradually becoming critical news sources for many people. However, not all news sources on social media are trustworthy. Numerous news stories containing false information have appeared and have spread on social networks, fulfilling the specific aims of certain individuals. Such misinformation is known as fake news, which is becoming increasingly sophisticated, making it difficult to immediately distinguish from real news. Therefore, fake news remains an entirely unresolved problem in social networks and is an attractive subject to many researchers. We propose a content-context-based graph convolutional network (C &C-GCN) as a novel method for representing and learning social content and context for fake news detection. The proposed method integrates information in terms of (i) the content of the news, and (ii) the social context of the news, such as the user who created or shared the news and the source that published the news. Unlike previous methods, C &C-GCN is better at simultaneously capturing the content and context of news. Experiment results when applying benchmark datasets show that this model can extract more information from shared news into a better representation of the graph structure and learn better features represented on a graph. Furthermore, it considers user sentiment to be a significant feature of news content.KeywordsFake news detectionGCNsC &C-GCNSocial networks
Chapter
Given a coarse satellite image and a fine satellite image of a particular location taken at the same time, the high-resolution spatiotemporal image fusion technique involves understanding the spatial correlation between the pixels of both images and using it to generate a finer image for a given coarse (or test) image taken at a later time. This technique is extensively used for monitoring agricultural land cover, forest cover, etc. The two key issues in this technique are: (i) handling missing pixel data and (ii) improving the prediction accuracy of the fine image generated from the given test coarse image. This paper tackles these two issues by proposing an efficient method consisting of the following three basic steps: (i) imputation of missing pixels using neighborhood information, (ii) cross-scale matching to adjust both the Point Spread Functions Effect (PSF) and geo-registration errors between the course and high-resolution images, and (iii) error-based modulation, which uses pixel-based multiplicative factors and residuals to fix the error caused due to modulation of temporal changes. The experimental results on the real-world satellite imagery datasets demonstrate that the proposed model outperforms the state-of-art by accurately producing the high-resolution satellite images closer to the ground truth.
Article
Full-text available
The massive spread of digital misinformation has been identified as a major threat to democracies. Communication, cognitive, social, and computer scientists are studying the complex causes for the viral diffusion of misinformation, while online platforms are beginning to deploy countermeasures. Little systematic, data-based evidence has been published to guide these efforts. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during ten months in 2016 and 2017. We find evidence that social bots played a disproportionate role in spreading articles from low-credibility sources. Bots amplify such content in the early spreading moments, before an article goes viral. They also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, resharing content posted by bots. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
Article
Full-text available
Addressing fake news requires a multidisciplinary effort
Article
Full-text available
Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of "fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Article
Full-text available
The massive spread of fake news has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of digital misinformation and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. However, to date, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand claims on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots play a key role in the spread of fake news. Accounts that actively spread misinformation are significantly more likely to be bots. Automated accounts are particularly active in the early spreading phases of viral claims, and tend to target influential users. Humans are vulnerable to this manipulation, retweeting bots who post false news. Successful sources of false and biased claims are heavily supported by social bots. These results suggests that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
Article
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.
Article
In recent years, graph neural networks (GNNs) have emerged as a powerful neural architecture to learn vector representations of nodes and graphs in a supervised, end-to-end fashion. Up to now, GNNs have only been evaluated empirically—showing promising results. The following work investigates GNNs from a theoretical point of view and relates them to the 1-dimensional Weisfeiler-Leman graph isomorphism heuristic (1-WL). We show that GNNs have the same expressiveness as the 1-WL in terms of distinguishing non-isomorphic (sub-)graphs. Hence, both algorithms also have the same shortcomings. Based on this, we propose a generalization of GNNs, so-called k-dimensional GNNs (k-GNNs), which can take higher-order graph structures at multiple scales into account. These higher-order structures play an essential role in the characterization of social networks and molecule graphs. Our experimental evaluation confirms our theoretical findings as well as confirms that higher-order information is useful in the task of graph classification and regression.
Article
Lies spread faster than the truth There is worldwide concern over false news and the possibility that it can influence political, economic, and social well-being. To understand how false news spreads, Vosoughi et al. used a data set of rumor cascades on Twitter from 2006 to 2017. About 126,000 rumors were spread by ∼3 million people. False news reached more people than the truth; the top 1% of false news cascades diffused to between 1000 and 100,000 people, whereas the truth rarely diffused to more than 1000 people. Falsehood also diffused faster than the truth. The degree of novelty and the emotional reactions of recipients may be responsible for the differences observed. Science , this issue p. 1146
Article
Recent accounts from researchers, journalists, as well as federal investigators, reached a unanimous conclusion: social media are systematically exploited to manipulate and alter public opinion. Some disinformation campaigns have been coordinated by means of bots, social media accounts controlled by computer scripts that try to disguise themselves as legitimate human users. In this study, we describe one such operation occurred in the run up to the 2017 French presidential election. We collected a massive Twitter dataset of nearly 17 million posts occurred between April 27 and May 7, 2017 (Election Day). We then set to study the MacronLeaks disinformation campaign: By leveraging a mix of machine learning and cognitive behavioral modeling techniques, we separated humans from bots, and then studied the activities of the two groups taken independently, as well as their interplay. We provide a characterization of both the bots and the users who engaged with them and oppose it to those users who didn't. Prior interests of disinformation adopters pinpoint to the reasons of the scarce success of this campaign: the users who engaged with MacronLeaks are mostly foreigners with a preexisting interest in alt-right topics and alternative news media, rather than French users with diverse political views. Concluding, anomalous account usage patterns suggest the possible existence of a black-market for reusable political disinformation bots.