ArticlePDF Available

Abstract

Significance The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. However, the World Wide Web is a fruitful environment for the massive diffusion of unverified rumors. In this work, using a massive quantitative analysis of Facebook, we show that information related to distinct narratives––conspiracy theories and scientific news––generates homogeneous and polarized communities (i.e., echo chambers) having similar information consumption patterns. Then, we derive a data-driven percolation model of rumor spreading that demonstrates that homogeneity and polarization are the main determinants for predicting cascades’ size.
The spreading of misinformation online
Michela Del Vicario
a
, Alessandro Bessi
b
, Fabiana Zollo
a
, Fabio Petroni
c
, Antonio Scala
a,d
, Guido Caldarelli
a,d
,
H. Eugene Stanley
e
, and Walter Quattrociocchi
a,1
a
Laboratory of Computational Social Science, Networks Department, IMT Alti Studi Lucca, 55100 Lucca, Italy;
b
IUSS Institute for Advanced Study, 27100
Pavia, Italy;
c
Sapienza University, 00185 Rome, Italy;
d
ISC-CNR Uos Sapienza,00185 Rome, Italy; and
e
Boston University, Boston, MA 02115
Edited by Matjaz Perc, University of Maribor, Maribor, Slovenia, and accepted by the Editorial Board December 4, 2015 (received for review September
1, 2015)
The wide availability of user-provided content in online social media
facilitates the aggregation of people around common interests,
worldviews, and narratives. However, the World Wide Web (WWW)
also allows for the rapid dissemination of unsubstantiated rumors
and conspiracy theories that often elicit rapid, large, but naive social
responses such as the recent case of Jade Helm 15––where a simple
military exercise turned out to be perceived as the beginning of a
new civil war in the United States. In this work, we address the
determinants governing misinformation spreading through a thor-
ough quantitative analysis. In particular, we focus on how Facebook
users consume information related to two distinct narratives: scien-
tific and conspiracy news. We find that, although consumers of
scientific and conspiracy stories present similar consumption pat-
terns with respect to content, cascade dynamics differ. Selective
exposure to content is the primary driver of content diffusion and
generates the formation of homogeneous clusters, i.e., echo cham-
bers.Indeed, homogeneity appears to be the primary driver for the
diffusion of contents and each echo chamber has its own cascade
dynamics. Finally, we introduce a data-driven percolation model
mimicking rumor spreading and we show that homogeneity and
polarization are the main determinants for predicting cascadessize.
misinformation
|
virality
|
Facebook
|
rumor spreading
|
cascades
The massive diffusion of sociotechnical systems and micro-
blogging platforms on the World Wide Web (WWW) creates a
direct path from producers to consumers of content, i.e., allows
disintermediation, and changes the way users become informed,
debate, and form their opinions (15). This disintermediated envi-
ronment can foster confusion about causation, and thus encourage
speculation, rumors, and mistrust (6). In 2011 a blogger claimed
that global warming was a fraud designed to diminish liberty and
weaken democracy (7). Misinformation about the Ebola epidemic
has caused confusion among healthcare workers (8). Jade Helm 15,
a simple military exercise, was perceived on the Internet as the
beginning of a new civil war in the United States (9).
Recent works (1012) have shown that increasing the exposure
of users to unsubstantiated rumors increases their tendency to
be credulous.
According to ref. 13, beliefs formation and revision is influ-
enced by the way communities attempt to make sense of events or
facts. Such a phenomenon is particularly evident on the WWW
where users, embedded in homogeneous clusters (1416), process
information through a shared system of meaning (10, 11, 17, 18)
and trigger collective framing of narratives that are often biased
toward self-confirmation.
In this work, through a thorough quantitative analysis on a
massive dataset, we study the determinants behind misinformation
diffusion. In particular, we analyze the cascade dynamics of Face-
book users when the content is related to very distinct narratives:
conspiracy theories and scientific information. On the one hand,
conspiracy theories simplify causation, reduce the complexity of
reality, and are formulated in a way that is able to tolerate a certain
level of uncertainty (1921). On the other hand, scientific in-
formation disseminates scientific advances and exhibits the process
of scientific thinking. Notice that we do not focus on the quality of
the information but rather on the possibility of verification. Indeed,
the main difference between the two is content verifiability. The gen-
erators of scientific information and their data, methods, and out-
comes are readily identifiable and available. The origins of conspiracy
theories are often unknown and their content is strongly disengaged
from mainstream society and sharply divergent from recommended
practices (22), e.g., the belief that vaccines cause autism.
Massive digital misinformation is becoming pervasive in online
social media to the extent that it has been listed by the World
Economic Forum (WEF) as one of the main threats to our so-
ciety (23). To counteract this trend, algorithmic-driven solutions
have been proposed (2429), e.g., Google (30) is developing a
trustworthiness score to rank the results of queries. Similarly,
Facebook has proposed a community-driven approach where
users can flag false content to correct the newsfeed algorithm.
This issue is controversial, however, because it raises fears that
the free circulation of content may be threatened and that the
proposed algorithms may not be accurate or effective (10, 11,
31). Often conspiracists will denounce attempts to debunk false
information as acts of misinformation.
Whether a claim (either substantiated or not) is accepted by
an individual is strongly influenced by social norms and by the
claims coherence with the individuals belief system––i.e., con-
firmation bias (32, 33). Many mechanisms animate the flow of
false information that generates false beliefs in an individual,
which, once adopted, are rarely corrected (3437).
In this work we provide important insights toward the un-
derstanding of cascade dynamics in online social media and in
particular about misinformation spreading.
We show that content-selective exposure is the primary driver
of content diffusion and generates the formation of homogeneous
Significance
The wide availability of user-provided content in online social
media facilitates the aggregation of people around common
interests, worldviews, and narratives. However, the World
Wide Web is a fruitful environment for the massive diffusion of
unverified rumors. In this work, using a massive quantitative
analysis of Facebook, we show that information related to
distinct narratives––conspiracy theories and scientific news––
generates homogeneous and polarized communities (i.e., echo
chambers) having similar information consumption patterns.
Then, we derive a data-driven percolation model of rumor
spreading that demonstrates that homogeneity and polariza-
tion are the main determinants for predicting cascadessize.
Author contributions: M.D.V., A.B., F.Z., A.S., G.C., H.E.S., and W.Q. designed research;
M.D.V., A.B., F.Z., H.E.S., and W.Q. performed research; M.D.V., A.B., F.Z., F.P., and W.Q.
contributed new reagents/analytic tools; M.D.V., A.B., F.Z., A.S., G.C., H.E.S., and W.Q.
analyzed data; and M.D.V., A.B., F.Z., A.S., G.C., H.E.S., and W.Q. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. M.P. is a guest editor invited by the Editorial
Board.
Freely available online through the PNAS open access option.
1
To whom correspondence should be addressed. Email: walterquattrociocchi@gmail.com.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1517441113/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1517441113 PNAS Early Edition
|
1of6
STATISTICSSOCIAL SCIENCES
clusters, i.e., echo chambers(10, 11, 38, 39). Indeed, our analysis
reveals that two well-formed and highly segregated communities
exist around conspiracy and scientific topics. We also find that
although consumers of scientific information and conspiracy
theories exhibit similar consumption patterns with respect to con-
tent, the cascade patterns of the two differ. Homogeneity appears
to be the preferential driver for the diffusion of content, yet each
echo chamber has its own cascade dynamics. To account for these
features we provide an accurate data-driven percolation model of
rumor spreading showing that homogeneity and polarization are
the main determinants for predicting cascade size.
The paper is structured as follows. First we provide the pre-
liminary definitions and details concerning data collection. We
then provide a comparative analysis and characterize the statistical
signatures of cascades of the different kinds of content. Finally,
we introduce a data-driven model that replicates the analyzed
cascade dynamics.
Methods
Ethics Statement. Approval and informed consent were not needed because
the data collection process has been carried out using the Facebook Graph
application program interface (API) (40), which is publicly available. For the
analysis (according to the specification settings of the API) we only used
publicly available data (thus users with privacy restrictions are not included in
the dataset). The pages from which we download data are public Facebook
entities and can be accessed by anyone. User content contributing to these
pages is also public unless the users privacy settings specify otherwise, and in
that case it is not available to us.
Data Collection. Debate about social issues continues to expand across the
Web, and unprecedented social phenomena such as the massive recruitment
of people around common interests, ideas, and political visions are emergi ng.
Using the approach described in ref. 10, we define the space of our in-
vestigation with the support of diverse Facebook groups that are active in
the debunking of misinformation.
The resulting dataset is composed of 67 public pages divided between 32
about conspiracy theories and 35 about science news. A second set, composed
of two troll pages, is used as a benchmark to fit our data-driven model.
The first category (conspiracy theories) includes the pages that disseminate
alternative, controversial information, often lacking supporting evidence
and frequently advancing conspiracy theories. The second category (science
news) includes the pages that disseminate scientific information. The third
category (trolls) includes those pages that intentionally disseminate sarcastic
false information on the Web with the aim of mocking the collective
credulity online.
For the three sets of pages we download all of the posts (and their
respective user interactions) across a 5-y time span (20102014). We
perform the data collection process by using the Facebook Graph API (40),
which is publicly available and accessible through any personal Facebook
user account. The exact breakdown of the data is presented in SI Appendix,
section 1.
Preliminaries and Definitions. A tree is an undirected simple graph that is
connected and has no simple cycles. An oriented tree is a directed acyclic
graph whose underlying undirected graph is a tree. A sharing tree, in the
context of our research, is an oriented tree made up of the successive sharing
of a news item through the Facebook system. The root of the sharing tree is
the node that performs the first share. We define the size of the sharing tree
as the number of nodes (and hence the number of news sharers) in the tree
and the height of the sharing tree as the maximum path length from the root.
We define the user polarization σ=2ϱ1, where 0 ϱ1 is the fraction of
likesa user puts on conspiracy-related content, and hence 1σ1. From
user polarization, we define the edge homogeneity, for any edge eij be-
tween nodes iand j,as
σij =σiσj,
with 1σij 1. Edge homogeneity reflects the similarity level between
the polarization of the two sharing nodes. A link in the sharing tree is
0.000
0.025
0.050
0.075
0 1020304050
Lifetime(hours)
PDF
Science
Conspirac
y
Fig. 1. PDF of lifetime computed on science news and conspiracy theories,
where the lifetime is here computed as the temporal distance (in hours) be-
tween the first and last share of a post. Both categories show a similar behavior.
0
200
400
600
0 500 1000 1500 2000 2500
Conspiracy Cascade Size
Lifetime (hours)
0
200
400
0 250 500 750
Science Cascade Size
Lifetime (hours)
Fig. 2. Lifetime as a function of the cascade size for conspiracy news (Left) and science news (Right). Science news quickly reaches a higher diffusion; a longer
lifetime does not correspond to a higher level of interest. Conspiracy rumors are assimilated more slowly and show a positive relation between lifetime
and size.
2of6
|
www.pnas.org/cgi/doi/10.1073/pnas.1517441113 Del Vicario et al.
homogeneous if its edge homogeneity is positive. We then define a
sharing path to be any path from the root to one of the leaves of the
sharing tree. A homogeneous path is a sharing path for which the edge
homogeneity of each edge is positive, i.e.,asharingpathcomposedonly
of homogeneous links.
Results and Discussion
Anatomy of Cascades. We begin our analysis by characterizing the
statistical signature of cascades as they relate to information
type. We analyze the three typesscience news, conspiracy ru-
mors, and trollingand find that size and maximum degree are
power-law distributed for all three categories. The maximum cas-
cade size values are 952 for science news, 2,422 for conspiracy
news, and 3,945 for trolling, and the estimated exponents γfor the
power-law distributions are 2.21 for science news, 2.47 for con-
spiracy, and 2.44 for trolling posts. Tree height values range from 1
to 5, with a maximum height of 5 for science news and conspiracy
theories and a maximum height of 4 for trolling. The resulting
network is very dense. Notice that such a feature weakens the role
of hubs in rumor-spreading dynamics. For further information see
SI Appendix,section2.1.
Fig. 1 shows the probability density function (PDF) of the
cascade lifetime (using hours as time units) for science and
conspiracy. We compute the lifetime as the length of time be-
tween the first user and the last user sharing a post. In both
categories we find a first peak at 12 h and a second at 20 h,
indicating that the temporal sharing patterns are similar irre-
spective of the difference in topic. We also find that a significant
percentage of the information diffuses rapidly (24.42% of the
science news and 20.76% of the conspiracy rumors diffuse in less
than 2 h, and 39.45% of science news and 40.78% of conspiracy
theories in less than 5 h). Only 26.82% of the diffusion of science
news and 17.79% of conspiracy lasts more than 1 d.
In Fig. 2 we show the lifetime as a function of the cascade size.
For science news we have a peak in the lifetime corresponding to
a cascade size value of 200, and higher cascade size values
correspond to high lifetime variability. For conspiracy-related
content the lifetime increases with cascade size.
These results suggest that news assimilation differs according
to the categories. Science news is usually assimilated, i.e., it reaches
a higher level of diffusion quickly, and a longer lifetime does not
correspond to a higher level of interest. Conversely, conspiracy
rumors are assimilated more slowly and show a positive relation
between lifetime and size. For both science and conspiracy news, we
compute the size as a function of the lifetime and confirm that
differentiation in the sharing patterns is content-driven, and that for
conspiracy there is a positive relation between size and lifetime (see
SI Appendix,section2.1for further details).
Homogeneous Clusters. We next examine the social determinants
that drive sharing patterns and we focus on the role of homo-
geneity in friendship networks.
Fig. 3 shows the PDF of the mean-edge homogeneity, com-
puted for all cascades of science news and conspiracy theories. It
shows that the majority of links between consecutively sharing
users is homogeneous. In particular, the average edge homoge-
neity value of the entire sharing cascade is always greater than or
equal to zero, indicating that either the information transmission
occurs inside homogeneous clusters in which all links are ho-
mogeneous or it occurs inside mixed neighborhoods in which the
balance between homogeneous and nonhomogeneous links is
favorable toward the former ones. However, the probability of
close to zero mean-edge homogeneity is quite small. Contents
tend to circulate only inside the echo chamber.
Hence, to further characterize the role of homogeneity in
shaping sharing cascades, we compute cascade size as a function
of mean-edge homogeneity for both science and conspiracy news
(Fig. 4). In science news, higher levels of mean-edge homogeneity in
the interval (0.5, 0.8) correspond to larger cascades, but in
conspiracy theories lower levels of mean-edge homogeneity
(0.25) correspond to larger cascades. Notice that, although
viral patterns related to distinct contents differ, homogeneity is
clearly the driver of information diffusion. In other words, dif-
ferent contents generate different echo chambers, characterized
by a high level of homogeneity inside them. The PDF of the edge
homogeneity, computed for science and conspiracy news as well
as the two taken togetherboth in the unconditional case and in
the conditional case (in the event that the user that made the
first share in the couple has a positive or negative polarization)
confirms the roughly null probability of a negative edge homo-
geneity (SI Appendix, section 2.1).
We record the complementary cumulative distribution func-
tion (CCDF) of the number of all sharing paths* on each tree
compared with the CCDF of the number of homogeneous paths
for science and conspiracy news, and the two together. A Kol-
mogorovSmirnov test and Q-Q plots confirm that for all three
pairs of distributions considered there is no significant statistical
difference (see SI Appendix, section 2.2 for more details). We
confirm the pervasiveness of homogeneous paths.
Indeed, cascadeslifetimes of science and conspiracy news
exhibit a probability peak in the first 2 h, and then in the fol-
lowing hours they rapidly decrease. Despite the similar con-
sumption patterns, cascade lifetime expressed as a function of
the cascade size differs greatly for the different content sets.
However, homogeneity remains the main driver of cascades
propagation. The distributions of the number of total and ho-
mogeneous sharing paths are very similar for both content cat-
egories. Viral patterns related to contents belonging to different
narratives differ, but homogeneity is the primary driver of con-
tent diffusion.
0
1
2
3
4
0.1 1.0
Mean Edge Homogeneity
PDF
Science
Conspirac
y
Fig. 3. PDF of edge homogeneity for science (orange) and conspiracy (blue)
news. Homogeneity paths are dominant on the whole cascades for both
scientific and conspiracy news.
*Recall that a sharing path is here defined as any path from the root to one of the leaves
of the sharing tree. A homogeneous path is a sharing path for which the edge homo-
geneity of each edge is positive.
Del Vicario et al. PNAS Early Edition
|
3of6
STATISTICSSOCIAL SCIENCES
The Model. Our findings show that users mostly tend to select and
share content according to a specific narrative and to ignore the
rest. This suggests that the determinant for the formation of echo
chambers is confirmation bias. To model this mechanism we now
introduce a percolation model of rumor spreading to account for
homogeneity and polarization. We consider nusers connected by
a small-world network (41) with rewiring probability r.Everynode
has an opinion ωi,if1, nguniformly distributed between ½0,1
and is exposed to mnews items with a content ϑj,jf1, mg
uniformly distributed in ½0,1. At each step the news items are
diffused and initially shared by a group of first sharers. After the
first step, the news recursively passes to the neighborhoods of
previous step sharers, e.g., those of the first sharers during the
second step. If a friend of the previous step sharers has an opinion
close to the fitness of the news, then she shares the news again.
When
ωiϑj
δ,
user ishares news j;δis the sharing threshold.
Because δby itself cannot capture the homogeneous clusters
observed in the data, we model the connectivity pattern as a
signed network (4, 42) considering different fractions of homo-
geneous links and hence restricting diffusion of news only to
homogeneous links. We define ϕHL as the fraction of homoge-
neous links in the network, Mas the number of total links, and nh
as the number of homogeneous links; thus, we have
ϕHL =nh
M,0nhM.
Notice that 0 ϕHL 1 and that 1 ϕHL, the fraction of nonho-
mogeneous links, is complementary to ϕHL. In particular, we can
reduce the parameters space to ϕHL ½0.5, 1as we would restrict
our attention to either one of the two complementary clusters.
The model can be seen as a branching process where the
sharing threshold δand neighborhood dimension zare the key
parameters. More formally, let the fitness θjof the jth news and
the opinion ωiof a the ith user be uniformly independent
identically distributed (i.i.d.) between ½0,1. Then the probability
pthat a user ishares a post jis defined by a probability
p=minð1, θ+δÞmaxð0, θδÞ2δ, because θand ωare uni-
formly i.i.d. In general, if ωand θhave distributions fðωÞand
fðθÞ, then pwill depend on θ,
pθ=fðθÞZ
minð1, θ+δÞ
maxð0, θδÞ
fðωÞdω.
If we are on a tree of degree z(or on a sparse lattice of degree
z+1), the average number of sharers (the branching ratio) is
defined by
μ=zp 2δz,
with a critical cascade size S=ð1μÞ1. If we assume that the
distribution of the number mof the first sharers is fðmÞ, then the
average cascade size is
S=X
m
fðmÞmð1μÞ1=
hmif
1μhmif
12δz,
where h...if=Pm...fðmÞis the average with respect to f. In the
simulations we fixed neighborhood dimension z=8 because
the branching ratio μdepends upon the product of zand δand,
without loss of generality, we can consider the variation of just one
of them.
If we allow a probability qthat a neighbor of a user has a
different polarization, then the branching ratio becomes
μ=zð1qÞp. If a lattice has a degree distribution dðkÞ(k=z+1),
we can then assume a usual percolation process that provides a
critical branching ratio and that is linear in hk2id=hkid(μ
ð1qÞphz2i=hzi).
Simulation Results. We explore the model parameters space using
n=5,000 nodes and m=1,000 news items with the number of first
sharers distributed as (i) inverse Gaussian, (ii)lognormal,(iii)
Poisson, (iv) uniform distribution, and as the real-data distribution
(from the science and conspiracy news sample). In Table 1 we
show a summary of relevant statistics (min value, first quantile,
median, mean, third quantile, and max value) to compare the real-
data first sharers distribution with the fitted distributions.
Along with the first sharers distribution, we vary the sharing
threshold δin the interval ½0.01, 0.05and the fraction of ho-
mogeneous links ϕHL in the interval ½0.5, 1. To avoid biases in-
duced by statistical fluctuations in the stochastic process, each
point of the parameter space is averaged over 100 iterations.
ϕHL 0.5 provides a good estimate of real-data values. In par-
ticular, consistently with the division of in two echo chambers
(science and conspiracy), the network is divided into two clusters
in which news items remain inside and are transmitted solely
within each communitys echo chamber (see SI Appendix, section
3.2 for the details of the simulation results).
In addition to the science and conspiracy content sharing
trees, we downloaded a set of 1,072 sharing trees of intentionally
false information from troll pages. Frequently troll information,
e.g., parodies of conspiracy theories such as chem-trails containing
the active principle of Viagra, is picked up and shared by habitual
conspiracy theory consumers. We computed the mean and SD of
size and height of all trolling sharing trees, and reproduced the data
using our model.
We used fixed parameters from trolling messages
0
20
40
60
0.00 0.25 0.50 0.75 1.00
Mean Edge Homogeneity
Cascade Size
Science
Conspirac
y
Fig. 4. Cascade size as a function of edge homogeneity for science (orange)
and conspiracy (dashed blue) news.
For details on the parameters of the fitted distributions used, see SI Appendix,section3.2.
Note that the real-data values for the mean (and SD) of size and height on the troll posts
are, respectively, 23.54ð122.32Þand 1.78ð0.73Þ.
4of6
|
www.pnas.org/cgi/doi/10.1073/pnas.1517441113 Del Vicario et al.
sample (the number of nodes in the system and the number of news
items) and varied the fraction of homogeneous links ϕHL,the
rewiring probability r, and sharing threshold δ.SeeSI Appendix,
section 3.2 for the distribution of first sharers used and for addi-
tional simulation results of the fit on trolling messages.
We simulated the model dynamics with the best combination
of parameters obtained from the simulations and the number of
first sharers distributed as an inverse Gaussian. Fig. 5 shows the
CCDF of cascadessize and the cumulative distribution function
(CDF) of their height. A summary of relevant statistics (min
value, first quantile, median, mean, third quantile, and max
value) to compare the real-data size and height distributions with
the fitted ones is reported in SI Appendix, section 3.2.
We find that the inverse Gaussian is the distribution that best
fits the data both for science and conspiracy news, and for troll
messages. For this reason, we performed one more simulation
using the inverse Gaussian as distribution of the number of first
sharers, 1,072 news items, 16,889 users, and the best parameters
combination obtained in the simulations.
§
The CCDF of size and
the CDF of height for the above parameters combination, as well
as basic statistics considered, fit real data well.
Conclusions
Digital misinformation has become so pervasive in online social
media that it has been listed by the WEF as one of the main threats
to human society. Whether a news item, either substantiated or not,
is accepted as true by a user may be strongly affected by social
norms or by how much it coheres with the userssystemofbeliefs
(32, 33). Many mechanisms cause false information to gain accep-
tance, which in turn generate false beliefs that, once adopted by an
individual, are highly resistant to correction (3437). In this work,
using extensive quantitative analysis and data-driven modeling, we
provide important insights toward the understanding of the mech-
anism behind rumor spreading. Our findings show that users mostly
tend to select and share content related to a specific narrative and
to ignore the rest. In particular, we show that social homogeneity is
the primary driver of content diffusion, and one frequent result is
the formation of homogeneous, polarized clusters. Most of the
times the information is taken by a friend having the same profile
(polarization)––i.e., belonging to the same echo chamber.
We also find that although consumers of science news and
conspiracy theories show similar consumption patterns with re-
spect to content, their cascades differ.
Our analysis shows that for science and conspiracy news a
cascades lifetime has a probability peak in the first 2 h, followed
by a rapid decrease. Although the consumption patterns are
similar, cascade lifetime as a function of the size differs greatly.
These results suggest that news assimilation differs according
to the categories. Science news is usually assimilated, i.e., it
reaches a higher level of diffusion, quickly, and a longer lifetime
does not correspond to a higher level of interest. Conversely,
conspiracy rumors are assimilated more slowly and show a pos-
itive relation between lifetime and size.
The PDF of the mean-edge homogeneity indicates that ho-
mogeneity is present in the linking step of sharing cascades. The
distributions of the number of total sharing paths and homoge-
neous sharing paths are similar in both content categories.
Viral patterns related to distinct contents are different but
homogeneity drives content diffusion. To mimic these dynamics,
we introduce a simple data-driven percolation model of signed
networks, i.e., networks composed of signed edges accounting for
nodes preferences toward specific contents. Our model repro-
duces the observed dynamics with high accuracy.
Users tend to aggregate in communities of interest, which
causes reinforcement and fosters confirmation bias, segregation,
and polarization. This comes at the expense of the quality of the
information and leads to proliferation of biased narratives
fomented by unsubstantiated rumors, mistrust, and paranoia.
According to these settings algorithmic solutions do not seem
to be the best options in breaking such a symmetry. Next envi-
sioned steps of our research are to study efficient communication
strategies accounting for social and cognitive determinants be-
hind massive digital misinformation.
10 3
10 2.5
10 2
10 1.5
10 1
10 0.5
100
100100.5 101101.5 102102.5 103
Size
CCDF
Data
Simulated
0.00
0.25
0.50
0.75
1.00
0246
Height
CDF
Data
Simulated
Fig. 5. CCDF of size (Left) and CDF of height (Right) for the best parameters combination that fits real-data values,ðϕHL ,r,δÞ=ð0.56, 0.01, 0.015Þ, and first
sharers distributed as IGð18.73, 9.63Þ.
Table 1. Summary of relevant statistics comparing synthetic
data with the real ones
Values Data IG Lognormal Poisson
Min 1 0.36 0.10 20
First quantile 5 4.16 3.16 35
Median 10 10.45 6.99 39
Mean 39.34 39.28 13.04 39.24
Third quantile 27 31.59 14.85 43
Max 3,033 1814 486.10 66
The inverse Gaussian (IG) shows the best fit for the distribution of first
sharers with respect to all of the considered statistics.
§
The best parameters combinations is ϕHL =0.56, r=0.01, δ=0.015. In this case we have a
mean size equal to 23.42ð33.43Þand a mean height 1.28ð0.88Þ, and it is indeed a good
approximation; see SI Appendix, section 3.2.
Del Vicario et al. PNAS Early Edition
|
5of6
STATISTICSSOCIAL SCIENCES
ACKNOWLEDGMENTS. Special thanks go to Delia Mocanu, Protesi di Pro-
tesi di Complotto,”“Che vuol dire reale,”“La menzogna diventa verita
e passa alla storia,”“Simply Humans,”“Semplicemente me,Salvatore
Previti, Elio Gabalo, Sandro Forgione, Francesco Pertini, and The rooster
on the trashfor their valuable suggestions and discussions. Funding
for this work was provided by the EU FET Project MULTIPLEX, 317532,
SIMPOL, 610704, the FET Project DOLFINS 640772, SoBigData 654024, and
CoeGSS 676547.
1. Brown J, Broderick AJ, Lee N (2007) Word of mouth communication within online
communities: Conceptualizing the online social network. J Interact Market 21(3):220.
2. Kahn R, Kellner D (2004) New media and internet activism: From the battle of Se-
attleto blogging. New Media Soc 6(1):8795.
3. Quattrociocchi W, Conte R, Lodi E (2011) Opinions manipulation: Media, power and
gossip. Adv Complex Syst 14(4):567586.
4. Quattrociocchi W, Caldarelli G, Scala A (2014) Opinion dynamics on interacting net-
works: Media competition and social influence. Sci Rep 4:4938.
5. Kumar R, Mahdian M, McGlohon M (2010) Dynamics of conversations. Proceedings of
the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (ACM, New York), pp 553562.
6. Sunstein C, Vermeule A (2009) Conspiracy theories: Causes and cures. J Polit Philos
17(2):202227.
7. Kadlec C (2011) The goal is power: The global warming conspiracy. Forbes, July 25,
2011. Available at www.forbes.com/sites/charleskadlec/2011/07/25/the-goal-is-po.wer-
the-global-warming-conspiracy/. Accessed August 21, 2015.
8. Millman J (2014) The inevitable rise of Ebola conspiracy theories. The Washington
Post, Oct. 13, 2014. Available at https://www.washingtonpost.com/news/wonk/
wp/20 14/10/13 /the-ine vitable -rise- of-ebo la-con spiracy- theories /. Accessed August
31, 2015.
9. Lamothe D (2015) Remember Jade Helm 15, the controversial military exercise? Its
over. The Washington Post, Sept. 14, 2015. Available at https://www.washingtonpost.
com/news/checkpoint/wp/2015/09/14/remember-jade-helm-15-the-controversial-military-
exercise-its-over/. Accessed September 20, 2015.
10. Bessi A, et al. (2015) Science vs conspiracy: Collective narratives in the age of mis-
information. PLoS One 10(2):e0118093.
11. Mocanu D, Rossi L, Zhang Q, Karsai M, Quattrociocchi W (2015) Collective attention in
the age of (mis) information. Comput Human Behav 51:11981204.
12. Bessi A, Scala A, Rossi L, Zhang Q, Quattrociocchi W (2014) The economy of attention
in the age of (mis) information. J Trust Manage 1(1):113.
13. Furedi F (2006) Culture of Fear Revisited (Bloomsbury, London).
14. Aiello LM, et al. (2012) Friendship prediction and homophily in social media. ACM
Trans Web 6(2):9.
15. Gu B, Konana P, Raghunathan R, Chen HM (2014) Research note––the allure of ho-
mophily in social media: Evidence from investor responses on virtual communities. Inf
Syst Res 25(3):604617.
16. Bessi A, et al. (2015) Viral misinformation: The role of homophily and polarization.
Proceedings of the 24th International Conference on World Wide Web Companion
(International World Wide Web Conferences Steering Committee, Florence,
Italy), pp 355356.
17. Bessi A, et al. (2015) Trend of narratives in the age of misinformation. PLoS One 10(8):
e0134641.
18. Zollo F, et al. (2015) Emotional dynamics in the age of misinformation. PLoS One
10(9):e0138740.
19. Byford J (2011) Conspiracy Theories: A Critical Introduction (Palgrave Macmillan,
London).
20. Fine GA, Campion-Vincent V, Heath C (2005) Rumor Mills: The Social Impact of Rumor
and Legend, eds Fine GA, Campion-Vincent V, Heath C (Aldine Transaction, New
Brunswick, NJ), pp 103122.
21. Hogg MA, Blaylock DL (2011) Extremism and the Psychology of Uncertainty (John
Wiley & Sons, Chichester, UK), Vol 8.
22. Betsch C, Sachse K (2013) Debunking vaccination myths: Strong risk negations can
increase perceived vaccination risks. Health Psychol 32(2):146155.
23. Howell L (2013) Digital wildfires in a hyperconnected world. WEF Report 2013.
Available at reports.weforum.org/global-risks-2013/risk-case-1/digital-wildfires-in-a-
hyperconnected-world. Accessed August 31, 2015.
24. Qazvinian V, Rosengren E, Radev DR, Mei Q (2011) Rumor has it: Identifying mis-
information in microblogs. Proceedings of the Conference on Empirical Methods in
Natural Language Processing (Association for Computational Linguistics, Stroudsburg,
PA), pp 15891599.
25. Ciampaglia GL, et al. (2015) Computational fact checking from knowledge networks.
arXiv:1501.03471.
26. Resnick P, Carton S, Park S, Shen Y, Zeffer N (2014) Rumorlens: A system for analyzing
the impact of rumors and corrections in social media. Proceedings of Computational
Journalism Conference (ACM, New York).
27. Gupta A, Kumaraguru P, Castillo C, Meier P (2014) Tweetcred: Real-time credibility
assessment of content on twitter. Social Informatics (Springer, Berlin), pp 228243.
28. Al Mansour AA, Brankovic L, Iliopoulos CS (2014) A model for recalibrating credibility
in different contexts and languages-a twitter case study. Int J Digital Inf Wireless
Commun 4(1):5362.
29. Ratkiewicz J, et al. (2011) Detecting and tracking political abuse in social media.
Proceedings of the 5th International AAAI Conference on Weblogs and Social Media
(AAAI, Palo Alto, CA).
30. Dong XL, et al. (2015) Knowledge-based trust: Estimating the trustworthiness of web
sources. Proc VLDB Endowment 8(9):938949.
31. Nyhan B, Reifler J, Richey S, Freed GL (2014) Effective messages in vaccine promotion:
A randomized trial. Pediatrics 133(4):e835e842.
32. Zhu B, et al. (2010) Individual differences in false memory from misinformation:
Personality characteristics and their interactions with cognitive abilities. Pers Individ
Dif 48(8):889894.
33. Frenda SJ, Nichols RM, Loftus EF (2011) Current issues and advances in misinformation
research. Curr Dir Psychol Sci 20(1):2023.
34. Kelly GR, Weeks BE (2013) The promise and peril of real-time corrections to political
misperceptions. Proceedings of the 2013 Conference on Computer Supported
Cooperative Work (ACM, New York), pp 10471058.
35. Meade ML, Roediger HL, 3rd (2002) Explorations in the social contagion of memory.
Mem Cognit 30(7):9951009.
36. Koriat A, Goldsmith M, Pansky A (2000) Toward a psychology of memory accuracy.
Annu Rev Psychol 51(1):481537.
37. Ayers MS, Reder LM (1998) A theoretical review of the misinformation effect: Pre-
dictions from an activation-based memory model. Psychon Bull Rev 5(1):121.
38. Sunstein C (2001) Echo Chambers (Princeton Univ Press, Princeton, NJ).
39. Kelly GR (2009) Echo chambers online?: Politically motivated selective exposure
among internet news users. J Comput Mediat Commun 14(2):265285.
40. Facebook. (2015) Using the graph API. Available at https://developers.facebook.com/
docs/graph-api/using-graph-api. Accessed December 19, 2015.
41. Watts DJ, Strogatz SH (1998) Collective dynamics of small-worldnetworks. Nature
393(6684):440442.
42. Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media.
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
(ACM, New York), pp 13611370.
6of6
|
www.pnas.org/cgi/doi/10.1073/pnas.1517441113 Del Vicario et al.
... A substantial consequence of emergence is social polarization, according to which agents develop increasingly extreme opinions and display diminished tolerance for opposing viewpoints, ultimately leading to societal divisions. Numerous studies have associated the phenomenon with negative outcomes in political contexts, as seen in the recent elections in both Brazil and the United States [3][4][5][6]. In Brazil, heightened polarization culminated in a significant event on 8 January 2023, when key institutions in Brasília, the capital of Brazil, were invaded. ...
... Closeness centrality CLC X 3 Betweenness centrality BC X 4 Shortest path length SPL X 5 Degree Pearson correlation coefficient PC X 6 Information centrality IC X 7 Subgraph centrality SC X 8 Approx. Current flow betweenness centrality AC X 9 Eigenvector centrality EC Subgraph centrality (SC) measures the importance of a node within its local subgraph by considering the closed walks that pass through the node, capturing its influence within specific network neighborhoods [59]. ...
Article
Full-text available
In this article, we consider machine learning algorithms to accurately predict two variables associated with the Q -voter model in complex networks, i.e. (i) the consensus time and (ii) the frequency of opinion changes. Leveraging nine topological measures of the underlying networks, we verify that the clustering coefficient (C) and information centrality emerge as the most important predictors for these outcomes. Notably, the machine learning algorithms demonstrate accuracy across three distinct initialization methods of the Q -voter model, including random selection and the involvement of high- and low-degree agents with positive opinions. By unraveling the intricate interplay between network structure and dynamics, this research sheds light on the underlying mechanisms responsible for polarization effects and other dynamic patterns in social systems. Adopting a holistic approach that comprehends the complexity of network systems, this study offers insights into the intricate dynamics associated with polarization effects and paves the way for investigating the structure and dynamics of complex systems through modern methods of machine learning.
... Getting properly informed is a fundamental civic right to live in a modern democracy. This seemingly simple fact is becoming increasingly challenging to achieve due to the abundance of information sources, as well as the competition between true and false news [1][2][3][4][5]. ...
Preprint
Full-text available
In times marked by an abundance of news sources and the widespread use of social media for staying informed, acquiring accurate data faces increasing challenges. Today, access to information plays a crucial role in shaping public opinion and is significantly influenced by interactions on social media. Therefore, studying the dissemination of news on these platforms is vital for understanding how individuals stay informed. In this paper, we study emergent properties of media outlet sharing behavior by users in social media. We quantify this behavior in terms of coordinates in a latent space proposing a metric called Media Sharing Index (MSI). We observe that the MSI shows a bimodal distribution in this latent dimension, reflecting the preference of large groups of users for specific groups of media outlets. This methodology allows the study of the extent to which communities of interacting users are permeable to different sources of information. Additionally, it facilitates the analysis of the relationship between users' media outlet preferences, their political leanings, and the political leanings of the media outlets.
Article
Many scholars and journalists point out the possibility that the Internet expands the divide in public opinion and in our society by causing polarization of people's opinions about political issues, because selective exposure to news and civic information is more likely to occur on the Internet than in the mass-media-centered environment that was the norm in the past. In this study, turning attention to cyber-racism such as “Alt-Right” in the U.S. or “Netto-uyoku” in Japan, we examined the relations between polarization of xenophobic attitudes and frequencies of exposure to online news via using PCs/smartphones, based on the data from online questionnaire surveys conducted in the U.S. and Japan in 2016. The results of quantile regression analyses showed that in Japan exposure to online news via PCs extended significantly the polarization of users' attitudes, whereas in the U.S. it shifted the attitudes uniformly toward an anti-xenophobic direction. These findings suggest that the occurrence of opinion polarization on the Internet is influenced by social, political and cultural contexts.
Conference Paper
Social media platforms offer a convenient way for people to interact and exchange information. However, there are sustained concerns that filter bubbles and echo chambers create information-limiting environments (ILEs) for their users. Despite a well-developed conceptual understanding, the empirical evidence regarding the causes and supporting conditions of these ILEs remains inconclusive. This paper addresses this gap by applying the triple-filter-bubble model developed by Geschke et al. (2019) to analyze empirical literature on the individual, social, and technological causes of ILEs. While we identify some factors that increase the probability of ILEs under certain conditions, our findings do not suffice to thoroughly validate conceptual models that explain why ILEs emerge. Therefore, we call for future research to investigate the causes of ILEs with higher external validity to develop a more comprehensive understanding of this phenomenon.
Article
Our beliefs are inextricably shaped through communication with others. Furthermore, even conversation we conduct in pairs may itself be taking place across a wider, connected social network. Our communications, and with that our thoughts, are consequently typically those of individuals in collectives. This has fundamental consequences with respect to how our beliefs are shaped. This article examines the role of dependence on our beliefs and seeks to demonstrate its importance with respect to key phenomena involving collectives that have been taken to indicate irrationality. It is argued that (with the benefit of hindsight) these phenomena no longer seem surprising when one considers the multiple dependencies that govern information acquisition and the evaluation of cognitive agents in their normal (i.e., social) context.
Article
Full-text available
Disclosure of disinformation has attracted increasing attention in recent years. The society recognises that false reports pose a real threat to the credibility of information and, ultimately, to the security of society. On the Internet an active audience is a distributor of media content because they are convinced of its truth, and in the online environment they find it in other people. Therefore, the audience seems to be an active amplifier of disinformation (sharing), and thus explicitly as a creator of (unwanted) web content (sharing and commenting). People’s willingness to share disinformation is linked to people’s similar attitudes; it is related to the similarity of faith and to the perception of the message, considered as appropriate and interesting (“I like it”), etc. The term “homogeneity” turns out to be a key term in audience research, and experts speak about a phenomenon that in fact appears to be the main driving force for the dissemination of any content. The aim of the research is to identify and classify the factors that motivate university students to share information on the social networking site Facebook.
Article
In 2013, nobody could believe that the old identarian Apulian trees—the symbol of beauty and the building block of the local economy—were rapidly drying and dying because they had been hit by a new pathogen that had never colonized Italy and its olive trees before. But the story of Xylella fastidiosa—the bacterium that has dramatically affected the landscapes and livelihoods of the Apulia Region (in Southern Italy) for the last ten years—is a textbook example of the parallel kinetics that determine the routes of diseases in plants and human beings alike. In drawing a very efficacious comparison between the olive plants’ disease in the fields and the COVID-19 disease in human bodies, the account explains the wide-ranging consequence of natural calamities and their origins in human actions and choices. Xylella fastidiosa still menaces olive trees in Apulia, and other countries in Europe, but in ten years it has triggered off the collective psychosis of local communities, still under shock. A stark reminder that nature is not disposable and that humans are intimately connected with it.
Article
Full-text available
Misinformation disseminated via online social networks can cause social confusion and result in inadequate responses during disasters and emergencies. To contribute to social media-based disaster resilience, we aim to decipher the spread of disaster misinformation and its correction through the case study of the disaster rumor during Hurricane Sandy (2012) on Twitter. We first leveraged social network analysis to identify different types of accounts that are influential in spreading and debunking disaster misinformation. Second, we examined how the spatiotemporal proximity to the rumor event influences the sharing of misinformation and the sharing of corrections on Twitter. Third, through sentiment analysis, we went further by examining how spatiotemporal and demographic similarity between social media users affect behavioral and emotional responses to misinformation. Finally, sentiment contagion across rumor and correction networks was also examined. Our findings generate novel insights into detecting and counteracting misinformation using social media with implications for disaster management.
Article
Full-text available
Quantum networks have experienced rapid advancements in both theoretical and experimental domains over the last decade, making it increasingly important to understand their large-scale features from the viewpoint of statistical physics. This review paper discusses a fundamental question: how can entanglement be effectively and indirectly (e.g., through intermediate nodes) distributed between distant nodes in an imperfect quantum network, where the connections are only partially entangled and subject to quantum noise? We survey recent studies addressing this issue by drawing exact or approximate mappings to percolation theory, a branch of statistical physics centered on network connectivity. Notably, we show that the classical percolation frameworks do not uniquely define the network’s indirect connectivity. This realization leads to the emergence of an alternative theory called “concurrence percolation”, which uncovers a previously unrecognized quantum advantage that emerges at large scales, suggesting that quantum networks are more resilient than initially assumed within classical percolation contexts, offering refreshing insights into future quantum network design.
Article
Full-text available
The inner dynamics of the multiple actors of the informations systems - i.e, T.V., newspapers, blogs, social network platforms, - play a fundamental role on the evolution of the public opinion. Coherently with the recent history of the information system (from few main stream media to the massive diffusion of socio-technical system), in this work we investigate how main stream media signed interaction might shape the opinion space. In particular we focus on how different size (in the number of media) and interaction patterns of the information system may affect collective debates and thus the opinions' distribution. We introduce a sophisticated computational model of opinion dynamics which accounts for the coexistence of media and gossip as separated mechanisms and for their feedback loops. The model accounts also for the effect of the media communication patterns by considering both the simple case where each medium mimics the behavior of the most successful one (to maximize the audience) and the case where there is polarization and thus competition among media memes. We show that plurality and competition within information sources lead to stable configurations where several and distant cultures coexist.
Article
Full-text available
There is an error in the last sentence of the “Validation on factual statements” section of the Results. The sentence should read: With this method we estimate that, in the four subject areas, true statements are assigned higher truth values than false ones with probability 95%, 98%, 100%, and 95%, respectively. Fig 4 is incorrect. Please view the corrected figure below. Fig 4 Receiver Operating Characteristic for the multiple questions task. For each confusion matrix depicted in Fig 3 we compute ROC curves where true statements correspond to the diagonal and false statements to off-diagonal elements. The red dashed line represents the performance of a random classifier.
Article
Full-text available
Millions of people participate in online social media to exchange and share information. Presumably, such information exchange could improve decision making and provide instrumental benefits to the participants. However, to benefit from the information access provided by online social media, the participant will have to overcome the allure of homophily-which refers to the propensity to seek interactions with others of similar status (e.g., religion, education, income, occupation) or values (e.g., attitudes, beliefs, and aspirations). This research assesses the extent to which social media participants exhibit homophily (versus heterophily) in a unique context-virtual investment communities (VICs). We study the propensity of investors in seeking interactions with others with similar sentiments in VICs and identify theoretically important and meaningful conditions under which homophily is attenuated. To address this question, we used a discrete choice model to analyze 682,781 messages on Yahoo! Finance message boards for 29 Dow Jones stocks and assess how investors select a particular thread to respond. Our results revealed that, despite the benefits from heterophily, investors are not immune to the allure of homophily in interactions in VICs. The tendency to exhibit homophily is attenuated by an investor's experience in VICs, the amount of information in the thread, but amplified by stock volatility. The paper discusses important implications for practice.
Conference Paper
Full-text available
During sudden onset crisis events, the presence of spam, rumors and fake content on Twitter reduces the value of information contained on its messages (or “tweets”). A possible solution to this problem is to use machine learning to automatically evaluate the credibility of a tweet, i.e. whether a person would deem the tweet believable or trustworthy. This has been often framed and studied as a supervised classification problem in an off-line (post-hoc) setting. In this paper, we present a semi-supervised ranking model for scoring tweets according to their credibility. This model is used in TweetCred, a real-time system that assigns a credibility score to tweets in a user’s timeline. TweetCred, available as a browser plug-in, was installed and used by 1,127 Twitter users within a span of three months. During this period, the credibility score for about 5.4 million tweets was computed, allowing us to evaluate TweetCred in terms of response time, effectiveness and usability. To the best of our knowledge, this is the first research work to develop a real-time system for credibility on Twitter, and to evaluate it on a user base of this size.
Article
Full-text available
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario has been shown to be a vivid environment for the diffusion of false claims, in particular with respect to conspiracy theories. Not rarely, viral phenomena trigger naive (and funny) social responses -- e.g., the recent case of Jade Helm 15 where a simple military exercise turned out to be perceived as the beginning of the civil war in the US. In this work, we address the emotional dynamics of collective debates around distinct kind of news -- i.e., science and conspiracy news -- and inside and across their respective polarized communities (science and conspiracy news). Our findings show that comments on conspiracy posts tend to be more negative than on science posts. However, the more the engagement of users, the more they tend to negative commenting (both on science and conspiracy). Finally, zooming in at the interaction among polarized communities, we find a general negative pattern. As the number of comments increases -- i.e., the discussion becomes longer -- the sentiment of the post is more and more negative.
Article
Full-text available
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario has been shown to be a vivid environment for the diffusion of false claims, in particular with respect to conspiracy theories. Not rarely, viral phenomena trigger naive (and funny) social responses – e.g., the recent case of Jade Helm 15 where a simple military exercise turned out to be perceived as the beginning of the civil war in the US. In this work, we address the emotional dynamics of collective debates around distinct kind of news – i.e., science and conspiracy news – and inside and across their respective polarized communities (science and conspiracy news). Our findings show that comments on conspiracy posts tend to be more negative than on science posts. However, the more the engagement of users, the more they tend to negative commenting (both on science and conspiracy). Finally, zooming in at the interaction among polarized communities, we find a general negative pattern. As the number of comments increases – i.e., the discussion becomes longer – the sentiment of the post is more and more negative.
Article
Full-text available
In this work we present a thorough quantitative analysis of information consumption patterns of qualitatively different information on Facebook. Pages are categorized, according to their topics and the communities of interests they pertain to, in a) alternative information sources (diffusing topics that are neglected by science and main stream media); b) online political activism; and c) main stream media. We find similar information consumption patterns despite the very different nature of contents. Then, we classify users according to their interaction patterns among the different topics and measure how they responded to the injection of 2788 false information (parodistic imitations of alternative stories). We find that users prominently interacting with alternative information sources – i.e. more exposed to unsubstantiated claims – are more prone to interact with intentional and parodistic false claims.
Article
Due to the growing dependence on the WWW User- Generated Content (UGC) as a primary source for information and news, the research on web credibility is becoming more important than ever before. In this paper we review previous efforts to evaluate information credibility, focusing specifically on microblogging. In particular, we provide a comparison of different systems for automatic assessment of information credibility based on the used techniques and features, and we classify the Twitter credibility surveys based on the features considered. We then propose a general model to assess information credibility on UGC different platforms, including Twitter, which employs a contextual credibility approach that examines the effect of culture, situation, topic variations, and languages on assessing credibility, using Arabic context as an example. We identify several factors that users may consider in determining credibility, and argue that the importance of each factor may vary with a context. Future work will include both a user study and machine learning techniques to evaluate the effectiveness of various factors for information credibility classification in different contexts.