ArticlePDF Available

The COVID-19 social media infodemic

Authors:
ResearchGate Logo

This article is featured on the COVID-19 research community page

View COVID-19 community

Abstract and Figures

We address the diffusion of information about the COVID-19 with a massive data analysis on Twitter, Instagram, YouTube, Reddit and Gab. We analyze engagement and interest in the COVID-19 topic and provide a differential assessment on the evolution of the discourse on a global scale for each platform and their users. We fit information spreading with epidemic models characterizing the basic reproduction number R 0 for each social media platform. Moreover, we identify information spreading from questionable sources, finding different volumes of misinformation in each platform. However, information from both reliable and questionable sources do not present different spreading patterns. Finally, we provide platform-dependent numerical estimates of rumors’ amplification.
Content may be subject to copyright.

Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports
The COVID‑19 social media
infodemic
Matteo Cinelli1,2, Walter Quattrociocchi1,2,3*, Alessandro Galeazzi4, Carlo Michele Valensise5,
Emanuele Brugnoli1, Ana Lucia Schmidt2, Paola Zola6, Fabiana Zollo1,2,7 & Antonio Scala1,3
We address the diusion of information about the COVID‑19 with a massive data analysis on Twitter,
Instagram, YouTube, Reddit and Gab. We analyze engagement and interest in the COVID‑19 topic
and provide a dierential assessment on the evolution of the discourse on a global scale for each
platform and their users. We t information spreading with epidemic models characterizing the basic
reproduction number
R0
for each social media platform. Moreover, we identify information spreading
from questionable sources, nding dierent volumes of misinformation in each platform. However,
information from both reliable and questionable sources do not present dierent spreading patterns.
Finally, we provide platform‑dependent numerical estimates of rumors’ amplication.
e World Health Organization (WHO) dened the SARS-CoV-2 virus outbreak as a severe global threat1. As
foreseen in 2017 by the global risk report of the World Economic forum, global risks are interconnected. In
particular, the case of the COVID-19 epidemic (the infectious disease caused by the most recently discovered
human coronavirus) is showing the critical role of information diusion in a disintermediated news cycle2.
e term infodemic3,4 has been coined to outline the perils of misinformation phenomena during the man-
agement of disease outbreaks57, since it could even speed up the epidemic process by inuencing and frag-
menting social response8. As an example, CNN has recently anticipated a rumor about the possible lock-down
of Lombardy (a region in northern Italy) to prevent pandemics9, publishing the news hours before the ocial
communication from the Italian Prime Minister. As a result, people overcrowded trains and airports to escape
from Lombardy toward the southern regions before the lock-down was put in place, disrupting the government
initiative aimed to contain the epidemics and potentially increasing contagion. us, an important research
challenge is to determine how people seek or avoid information and how those decisions aect their behavior10,
particularly when the news cycle—dominated by the disintermediated diusion of information—alters the way
information is consumed and reported on.
e case of the COVID-19 epidemic shows the critical impact of this new information environment. e
information spreading can strongly inuence people’s behavior and alter the eectiveness of the countermeas-
ures deployed by governments. To this respect, models to forecast virus spreading are starting to account for
the behavioral response of the population with respect to public health interventions and the communication
dynamics behind content consumption8,11,12.
Social media platforms such as YouTube and Twitter provide direct access to an unprecedented amount of
content and may amplify rumors and questionable information. Taking into account users’ preferences and atti-
tudes, algorithms mediate and facilitate content promotion and thus information spreading13. is shi from the
traditional news paradigm profoundly impacts the construction of social perceptions14 and the framing of narra-
tives; it inuences policy-making, political communication, as well as the evolution of public debate15,16, especially
when issues are controversial17. Users online tend to acquire information adhering to their worldviews18,19, to
ignore dissenting information20,21 and to form polarized groups around shared narratives22,23. Furthermore, when
polarization is high, misinformation might easily proliferate24,25. Some studies pointed out that fake news and
inaccurate information may spread faster and wider than fact-based news26. However, this might be platform-
specic eect. e denition of “Fake News” may indeed be inadequate since political debate oen resorts
to labelling opposite news as unreliable or fake27. Studying the eect of the social media environment on the
perception of polarizing topics is being addressed also in the case of COVID-19. e issues related to the cur-
rent infodemics are indeed being tackled by the scientic literature from multiple perspectives including the
open
               
    
 *
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
dynamics of hatespeech and conspiracy theories28,29, the eect of bots and automated accounts30, and the threats
of misinformation in terms of diusion and opinions formation31,32.
In this work we provide an in-depth analysis of the social dynamics in a time window where narratives and
moods in social media related to the COVID-19 have emerged and spread. While most of the studies on misin-
formation diusion focus on a single platform17,26,33, the dynamics behind information consumption might be
particular to the environment in which they spread on. Consequently, in this paper we perform a comparative
analysis on ve social media platforms (Twitter, Instagram, YouTube, Reddit and Gab) during the COVID-19
outbreak. e dataset includes more than 8 million comments and posts over a time span of 45 days. We analyze
user engagement and interest about the COVID-19 topic, providing an assessment of the discourse evolution
over time on a global scale for each platform. Furthermore, we model the spread of information with epidemic
models, characterizing for each platform its basic reproduction number (
R0
), i.e. the average number of second-
ary cases (users that start posting about COVID-19) an “infectious” individual (an individual already posting
on COVID-19) will create. In epidemiology,
R0
=1 is a threshold parameter. When
R0<1
the disease will die
out in a nite period of time, while the disease will spread for
R0>1
. In social media,
R0>1
will indicate the
possibility of an infodemic.
Finally, coherently with the classication provided by the fact-checking organization Media Bias/Fact Check34
that classies news sources based on the truthfulness and bias of the information published, we split news outlets
into two groups. ese groups are either associated to the diusion of (mostly) reliable or (mostly) questionable
contents and we characterize the spreading of information regarding COVID-19 relying on this classication.
We nd that users in mainstream platforms are less susceptible to the diusion of information from question-
able sources and that information deriving from news outlets marked either as reliable or questionable do not
present signicant dierence in the way it spreads.
Our ndings suggest that the interaction patterns of each social media combined with the peculiarity of the
audience of each platform play a pivotal role in information and misinformation spreading. We conclude the
paper by measuring rumor’s amplication parameters for COVID-19 on each social media platform.
Results
We analyze mainstream platforms such as Twitter, Instagram and YouTube as well as less regulated social media
platforms such as Gab and Reddit. Gab is a crowdfunded social media whose structure and features are Twitter-
inspired. It performs very little control on content posted; in the political spectrum, its user base is considered
to be far-right. Reddit is an American social news aggregation, web content rating, and discussion website based
on collective ltering of information.
We perform a comparative analysis of information spreading dynamics around the same argument in dier-
ent environments having dierent interaction settings and audiences. We collect all pieces of content related to
COVID-19 from the 1st of January to the 14th of February. Data have been collected ltering contents accord-
ingly to a selected sample of Google Trends’ COVID-19 related queries such as: coronavirus, coronavirusout-
break, imnotavirus, ncov, ncov-19, pandemic, wuhan. e deriving dataset is then composed of 1,342,103 posts
and 7,465,721 comments produced by 3,734,815 users. For more details regarding the data collection refer to
Methods.
Interaction patterns. First, we analyze the interactions (i.e., the engagement) that users have with COVID-
19 topics on each platform. e upper panel of Fig.1 shows users’ engagement around the COVID-19 topic.
Despite the dierences among platforms, we observe that they all display a rather similar distribution of the
users’ activity characterized by a long tail. is entails that users behave similarly for what concern the dynamics
of reactions and content consumption. Indeed, users’ interactions with the COVID-19 content present attention
patterns similar to any other topic35. e highest volume of interactions in terms of posting and commenting can
be observed on mainstream platforms such as YouTube and Twitter.
en, to provide an overview of the debate concerning the disease outbreak, we extract and analyze the topics
related to the COVID-19 content by means of Natural Language Processing techniques. We build word embed-
ding for the text corpus of each platform, i.e. a word vector representation in which words sharing common
contexts are in close proximity. Moreover, by running clustering procedures on these vector representations, we
separate groups of words and topics that are perceived as more relevant for the COVID-19 debate. For further
details refer to Methods. e results (Fig.1, middle panel) show that topics are quite similar across each social
media platform. Debates range from comparisons to other viruses, requests for God blessing, up to racism, while
the largest volume of interaction is related to the lock-down of ights.
Finally, to characterize user engagement with the COVID-19 on the ve platforms, we compute the cumulative
number of new posts each day (Fig.1, lower panel). For all platforms, we nd a change of behavior around the
20th of January, that is the day that the World Health Organization (WHO) issued its rst situation report on the
COVID-1936. e largest increase in the number of posts is on the 21st of January for Gab, the 24th January for
Reddit, the 30th January for Twitter, the 31th January for YouTube and the 5th of February for Instagram. us,
social media platforms seem to have specic timings for content consumption; such patterns may depend upon
the dierence in terms of audience and interaction mechanisms (both social and algorithmic) among platforms.
Information spreading. Eorts to simulate the spreading of information on social media by reproducing
real data have mostly applied variants of standard epidemic models3740. Coherently, we analyze the observed
monotonic increasing trend in the way new users interact with information related to the COVID-19 by using
epidemic models. Unlike previous works, we do not only focus on models that imply specic growth mecha-
nisms, but also on phenomenological models that emphasize the reproducibility of empirical data41.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
Most of the epidemiological models focus on the basic reproduction number
R0
, representing the expected
number of new infectors directly generated by an infected individual for a given time period42. An epidemic
occurs if
R0>1
,—i.e., if an exponential growth in the number of infections is expected at least in the initial
phase. In our case, we try to model the growth in number of people publishing a post on a subject as an infec-
tive process, where people can start publishing aer being exposed to the topic. While in real epidemics
R0>1
highlights the possibility of a pandemic, in our approach
R0>1
indicates the emergence of an infodemic. We
model the dynamics both with the phenomenological model of43 (from now on referred to as the EXP model)
and with the standard SIR (Susceptible, Infected, Recovered) compartmental model44. Further details on the
modeling approach can be found in Methods.
As shown in Fig.2, each platform has its own basic reproduction number
R0
. As expected, all the values of
R0
are supercritical—even considering condence intervals (Table1)—signaling the possibility of an infodemic. is
observation may facilitate the prediction task of information spreading during critical events. Indeed, according
to this result we can consider information spreading patterns on each social media to predict social response
when implementing crisis management plans.
While
R0
is a good proxy for the engagement rate and a good predictor for epidemic-like information spread-
ing, social contagion phenomena might be in general more complex4547. For instance, in the case of Instagram,
we observe an abrupt jump in the number of new users that cannot be explained with continuous models like
the standard epidemic ones; accordingly, the SIR model estimates a value of
R0102
that is way beyond what
has been observed in any real-world epidemic.
Figure1. Upper panel: activity (likes, comments, reposts, etc.) distribution for each social media. Middle panel:
most discussed topics about COVID-19 on each social media. Lower panel: cumulative number of content
(posts, tweets, videos, etc.) produced from the 1st of January to the 14th of February. Due to the Twitter API
limitations in gathering past data, the rst data point for Twitter is dated January 27th.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
Questionable VS reliable information sources. We conclude our analysis by comparing the diusion
of information from questionable and reliable sources on each platform. We tag links as reliable or question-
able according to the data reported by the independent fact-checking organization Media Bias/Fact Check34. In
order to clarify the limits of an approach that is based on labelling news outlets rather than single articles, as for
instance performed in33,48, we report the denitions used in this paper for questionable and reliable information
sources. In accordance with the criteria established by MBFC, by questionable information source we mean a
news outlet systematically showing one or more of the following characteristics: extreme bias, consistent promo-
tion of propaganda/conspiracies, poor or no sourcing to credible information, information not supported by
evidence or unveriable, a complete lack of transparency and/or fake news. By reliable information sources we
mean news outlets that do not show any of the aforementioned characteristics. Such outlets can anyway produce
contents potentially displaying a bias towards liberal/conservative opinion, but this does not compromise the
overall reliability of the source.
Figure3 shows, for each platform, the plots of the cumulative number of posts and reactions related to reliable
sources versus the cumulative number of posts and interactions referring to questionable sources. By interactions
we mean the overall reactions, e.g. likes or other form or endorsement and comments, that can be performed
with respect to a post on a social platform. Surprisingly, all the posts show a strong linear correlation, i.e., the
number of posts/reactions relying on questionable and reliable sources grows with the same pace inside the same
social media platform. We observe the same phenomenon also for the engagement with reliable and questionable
sources. Hence, the growth dynamics of posts/interactions related to questionable news outlets is just a re-scaled
version of the growth dynamics of posts/reactions related to reliable news outlets; however, the re-scaling factor
ρ
(i.e., the fraction of questionable over reliable) is strongly dependent on the platform.
In particular, we observe that in mainstream social media the number of posts produced by questionable
sources represents a small fraction of posts produced by reliable ones; the same thing happens in Reddit. Among
less regulated social media, a peculiar eect is observed in Gab: while the volume of posts from questionable
sources is just the
of the volume of posts from reliable ones, the volume of reactions for the former ones
is
3
times bigger than the volume for the latter ones. Such results hint the possibility that dierent platform
react dierently to information produced by reliable and questionable news outlets.
To further investigate this issue, we dene the amplication factor
E
as the average number of reactions to a
post; hence,
E
is a measure that quanties the extent to which a post is amplied in a social media. We observe
that the amplication
EU
(for unreliable posts posts produced by questionable outlets) and
ER
(for reliable posts
Figure2. Growth of the number of authors versus time. Time is expressed in number of days since 1st January
2020 (day 1). Shaded areas represents [5%, 95%] estimates of the models obtained via bootstrapping least square
estimates of the EXP model (upper panels) and of the SIR model (lower panels). For details the SIR and the EXP
model, see SI.
Table 1. [5%, 95%] interval of condence
R0
as estimated from bootstrapping the least square ts parameter
of the EXP and of the SIR model. Notice that, due to the steepness of the growth of the number of new authors
in Instagram,
R0
assumes unrealistic values
102
for the SIR model.
Gab Reddit YouTu b e Instagram Twitter
REXP
0
[1.42, 1.52] [1.44, 1.51] [1.56, 1.70] [2.02, 2.64] [1.65, 2.06]
RSIR
0
[2.2, 2.5] [2.4, 2.8] [3.2, 3.5]
[1.1 ×102, 1.6 ×102]
[4.0, 5.1]
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
posts produced by reliable outlets) vary from social media platform to social media platform and that assumes
the largest values in YouTube and the lowest in Gab. To measure the permeability of a platform to posts from
questionable/reliable news outlets, we then dene the coecient of relative amplication
α
=E
U/
E
R
. It is a
measure of whether a social media amplies questionable (
α>1
) or reliable (
α<1
) posts. Results are shown in
Table2. Among mainstream social media, we notice that Twitter is the most neutral (
α1
i.e.
EUER
), w hile
YouTube amplies questionable sources less (
α4/10
). Among less popular social media, Reddit reduces the
impact of questionable sources (
α1/2
), while Gab strongly amplies them (
α4
).
erefore, we conclude that the main drivers of information spreading are related to specic peculiarities of
each platform and depends upon the group dynamics of individuals engaged with the topic.
Figure3. Upper panels: plot of the cumulative number of posts referring to questionable sources versus
the cumulative number of posts referring to reliable sources. Lower panel: plot of the cumulative number
of engagements relatives to questionable sources versus the cumulative number of engagements relative
to reliable sources. Notice that a linear behavior indicates that the time evolution of questionable posts/
engagements is just a re-scaled version of the time evolution of reliable posts/engagements. Each plot indicates
the regression coecients
ρ
, representing the ratio among the volumes of questionable and reliable posts (
ρpost
)
and engagements (
ρeng
). In more popular social media, the number of questionable posts represents a small
fraction of the reliable ones; same thing happens in Reddit. Among less popular social media, a peculiar eect
is observed in Gab: while the volume of questionable posts is just the
70%
of the volume of reliable ones, the
volume of engagements for questionable posts is
3
times bigger than the volume for reliable ones. Further
details concerning the regression coecients are reported in Methods.
Table 2. e average engagement of a post is the number of reactions expected for a post and is a measure of
how much a post is amplied in each social media platform. e average engagement
EU
(for unreliable post)
and
ER
(for reliable post) vary from platform to platform, and are the largest in Twitter and the lowest in Gab.
e coecient of relative amplication
α
=E
U/
E
R
measures whether a social media amplies more unreliable
(
α>1
) or reliable (
α<1
) posts. Among more popular social media platforms, we notice that Twitter is the
most neutral (
α1%
i.e.
EUER
), while YouTube amplies unreliable sources less (
α
4/10
). Among less
popular social media platforms, Reddit reduces the impact of unreliable sources (
α
1/2
) while Gab strongly
amplies them (
α4
).
EU
ER
α
Gab 5.6 1.4 3.9
Reddit 22.7 40.1 0.55
Twitter 15.1 15.6 0.97
YouTub e
1.4 ×104
3.9 ×104
0.35
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
Conclusions
In this work we perform a comparative analysis of users’ activity on ve dierent social media platforms during
the COVID-19 health emergency. Such a timeframe is a good benchmark for studying content consumption
dynamics around critical events in a times when the accuracy of information is threatened. We assess user
engagement and interest about the COVID-19 topic and characterize the evolution of the discourse over time.
Furthermore, we model the spread of information using epidemic models and provide basic growth param-
eters for each social media platform. We then analyze the diusion of questionable information for all channels,
nding that Gab is the environment more susceptible to misinformation dissemination. However, information
deriving from sources marked either as reliable or questionable do not present signicant dierences in their
its spreading patterns. Our analysis suggests that information spreading is driven by the interaction paradigm
imposed by the specic social media or/and by the specic interaction patterns of groups of users engaged with
the topic. We conclude the paper by computing rumor’s amplication parameters for social media platforms.
We believe that the understanding of social dynamics between content consumption and social media plat-
forms is an important research subject, since it may help to design more ecient epidemic models accounting
for social behavior and to design more eective and tailored communication strategies in time of crisis.
Methods
Data collection. Table3 reports the data breakdown of the ve social media platforms. Dierent data col-
lection processes have been performed depending on the platform. In all cases we guided the data collection by
a set of selected keywords based on Google Trends’ COVID-19 related queries such as: coronavirus, pandemic,
coronaoutbreak, china, wuhan, nCoV, IamNotAVirus, coronavirus_update, coronavirus_transmission, corona-
virusnews, coronavirusoutbreak.
e Reddit dataset was downloaded from the Pushi .io archive, exploiting the related API. In order to lter
contents linked to COVID-19, we used our set of keywords.
In Gab, although no ocial guides are available, there is an API service that given a certain keyword, returns
a list of users, hashtags and groups related to it. We queried all the keywords we selected based on Google
Trends and we downloaded all hashtags linked to them. We then manually browsed the results and selected a
set of hashtags based on their meaning. For each hashtag in our list, we downloaded all the posts and comments
linked to it.
For YouTube, we collected videos by using the YouTube Data API by searching for videos that matched our
keywords. en an in depth search was done by crawling the network of videos by searching for more related
videos as established by the YouTube algorithm. From the gathered set, we ltered the videos that matched
coronavirus, nCov, corona virus, corona-virus, corvid, covid or SARS-CoV in the title or description. We then
collected all the comments received by those videos.
For Twitter, we collect tweets related to the topic coronavirus by using both the search and stream endpoint
of the Twitter API. e data derived from the stream API represent only 1% of the total volume of tweets, further
ltered by the selected keywords. e data derived from the search API represent a random sample of the tweets
containing the selected keywords up to a maximum rate limit of 18000 tweets every 10 minutes.
Since no ocial API are available for Instagram data, we built our own process to collect public contents
related to our keywords. We manually took notes of posts, comments and populated the Instagram Dataset.
Matching ability. We consider all the posts in our dataset that contain at least one URL linking to a website
outside the related social media platfrom (e.g., tweets pointing outside Twitter). We separate URLs in two main
categories obtained using the classication provided by MediaBias/FactCheck (MBFC). MBFC provides a clas-
sication determined by ranking bias in four dierent categories, one of them being Factual/Sourcing. In that
category, each news outlet is associated to a label that refers to its reliability as expressed in three labels, namely
Conspiracy-Pseudoscience, Pro-Science or Questionable. Noticeably, also the Questionable set include a wide
range of political bias, from Extreme Le to Extreme Right.
Using such a classication, we assign to each of these outlets a binary label that partially stems from the
labelling provided by MBFC. We divide the news outlets into Questionable and Reliable. All the outlets already
classied as Questionable or belonging to the category Conspiracy-Pseudoscience are labelled as Questionable,
the rest is labelled as Reliable. us, by questionable information source we mean a news outlet systematically
showing one or more of the following characteristics: extreme bias, consistent promotion of propaganda/con-
spiracies, poor or no sourcing to credible information, information not supported by evidence or unveriable, a
Table 3. Data breakdown of the number of posts, comments and users for all platforms.
Posts Comments Users Period
Gab 6,252 4,364 2,629 01/01–14/02
Reddit 10,084 300,751 89,456 01/01–14/02
YouTub e 111,709 7,051,595 3,199,525 01/01–14/02
Instagram 26,576 109,011 52,339 01/01–14/02
Twitter 1,187,482 – 390,866 27/01–14/02
Tot a l 1,342,103 7,465,721 3,734,815
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
complete lack of transparency and/or fake news. By reliable information sources we mean news outlets that do not
show any of the aforementioned characteristics. Such outlets can anyway produce contents potentially displaying
a bias towards liberal/conservative opinion, but this does not compromise the overall reliability of the source.
Considering all the 2637 news outlets that we retrieve from the list provided by MBFC we end up with 800
outlets classied as Questionable 1837 outlets classied as Reliable. Using such a classication we quantify our
overall ability to match and label domains of posts containing URLs, as reported in Table4.e matching ability
that is low doesn’t refer to the ability of identifying known domain but to the ability of nding the news outlets
that belong to the list provided by MBFC. Indeed in all the social networks we nd a tendency towards linking
to other social media platforms, as shown in Table5.
Text analysis. To provide an overview of the debate concerning the virus outbreak on the various platforms,
we extract and analyze all topics related to COVID-19 by applying Natural Language Processing techniques to
the written content of each social media platform. We rst build word embedding for the text corpus of each
platform, then, to assess the topics around which the perception of the COVID-19 debate is concentrated, we
cluster words by running the Partitioning Around Medoids (PAM) algorithm on their vector representations.
Word embeddings, i.e., distributed representations of words learned by neural networks, represent words as
vectors in
Rn
bringing similar words closer to each other. ey perform signicantly better than the well-known
Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) for preserving linear regularities among
words and computational eciency on large data sets49. In this paper we use the Skip-gram model50 to construct
word embedding of each social media corpus. More formally, given a content represented by the sequence of
words
w1,w2,...,wT
, we use stochastic gradient descent with gradient computed through backpropagation
rule51 for maximizing the average log probability
where k is the size of the training window. erefore, during training the vector representations of closely related
words are pushed to be close to each other.
In the Skip-gram model, every word w is associated with its input and output vectors,
uw
and
vw
, respect ively.
e probability of correctly predicting the word
wi
given the word
wj
is dened as
where V is the number of words in the corpus vocabulary. Two major parameters aect the training quality: the
dimensionality of word vectors, and the size of the surrounding words window. We choose 200 as vector dimen-
sion—that is typical value for training large dataset—and 6 words for the window.
(1)
1
T
T
t=1
k
j=−k
log p(wt+j|wt)
(2)
p
(wi|wj)=
exp
uT
wivwj
V
l=1
exp
uT
lvwj
Table 4. Number of posts containing a URL, matching ability and classication for each of the ve platforms.
Gab Reddit YouTu b e Instagram Twitter
Posts containing a URL 3,778 10,084 351,786 1,328 356,448
Matched 0.47 0.55 0.035 0.09 0.27
Questionable 0.38 0.045 0.064 0.05 0.10
Reliable 0.62 0.955 0.936 0.95 0.90
Table 5. Fraction of URLs pointing to social media. Table should be read as entries in each row link to entries
in each column. For example, Gab links to Reddit 0.003.
Gab Reddit YouTu b e Instagram Twitter Facebook
Gab 0.003 0.002 0.001 0.002 0.138 0
Reddit 0.043 0.006 0.009 0.001 0 0
YouTub e 00 0.292 0 0.088 0.081
Instagram 0 0 0.003 0 0.001 0.001
Twitter 0.059 0.001 0.257 0.003 0 0
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
Before applying the tool, we reduced the contents to those written in English as detected with cld3. en we
cleaned the corpora by removing HTML code, URLs and email addresses, user mentions, hashtags, stop-words,
and all the special characters including digits. Finally, we dropped words composed by less than three characters,
words occurring less than ve times in all the corpus, and contents with less than three words.
To analyze the topics related to COVID-19, we cluster words by PAM and using as proximity metric the cosine
distance matrix of words in their vector representations. In order to select the number of clusters, k, we calculate
the average silhouette width for each value of k. Moreover, for evaluating the cluster stability, we calculate the
average pairwise Jaccard similarity between clusters based on 90% sub-samples of the data. Lastly, we produce
word clouds to identify the topic of each cluster. To provide a view about the debate around the virus outbreak,
we dene the distribution over topics
c
for a given content c as the distribution of its words among the word
clusters. us, to quantify the relevance of each topic within a corpus, we restrict to contents c with
max c>0.5
and consider them uniquely identied as a single topic each. Table6 shows the results of the text cleaning and
topic analysis for all the data.
Epidemiological models. Several mathematical models can be used to analyse potential mechanisms that
underline epidemiological data. Generally, we can distinguish among phenomenological models that emphasize
the reproducibility of empirical data without insights in the mechanisms of growth, and more insightful mecha-
nistic models that try to incorporate such mechanisms41.
To t our cumulative curves, we rst use the adjusted exponential model of43 since it naturally provides an
estimate of the basic reproduction number
R0
. is phenomenological model (from now on indicated as EXP) has
been successfully employed in data-scarce settings and shown to be on-par with more traditional compartmental
models for multiple emerging diseases like Zika, Ebola, and Middle East Respiratory Syndrome43.
e model is dened by the following single equation:
Here, I is incidence, t is the number of days,
R0
is the basic reproduction number and d is a damping factor
accounting for the reduction in transmissibility over time. In our case, we interpret I as the number
Cauth
of
authors that have published a post on the subject.
As a mechanistic model, we employ the classical SIR model44. In such a model, a susceptible population can
be infected with a rate
β
by coming into contact with infected individuals; however, infected individuals can
recover with a rate
γ
. e model is described by a set of dierential equations:
where S is the number of susceptible, I is the number of infected and R is the number of recovered. In our case,
we interpret the number
I+R
as the number
Cauth
of authors that have published a post on the subject.
In the SIR model, the basic reproduction number
R0=β/γ
corresponds to the ration among the rate of
infection by contact
β
and the rate of recovery
γ
. Notice that for the SIR model, vaccination strategies correspond
to bringing the system in a situation where
S<N/R0
; in such a way, both the number of infected will decrease.
To estimate the basic reproduction numbers
REXP
0
and
RSIR
0
for the EXP and the SIR model, we use least square
estimates of the modelsparameters42. e range of parameters is estimated via bootstrapping41,52.
Linear regression coecients. Table7 reports the regression coecient
ρ
, the intercept and the
R2
values
for the linear t of Fig.3. High values of
R2
conrm the linear relationship between reliable and questionable
sources in information diusion.
(3)
I
=
R0
(1+d)
t
t
(4)
t
S=−βS·I/N
tI=βS·I/NγI
tR=γI
Table 6. Results of text cleaning and analysis for all the corpora.
Cleaned contents Vocabulary size Topics Contents with
max �>0.5
Instagram 21,189 posts 15,324 17 4,467
Twitter 638,214 posts 22,587 21 369,131
Gab 5,853 posts 3,024 19 2,986
Reddit 10,084 posts 1,968 34 6,686
YouTub e 815,563 comments 35,381 30 679,261
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
Data availability
e datasets generated during and/or analysed during the current study are available from the corresponding
author on reasonable request.
Received: 11 April 2020; Accepted: 15 September 2020
References
1. Organization, W.H. Naming the coronavirus disease (COVID-19) and the virus that causes it. https ://www.who.int/emerg encie
s/disea ses/novel -coron aviru s-2019/techn ical-guida nce/namin g-the-coron aviru s-disea se-(covid -2019)-and-the-virus -that-cause
s-it (2020 (accessed April 9, 2020)).
2. Quattrociocchi, W. Part 2-social and political challenges: 2.1 western democracy in crisis? In Global Risk Report World Economic
Forum (2017).
3. WHO Situation Report 13. https ://www.who.int/docs/defau lt-sourc e/c o ron aviru se/situa tion-repor ts/20200 202-sitre p-13-ncov-v3.
pdf?sfvrs n=195f4 010_6. Accessed: 2010-09-30.
4. Zarocostas, J. How to ght an infodemic. Lancet 395, 676 (2020).
5. Organization, W.H. Director-general’s remarks at the media brieng on 2019 novel Coronavirus on 8 February 2020. https ://www.
who.int/dg/speec hes/detai l/direc tor-g ener al-s-r em ar ks-at-the-media -brie ng-o n-2019-n o v el - co r o n a viru s---8-fe b ru ary-2020 (2020
(accessed April 9, 2020)).
6. Mendoza, M., Poblete, B. & Castillo, C. Twitter under crisis: Can we trust what we RT?. Proceedings of the rst workshop on social
media analytics 71–79 (2010).
7. Starbird, K., Maddock, J., Orand, M., Achterman, P. & Mason, R.M. Rumors, false ags, and digital vigilantes: Misinformation on
twitter aer the 2013 boston marathon bombing. IConference 2014 Proceedings (2014).
8. Kim, L., Fast, S. M. & Markuzon, N. Incorporating media data into a model of infectious disease transmission. PLoS ONE 14, 1
(2019).
9. John, T. & BenWedeman, C. Italy prohibits travel and cancels all public events in its northern region to contain Coronavirus. https
://editi on.cnn.com/2020/03/08/europ e/italy -coron aviru s-lockd own-europ e-intl/index .html (2020 (accessed April 9, 2020)).
10. Sharot, T. & Sunstein, C. R. How people decide what they want to know. Nat. Hum. Behav. 2020, 1–6 (2020).
11. Shaman, J., Karspeck, A., Yang, W., Tamerius, J. & Lipsitch, M. Real-time inuenza forecasts during the 2012–2013 season. Nat.
Commun. 4, 1–10 (2013).
12. Viboud, C. & Vespignani, A. e future of inuenza forecasts. Proc. Natl. Acad. Sci. 116, 2802–2804 (2019).
13. Kulshrestha, J. etal. Quantifying search bias: Investigating sources of bias for political searches in social media. In Proceedings of
the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, 417–432 (2017).
14. Schmidt, A. L. et al. Anatomy of news consumption on Facebook. Proc. Natl. Acad. Sci. 114, 3035–3039 (2017).
15. Starnini, M., Frasca, M. & Baronchelli, A. Emergence of metapopulations and echo chambers in mobile agents. Sci. Rep. 6, 31834
(2016).
16. Schmidt, A. L., Zollo, F., Scala, A., Betsch, C. & Quattrociocchi, W. Polarization of the vaccination debate on Facebook. Vaccine
36, 3606–3612 (2018).
17. Del Vicario, M. et al. e spreading of misinformation online. Proc. Natl. Acad. Sci. 113, 554–559 (2016).
18. Bessi, A. et al. Science vs. conspiracy: collective narratives in the age of misinformation. PLoS ONE 10, e0118093 (2015).
19. Cinelli, M. et al. Selective exposure shapes the Facebook news diet. PLoS ONE 15, e0229129 (2020).
20. Zollo, F. et al. Debunking in a world of tribes. PLoS ONE 12, 1 (2017).
21. Baronchelli, A. e emergence of consensus: a primer. R. Soc. Open Sci. 5, 172189 (2018).
22. Del Vicario, M. et al. Echo chambers: emotional contagion and group polarization on Facebook. Sci. Rep. 6, 37825 (2016).
23. Bail, C. A. et al. Exposure to opposing views on social media can increase political polarization. Proc. Natl. Acad. Sci. 115,
9216–9221 (2018).
24. Vicario, M. D., Quattrociocchi, W., Scala, A. & Zollo, F. Polarization and fake news: early warning of potential misinformation
targets. ACM Trans. Web (TWEB) 13, 1–22 (2019).
25. Wardle, C. & Derakhshan, H. Information disorder: Toward an interdisciplinary framework for research and policy making.
Council of Europe report 27 (2017).
26. Vosoughi, S., Roy, D. & Aral, S. e spread of true and false news online. Science 359, 1146–1151 (2018).
27. Ruths, D. e misinformation machine. Science 363, 348–348 (2019).
28. Schild, L. etal. “ go eat a bat, chang!”: An early look on the emergence of sinophobic behavior on web communities in the face of
covid-19. arXiv preprintarXiv :2004.04046 (2020).
29. Velásquez, N. etal. Hate multiverse spreads malicious COVID-19 content online beyond individual platform control. Preprint arXiv
:2004.00673 (2020).
30. Ferrara, E. What types of COVID-19 conspiracies are populated by twitter bots? First Monday (2020).
31. Alam, F. etal. Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy
makers, and the society. Preprint arXiv :2005.00033 (2020).
Table 7. Coecients and
R2
of the linear regressions displayed in Fig.3.
Dataset Type Intercept Coecient (
ρ
)
R2
Gab Posts − 22.321 0.695 0.996
Reddit Posts − 4.111 0.047 0.997
Youtube Posts 4.529 0.073 0.998
Twitter Posts − 151.44 0.110 0.998
Gab Reactions 74.577 2.721 0.981
Reddit Reactions − 70.677 0.026 0.990
Youtube Reactions − 8854.33 0.025 0.986
Twitter Reactions − 2136.978 0.107 0.987
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientific RepoRtS | (2020) 10:16598 | 
www.nature.com/scientificreports/
32. Shahi, G.K., Dirkson, A. & Majchrzak, T.A. An exploratory study of COVID-19 misinformation on twitter. Preprint arXiv
:2005.05710 (2020).
33. Bovet, A. & Makse, H. A. Inuence of fake news in twitter during the 2016 us presidential election. Nat. Commun. 10, 1–14 (2019).
34. (MBFC), M. B.C. Media bias/fact check, the most comprehensive Meida bias check resource. https ://media biasf actch eck.com/
(2020 (accessed April 9, 2020)).
35. Romero, D.M., Meeder, B. & Kleinberg, J. Dierences in the mechanics of information diusion across topics: idioms, political
hashtags, and complex contagion on twitter. In Proceedings of the 20th international conference on World wide web, 695–704 (2011).
36. Organization, W.H. Novel Coronavirus (2019-NCOV) situation report-1 21 January 2020. https ://www.who .int/docs/defau lt-sourc
e/coron aviru se/situa tion-repor ts/20200 121-sitre p-1-2019-ncov.pdf?sfvrs n=20a99 c10_4 (2020 (accessed April 9, 2020)).
37. Pellis, L. et al. Eight challenges for network epidemic models. Epidemics 10, 58–62 (2015).
38. Liu, Y. et al. Characterizing super-spreading in microblog: an epidemic-based information propagation model. Physica A 463,
202–218 (2016).
39. Skaza, J. & Blais, B. Modeling the infectiousness of twitter hashtags. Physica A 465, 289–296 (2017).
40. Davis, J. T., Perra, N., Zhang, Q., Moreno, Y. & Vespignani, A. Phase transitions in information spreading on structured popula-
tions. Nat. Phys. 2020, 1–7 (2020).
41. Chowell, G. Fitting dynamic models to epidemic outbreaks with quantied uncertainty: a primer for parameter uncertainty,
identiability, and forecasts. Infect. Dis. Model. 2, 379–398 (2017).
42. Ma, J. Estimating epidemic exponential growth rate and basic reproduction number. Infectious Disease Modelling (2020).
43. Fisman, D. N., Hauck, T. S., Tuite, A. R. & Greer, A. L. An idea for short term outbreak projection: nearcasting using the basic
reproduction number. PLoS ONE 8, 1 (2013).
44. Bailey, N.T. etal. e mathematical theory of infectious diseases and its applications (Charles Grin & Company Ltd, 5a Crendon
Street, High Wycombe, Bucks HP13 6LE, 1975).
45. Centola, D. e spread of behavior in an online social network experiment. Science 329, 1194–1197 (2010).
46. Del Vicario, M., Scala, A., Caldarelli, G., Stanley, H. E. & Quattrociocchi, W. Modeling conrmation bias and polarization. Sci.
Rep. 7, 40391 (2017).
47. Baumann, F., Lorenz-Spreen, P., Sokolov, I. M. & Starnini, M. Modeling echo chambers and polarization dynamics in social net-
works. Phys. Rev. Lett. 124, 048301 (2020).
48. Grinberg, N., Joseph, K., Friedland, L., Swire-ompson, B. & Lazer, D. Fake news on Twitter during the 2016 us presidential
election. Science 363, 374–378 (2019).
49. Mikolov, T., Yih, W.-T. & Zweig, G. Linguistic regularities in continuous space word representations. In Proceedings of the Con-
ference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–751
(Association for Computational Linguistics 2013 (Georgia, Atlanta, 2013).
50. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compo-
sitionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 2, NIPS’13,
3111–3119 (Curran Associates Inc., Red Hook, NY, USA, 2013).
51. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536. https
://doi.org/10.1038/32353 3a0 (1986).
52. Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (CRC Press, London, 1994).
Author contributions
M.C., A.G., C.M.V., A.L.S., P.Z. collected and prepared the data. All authors conceived the experiments. M.C.,
A.G., C.M.V., A.L.S., E.B., and A.S. conducted the experiments. All authors analysed the results, wrote and
reviewed the manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to W.Q.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.
© e Author(s) 2020
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... The term infodemic has been coined to outline the perils of misinformation phenomena during the management of virus outbreak, since it could even speed up the epidemic process by influencing and fragmenting social response (Matteo et al, 2020). The World Health Organization is warning that an "infodemic" has developed alongside the coronavirus pandemic. ...
... Journal of applied Information Science and Technology 14 (1)2021 The spreading of information can strongly influence people behaviour and alter the effectiveness of the counter measure deployed by the governments (Matteo, et al, 2020). According to Falkenrath et al (2020), for students of the history of epidemics, the effects of a pandemic are unsurprising in general even as they may be startling in their speed, proximity, and intensity. ...
... In the wake of the outbreak, and in a glaring form, social media platforms saw a spread in misinformation and outright falsehood ranging from preventive to curative measures. Consequently, platforms such as YouTube and Facebook as a result of the direct access users have may have contributed to the propagation of false rumours and inaccurate information [3]. The spread of fake news and conspiracies on social media platforms is becoming a trend and more people have realised the impact it has on the health of people. ...
... All manners of fake news thrive on various platforms. Cinelli et al. [3] conducted a study to understand the diffusion of information on COVID-19 across social media platforms -Twitter, YouTube, Instagram, Reddit and Gab. The study found that interaction paradigm or specific interaction patterns or functions provided by specific platform is the driving force behind the spread of information. ...
Article
Full-text available
Social media has in recent times become part and parcel of human daily living. People easily get addicted to staying glued to their digital devices-mobile phones, laptop computers, tablet computers and many more which are all gateways to the Internet in general and social media in particular. This addiction comes with a price as it puts many users face-to-face with all forms of personalities, information and situations. COVID-19 took the world by surprise when its outbreak was reported in December 2019 following a report of a mysterious illness in Wuhan, China. Infodemiology is not new and scholars have made attempts to understand the pattern of information flow on platforms such as the Internet and social media. However, the word infodemic recently got into the mix. It was used to describe the spread of falsehood on COVID-19 and social media is recognised as the main diffusing point. This study is a discourse on the influence social media conspiracies and fake news has on the fight against Coronavirus. Through existing literature, that is library research, the study sought to identify conspiracy theories that trended during the Covid-19 pandemic whilst also exploring the implications of such conspiracies on the fight against the pandemic vis á vis its impact on Nigeria's public health. It is anchored on Uses and Gratifications and Conspiracy Theories. The paper, among other findings revealed that Coronavirus-related conspiracies trended in the middle of the outbreak and asserted that social media owing to its ubiquitous nature weighs so much power, and therefore had a mix of negative and positive impacts on the fight against COVID-19 in Nigeria. It concludes that unverified reports were shared on social media to the detriment of Nigeria's public health.
Article
Background: This study aims to identify the preferred sources for acquiring knowledge about COVID-19 and to evaluate basic knowledge on critical scientific literature appraisal in students from medical schools located in Spanish speaking countries in Latin America. Methods: We designed an online survey of 15 closed-ended questions related to demographics, preferred resources for COVID-19 training, and items to assess critical appraisal skills. A snowball method was used for sampling. We conducted a descriptive analysis and Chi-squared tests to compare the proportion of correct identification of the concept of a preprint and a predatory journal when considering a) self-perceived level of knowledge, b) public vs private school, c) inclusion of a scientific literature appraisal subject in the curriculum, and d) progress in medical school. Results: Our sample included 770 valid responses, out of which most of the participants included were from Mexico (n=283, 36.8%) and Ecuador (n=229, 29.7%). Participants preferred using evidence-based clinical resources (EBCRs) to learn more about COVID-19 (n=182, 23.6%). The preferred study design was case report/series (n=218, 28.1%). We found that only 265 participants correctly identified the concept of a preprint (34.4%), while 243 students (31.6%) correctly identified the characteristics of a predatory journal. We found no significant differences in the proportion of correct answers regardless of the self-perceived level of knowledge, progress in medical school, or scientific literature critical appraisal classes. Conclusion: This study is novel in its approach of identifying sources of knowledge used by Latin American medical students and provides insights into the need to reinforce training in critical appraisal of scientific literature during medical school.
Article
This study presents a data analytics framework that aims to analyze topics and sentiments associated with COVID-19 vaccine misinformation in social media. A total of 40,359 tweets related to COVID-19 vaccination were collected between January 2021 and March 2021. Misinformation was detected using multiple predictive machine learning models. Latent Dirichlet allocation (LDA) topic model was used to identify dominant topics in COVID-19 vaccine misinformation. Sentiment orientation of misinformation was analyzed using a lexicon-based approach. An independent-samples t-test was performed to compare the number of replies, retweets, and likes of misinformation with different sentiment orientations. Based on the data sample, the results show that COVID-19 vaccine misinformation included 21 major topics. Across all misinformation topics, the average number of replies, retweets, and likes of tweets with negative sentiment was 2.26, 2.68, and 3.29 times higher, respectively, than those with positive sentiment.
Article
Full-text available
This paper analyzes government websites and social media accounts during COVID-19. In a crisis, the government must provide real-time information to store and share information using websites and social media. This paper compares government websites and social media during covid-19. This paper uses a qualitative analysis method with Nvivo 12 Plus and Similar-web as analytical tools to assist in capturing data and mining data from government-owned websites and social media accounts. This study explains that the use and availability of websites and social media by local governments is the government's response to ensure access and public information services can run well during a crisis. The performance of government websites is influenced by the intensity of relationships on social networks such as social media. The higher the engagement of government websites with social media, the higher the level of information dissemination in the community. The findings of this study indicate that the performance of government websites greatly influences public trust. The limitation of this research lies in the research method, which only takes data for a certain period. This research still requires further development by using an observation or interview approach.
Online social media provides rich and varied information reflecting the significant concerns of the public during the coronavirus pandemic. Analyzing what the public is concerned with from social media information can support policy-makers to maintain the stability of the social economy and life of the society. In this article, we focus on the detection of the network public opinions during the coronavirus pandemic. We propose a novel Relational Topic Model for Short texts (RTMS) to draw opinion topics from social media data. RTMS exploits the feature of texts in online social media and the opinion propagation patterns among individuals. Moreover, a dynamic version of RTMS (DRTMS) is proposed to capture the evolution of public opinions. Our experiment is conducted on a real-world dataset which includes 67,592 comments from 14,992 users. The results demonstrate that, compared with the benchmark methods, the proposed RTMS and DRTMS models can detect meaningful public opinions by leveraging the feature of social media data. It can also effectively capture the evolution of public concerns during different phases of the coronavirus pandemic.
Article
The dramatic expansion of modern linguistic research and enhanced accuracy of linguistic analysis have become a reality due to the ability of artificial neural networks not only to learn and adapt, but also carry out automate linguistic analysis, select, modify and compare texts of various types and genres. The purpose of this article and the journal issue as a whole is to present modern areas of research in computational linguistics and linguistic complexology, as well as to define a solid rationale for the new interdisciplinary field, i.e. discourse complexology. The review of trends in computational linguistics focuses on the following aspects of research: applied problems and methods, computational linguistic resources, contribution of theoretical linguistics to computational linguistics, and the use of deep learning neural networks. The special issue also addresses the problem of objective and relative text complexity and its assessment. We focus on the two main approaches to linguistic complexity assessment: “parametric approach” and machine learning. The findings of the studies published in this special issue indicate a major contribution of computational linguistics to discourse complexology, including new algorithms developed to solve discourse complexology problems. The issue outlines the research areas of linguistic complexology and provides a framework to guide its further development including a design of a complexity matrix for texts of various types and genres, refining the list of complexity predictors, validating new complexity criteria, and expanding databases for natural language.
Article
Quarantine and social distance restrictions have been enforced worldwide to reduce the spread of coronavirus disease 2019 (COVID‐19). The effects of these measures on mental health are recognised, but remaining unclear, is whether these effects are a consequence of the virus itself or policies that are enforced to prevent it. The present study investigated the impact of lockdown restrictions on anxiety and depression at two different times in 2020. Data were collected from 118 participants from all regions of Brazil. After easing quarantine restrictions in the second half of 2020, two natural groups were formed. One group included participants who voluntarily remained at home (n = 73). The other group consisted of those who decided to leave home (n = 45). A linear mixed model was used to determine the effects of group and time and their interaction. The McNemar test was used to determine intragroup differences in perceptions and concerns about COVID‐19. Logistic regression identified predictors of high and stable depression and anxiety. None of the factors or their interactions was significant. Indicators of depression and anxiety remained stable over time, regardless of whether the participants left home or remained at home. Significantly, a strong and stable agreement with quarantine was found. The participants agreed that COVID‐19 was a threat to public health. Political orientation was a predictor of high and stable levels of depression but not anxiety. Participants who self‐identified as liberal politically were at a greater risk of developing depression. The results suggest that the lockdown policy did not contribute to disruptions of mental health, which instead was a consequence of the pandemic and virus itself. We also found wide and strong support amongst the participants for lockdown and mandatory stay‐at‐home policies.
Article
Objective: This study examined socio-demographic characteristics and COVID-19 experiences as concurrent predictors of perceived familial and friend social support, social media use, and socio-emotional motives for electronic communication during the COVID-19 pandemic among college students. Participants: Participants were 619 emerging adults (18-29-year-olds) currently enrolled at, or recently graduated from, a U.S.-based college or university (Mean age = 21.8, SD = 2.2; 64% female; 60% Non-Hispanic White). Methods: Online surveys were administered between May and June 2020. A path analysis model was conducted to examine the concurrent associations between socio-demographic factors, COVID-19-related experiences, social media/electronic engagement, and perceived social support. Results: Findings indicated significant differences in perceived social support, social media use, and socio-emotional motives for electronic communication as a function of gender, race, sexual orientation, first-generation status, and relationship status. Conclusions: Our findings highlight the role of both individual and situational differences in interpersonal functioning and demonstrate how college students differently engaged with social media for socio-emotional purposes during the COVID-19 pandemic.
Article
Full-text available
During the COVID-19 pandemic, social media has become a home ground for misinformation. To tackle this infodemic, scientific oversight, as well as a better understanding by practitioners in crisis management, is needed. We have conducted an exploratory study into the propagation, authors and content of misinformation on Twitter around the topic of COVID-19 in order to gain early insights. We have collected all tweets mentioned in the verdicts of fact-checked claims related to COVID-19 by over 92 professional fact-checking organisations between January and mid-July 2020 and share this corpus with the community. This resulted in 1500 tweets relating to 1274 false and 226 partially false claims, respectively. Exploratory analysis of author accounts revealed that the verified twitter handle(including Organisation/celebrity) are also involved in either creating(new tweets) or spreading(retweet) the misinformation. Additionally, we found that false claims propagate faster than partially false claims. Compare to a background corpus of COVID-19 tweets, tweets with misinformation are more often concerned with discrediting other information on social media. Authors use less tentative language and appear to be more driven by concerns of potential harm to others. Our results enable us to suggest gaps in the current scientific coverage of the topic as well as propose actions for authorities and social media users to counter misinformation.
Article
Full-text available
The social brain hypothesis approximates the total number of social relationships we are able to maintain at 150. Similar cognitive constraints emerge in several aspects of our daily life, from our mobility to the way we communicate, and might even affect the way we consume information online. Indeed, despite the unprecedented amount of information we can access online, our attention span still remains limited. Furthermore, recent studies have shown that online users are more likely to ignore dissenting information, choosing instead to interact with information adhering to their own point of view. In this paper, we quantitatively analyse users’ attention economy in news consumption on social media by analysing 14 million users interacting with 583 news outlets (pages) on Facebook over a time span of six years. In particular, we explore how users distribute their activity across news pages and topics. On the one hand, we find that, independently of their activity, users show a tendency to follow a very limited number of pages. On the other hand, users tend to interact with almost all the topics presented by their favoured pages. Finally, we introduce a taxonomy accounting for users’ behaviour to distinguish between patterns of selective exposure and interest. Our findings suggest that segregation of users in echo chambers might be an emerging effect of users’ activity on social media and that selective exposure—i.e. the tendency of users to consume information adhering to their preferred narratives—could be a major driver in their consumption patterns.
Article
Full-text available
Mathematical models of social contagion that incorporate networks of human interactions have become increasingly popular, however, very few approaches have tackled the challenges of including complex and realistic properties of socio-technical systems. Here, we define a framework to characterize the dynamics of the Maki–Thompson rumour spreading model in structured populations, and analytically find a previously uncharacterized dynamical phase transition that separates the local and global contagion regimes. We validate our threshold prediction through extensive Monte Carlo simulations. Furthermore, we apply this framework in two real-world systems, the European commuting and transportation network and the Digital Bibliography and Library Project collaboration network. Our findings highlight the importance of the underlying population structure in understanding social contagion phenomena and have the potential to define new intervention strategies aimed at hindering or facilitating the diffusion of information in socio-technical systems. The mathematical modelling of how information spreads in social networks has latterly gained fresh urgency. A study of realistic structured populations now identifies the threshold at which the propagation of rumours becomes contagious, thereby inducing a phase transition.
Book
Full-text available
While the historical impact of rumours and fabricated content has been well documented, efforts to better understand today’s challenge of information pollution on a global scale are only just beginning. Concern about the implications of dis-information campaigns designed specifically to sow mistrust and confusion and to sharpen existing sociocultural divisions using nationalistic, ethnic, racial and religious tensions is growing. The Council of Europe report on “Information Disorder: Toward an interdisciplinary framework for research and policy making” is an attempt to comprehensively examine information disorder and to outline ways to address it.
Article
Full-text available
Immense amounts of information are now accessible to people, including information that bears on their past, present and future. An important research challenge is to determine how people decide to seek or avoid information. Here we propose a framework of information-seeking that aims to integrate the diverse motives that drive information-seeking and its avoidance. Our framework rests on the idea that information can alter people’s action, affect and cognition in both positive and negative ways. The suggestion is that people assess these influences and integrate them into a calculation of the value of information that leads to information-seeking or avoidance. The theory offers a framework for characterizing and quantifying individual differences in information-seeking, which we hypothesize may also be diagnostic of mental health. We consider biases that can lead to both insufficient and excessive information-seeking. We also discuss how the framework can help government agencies to assess the welfare effects of mandatory information disclosure. Sharot and Sunstein propose a framework of information-seeking, whereby individuals decide to seek or avoid information based on combined estimates of the potential impact of information on their action, affect and cognition.
Article
Full-text available
The initial exponential growth rate of an epidemic is an important measure of the severeness of the epidemic, and is also closely related to the basic reproduction number. Estimating the growth rate from the epidemic curve can be a challenge, because of its decays with time. For fast epidemics, the estimation is subject to over-fitting due to the limited number of data points available, which also limits our choice of models for the epidemic curve. We discuss the estimation of the growth rate using maximum likelihood method and simple models.
Article
With people moving out of physical public spaces due to containment measures to tackle the novel coronavirus (COVID-19) pandemic, online platforms become even more prominent tools to understand social discussion. Studying social media can be informative to assess how we are collectively coping with this unprecedented global crisis. However, social media platforms are also populated by bots, automated accounts that can amplify certain topics of discussion at the expense of others. In this paper, we study 43.3M English tweets about COVID-19 and provide early evidence of the use of bots to promote political conspiracies in the United States, in stark contrast with humans who focus on public health concerns.
Article
Echo chambers and opinion polarization recently quantified in several sociopolitical contexts and across different social media raise concerns on their potential impact on the spread of misinformation and on the openness of debates. Despite increasing efforts, the dynamics leading to the emergence of these phenomena remain unclear. We propose a model that introduces the dynamics of radicalization as a reinforcing mechanism driving the evolution to extreme opinions from moderate initial conditions. Inspired by empirical findings on social interaction dynamics, we consider agents characterized by heterogeneous activities and homophily. We show that the transition between a global consensus and emerging radicalized states is mostly governed by social influence and by the controversialness of the topic discussed. Compared with empirical data of polarized debates on Twitter, the model qualitatively reproduces the observed relation between users’ engagement and opinions, as well as opinion segregation in the interaction network. Our findings shed light on the mechanisms that may lie at the core of the emergence of echo chambers and polarization in social media.