PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In this study, we examine a collection of health-related news articles published by reliable and unreliable media outlets. Our analysis shows that there are structural, topical, and semantic differences in the way reliable and unreliable media outlets conduct health journalism. We argue that the findings from this study will be useful for combating health disinformation problem.
Content may be subject to copyright.
Dierences between Health Related News Articles from Reliable
and Unreliable Media
Sameer Dhoju, Md Main Uddin Rony, Naeemul Hassan
Department of Computer Science and Engineering, The University of Mississippi
ABSTRACT
In this study, we examine a collection of health-related news articles
published by reliable and unreliable media outlets. Our analysis
shows that there are structural, topical, and semantic dierences
in the way reliable and unreliable media outlets conduct health
journalism. We argue that the ndings from this study will be
useful for combating health disinformation problem.
ACM Reference Format:
Sameer Dhoju, Md Main Uddin Rony, Naeemul Hassan. 2019. Dierences
between Health Related News Articles from Reliable and Unreliable Media.
In Proceedings of Computation+Journalism Symposium (C+J’19). ACM, New
York, NY, USA, Article 4, 5 pages. https://doi.org/10.475/1234
1 INTRODUCTION
Of the 20 most-shared articles on Facebook in 2016 with the word
“cancer” in the headline, more than half the reports were discredited
by doctors and health authorities [
6
]. The spread of health-related
hoaxes is not new. However, the advent of Internet, social network-
ing sites (SNS), and click-through-rate (CTR)-based pay policies
have made it possible to create hoaxes/“fake news”, published in a
larger scale and reach to a broader audience with a higher speed
than ever [
14
]. Misleading or erroneous health news can be dan-
gerous as it can lead to a critical situation. [
12
] reported a measles
outbreak in Europe due to lower immunization rate which experts
believed was the result of anti-vaccination campaigns caused by
a false news about MMR vaccine. Moreover, misinformation can
spoil the credibility of the health-care providers and create a lack of
trust in taking medicine, food, and vaccines. Recently, researchers
have started to address the fake news problem in general [
19
,
27
].
However, health disinformation is a relatively unexplored area. Ac-
cording to a report from Pew Research Center [
7
], 72% of adult
internet users search online for information about a range of health
issues. So, it is important to ensure that the health information
which is available online is accurate and of good quality. There
are some authoritative and reliable entities such as National In-
stitutes of Health (NIH)
1
or Health On the Net
2
which provide
high-quality health information. Also, there are some fact-checking
1https://www.nih.gov/
2https://www.hon.ch/en/
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
C+J’19, February 2019, Miami, Florida USA
©2019 Association for Computing Machinery.
ACM ISBN 123-4567-24-567/08/06. . . $15.00
https://doi.org/10.475/1234
sites such as Snopes.com
3
and Quackwatch.org
4
that regularly de-
bunk health and medical related misinformation. Nonetheless, these
sites are incapable of busting the deluge of health disinformation
continuously produced by unreliable health information outlets
(e.g., RealFarmacy.com, Health Nut News). Moreover, the bots in so-
cial networks signicantly promote unsubstantiated health-related
claims [
8
]. Researchers have tried developing automated health
hoax detection techniques but had limited success due to several
reasons such as small training data size and lack of consciousness
of users [10, 11, 18, 30].
The objective of this paper is to identify discriminating features
that can potentially separate a reliable health news from an unreli-
able health news by leveraging a large-scale dataset. We examine
how reliable media and unreliable media outlets conduct health
journalism. First, we prepare a large dataset of health-related news
articles which were produced and published by a set of reliable
media outlets and unreliable media outlets. Then, using a system-
atic content analysis, we identify the features which separate a
reliable outlet sourced health article from an unreliable sourced
one. These features incorporate the structural, topical, and seman-
tic dierences in health articles from these outlets. For instance,
our structural analysis nds that the unreliable media outlets use
clickbaity headlines in their health-related news signicantly more
than what reliable outlets do. Our semantic analysis shows that on
average a health news from reliable media contains more reference
quotes than an average unreliable sourced health news. We argue
that these features can be critical in understanding health misinfor-
mation and designing systems to combat such disinformation. In
the future, our goal is to develop a machine learning model using
these features to distinguish unreliable media sourced health news
from reliable articles.
2 RELATED WORK
There has been extensive work on how scientic medical research
outcomes should be disseminated to general people by following
health journalism protocols [
3
,
4
,
16
,
26
,
28
]. For instance, [
20
] sug-
gests that it is necessary to integrate journalism studies, strategic
communication concepts, and health professional knowledge to
successfully disseminate professional ndings. Some researchers
particularly focused on the spread of health misinformation in social
media. For example, [
10
] analyzes Zika
5
related misinformation in
Twitter. In particular, it shows that tracking health misinformation
in social media is not trivial, and requires some expert supervision.
It used crowdsource to annotate a collection of Tweets and used the
annotated data to build a rumor classication model. One limitation
of this work is that the used dataset is too small (6 rumors) to make
3https://www.snopes.com/
4http://www.quackwatch.org/
5https://en.wikipedia.org/wiki/Zikavirus
arXiv:1811.01852v1 [cs.SI] 5 Nov 2018
C+J’19, February 2019, Miami, Florida USA S. Dhoju et al.
a general conclusion. Moreover, it didn’t consider the features in
the actual news articles unlike us. [
11
] examines the individuals
on social media that are posting questionable health-related infor-
mation, and in particular promoting cancer treatments which have
been shown to be ineective. It develops a feature based supervised
classication model to automatically identify users who are com-
paratively more susceptible to health misinformation. There are
other works which focus on automatically identifying health misin-
formation. For example, [
17
] developed a classier to detect misin-
formative posts in health forums. One limitation of this work is that
the training data is only labeled by two individuals. Researchers
have also worked on building tools that can help a user to easily
consume health information. [
18
] developed the “VAC Medi+board”,
an interactive visualization platform integrating Twitter data and
news coverage from a reliable source called MediSys
6
. It covers
public debate related to vaccines and helps users to easily browse
health information on a certain vaccine-related topic.
Our study signicantly diers from these already existing researches.
Instead of depending on a small sample of health hoaxes like some
of the existing works, we take a dierent approach and focus on
the source outlets. This gives us the benet of investigating with a
larger dataset. We investigate the journalistic practice of reliable
and unreliable health outlets, an area which has not been studied
according to our knowledge.
3 DATA PREPARATION
For investigating how reliable media outlets and unreliable outlets
portray health information, we need a reasonably sized collection
of health-related news articles from these two sides. Unfortunately,
there is not an available dataset which is of adequate size. For this
reason, we prepare a dataset of about 30
,
000 health-related news
articles disseminated by reliable or unreliable outlets within the
years 2015
2018. Below, we describe the preparation process in
detail.
3.1 Media Outlet Selection
The rst challenge is to identify reliable and unreliable outlets. The
matter of reliability is subjective. We decided to consider the outlets
which have been cross-checked as reliable or unreliable by credible
sources.
3.1.1 Reliable Media. We identied 29 reliable media outlets from
three sources–
i)
11 of them are certied by the Health On the
Net [
22
], a non-prot organization that promotes transparent and
reliable health information online. It is ocially related with the
World Health Organization (WHO) [
31
].
ii)
8from U.S. govern-
ment’s health-related centers and institutions (e.g., CDC, NIH,
NCBI), and
iii)
10 from the most circulated broadcast [
25
] main-
stream media outlets (e.g., CNN, NBC). Note, the mainstream out-
lets generally have a separate section for health information (e.g.,
https://www
.
cnn
.
com/health). As our goal is to collect health-
related news, we restricted ourselves to their health portals only.
3.1.2 Unreliable Media. Dr. Melissa Zimdars, a communication and
media expert, prepared a list of false, misleading, clickbaity, and
satirical media outlets [
33
,
34
]. Similar lists are also maintained by
6http://medisys.newsbrief .eu
Reliable
everydayhealth, WebMD, statnews,AmericanHeart, BBCLifestyleHealth,
CBSHealth, FoxNewsHealth, WellNYT, latimesscience, tampabaytimeshealth,
philly.comhealth, AmericanHeart, AmericanCancerSociety, HHS, CNNHealth,
cancer.gov,FDA, mplus.gov, NHLBI, kidshealthparents,
ahrq.gov, healthadvocateinc, HealthCentral, eMedicineHealth, C4YWH,
BabyCenter, MayoClinic, MedicineNet, healthline
Unreliable
liveahealth, healthexpertgroup, healthysolo, organichealthcorner,
justhealthylifestyle1, REALfarmacyCOM, thetruthaboutcancer, BookforHealthyLife,
viralstories.bm, justhealthyway, thereadersle, pinoyhomeremedies,
onlygenuinehealth, greatremediesgreathealth, HealthRanger, thefoodbabe,
AgeofAutism, HealthNutNews, consciouslifenews, HealthImpactNews
Table 1: List of Facebook page ids of the reliable and unreli-
able outlets. Some of them are unavailable now.
Wikipedia [
32
] and informationisbeautiful
.
net [
13
]. We identied 6
media outlets which primarily spread health-related misinforma-
tion and are present in these lists. Another source for identifying
unreliable outlets is Snopes.com, a popular hoax-debunking website
that fact-checks news of dierent domains including health. We
followed the health or medical hoaxes debunked by Snopes.com and
identied 14 media outlets which sourced those hoaxes. In total,
we identied 20 unreliable outlets. Table 1 lists the Facebook page
ids of all the reliable and unreliable outlets that have been used in
this study.
3.2 Data Collection
The next challenge is to gather news articles published by the
selected outlets. We identied the ocial Facebook pages of each of
the 49 media outlets and collected all the link-posts
7
shared by the
outlets within January 1, 2015 and April 2, 2018
8
using Facebook
Graph API. For each post, we gathered the corresponding news
article link, the status message, and the posting date.
3.2.1 News Article Scraping. We used a Python package named
Newspaper3k
9
to gather the news article related data. Given a
news article link, this package provides the headline, body, author
name (if present), and publish date of the article. It also provides
the visual elements (image, video) used in an article. In total, we
collected data for 29
,
047 articles from reliable outlets and 15
,
017
from unreliable outlets.
3.2.2 Filtering non-Health News Articles. Even though we restricted
ourselves to health-related outlets, we observed that the outlets
also published or shared non-health (e.g., sports, entertainment,
weather) news. We removed these non-health articles from our
dataset and only kept health,food & drink, or tness & beauty related
articles. Specically, for each news article, we used the document
categorization service provided by Google Cloud Natural Language
API
10
to determine its topic. If an article doesn’t belong to one of the
three above mentioned topics, it is ltered out. This step reduced the
dataset size to 27
,
589;18
,
436 from reliable outlets and 9
,
153 from
unreliable outlets. We used this health-related dataset only in all the
experiments of this paper. Figure 1 shows the health-related news
percentage distribution for reliable outlets and unreliable outlets
using box-plots. For each of the 29 reliable outlets, we measure the
percentage of health news and then use these 29 percentage values
7
Facebook allows posting status, pictures, videos, events, links, etc. We collected the
link type posts only.
8
After that, Facebook limited access to pages as a result of the Cambridge Analytica
incident.
9https://newspaper.readthedocs.io/en/latest/
10https://cloud.google.com/natural-language/
Dierences between Health Related News Articles from Reliable and Unreliable Media C+J’19, February 2019, Miami, Florida USA
to draw the box-plot for the reliable outlets; likewise for unreliable.
We observe that the reliable outlets (median 72%) publish news on
health topics comparatively less than unreliable outlets (median
85%).
Figure 1: Comparison between reliable and unreliable out-
lets with respect to presence of health-related news contents
4 ANALYSIS
Using this dataset, we conduct content analysis to examine struc-
tural, topical, and semantic dierences in health news from reliable
and unreliable outlets.
4.1 Structural Dierence
4.1.1 Headline. The headline is a key element of a news article.
According to a study done by American Press Institute and the
Associated Press [
15
], only 4out of 10 Americans read beyond
the headline. So, it is important to understand how reliable and
unreliable outlets construct the headlines of their health-related
news. According to to [
1
], a long headline results in signicantly
higher click-through-rate (CTR) than a short headline does. We
observe that the average headline length of an article from reliable
outlets and an article from unreliable outlets is 8
.
56 words and 12
.
13
words, respectively. So, on average, an unreliable outlet’s headline
has a higher chance of receiving more clicks or attention than a
reliable outlet’s headline. To further investigate this, we examine
the clickbaityness of the headlines. The term clickbait refers to a
form of web content (headline, image, thumbnail, etc.) that employs
writing formulas, linguistic techniques, and suspense creating visual
elements to trick readers into clicking links, but does not deliver
on its promises [
9
]. Chen et al. [
2
] reported that clickbait usage is
a common pattern in false news articles. We investigate to what
extent the reliable and unreliable outlets use clickbait headlines in
their health articles. For each article headline, we test whether it is a
clickbait or not using two supervised clickbait detection models– a
sub-word embedding based deep learning model [
24
] and a feature
engineering based Multinomial Naive Bayes model [
21
]. Agreement
between these models was measured as 0
.
44 using Cohen’s
κ
. We
mark a headline as a clickbait if both models labeled it as clickbait.
We observe, 27.29% (5,031 out of 18,436) of the headlines from
reliable outlets are click bait. In unreliable outlets, the percentage
is signicantly higher, 40.03% (3,664 out of 9,153). So, it is evident
that the unreliable outlets use more click baits than reliable outlets
in their health journalism.
Figure 2: Distribution of clickbait patterns
We further investigate the linguistic patterns used in the clickbait
headlines. In particular, we analyze the presence of some common
patterns which are generally employed in clickbait according to [
1
,
23]. The patterns are-
Presence of demonstrative adjectives (e.g., this, these, that)
Presence of numbers (e.g., 10, ten)
Presence of modal words (e.g., must, should, could, can)
Presence of question or WH words (e.g., what, who, how)
Presence of superlative words (e.g., best, worst, never)
Figure 2 shows the distribution of these patterns among the clickbait
headlines of reliable and unreliable outlets. Note, one headline may
contain more than one pattern. For example, this headline “Are
these the worst 9 diseases in the world?” contains four of the above
patterns. This is the reason why summation of the percentages
isn’t equal to one. We see that unreliable outlets use demonstrative
adjective and numbers signicantly more compared to the reliable
outlets.
Figure 3: Distribution of (Shared Date - Published Date) gaps
in days
4.1.2 Time-span Between Publishing and Sharing. We investigate
the time dierence between an article’s published date and share
date (in Facebook). Figure 3 shows density plots of Facebook Share
Date – Article Publish Date for reliable and unreliable outlets. We
observe that both outlet categories share their articles on Facebook
within a short period after publishing. However, unreliable outlets
seem to have considerable time gap compared to reliable outlets.
It could be because of re-sharing an article after a long period.
To verify that, we checked how often an article is re-shared on
Facebook. We nd that on average a reliable article is shared 1.057
times whereas an unreliable article is shared 1.222 times.
C+J’19, February 2019, Miami, Florida USA S. Dhoju et al.
(a) Image (b) Quotation (c) Link
Figure 4: Distribution of average number of image/quotation/link per article from reliable and unreliable outlets.
(a) RT1 (b) RT2 (c) RT3
(d) UT1 (e) UT2 (f) UT3
Figure 5: Topic modeling (k=
3
) of articles from reliable out-
lets (top, denoted as RT) and from unreliable outlets (bot-
tom, denoted as UT).
4.1.3 Use of visual media. We examined how often the outlets
use images in the articles. Our analysis nds that on average an
article from reliable outlets uses 13.83 images and an article from
unreliable outlets uses 14.22 images. Figure 4a shows density plots of
the average number of images per article for both outlet categories.
We observe that a good portion of unreliable outlet sourced articles
uses a high number of images (more than 20).
4.2 Topical Dierence
All the articles which we examined are health-related. However,
the health domain is considerably broad and it covers many topics.
We hypothesize that there are dierences between the health topics
which are discussed in reliable outlets and in unreliable outlets. To
test that, we conduct an unsupervised and a supervised analysis.
4.2.1 Topic Modeling. We use Latent Dirichlet Allocation(LDA) al-
gorithm to model the topics in the news articles. The number of
topics,
k
, was set as 3. Figure 5 shows three topics for each of the
outlet categories. Each topic is modeled by the top-10 important
words in that topic. The font size of words is proportional to the
importance. Figure 5a and 5d indicate that “cancer” is a common
topic in reliable and unreliable outlets. Although, the words study,
said,percent,research, and their font sizes in Figure 5a indicate that
the topic “cancer” is associated with research studies, facts, and
references in reliable outlets. On the contrary, unreliable outlets
have the words vaccine,autism, and risk in Figure 5d which suggests
the discussion regarding how vaccines put people under autism
and cancer risk, an unsubstantiated claim, generally propagated
by unreliable media
11,12
. Figure 5e and 5f suggest the discussions
about weight loss, skin, and hair care products (e.g., essential oil,
lemon). Topics in Figure 5b and 5c discuss mostly u, virus, skin
infection, exercise, diabetes and so on.
(a) Reliable (b) Unreliable
Figure 6: Top-10 topics in reliable and unreliable outlets.
4.2.2 Topic Categorization. In addition to topic modeling, we cate-
gorically analyze the articles’ topics using Google Cloud Natural
Language API
13
. Figure 6 shows the top-10 topics in the reliable
and unreliable outlets. In the case of reliable, the distribution is
signicantly dominated by health condition. On the other hand, in
the case of unreliable outlets, percentages of nutrition and food are
noticeable. Only 4 of the 10 categories are common in two outlet
groups. Unreliable topics have weight loss,hair care,face & body
care. This nding supports our claim from topic modeling analysis.
4.3 Semantic Dierence
We analyze what eorts the outlets make to make a logical and
meaningful health news. Specically, we consider to what extent
the outlets use quotations and hyperlinks. Use of quotation and
hyperlinks in a news article is associated with credibility [
5
,
29
].
Presence of quotation and hyperlinks indicates that an article is
11https://www.webmd.com/brain/autism/do-vaccines-cause-autism
12
https://www.skepticalraptor.com/skepticalraptorblog.php/polio-vaccine-causes-
cancer-myth/
13https://cloud.google.com/natural-language/
Dierences between Health Related News Articles from Reliable and Unreliable Media C+J’19, February 2019, Miami, Florida USA
logically constructed and supported with credible factual informa-
tion.
4.3.1 otation. We use the Stanford QuoteAnnotator
14
to iden-
tify the quotations from a news article. Figure 4b shows density
plots of the number of quotations per article for reliable and unreli-
able outlets. We observe that unreliable outlets use less number of
quotations compared to reliable outlets. We nd that the average
number of quotations per article is 1 and 3 in unreliable and reliable
outlets, respectively. This suggests that the reliable outlet sources
articles are more credible and unreliable outlets are less credible.
4.3.2 Hyperlink. We examine the use of the hyperlink in the ar-
ticles. On average, a reliable outlet sourced article contains 8.4
hyperlinks and an unreliable outlet sourced article contains 6.8 hy-
perlinks. Figure 4c shows density plots of the number of links per
article for reliable and unreliable outlets. The peaks indicate that
most of the articles from reliable outlets have close to 8 (median)
hyperlinks. On the other hand, most of the unreliable outlet articles
have less than 2 hyperlinks. This analysis again suggests that the
reliable sourced articles are more credible than unreliable outlet
articles.
5 CONCLUSION AND FUTURE WORK
In this paper, we closely looked at structural, topical, and semantic
dierences between articles from reliable and unreliable outlets.
Our ndings reconrm some of the existing claims such as unreli-
able outlets use clickbaity headlines to catch the attention of users.
In addition, this study nds new patterns that can potentially help
separate health disinformation. For example, we nd that less quo-
tation and hyperlinks are more associated with unreliable outlets.
However, there are some limitations to this study. For instance, we
didn’t consider the videos, cited experts, comments of the users,
and other information. In the future, we want to overcome these
limitations and leverage the ndings of this study to combat health
disinformation.
REFERENCES
[1]
Chris Breaux. (accessed September 28, 2018). "You’ll Never Guess
How Chartbeat’s Data Scientists Came Up With the Single Greatest Head-
line". http://blog
.
chartbeat
.
com/2015/11/20/youll-never-guess-how- chartbeats-
data-scientists- came-up- with-the-single- greatest-headline/
[2]
Yimin Chen, Niall J Conroy, and Victoria L Rubin. 2015. Misleading online
content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on
Workshop on Multimodal Deception Detection. ACM, 15–19.
[3]
Nicole K Dalmer. 2017. Questioning reliability assessments of health information
on social media. Journal of the Medical Library Association: JMLA 105, 1 (2017),
61.
[4]
Irja Marije de Jong, Frank Kupper, Marlous Arentshorst, and Jacqueline Broerse.
2016. Responsible reporting: neuroimaging news in the age of responsible re-
search and innovation. Science and engineering ethics 22, 4 (2016), 1107–1130.
[5]
Juliette De Maeyer. 2012. The journalistic hyperlink: Prescriptive discourses
about linking in online news. Journalism Practice 6, 5-6 (2012), 692–701.
[6]
Katie Forster. (accessed October 30, 2018). Revealed: How dangerous fake health
news conquered Facebook. https://www
.
independent
.
co
.
uk/life-style/health-
and-families/health- news/fake-news- health-facebook-cruel- damaging-social-
media-mike- adams-natural- health- ranger-conspiracy- a7498201.html
[7]
Susannah Fox. (accessed October 30, 2018). The social life of health infor-
mation. http://www
.
pewresearch
.
org/fact-tank/2014/01/15/the- social-life- of-
health-information/
[8]
Gaby Galvin. (accessed October 30, 2018). How Bots Could Hack Your
Health. https://www
.
usnews
.
com/news/healthiest-communities/articles/2018-
07-24/how- social-media- bots- could-compromise- public-health
14https://stanfordnlp.github.io/CoreNLP/quote.html
[9]
Bryan Gardiner. (accessed September 28, 2018). "You’ll Be Outraged at How Easy
It Was to Get You to Click on This Headline". https://www
.
wired
.
com/2015/12/
psychology-of- clickbait/
[10]
Amira Ghenai and Yelena Mejova. 2017. Catching Zika Fever: Application of
Crowdsourcing and Machine Learning for Tracking Health Misinformation on
Twitter. In Healthcare Informatics (ICHI), 2017 IEEE International Conference on.
IEEE, 518–518.
[11]
Amira Ghenai and Yelena Mejova. 2018. Fake Cures: User-centric Modeling of
Health Misinformation in Social Media. In 2018 ACM Conference on Computer-
Supported Cooperative Work and Social Computing (CSCW). ACM.
[12]
Muiris Houston. (accessed October 31, 2018). Measles back with a vengeance due
to fake health news. https://www
.
irishtimes
.
com/opinion/measles-back- with-a-
vengeance-due- to-fake- health-news-1.3401960
[13]
informationisbeautiful.net. 2016. Unreliable/Fake News
Sites & Sources. https://docs
.
google
.
com/spreadsheets/d/
1xDDmbr54qzzG8wUrRdxQlC1dixJSIYqQUaXVZBqsJs. (2016).
[14]
Mathew Ingram. (accessed October 30, 2018). The internet didn’t invent viral
content or clickbait journalism âĂŤ there’s just more of it now, and it happens
faster. https://gigaom
.
com/2014/04/01/the-internet- didnt-invent- viral-content-
or-clickbait- journalism-theres- just-more-of- it-now- and-it- happens- faster/
[15]
American Press Institute and the Associated Press-NORC Center for Public
Aairs Research. (accessed September 28, 2018). The Personal News Cycle: How
Americans choose to get their news. https://www
.
americanpressinstitute
.
org/
publications/reports/survey-research/how-americans-get- news/
[16]
Marjorie Kagawa-Singer and Shaheen Kassim-Lakha. 2003. A strategy to reduce
cross-cultural miscommunication and increase the likelihood of improving health
outcomes. Academic Medicine 78, 6 (2003), 577–587.
[17]
Alexander Kinsora, Kate Barron, Qiaozhu Mei, and VG Vinod Vydiswaran. 2017.
Creating a Labeled Dataset for Medical Misinformation in Health Forums. In
Healthcare Informatics (ICHI), 2017 IEEE International Conference on. IEEE, 456–
461.
[18]
Patty Kostkova, Vino Mano, Heidi J Larson, and William S Schulz. 2016. Vac
medi+ board: Analysing vaccine rumours in news and social media. In Proceedings
of the 6th International Conference on Digital Health Conference. ACM, 163–164.
[19]
David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M
Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny-
cook, David Rothschild, et al
.
2018. The science of fake news. Science 359, 6380
(2018), 1094–1096.
[20]
Felisbela Lopes, Teresa Ruão, Zara Pinto Coelho, and Sandra Marinho. 2009.
Journalists and health care professionals: what can we do about it?. In 2009
Annual Conference of the International Association for Media and Communication
Research (IAMCR),“Human Rights and Communication”. 1–15.
[21]
Saurabh Mathur. (accessed September 24, 2018). Clickbait Detector. https://
github.com/saurabhmathur96/clickbait-detector
[22]
HEALTH ON THE NET. (accessed September 24, 2018). . https://www
.
hon
.
ch/en/
[23]
Matthew Opatrny. (accessed September 28, 2018). "9 Headline Tips to Help You
Connect with Your Target Audience". https://www
.
outbrain
.
com/blog/9-headline-
tips-to- help-marketers- and-publishers-connect- with-their- target- audiences/
[24]
Md Main Uddin Rony, Naeemul Hassan, and Mohammad Yousuf. 2017. Diving
Deep into Clickbaits: Who Use Them to What Extents in Which Topics with
What Eects?. In Proceedings of the 2017 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining 2017. ACM, 232–239.
[25]
Michael Schneider. (accessed September 24, 2018). Most-Watched Television
Networks: Ranking 2016’s Winners and Losers. https://www
.
indiewire
.
com/2016/
12/cnn-fox- news-msnbc- nbc- ratings-2016- winners-losers- 1201762864/
[26]
Gary Schwitzer. 2008. How do US journalists cover treatments, tests, products,
and procedures? An evaluation of 500 stories. PLoS medicine 5, 5 (2008), e95.
[27]
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news
detection on social media: A data mining perspective. ACM SIGKDD Explorations
Newsletter 19, 1 (2017), 22–36.
[28]
Miriam Shuchman and Michael S Wilkes. 1997. Medical scientists and health
news reporting: a case of miscommunication. Annals of Internal Medicine 126, 12
(1997), 976–982.
[29]
S Shyam Sundar. 1998. Eect of source attribution on perception of online news
stories. Journalism & Mass Communication Quarterly 75, 1 (1998), 55–68.
[30]
Emily K Vraga and Leticia Bode. 2017. Using Expert Sources to Correct Health
Misinformation in Social Media. Science Communication 39, 5 (2017), 621–645.
[31]
World Health Organization (WHO). (accessed September 24, 2018). . http:
//www.who.int/
[32]
Wikipedia. (accessed September 24, 2018). List of fake news websites. https:
//bit.ly/2moBDvA
[33]
Wikipedia. (accessed September 24, 2018). Wikipedia:Zimdars’ fake news list.
https://bit.ly/2ziHaf j
[34]
Melissa Zimdars. 2016. My ‘fake news list’ went viral. But made-up stories are
only part of the problem. https://www
.
washingtonpost
.
com/posteverything/
wp/2016/11/18/my-fake- news-list- went-viral- but-made-up- stories-are- only-
part-of- the-problem. (2016).
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The use of alluring headlines (clickbait) to tempt the readers has become a growing practice nowadays. For the sake of existence in the highly competitive media industry, most of the on-line media including the mainstream ones, have started following this practice. Although the wide-spread practice of clickbait makes the reader's reliability on media vulnerable, a large scale analysis to reveal this fact is still absent. In this paper, we analyze 1.67 million Facebook posts created by 153 media organizations to understand the extent of clickbait practice, its impact and user engagement by using our own developed clickbait detection model. The model uses distributed sub-word embeddings learned from a large corpus. The accuracy of the model is 98.3%. Powered with this model, we further study the distribution of topics in clickbait and non-clickbait contents.
Article
Full-text available
This study tests whether the number (1 vs. 2) and the source (another user vs. the Centers for Disease Control and Prevention [CDC]) of corrective responses affect successful reduction of misperceptions. Using an experimental design, our results suggest that while a single correction from another user did not reduce misperceptions, the CDC on its own could correct misinformation. Corrections were more effective among those higher in initial misperceptions. Notably, organizational credibility was not reduced when correcting misinformation, making this a low-cost behavior for public health organizations. We recommend that expert organizations like the CDC immediately and personally rebut misinformation about health issues on social media.
Article
Full-text available
Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of "fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Article
Full-text available
This narrative review examines assessments of the reliability of online health information retrieved through social media to ascertain whether health information accessed or disseminated through social media should be evaluated differently than other online health information. Several medical, library and information science, and interdisciplinary databases were searched using terms relating to social media, reliability, and health information. While social media’s increasing role in health information consumption is recognized, studies are dominated by investigations of traditional (i.e., non-social media) sites. To more richly assess constructions of reliability when using social media for health information, future research must focus on health consumers’ unique contexts, virtual relationships, and degrees of trust within their social networks.
Conference Paper
Full-text available
Digital health has revolutionised healthcare [1-2], with implications for understanding public reactions to health emergencies and interventions [3], while real-Time analysis provides a new opportunity for rapidly detecting changes in public confidence in vaccines [4, 5, 6]. Medi+board implements tools for infectious disease surveillance and outbreak management [7], and the novel aim of the VAC medi+board is to design an interactive visualisation framework integrating heterogeneous real-Time data streams with social network data, to meet information needs as articulated by the LSHTM Vaccine Confidence Project (VCP) investigators.
Article
Full-text available
Besides offering opportunities in both clinical and non-clinical domains, the application of novel neuroimaging technologies raises pressing dilemmas. 'Responsible Research and Innovation' (RRI) aims to stimulate research and innovation activities that take ethical and social considerations into account from the outset. We previously identified that Dutch neuroscientists interpret "responsible innovation" as educating the public on neuroimaging technologies via the popular press. Their aim is to mitigate (neuro)hype, an aim shared with the wider emerging RRI community. Here, we present results of a media-analysis undertaken to establish whether the body of articles in the Dutch popular press presents balanced conversations on neuroimaging research to the public. We found that reporting was mostly positive and framed in terms of (healthcare) progress. There was rarely a balance between technology opportunities and limitations, and even fewer articles addressed societal or ethical aspects of neuroimaging research. Furthermore, neuroimaging metaphors seem to favour oversimplification. Current reporting is therefore more likely to enable hype than to mitigate it. How can neuroscientists, given their self-ascribed social responsibility, address this conundrum? We make a case for a collective and shared responsibility among neuroscientists, journalists and other stakeholders, including funders, committed to responsible reporting on neuroimaging research.
Article
Full-text available
Are quoted sources in online news as psychologically meaningful us those in printed and broadcast news? A within-subjects experiment was designed to answer this question. On a web site, forty-eight subjects read three online news stories with quotes and three stories without source attribution. They rated stories with quotes significantly higher in credibility and quality than identical stories without quotes. However, quotes did not seem to affect their ratings of liking for - and representativeness (newsworthiness) of- online news.