Content uploaded by Abeed Sarker
Author content
All content in this area was uploaded by Abeed Sarker on Jun 02, 2021
Content may be subject to copyright.
Content uploaded by Abeed Sarker
Author content
All content in this area was uploaded by Abeed Sarker on Apr 07, 2021
Content may be subject to copyright.
Syndromic surveillance for COVID19 from Reddit using multi-platform
lexicons
Abimbola Leslie, MPH1Sahithi Lakamana, MS,2 Abeed Sarker, PhD2
1Laney Graduate School, Emory University, Atlanta, GA 30322
2Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322
Introduction
COVID19 presents challenges that warrant the
exploration of additional sources of data for
syndromic surveillance.
Social media, which has become a particularly
important channel for communication due to social
distancing guidelines, has been shown to contain
large amounts of COVID19
-related chatter, which
can be captured and analyzed in real
-time.
In this study, we utilized a COVID19 symptom
lexicon to discover and compare automatically
detected COVID19 symptoms reported publicly on
Twitter and Reddit.
Methods
Results
Reddit data: 8,435 posts; 893 COVID19
-positive
users.
Symptom distributions between Twitter and Reddit
had significant and high correlation (r
2=0.95;
Figure).
Twitter lexicon, without combining with Reddit,
obtained F
1-score of 0.71.
Notable differences:
fatigue and pain-related
symptoms
—Reddit users appear to consistently
report higher numbers of symptoms compared to
Twitter (p=0.0076; two
-tailed paired T test).
No significant difference in the mean number of
symptoms reported per person (4.94
vs. 5.55;
p=0.0503).
Summary and Conclusions
The symptom lexicon developed is largely portable
across networks.
A multi
-network syndromic surveillance approach
over social media data has the potential of
complementing existing syndromic surveillance.
Data Collection
•Manual annotation of the
Reddit data flair “tested
positive”
Matching of lexicon
entries
•Levenshtein ratio matched
similarities between lexicon
entries
Statistical Analysis
•Statistical comparison of
symptom with twitter
symptom distribution