Content uploaded by Abeed Sarker
Author content
All content in this area was uploaded by Abeed Sarker on Sep 16, 2019
Content may be subject to copyright.
Towards Automating Location-Specific Opioid Toxicosurveillance from Twitter via Data
Science Methods
Abeed Sarkera, Graciela Gonzalez-Hernandeza, Jeanmarie Perroneb
a Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia,
Pennsylvania, U.S.A.,
b Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, U.S.A.
Abstract
Social media may serve as an important platform for the
monitoring of population-level opioid abuse in near real-time.
Our objectives for this study were to (i) manually characterize
a sample of opioid-mentioning Twitter posts, (ii) compare the
rates of abuse/misuse related posts between prescription and
illicit opiods, and (iii) to implement and evaluate the
performances ofsupervised machine learning algorithms for
the characterization of opioid-related chatter, which can
potentially automate social media based monitoring in the
future.. We annotated a total of 9006 tweets into four
categories, trained several machine learning algorithms and
compared their performances. Deep convolutional neural
networks marginally outperformed support vector machines
and random forests, with an accuracy of 70.4%. Lack of context
in tweets and data imbalance resulted in misclassification of
many tweets to the majority class. The automatic classification
experiments produced promising results, although there is
room for improvement.
Keywords:
Social media, Opioids, Surveillance
Introduction
The problem of opioid (prescription and illicit) addiction and
overdose is having lethal consequences all over the United
States [1]. The 2015 National Survey on Drug Use and Health
(NSDUH) estimated that 11.5 million adults misused/abused
prescription opioids, and among adults with prescription opioid
use, 12.5% reported misuse [2]. The number of opioid overdose
deaths continue to rise alarmingly, with 174 people dying from
drug overdoses daily [3], and the current rate of opioid
prescriptions is three times higher than in the 90s. Between
2014 and 2015, opioid related death rates increased by 15.6%,
continuing a trend from 1999, and this increase was driven by
illicit opioids other than methadone [4]. Despite the significant
acceleration of the crisis in recent years, surveillance measures
are slow, and deriving estimates from surveys, such as the
NSDUH, is belated. There is almost a two-year lag between the
occurrence of overdose related deaths and the time by which
the statistics are publicized.† Such a lag in the process of data
collection and synthesis makes it impossible to determine the
trajectory of the epidemic or identify geographic areas that are
†Available at: https://www.drugabuse.gov/related-
topics/trends-statistics/overdose-death-rates. Accessed:
October 22, 2018.
more greatly impacted by the crisis at a specific point of time.
Whether its illicit or prescription opioids, the vast numbers of
people affected means that a comprehensive public health
approach is needed to curb the crisis, going beyond simply
changing patterns of prescribing [5]. Kolodny and Frieden [1]
recommended 10 steps that the federal government should take
to reverse the opioid epidemic, and, as their first point, the
authors outlined the need for real-time assessment of the
numbers, patterns, or trends of opioid misuse/addiction.
In this paper, we explore the possibility of using social media,
namely Twitter, as a resource for performing real time
surveillance of opioid abuse, including both prescription and
illicit opioids. Past studies have shown that users post
information related to drug abuse on social media [6]–[8].
However, there is a lack of analysis of the differences in abuse-
related chatter versus other types of chatter, such as
consumption, although it is well known that not all drug-related
chatter represents abuse [9]. There is also a lack of
understanding regarding the differences between the chatter
associated with prescription and illicit opioids (e.g., what
proportions of illicit vs. prescription opioid mentioning chatter
represent abuse?). Unsupervised methods that primarily rely on
the volume of data do not take into account the large amounts
of noise that is present in social media data (e.g., [10]). There
are currently no prototype end-to-end, automated pipelines that
can enable the real time surveillance of opioid abuse/misuse via
social media. In this paper, we take the first steps in addressing
these gaps. We present (i) data collection strategies from
Twitter, including the use of automatically generated
misspellings and geolocation metadata, (ii) an analysis of the
contents of tweets mentioning prescription and illicit opioids,
and (iii) a comparison of several supervised classification
approaches. Our experiments show that opioid chatter on
Twitter can vary significantly between prescription and illicit
opioids, with some illicit opioid keywords being too ambiguous
to be useful for data collection. We also show that using
annotated data, we can train supervised learning algorithms to
automatically characterize tweets. We suggest that such a
supervised classification system, paired with geolocation
metadata from Twitter, can be used to perform localized
surveillance of opioid abuse/misuse. We present our pilot
methods using the state of Pennsylvania as example.
MEDINFO 2019: Health and Wellbeing e-Networks for All
L. Ohno-Machado and B. Séroussi (Eds.)
© 2019 International Medical Informatics Association (IMIA) and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/SHTI190238
333
Methods
Data Collection
We collected data from Twitter using names of prescription and
illicit opioid keywords (e.g., Percocet® and heroin), including
street names (e.g., china white, tar, skag, percs) and common
misspellings (e.g., percoset, heorin). We used the list of drug
slang terms recently released by the Drug Enforcement Agency
(DEA) of the United States to create an initial list of possible
slang terms for different prescription and illicit opioids [11].
We manually reviewed the terms and removed those we were
sure to be too ambiguous. For example, some of the slang terms
associated with heroin, as per the document, are ‘basketball’,
‘coffee’, ‘lemonade’ and ‘whiskey’. Through manual searches
of the Twitter web interface, we could not find any instances
where these terms were used to refer to opiods. Therefore, we
removed these to reduce the retrieval of noise. This strategy led
us to use a total of 56 unique names of opioids. Since drug
names are often misspelled on social media, we automatically
generated misspellings for these keywords using a misspelling
generation system [12]. Table 1 presents some sample opioid-
related keywords and their automatically generated
misspellings. After collecting an initial set, we analyzed
samples of retrieved tweets for each keyword. We discovered
that despite our initial filtering of keywords, approximately
85% of the tweets were retrieved by 4 keywords—tar (~6.5%),
dope (~54%), smack (~20.5%) and skunk (~4%)—and in these
tweets, these keywords were almost invariably unrelated to
opioids and represented something else. We, therefore,
removed these keywords for our final data collection. In this
manner, we collected tweets between the years 2012 to 2015,
only including those geolocated from within Pennsylvania and
excluding retweets.
Table 1– Sample of opioid-related keywords and their
automatically-generated frequently occurring misspellings
Keyword
Generated Misspellings
Tramadol
trammadol tramadal tramdol
tramadols tramado tramedol
tramadoll tramadole tramidol
tamadol tranadol tramodol
tremadol
Heroin
herione herroine heroins
heroine heroin heorin herion
Methadone
methadones methadose methodone
mehtadone metadone methadon
methdone
Oxycontin
oxicontin oxcotin oycotin
oxycotins oycontin oxycontins
oxycoton oxicotin ocycotin
oxycodin oxycottin oxycotine
ocycontin
Codeine
codiene coedine codine codene
codein
Dilaudid
delaudid dialudid dilaudad
diluadid diaudid dilaudin
dilauded dilauid dillaudid
Annotation
Our intent was to compare the distributions of tweets for
prescription and illicit opioids and to attempt to train supervised
learning algorithms—both of which require manual
annotation/labeling of a sample of tweets. Based on manual
inspection of the collected data,. we decided to manually code
the tweets into 4 broad categories—self-reported abuse,
information sharing, non-English, and unrelated. Details about
these categories are as follows.
Abuse-related (A)
Abuse-indicating or possible abuse by the poster or by someone
the user knows or is communicating with. This category also
includes admissions of abuse in the past. For illicit opioids, any
consumption is considered to be abuse. For prescription
opioids, consumption is considered to be abuse only when there
is evidence that the user is taking the drug without a
prescription, through a non-standard route (e.g., injecting,
snorting) or in combination with other substances in order to
experience certain sensations.
Information Sharing/Seeking/Providing (I)
Tweets in which the poster is asking for information or
providing information about an opioid. This category also
includes expressions of medical use (e.g., mentions of having a
prescription or taking painkillers after surgery), and sharing of
news articles or other media that contain information about
opioids. General statements about the drug may are also put into
this class.
Non-English (N)
Tweets that are not written in English belong to this category.
Unrelated (U)
This category includes tweets that are not about the drug or
opioid, but about something else. This category also includes
tweets that make metaphorical comparisons (e.g., I am addicted
to X like heroin). Some examples of tweets belonging to this
category: handle related (@codeine_CXXX), heroine (hero),
cooking (brown sugar). This category also includes tweets
about movies or lyrics of songs that mention opioids, but don’t
have any information value. Table 2 presents examples of
tweets belonging to these four categories.
Table 2– Sample tweets and their categories; opioid keywords
shown in bold
Tweet
Category
@username naa, i just popped a few percs at
2, i drink, sip lean. Wbu?
A
Sooooo heroine addicts robbed the house 3
houses away from me...makes me feel safe
I
Ok I thought that it was just a really funny
oxy
clean commercial but turns out it was just
the Spanish channel
U
Te quieroo muchito mi hermana negra
N
We iteratively annotated a set of 100 tweets and discussed the
disagreements between pairs of annotators. The disagreements
on the initial set were resolved via discussion, and the same
process was executed twice until an acceptable level of
agreement was reached. In the final set, disagreements for
overlapping tweets were resolved by a third annotator—the first
author of this article.
Analysis and Supervised Learning
Prescription versus Illicit Opioids
Using the annotated dataset, we compared the volumes of
prescription and illicit opioids in the sample to better
understand which of these two broad classes of opioids were
A. Sarker et al. / Towards Automating Location-Specific Opioid Toxicosurveillance from Twitter via Data Science Methods334
more frequently discussed on Twitter. Since the sample for
annotation was drawn randomly, we assumed that the
distributions of prescription and illicit opiod mentions
represented their natural distribution in publicly available
Twiter chatter. We also assessed the differences in the
distributions of the four tweet categories for these two types of
opioids by comparing their proportions. The results of these
comparisons are presented in the Results section.
Supervised Machine Learning
To train and evaluate several machine learning algorithms, we
first split the annotated data into training (~80%) and test
(~20%) sets. We used the training set for analysis, algorithm
training and feature analyses, and held out the test set for
evaluation. Our intent was primarily to assess the utility of the
annotated corpus for supervised machine learning, with the
assumption that if supervised classification produced adequate
performance, they can be employed in the future for real time
monitoring. We trained and optimized three different
classification algorithms over the dataset—support vector
machines (SVMs), random forests (RFs) and deep
convolutional neural network (d-CNN), and compared their
performances with a naïve bayes (NB) baseline classifier.
SVMs and RFs have been shown in the past to perform well for
text classification tasks, particularly because of their suitability
for handling large feature spaces. Meanwhile, CNN based
classifiers have become popular in the recent past, and they
work particularly well in the presence of large annotated data
sets. For the SVMs, RF and NB classifiers, we performed basic
feature engineering based on our findings from past work on
the topic of automatic prescription medication abuse detection
from social media [13]. As features, we used preprocessed n-
grams (n=1—3), word clusters or generalized representations
of words, and the presence or absence of abuse-indicating
terms. We used 10-fold cross validation over the training set for
the RF and SVM classifiers to find optimal parameter values.
For the SVMs, we optimized the kernel and the cost parameter.
For the RF classifier, we optimized the number of trees. For the
d-CNN classifier, we used dense word vectors, or word
embeddings as input. We obtained pre-trained word
embeddings from our past work [14]. We used a three-layer
convolutional neural network, and for optimizing the various
hyperparameters, we split the training set further into two sets
and used the larger set for training and the smaller set for
validation. For NB, SVM and RF classifiers, we used
implementations provided by the python scikit-learn library
[15], and for the d-CNN classifier, we used the TensorFlow
library [16]. Figure 1 summarizes our entire processing
workflow for this study—from spelling variant generation
through to supervised classification of tweets.
Results
A total of 9006 tweets mentioning both prescription and illicit
opioids were annotated by 4 annotators. Among 550
overlapping tweets, average inter-annotator agreement was
0.75 (Cohen’s kappa [17]). The final data set consisted of 1748
abuse tweets, 2001 information tweets, 4830 unrelated tweets,
and 427 non-English tweets. The majority of the tweets
mentioned illicit opioids—7038 illicit and 2257 prescription.‡
Figure 2 shows the distributions of illicit and prescription
opioid mentioning tweets in our annotated set, illustrating that
although the relative volume of illicit opioid tweets is much
‡ Note that the sum is of these two numbers is greater than the
total number of tweets annotated (9006) since some tweets
mention both prescription and illicit opioids.
higher, a significantly larger proportion of these tweets are
unrelated to opioids. The significantly higher number of
unrelated tweets for illicit opioid mentioning posts suggests that
such tweets have higher amounts of noise associated with them,
and may be more difficult to mine knowledge from despite the
large volume.
Table 3 presents the performance of the three classifiers and the
NB baseline over the test set. In total, we used 7204 tweets for
trainining and 1802 tweets for evaluation. For the d-CNN
classifier, the training set was further split into 6304 for training
and 900 for validation. It can be seen that, in terms of overall
accuracy, macro-averaged recall and precision, the d-CNN
classifier marginally outperforms the two traditional
benchmark classification approaches (SVMs and RF) despite
the relatively small amount of annotated data that was used. All
the three classifiers perform significantly better than the NB
baseline. The high performance of the d-CNN classifier is
encouraging because such deep neural network based
classifiers have more room for improvement, compared to their
traditional counterparts, as more data is annotated.
Figure 1 The Twitter data processing workflow for this study
Discussion
Our experiments produced very promising results and showed
that automatic machine learning based approaches may in fact
provide a possible mechanism for monitoring opioid abuse in
near real time for targeted geographic locations (e.g., at the state
leve). By combining geolocation information and manually
annotated data, we were able to automatically characterize
opioid-mentioning chatter from Pennsylvania with moderate
accuracy. Table 4 shows three sample tweets, their automatic
classifications, location by county and timestamp.
Our manual categorization efforts revealed the difficulty of
annotating tweets with high inter annotator agreement. Creating
a specific annotation guideline and several iterations of
discussions over small sets of overlapping annotations helped
improve agreement, although in many cases, due to the lack of
context in the tweets, the assigned category depended on the
subjective assessement of the annotator. This suggests that
A. Sarker et al. / Towards Automating Location-Specific Opioid Toxicosurveillance from Twitter via Data Science Methods 335
thorough annotation guidelines and such an iterative approach
to annotation are very important for achieving acceptable
agreement levels for complex annotation tasks such as this.
We found that illicit opioid mentioning tweets were particularly
highly noisy, with references to song lyrics or movie quotes,
which led to a large proportion of them to be labeled as
unrelated. The high proportions of unrelated tweets for both
types of opioids, and particularly for illicit opioids, illustrate the
importance of a supervised classification system for automatic
surveillance. Keyword-based surveillance methods, which rely
on the volume of data collected using specific keywords, are
evidently not suitable for opioid toxicosurveillance since most
of the data retrieved by the keywords will be unrelated noise.
The amount of noise may increase or decrease based on events
publicized over media outlets. In addition, as our initial analysis
of the retrieved data showed, if ambiguous keywords are to be
used, the vast majority of tweets collected via the ambiguous
keywords (e.g., dabs) can be noise, and this noise may mask the
real abuse related signals. Thus, when designing surveillance
strategies for similar tasks via social media, care must be taken
to identify noisy keywords that may invalidate the surveillance
process by bringing in too much noise.
The automatic classification experiments produced acceptable
performances, suggesting that automated, real-time opioid
toxicosurveillance may be a possibility. In the future, we will
explore additioal classification strategies for further improving
performance. A brief error analysis revealed that lack of context
in tweets caused our learning algorithms to often misclassify
tweets to the majority class (U). To better understand the
characteristics of the missclassified tweets, more analyses are
required.
In the future, we will also apply supervised classifiers trained
using our annotated data to automatically characterize
unlabeled posts collected over a longer time period to better
understand how opioid abuse related tweets are distributed over
time and more fine-grained geolocations. Such an analysis may
reveal specific time periods that are associated with higher rates
of abuse. We will also explore how the opioid abuse rates
reported on Twitter correlate, if at all, with real-world data
regarding the opioid crisis, such as geolocation-centric opioid
overdose death rates.
Conclusions
Our study suggests that Twitter is a promising platform to
perform real-time surveillance of opioid abuse/misuse.
Although we have only used geolocation data to identify the
origins of tweets at the state level, it may be possible to further
narrow down to the county or city level, particularly as the
volume of data grows over time. Our manual categorization of
the data and analyses shows that keyword based data collection
from Twitter results in the retrieval of significant amounts of
noise. Therefore, studies attempting to use streaming Twitter
data for surveillance must be wary of the amount of noise
retrieved per keyword and only use keywords that are
unambiguous. The same protocol should be followed for
research involving data from other social networks. Our
annotation also showed that even when using keywords with
high signal-to-noise ratios, the number of unrelated tweets is
significantly higher for illicit opioids compared to prescription
opioids. Thus, the total volume of opioid related chatter may
not be indicative of the real abuse or misuse of opioids, but may
be driven by other factors such as news articles or the release of
movies/songs. To overcome this problem, we employed a
supervised classification approach to automatically categorize
the tweets, and we found a deep convolutional neural network
to produce the best performance with an overall accuracy of
70.4%. In the future, we will try to improve on this
classification performance by employing more advanced
strategies, and also use the output of the classifiers to perform
downstream geolocation-centric analyses.
Table 3– Classifier accuracies over the test set
Classifier
Recall
Precis-
ion
Accu-
racy
(%)
95% CI
Naïve Bayes
0.61
0.58
53.9
51.6-56.3
Random
Forest
0.66
0.70
70.1
67.9-72.2
Support
Vector
Machines
0.68
0.70
69.9
67.8-72.1
Deep
Convolutional
Neural
Network
0.70
0.71
70.4
68.2-72.5
Figure 2– Distributions of tweets belonging to each category
for illicit and prescription opioid mentioning tweets. The charts
show that aabout 75% of the tweets in the sample mention illicit
opioids, and that illicit opioid mentioning tweets have much
higher proportions of unrelated information (including non-
English tweets), while prescription opioid mentioning tweets
have higher proportions of misuse/abuse and information
oriented tweets.
A. Sarker et al. / Towards Automating Location-Specific Opioid Toxicosurveillance from Twitter via Data Science Methods336
Acknowledgements
Research reported in this publication was supported in part by
the National Institute on Drug Abuse of the National Institutes
of Health under Award Number R01DA046619. The content is
solely the responsibility of the authors and does not necessarily
represent the official views of the National Institutes of Health.
The data collection and annotation efforts were partly funded
by a grant from the Pennsylvania Department of Health. The
Titan Xp used for this research was donated by the NVIDIA
Corporation. The authors would like to thank Karen O’Connor,
Alexis Upshur and Annika DeRoos for performing the
annotations. This study was approved by the institutional
review board at the University of Pennsylvania.
References
[1] A. Kolodny and T.R. Frieden, Ten steps the federal
government should take now to reverse the opioid
addiction epidemic, JAMA 318 (2017), 1537-1538.
[2] B. Han, W.M. Compton, C. Blanco, E. Crane, J. Lee, and
C.M. Jones, Prescription opioid use, misuse, and use
disorders in U.S. adults: 2015 National Survey on Drug
Use and Health, Ann Intern Med 167 (2017), 293-301.
[3] H. Jalal, J.M. Buchanich, M.S. Roberts, L.C. Balmert, K.
Zhang, and D.S. Burke, Changing dynamics of the drug
overdose epidemic in the United States from 1979 through
2016, Science 361 (2018), 1184.
[4] R.A.R. Rudd, P. Seth, F. David, and L. Scholl, Increases in
drug and opioid-involved overdose deaths — United
States, 2010–2015, Morb Mortal Wkly Rep 65 (2016),
1445-1452.
[5] A. Schuchat, D. Houry, and G.P. Guy, New data on opioid
use and prescribing in the United States, JAMA 318 (2017,
425-426.
[6] L. Shutler, L.S. Nelson, I. Portelli, C. Blachford, and J.
Perrone, Drug use in the Twittersphere: a qualitative
contextual analysis of tweets about prescription drugs, J
Addict Dis 34 (2015), 303-310.
[7] M. Chary, N. Genes, C. Giraud-Carrier, C. Hanson, L.S.
Nelson, and A.F. Manini, “Epidemiology from tweets:
estimating misuse of prescription opioids in the USA from
social media, J Med Toxicol 13 (2017) 278-286.
[8] D. Cameron et al., PREDOSE: a semantic web platform
for drug abuse epidemiology using social media, J Biomed
Inform 46 (2013), 985-997.
§ The tweets and their metadata have been modified to protect
the anonymity of the actual users.
[9] A. Sarker et al., Social media mining for toxicovigilance:
automatic monitoring of prescription medication abuse
from Twitter, Drug Saf 39 (2016), 231-240.
[10] R.L. Graves, C. Tufts, Z.F. Meisel, D. Polsky, L. Ungar,
and R.M. Merchant, Opioid discussion in the
Twittersphere, Subst Use Misuse 53 (2018), 2132-2139.
[11] DEA Houston Division, Slang Terms and Code Words: A
Reference for Law Enforcement Personnel, US Drug
Enforcement Administration, Washington, DC, 2018.
[12] A. Sarker and G. Gonzalez-Hernandez, An unsupervised
and customizable misspelling generator for mining noisy
health-related text sources, J Biomed Inform 88 (2018), 98-
107.
[13] A. Sarker et al., Social media mining for toxicovigilance:
automatic monitoring of prescription medication abuse
from Twitter, Drug Saf 39 (2016), 231-240.
[14] A. Sarker and G. Gonzalez, A corpus for mining drug-
related knowledge from Twitter chatter: language models
and their utilities, Data in Brief 10 (2017), 122-131.
[15] F. Pedregosa et al., Scikit-learn: machine learning in
Python, J Mach Learn Res 12 (2011), 2825-2830.
[16] M. Abadi et al., TensorFlow: Large-Scale Machine
Learning on Heterogeneous Distributed Systems, Google
Research, Mountain View, CA, 2016.
[17] J. Cohen, A coefficient of agreement for nominal scales,
Educ Psychol Meas 20 (1960), 37-46.
Address for correspondence
Abeed Sarker, Ph.D.
Mailing Address: Level 4, 423 Guardian Drive, Division of
Informatics, Department of Biostatistics, Epidemiology and
Informatics, Perelman School of Medicine, University of
Pennsylvania, Philadelphia, PA 19104, U.S.A.
Email: abeed@pennmedicine.upenn.edu
Phone: +1-215-746-1700
Table 4
– Sample tweets and classification in real-time with geolocation information (county level) and timestamps§
Tweet
Class
County
Timestamp
Enjoying this healthy breakfast
recommendation
frm @username. Oatmeal
w/raisins/walnuts/brown sugar
frm @username
Unrelated
Philadelphia
12:37:11 XX
-XX- 2015
@username i
shouldnt have done all that heroin this morning
Abuse
Allegheny
13:32:34 XX
-XX-2015
I know everyone is socialized different and wired uniquely. I
still want to smack a ******* for not staying in their lane.
Unrelated
Philadelphia
13:54:21 XX
-XX-2015
its on the news.. kensington oxys on the loose
Information
Philadelphia
15:01:55 XX
-XX-2015
A. Sarker et al. / Towards Automating Location-Specific Opioid Toxicosurveillance from Twitter via Data Science Methods 337