Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Background Opioid misuse (OM) is a major health problem in the United States, and can lead to addiction and fatal overdose. We sought to employ natural language processing (NLP) and machine learning to categorize Twitter chatter based on the motive of OM. Materials and Methods We collected data from Twitter using opioid-related keywords, and manually annotated 6,988 tweets into three classes—No-OM, Pain-related-OM, and Recreational-OM—with the No-OM class representing tweets indicating no use/misuse, and the Pain-related misuse and Recreational-misuse classes representing misuse for pain or recreation/addiction. We trained and evaluated multi-class classifiers, and performed term-level k-means clustering to assess whether there were terms closely associated with the three classes. Results On a held-out test set of 1,677 tweets, a transformer-based classifier (XLNet) achieved the best performance with F1-score of 0.71 for the Pain-misuse class, and 0.79 for the Recreational-misuse class. Macro- and micro-averaged F1-scores over all classes were 0.82 and 0.92, respectively. Content-analysis using clustering revealed distinct clusters of terms associated with each class. Discussion While some past studies have attempted to automatically detect opioid misuse, none have further characterized the motive for misuse. Our multi-class classification approach using XLNet showed promising performance, including in detecting the subtle differences between pain-related and recreation-related misuse. The distinct clustering of class-specific keywords may help conduct targeted data collection, overcoming under-representation of minority classes. Conclusion Machine learning can help identify pain-related and recreational-related OM contents on Twitter to potentially enable the study of the characteristics of individuals exhibiting such behavior.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Importance Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States. Objective To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning social media chatter. Design, Setting, and Participants This cross-sectional, population-based study was conducted from December 1, 2017, to August 31, 2019, and used more than 3 years of publicly available social media posts on Twitter, dated from January 1, 2012, to October 31, 2015, that were geolocated in Pennsylvania. Opioid-mentioning tweets were extracted using prescription and illicit opioid names, including street names and misspellings. Social media posts (tweets) (n = 9006) were manually categorized into 4 classes, and training and evaluation of several machine learning algorithms were performed. Temporal and geospatial patterns were analyzed with the best-performing classifier on unlabeled data. Main Outcomes and Measures Pearson and Spearman correlations of county- and substate-level abuse-indicating tweet rates with opioid overdose death rates from the Centers for Disease Control and Prevention WONDER database and with 4 metrics from the National Survey on Drug Use and Health for 3 years were calculated. Classifier performances were measured through microaveraged F1 scores (harmonic mean of precision and recall) or accuracies and 95% CIs. Results A total of 9006 social media posts were annotated, of which 1748 (19.4%) were related to abuse, 2001 (22.2%) were related to information, 4830 (53.6%) were unrelated, and 427 (4.7%) were not in the English language. Yearly rates of abuse-indicating social media post showed statistically significant correlation with county-level opioid-related overdose death rates (n = 75) for 3 years (Pearson r = 0.451, P < .001; Spearman r = 0.331, P = .004). Abuse-indicating tweet rates showed consistent correlations with 4 NSDUH metrics (n = 13) associated with nonmedical prescription opioid use (Pearson r = 0.683, P = .01; Spearman r = 0.346, P = .25), illicit drug use (Pearson r = 0.850, P < .001; Spearman r = 0.341, P = .25), illicit drug dependence (Pearson r = 0.937, P < .001; Spearman r = 0.495, P = .09), and illicit drug dependence or abuse (Pearson r = 0.935, P < .001; Spearman r = 0.401, P = .17) over the same 3-year period, although the tests lacked power to demonstrate statistical significance. A classification approach involving an ensemble of classifiers produced the best performance in accuracy or microaveraged F1 score (0.726; 95% CI, 0.708-0.743). Conclusions and Relevance The correlations obtained in this study suggest that a social media–based approach reliant on supervised machine learning may be suitable for geolocation-centric monitoring of the US opioid epidemic in near real time.
Article
Full-text available
Objective: Prescription medication (PM) misuse and abuse is a major health problem globally, and a number of recent studies have focused on exploring social media as a resource for monitoring nonmedical PM use. Our objectives are to present a methodological review of social media-based PM abuse or misuse monitoring studies, and to propose a potential generalizable, data-centric processing pipeline for the curation of data from this resource. Materials and methods: We identified studies involving social media, PMs, and misuse or abuse (inclusion criteria) from Medline, Embase, Scopus, Web of Science, and Google Scholar. We categorized studies based on multiple characteristics including but not limited to data size; social media source(s); medications studied; and primary objectives, methods, and findings. Results: A total of 39 studies met our inclusion criteria, with 31 (∼79.5%) published since 2015. Twitter has been the most popular resource, with Reddit and Instagram gaining popularity recently. Early studies focused mostly on manual, qualitative analyses, with a growing trend toward the use of data-centric methods involving natural language processing and machine learning. Discussion: There is a paucity of standardized, data-centric frameworks for curating social media data for task-specific analyses and near real-time surveillance of nonmedical PM use. Many existing studies do not quantify human agreements for manual annotation tasks or take into account the presence of noise in data. Conclusion: The development of reproducible and standardized data-centric frameworks that build on the current state-of-the-art methods in data and text mining may enable effective utilization of social media data for understanding and monitoring nonmedical PM use.
Article
Full-text available
Background: Data collection and extraction from noisy text sources such as social media typically rely on keyword-based searching/listening. However, health-related terms are often misspelled in such noisy text sources due to their complex morphology, resulting in the exclusion of relevant data for studies. In this paper, we present a customizable data-centric system that automatically generates common misspellings for complex health-related terms, which can improve the data collection process from noisy text sources. Materials and methods: The spelling variant generator relies on a dense vector model learned from large, unlabeled text, which is used to find semantically close terms to the original/seed keyword, followed by the filtering of terms that are lexically dissimilar beyond a given threshold. The process is executed recursively, converging when no new terms similar (lexically and semantically) to the seed keyword are found. The weighting of intra-word character sequence similarities allows further problem-specific customization of the system. Results: On a dataset prepared for this study, our system outperforms the current state-of-the-art medication name variant generator with best F1-score of 0.69 and F14-score of 0.78. Extrinsic evaluation of the system on a set of cancer-related terms showed an increase of over 67% in retrieval rate from Twitter posts when the generated variants are included. Discussion: Our proposed spelling variant generator has several advantages over the existing spelling variant generators-(i) it is capable of filtering out lexically similar but semantically dissimilar terms, (ii) the number of variants generated is low, as many low-frequency and ambiguous misspellings are filtered out, and (iii) the system is fully automatic, customizable and easily executable. While the base system is fully unsupervised, we show how supervision may be employed to adjust weights for task-specific customizations. Conclusion: The performance and relative simplicity of our proposed approach make it a much-needed spelling variant generation resource for health-related text mining from noisy sources. The source code for the system has been made publicly available for research.
Article
Full-text available
Introduction Prescription medication overdose is the fastest growing drug-related problem in the USA. The growing nature of this problem necessitates the implementation of improved monitoring strategies for investigating the prevalence and patterns of abuse of specific medications. Objectives Our primary aims were to assess the possibility of utilizing social media as a resource for automatic monitoring of prescription medication abuse and to devise an automatic classification technique that can identify potentially abuse-indicating user posts. Methods We collected Twitter user posts (tweets) associated with three commonly abused medications (Adderall®, oxycodone, and quetiapine). We manually annotated 6400 tweets mentioning these three medications and a control medication (metformin) that is not the subject of abuse due to its mechanism of action. We performed quantitative and qualitative analyses of the annotated data to determine whether posts on Twitter contain signals of prescription medication abuse. Finally, we designed an automatic supervised classification technique to distinguish posts containing signals of medication abuse from those that do not and assessed the utility of Twitter in investigating patterns of abuse over time. Results Our analyses show that clear signals of medication abuse can be drawn from Twitter posts and the percentage of tweets containing abuse signals are significantly higher for the three case medications (Adderall®: 23 %, quetiapine: 5.0 %, oxycodone: 12 %) than the proportion for the control medication (metformin: 0.3 %). Our automatic classification approach achieves 82 % accuracy overall (medication abuse class recall: 0.51, precision: 0.41, F measure: 0.46). To illustrate the utility of automatic classification, we show how the classification data can be used to analyze abuse patterns over time. Conclusion Our study indicates that social media can be a crucial resource for obtaining abuse-related information for medications, and that automatic approaches involving supervised classification and natural language processing hold promises for essential future monitoring and intervention tasks.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Full-text available
Food crises imply responses that are not what people and organisations would normally do, if one or more threats (health, economic, etc.) were not present. At an individual level, this motivates individuals to implement coping strategies aimed at adaptation to the threat that has been presented, as well as the reduction of stressful experiences. In this regard, microblogging channels such as Twitter emerge as a valuable resource to access individuals’ expressions of coping. Accordingly, Twitter expressions are generally more natural, spontaneous and heterogeneous — in cognitive, affective and behavioural dimensions — than expressions found on other types of social media (e.g. blogs). Moreover, as a social media channel, it provides access not only to an individual but also to a social level of analysis, i.e. a psychosocial media analysis. To show the potential in this regard, our study analysed Twitter messages produced by individuals during the 2011 EHEC/E. coli bacteria outbreak in Europe, due to contaminated food products. This involved more than 3,100 cases of bloody diarrhoea and 850 of haemolytic uremic syndrome (HUS), and 53 confirmed deaths across the EU. Based on data collected in Spain, the country initially thought to be the source of the outbreak, an initial quantitative analysis considered 11 411 tweets, of which 2099 were further analysed through a qualitative content analysis. This aimed at identifying: 1) the ways of coping expressed during the crisis; and 2) how uncertainty about the contaminated product, expressed through hazard notifications, influenced the former. Results revealed coping expressions as being dynamic, flexible and social, with a predominance of accommodation, information seeking and opposition (e.g. anger) strategies. The latter were more likely during a period of uncertainty, with the opposite being true for strategies relying on the identification of the contaminated product (e.g. avoid consumption/purchase). Implications for food crisis communication and monitoring systems are discussed.
Article
Full-text available
The tragic death of 18-year-old Ryan Haight highlighted the ethical, public health, and youth patient safety concerns posed by illicit online nonmedical use of prescription drugs (NUPM) sourcing, leading to a federal law in an effort to address this concern. Yet despite the tragedy and resulting law, the NUPM epidemic in the United States has continued to escalate and represents a dangerous and growing trend among youth and adolescents. A critical point of access associated with youth NUPM is the Internet. Internet use among this vulnerable patient group is ubiquitous and includes new, emerging, and rapidly developing technologies-particularly social media networking (eg, Facebook and Twitter). These unregulated technologies may pose a potential risk for enabling youth NUPM behavior. In order to address limitations of current regulations and promote online safety, we advocate for legislative reform to specifically address NUPM promotion via social media and other new online platforms. Using more comprehensive and modernized federal legislation that anticipates future online developments is critical in substantively addressing youth NUPM behavior occurring through the Internet.
Conference Paper
Full-text available
Online social networking sites like My Space, Facebook, and Flickr have become a popular way to share and disseminate content. Their massive popularity has led to viral marketing techniques that attempt to spread content, products, and ideas on these sites. However, there is little data publicly available on viral propagation in the real world and few studies have characterized how information spreads over current online social networks. In this paper, we collect and analyze large-scale traces of information dissemination in the Flickr social network. Our analysis, based on crawls of the favorite markings of 2.5 million users on 11 million photos, aims at answering three key questions: (a) how widely does information propagate in the social network? (b) how quickly does information propagate? and (c) what is the role of word-of-mouth exchanges between friends in the overall propagation of information in the network? Contrary to viral marketing "intuition," we find that (a) even popular photos do not spread widely throughout the network, (b) even popular photos spread slowly through the network, and (c) information exchanged between friends is likely to account for over 50% of all favorite-markings, but with a significant delay at each hop. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Article
Full-text available
Surveys are popular methods to measure public perceptions in emergencies but can be costly and time consuming. We suggest and evaluate a complementary "infoveillance" approach using Twitter during the 2009 H1N1 pandemic. Our study aimed to: 1) monitor the use of the terms "H1N1" versus "swine flu" over time; 2) conduct a content analysis of "tweets"; and 3) validate Twitter as a real-time content, sentiment, and public attention trend-tracking tool. Between May 1 and December 31, 2009, we archived over 2 million Twitter posts containing keywords "swine flu," "swineflu," and/or "H1N1." using Infovigil, an infoveillance system. Tweets using "H1N1" increased from 8.8% to 40.5% (R(2) = .788; p<.001), indicating a gradual adoption of World Health Organization-recommended terminology. 5,395 tweets were randomly selected from 9 days, 4 weeks apart and coded using a tri-axial coding scheme. To track tweet content and to test the feasibility of automated coding, we created database queries for keywords and correlated these results with manual coding. Content analysis indicated resource-related posts were most commonly shared (52.6%). 4.5% of cases were identified as misinformation. News websites were the most popular sources (23.2%), while government and health agencies were linked only 1.5% of the time. 7/10 automated queries correlated with manual coding. Several Twitter activity peaks coincided with major news stories. Our results correlated well with H1N1 incidence data. This study illustrates the potential of using social media to conduct "infodemiology" studies for public health. 2009 H1N1-related tweets were primarily used to disseminate information from credible sources, but were also a source of opinions and experiences. Tweets can be used for real-time content analysis and knowledge translation research, allowing health authorities to respond to public concerns.
Article
Full-text available
Nationally endorsed, clinical performance measures are available that allow for quality reporting using electronic health records (EHRs). To our knowledge, how well they reflect actual quality of care has not been studied. We sought to evaluate the validity of performance measures for coronary artery disease (CAD) using an ambulatory EHR. We performed a retrospective electronic medical chart review comparing automated measurement with a 2-step process of automated measurement supplemented by review of free-text notes for apparent quality failures for all patients with CAD from a large internal medicine practice using a commercial EHR. The 7 performance measures included the following: antiplatelet drug, lipid-lowering drug, beta-blocker following myocardial infarction, blood pressure measurement, lipid measurement, low-density lipoprotein cholesterol control, and angiotensin-converting enzyme inhibitor or angiotensin receptor blocker for patients with diabetes mellitus or left ventricular systolic dysfunction. Performance varied from 81.6% for lipid measurement to 97.6% for blood pressure measurement based on automated measurement. A review of free-text notes for cases failing an automated measure revealed that misclassification was common and that 15% to 81% of apparent quality failures either satisfied the performance measure or met valid exclusion criteria. After including free-text data, the adherence rate ranged from 87.5% for lipid measurement and low-density lipoprotein cholesterol control to 99.2% for blood pressure measurement. Profiling the quality of outpatient CAD care using data from an EHR has significant limitations. Changes in how data are routinely recorded in an EHR are needed to improve the accuracy of this type of quality measurement. Validity testing in different settings is required.
Article
While many studies have explored the use of social media and behavioral changes of individuals, few examined the utility of using social media for suicide detection and prevention. The study by Jashinsky et al. identified specific language patterns associated with a set of twelve suicide risk factors. The authors extended these methods to assess the significance of the language used on Twitter for suicide detection. This article quantifies the use of Twitter to express suicide related language, and its potential to detect users at high risk of suicide. The authors searched Twitter for tweets indicative of 12 suicide risk factors. This paper divided Twitter users into two groups: “high risk” and “at risk” based on two of the risk factors (“self-harm” and “prior suicide attempts”) and examined language patterns by computing co-occurrences of terms in tweets which helped identify relationships between suicide risk factors in both groups.
Article
Dimensionality reduction methods are usually applied on molecular dynamics simulations of macromolecules for analysis and visualization purpose. It is normally desired that suitable dimensionality reduction methods could clearly distinguish functionally important states with different conformations for the systems of interest. However, common dimensionality reduction methods for macromolecules simulations, including pre-defined order parameters and collective variables (CVs), principal component analysis (PCA), and time-structure based independent component analysis (t-ICA), only have limited success due to significant key structural information loss. Here, we introduced t‐distributed stochastic neighbor embedding (t-SNE) method as a dimensionality reduction method with minimum structural information loss widely used in bioinformatics for analyses of macromolecules, especially biomacromolecules simulations. It is demonstrated that both one-dimensional (1D) and two-dimensional (2D) models of t-SNE method are superior to distinguish important functional states of a model allosteric protein system for free energy and mechanistic analysis. Projections of the model protein simulations onto 1D and 2D t-SNE surfaces provide both clear visual cues and quantitative information, which is not readily available using other methods, regarding to the transition mechanism between two important functional states of this protein.
Article
Background: This paper goes beyond detecting specific themes within Zika-related chatter on Twitter, to identify the key actors who influence the diffusive process through which some themes become more amplified than others. Methods: We collected all Zika-related tweets during the 3 months immediately after the first U.S. case of Zika. After the tweets were categorized into 12 themes, a cross-section were grouped into weekly datasets, to capture 12 amplifier/user groups, and analyzed by 4 amplification modes: mentions, retweets, talkers, and Twitter-wide amplifiers. Results: We analyzed 3,057,130 tweets in the United States and categorized 4997 users. The most talked about theme was Zika transmission (~58%). News media, public health institutions, and grassroots users were the most visible and frequent sources and disseminators of Zika-related Twitter content. Grassroots users were the primary sources and disseminators of conspiracy theories. Conclusions: Social media analytics enable public health institutions to quickly learn what information is being disseminated, and by whom, regarding infectious diseases. Such information can help public health institutions identify and engage with news media and other active information providers. It also provides insights into media and public concerns, accuracy of information on Twitter, and information gaps. The study identifies implications for pandemic preparedness and response in the digital era and presents the agenda for future research and practice.
Conference Paper
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Article
The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), which contains health-related terms used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers' health information retrieval by enabling consumer-facing health applications to translate between professional language and consumer friendly language. To keep up with the constantly evolving medical knowledge and language use, new terms need to be identified and added to the OAC CHV. User-generated content on social media, including social question and answer (social Q&A) sites, afford us an enormous opportunity in mining consumer health terms. Existing methods of identifying new consumer terms from text typically use ad-hoc lexical syntactic patterns and human review. Our study extends an existing method by extracting n-grams from a social Q&A textual corpus and representing them with a rich set of contextual and syntactic features. Using K-means clustering, our method, simiTerm, was able to identify terms that are both contextually and syntactically similar to the existing OAC CHV terms. We tested our method on social Q&A corpora on two disease domains: diabetes and cancer. Our method outperformed three baseline ranking methods. A post-hoc qualitative evaluation by human experts further validated that our method can effectively identify meaningful new consumer terms on social Q&A.
Article
This paper proposes an improved random forest algorithm for classifying text data. This algorithm is particularly designed for analyzing very high dimensional data with multiple classes whose well-known representative data is text corpus. A novel feature weighting method and tree selection method are developed and synergistically served for making random forest framework well suited to categorize text documents with dozens of topics. With the new feature weighting method for subspace sampling and tree selection method, we can effectively reduce subspace size and improve classification performance without increasing error bound. We apply the proposed method on six text data sets with diverse characteristics. The results have demonstrated that this improved random forests outperformed the popular text classification methods in terms of classification performance.
Article
In order to improve the efficiency of the multi-class classifiers based on support vector machine (SVM), the multi-sphere method was introduced to supervised learning. By training one-class SVM (1-SVM) on the samples class by class, a classifier composed of multiple spheres was obtained. To remove the redundant region in the spheres, a compacted one-vs-rest classifier was used to separate the mixed samples. These two complementary classifiers can be combined into a weighted classifier of one-vs-rest and multi-spheres. The regularization method of the weight factor and other parameters was given based on cross validation. Simulation showed that the novel classifier has higher accuracy with less training time when compared with one-vs-rest classifier, and its decision rate is faster than that of one-vs-one classifier. Consequently, the novel classifier is helpful for solving multi-class problems on large scale systems.
Article
A longitudinal analysis of panel data from users of a popular online social network site, Facebook, investigated the relationship between intensity of Facebook use, measures of psychological well-being, and bridging social capital. Two surveys conducted a year apart at a large U.S. university, complemented with in-depth interviews with 18 Facebook users, provide the study data. Intensity of Facebook use in year one strongly predicted bridging social capital outcomes in year two, even after controlling for measures of self-esteem and satisfaction with life. These latter psychological variables were also strongly associated with social capital outcomes. Self-esteem served to moderate the relationship between Facebook usage intensity and bridging social capital: those with lower self-esteem gained more from their use of Facebook in terms of bridging social capital than higher self-esteem participants. We suggest that Facebook affordances help reduce barriers that lower self-esteem students might experience in forming the kinds of large, heterogeneous networks that are sources of bridging social capital.
Article
Electronic medical records (EMR) provide a unique opportunity for efficient, large-scale clinical investigation in psychiatry. However, such studies will require development of tools to define treatment outcome. Natural language processing (NLP) was applied to classify notes from 127 504 patients with a billing diagnosis of major depressive disorder, drawn from out-patient psychiatry practices affiliated with multiple, large New England hospitals. Classifications were compared with results using billing data (ICD-9 codes) alone and to a clinical gold standard based on chart review by a panel of senior clinicians. These cross-sectional classifications were then used to define longitudinal treatment outcomes, which were compared with a clinician-rated gold standard. Models incorporating NLP were superior to those relying on billing data alone for classifying current mood state (area under receiver operating characteristic curve of 0.85-0.88 v. 0.54-0.55). When these cross-sectional visits were integrated to define longitudinal outcomes and incorporate treatment data, 15% of the cohort remitted with a single antidepressant treatment, while 13% were identified as failing to remit despite at least two antidepressant trials. Non-remitting patients were more likely to be non-Caucasian (p<0.001). The application of bioinformatics tools such as NLP should enable accurate and efficient determination of longitudinal outcomes, enabling existing EMR data to be applied to clinical research, including biomarker investigations. Continued development will be required to better address moderators of outcome such as adherence and co-morbidity.
Article
We study online social networks in which relationships can be either positive (indicating relations such as friendship) or negative (indicating relations such as opposition or antagonism). Such a mix of positive and negative links arise in a variety of online settings; we study datasets from Epinions, Slashdot and Wikipedia. We find that the signs of links in the underlying social networks can be predicted with high accuracy, using models that generalize across this diverse range of sites. These models provide insight into some of the fundamental principles that drive the formation of signed links in networks, shedding light on theories of balance and status from social psychology; they also suggest social computing applications by which the attitude of one user toward another can be estimated from evidence provided by their relationships with other members of the surrounding social network.
Article
Only a few formal assessments of websites with drug-related contents have been carried out. We aimed here at fostering collection and analysis of data from web pages related to information on consumption, manufacture and sales of psychoactive substances. GENERAL METHODS: An 8-language, two-engine, assessment of the information available in a purposeful sample of 1633 unique websites was carried out. A pro-drug and a harm reduction approach were evident, respectively, in 18% and 10% of websites accessed. About 1 in 10 websites offered either psychoactive compounds for sale or detailed data on drugs' synthesis/extraction procedures. Information on a number of psychoactive substances and on unusual drugs' combinations not found in the Medline was elicited. This represents the first review which is both comprehensive and multilingual of the online available information on psychoactive compounds. Health professionals may need to be aware of the web being a new drug resource for information and possibly purchase.
Leveraging Twitter to better identify suicide risk
  • Fodeh
Facebook versus twitter: differences in self-disclosure and trait prediction
  • Jaidka
Utilizing social media to combat opioid addiction epidemic: automatic detection of opioid users from twitter
  • Zhang
Xlnet: generalized autoregressive pretraining for language understanding
  • Yang
Predicting the future with social media
  • S Asur
  • B A Huberman
S. Asur, B.A. Huberman, Predicting the future with social media, IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, 2010, pp. 492-499.
Facebook versus twitter: differences in selfdisclosure and trait prediction
  • K Jaidka
  • S C Guntuku
  • L H Ungar
K. Jaidka, S.C. Guntuku, L.H. Ungar, Facebook versus twitter: differences in selfdisclosure and trait prediction. Twelfth International AAAI Conference on Web and Social Media, 2018.
  • F Schifano
  • P Deluca
  • A Baldacchino
  • T Peltoniemi
  • N Scherbaum
  • M Torrens
F. Schifano, P. Deluca, A. Baldacchino, T. Peltoniemi, N. Scherbaum, M. Torrens, et al., Drugs on the web; the Psychonaut 2002 EU project, in: Progress in Neuro-Psychopharmacology and Biological Psychiatry, vol. 30, 2006, pp. 640-646.
Leveraging Twitter to better identify suicide risk
  • S Fodeh
  • J Goulet
  • C Brandt
  • A.-T Hamada
S. Fodeh, J. Goulet, C. Brandt, A.-T. Hamada, Leveraging Twitter to better identify suicide risk. Medical Informatics and Healthcare, 2017, pp. 1-7.
Utilizing social media to combat opioid addiction epidemic: automatic detection of opioid users from twitter
  • Y Zhang
  • Y Fan
  • Y Ye
  • X Li
  • E L Winstanley
Y. Zhang, Y. Fan, Y. Ye, X. Li, E.L. Winstanley, Utilizing social media to combat opioid addiction epidemic: automatic detection of opioid users from twitter. Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling
  • P Zhou
  • Z Qi
  • S Zheng
  • J Xu
  • H Bao
  • B Xu
P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, B. Xu, Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling, 2016 arXiv preprint arXiv:1611.06639.
Bert: pre-training of deep bidirectional transformers for language understanding
  • J Devlin
  • M.-W Chang
  • K Lee
  • K Toutanova
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding, 2018 arXiv preprint arXiv: 1810.04805.
Xlnet: generalized autoregressive pretraining for language understanding
  • Z Yang
  • Z Dai
  • Y Yang
  • J Carbonell
  • R R Salakhutdinov
  • Q V Le
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R.R. Salakhutdinov, Q.V. Le, Xlnet: generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems, 2019, pp. 5754-5764.