Article

Pharmacovigilance Using Textual Data: The Need to Go Deeper and Wider into the Con(text)

Authors:
  • Indraprastha Institute of Information Technology Delhi, India; All India Institute of Medical Sciences, New Delhi, India
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Chapter
Text data in digital format (e.g., physician’s notes, pathology reports, notes accompanying radiographic images, operational notes, text from digital content) are a commonly underutilized resource in the pharmaceutical industry. Supervised and unsupervised natural language processing techniques have been used in the past to represent medical knowledge and medical evidence generation. However, challenges remain with such techniques, including accuracy, ease of transfer to new text-based datasets, and the capability for integration with other types of structured data (e.g., genomics, structures, targets, vocabularies). Graph-based natural language processing approaches have the potential to address these challenges. With this chapter, we present an overview of graph-based NLP techniques used in the pharmaceutical industry. Borrowing from the strengths of existing approaches outside the pharmaceutical industry, we propose recommendations for graph-based NLP techniques for pharmaceutical industry use cases.
Article
Full-text available
Background: Adverse drug reactions (ADRs) are a well-recognized public health problem and a major cause of death and hospitalization in developed countries. The safety of a new drug cannot be established until it has been on the market for several years. Keeping drug reactions under surveillance through pharmacovigilance systems is indispensable. However, underreporting is a major issue that undermines the effectiveness of spontaneous reports. Our work presents a systematic review on the use of information systems for the promotion of ADR reporting. The aim of this work is to describe the state of the art information systems used to promote adverse drug reaction reporting. Methods: A systematic review was performed with quantitative analysis of studies describing or evaluating the use of information systems to promote adverse drug reaction reporting. Studies with data related to the number of ADRs reported before and after each intervention and the follow-up period were included in the quantitative analysis. Results: From a total of 3865 articles, 33 articles were included in the analysis; these articles described 29 different projects. Most of the projects were on a regional scale (62 %) and were performed in a hospital context (52 %). A total of 76 % performed passive promotion of ADR reporting and used web-based software (55 %). A total of 72 % targeted healthcare professionals and 24 % were oriented to patient ADR reporting. We performed a meta-analysis of 7 of the 29 projects to calculate the aggregated measure of the ADR reporting increase, which had an overall measure of 2.1 (indicating that the interventions doubled the number of ADRs reported). Conclusions: We found that most of the projects performed passive promotion of ADR reporting (i.e., facilitating the process). They were developed in hospitals and were tailored to healthcare professionals. These interventions doubled the number of ADR reports. We believe that it would be useful to develop systems to assist healthcare professionals with completing ADR reporting within electronic health records because this approach seems to be an efficient method to increase the ADR reporting rate. When this approach is not possible, it is essential to have a tool that is easily accessible on the web to report ADRs. This tool can be promoted by sending emails or through the inclusion of direct hyperlinks on healthcare professionals' desktops.
Article
Full-text available
Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media.
Article
Full-text available
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
Article
Full-text available
With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into a deidentified patient-feature matrix encoded using medical terminologies. We demonstrate the use of the resulting high-throughput data for detecting drug-adverse event associations and adverse events associated with drug-drug interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.Clinical Pharmacology & Therapeutics (2013); advance online publication 10 April 2013. doi:10.1038/clpt.2013.47.
Article
Full-text available
Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
Conference Paper
Full-text available
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called ldquoImageNetrdquo, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.
Article
The goal of pharmacovigilance is to detect, monitor, characterize and prevent adverse drug events (ADEs) with pharmaceutical products. This article is a comprehensive structured review of recent advances in applying natural language processing (NLP) to electronic health record (EHR) narratives for pharmacovigilance. We review methods of varying complexity and problem focus, summarize the current state-of-the-art in methodology advancement, discuss limitations and point out several promising future directions. The ability to accurately capture both semantic and syntactic structures in clinical narratives becomes increasingly critical to enable efficient and accurate ADE detection. Significant progress has been made in algorithm development and resource construction since 2000. Since 2012, statistical analysis and machine learning methods have gained traction in automation of ADE mining from EHR narratives. Current state-of-the-art methods for NLP-based ADE detection from EHRs show promise regarding their integration into production pharmacovigilance systems. In addition, integrating multifaceted, heterogeneous data sources has shown promise in improving ADE detection and has become increasingly adopted. On the other hand, challenges and opportunities remain across the frontier of NLP application to EHR-based pharmacovigilance, including proper characterization of ADE context, differentiation between off- and on-label drug-use ADEs, recognition of the importance of polypharmacy-induced ADEs, better integration of heterogeneous data sources, creation of shared corpora, and organization of shared-task challenges to advance the state-of-the-art.
Article
Objective: Widespread application of clinical natural language processing (NLP) systems requires taking existing NLP systems and adapting them to diverse and heterogeneous settings. We describe the challenges faced and lessons learned in adapting an existing NLP system for measuring colonoscopy quality. Materials and methods: Colonoscopy and pathology reports from 4 settings during 2013-2015, varying by geographic location, practice type, compensation structure, and electronic health record. Results: Though successful, adaptation required considerably more time and effort than anticipated. Typical NLP challenges in assembling corpora, diverse report structures, and idiosyncratic linguistic content were greatly magnified. Discussion: Strategies for addressing adaptation challenges include assessing site-specific diversity, setting realistic timelines, leveraging local electronic health record expertise, and undertaking extensive iterative development. More research is needed on how to make it easier to adapt NLP systems to new clinical settings. Conclusions: A key challenge in widespread application of NLP is adapting existing systems to new clinical settings.
Article
Adverse drug events cause substantial morbidity and mortality and are often discovered after a drug comes to market. We hypothesized that Internet users may provide early clues about adverse drug events via their online information-seeking. We conducted a large-scale study of Web search log data gathered during 2010. We pay particular attention to the specific drug pairing of paroxetine and pravastatin, whose interaction was reported to cause hyperglycemia after the time period of the online logs used in the analysis. We also examine sets of drug pairs known to be associated with hyperglycemia and those not associated with hyperglycemia. We find that anonymized signals on drug interactions can be mined from search logs. Compared to analyses of other sources such as electronic health records (EHR), logs are inexpensive to collect and mine. The results demonstrate that logs of the search activities of populations of computer users can contribute to drug safety surveillance.
Article
In this paper we describe an algorithm called ConText for determining whether clinical conditions mentioned in clinical reports are negated, hypothetical, historical, or experienced by someone other than the patient. The algorithm infers the status of a condition with regard to these properties from simple lexical clues occurring in the context of the condition. The discussion and evaluation of the algorithm presented in this paper address the questions of whether a simple surface-based approach which has been shown to work well for negation can be successfully transferred to other contextual properties of clinical conditions, and to what extent this approach is portable among different clinical report types. In our study we find that ConText obtains reasonable to good performance for negated, historical, and hypothetical conditions across all report types that contain such conditions. Conditions experienced by someone other than the patient are very rarely found in our report set. A comprehensive solution to the problem of determining whether a clinical condition is historical or recent requires knowledge above and beyond the surface clues picked up by ConText.