Chapter

Medical Entity Linking in Laypersons’ Language

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Due to the vast amount of health-related data on the Internet, a trend toward digital health literacy is emerging among laypersons. We hypothesize that providing trustworthy explanations of informal medical terms in social media can improve information quality. Entity linking (EL) is the task of associating terms with concepts (entities) in the knowledge base. The challenge with EL in lay medical texts is that the source texts are often written in loose and informal language. We propose an end-to-end entity linking approach that involves identifying informal medical terms, normalizing medical concepts according to SNOMED-CT, and linking entities to Wikipedia to provide explanations for laypersons.KeywordsMedical entity linkingMedical concept normalizationNamed entity recognition

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Many people share information in social media or forums, like food they eat, sports activities they do or events which have been visited. This also applies to information about a person's health status. Information we share online unveils directly or indirectly information about our lifestyle and health situation and thus provides a valuable data resource. If we can make advantage of that data, applications can be created that enable e.g. the detection of possible risk factors of diseases or adverse drug reactions of medications. However, as most people are not medical experts, language used might be more descriptive rather than the precise medical expression as medics do. To detect and use those relevant information, laymen language has to be translated and/or linked to the corresponding medical concept. This work presents baseline data sources in order to address this challenge for German. We introduce a new data set which annotates medical laymen and technical expressions in a patient forum, along with a set of medical synonyms and definitions, and present first baseline results on the data.
Conference Paper
Full-text available
In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a health-related entity mention in a free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This is a challenging task since medical terminology is very different when coming from health care professionals or from the general public in the form of social media texts. We approach it as a sequence learning problem with powerful neural networks such as recurrent neural networks and contextual-ized word representation models trained to obtain semantic representations of social media expressions. Our experimental evaluation over three different benchmarks shows that neural architectures leverage the semantic meaning of the entity mention and significantly outper-form an existing state of the art models.
Article
Full-text available
The “Psychiatric Treatment Adverse Reactions” (PsyTAR) dataset contains patients’ expression of effectiveness and adverse drug events associated with psychiatric medications. The PsyTAR was generated in four phases. In the first phase, a sample of 891 drugs reviews posted by patients on an online healthcare forum, “askapatient.com”, was collected for four psychiatric drugs: Zoloft, Lexapro, Cymbalta, and Effexor XR. For each drug review, patient demographic information, duration of treatment, and satisfaction with the drugs were reported. In the second phase, sentence classification, drug reviews were split to 6009 sentences, and each sentence was labeled for the presence of Adverse Drug Reaction (ADR), Withdrawal Symptoms (WDs), Sign/Symptoms/Illness (SSIs), Drug Indications (DIs), Drug Effectiveness (EF), Drug Infectiveness (INF), and Others (not applicable). In the third phases, entities including ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792 mentions) were identified and extracted from the sentences. In the four phases, all the identified entities were mapped to the corresponding UMLS Metathesaurus concepts (916) and SNOMED CT concepts (755). In this phase, qualifiers representing severity and persistency of ADRs, WDs, SSIs, and DIs (e.g., mild, short term) were identified. All sentences and identified entities were linked to the original post using IDs (e.g., Zoloft.1, Effexor.29, Cymbalta.31). The PsyTAR dataset can be accessed via Online Supplement #1 under the CC BY 4.0 Data license. The updated versions of the dataset would also be accessible in https://sites.google.com/view/pharmacovigilanceinpsychiatry/home.
Article
Full-text available
Text mining of scientific libraries and social media has already proven itself as a reliable tool for drug repurposing and hypothesis generation. The task of mapping a disease mention to a concept in a controlled vocabulary, typically to the standard thesaurus in the Unified Medical Language System (UMLS), is known as medical concept normalization. This task is challenging due to the differences in the use of medical terminology between health care professionals and social media texts coming from the lay public. To bridge this gap, we use sequence learning with recurrent neural networks and semantic representation of one- or multi-word expressions: we develop end-to-end architectures directly tailored to the task, including bidirectional Long Short-Term Memory, Gated Recurrent Units with an attention mechanism, and additional semantic similarity features based on UMLS. Our evaluation against a standard benchmark shows that recurrent neural networks improve results over an effective baseline for classification based on convolutional neural networks. A qualitative examination of mentions discovered in a dataset of user reviews collected from popular online health information platforms as well as a quantitative evaluation both show improvements in the semantic representation of health-related expressions in social media.
Article
Full-text available
The internet has revolutionised the ways in which patients acquire medical information, a development which has clearly been welcomed by patients: seeking out health information online is now the third most popular activity after internet searches and e-mail (Timimi 2012). However, it has led to concerns about the quality of the information, the ability of lay people to understand it (Gerber/Eiser 2001) as well as potential cyberchondria (Starcevic/Berle 2013). In light of these conflicting perspectives, this paper examines one such source of online information, namely, the patient forum where patients communicate with other patients about a particular medical condition. Although doctor-patient communication in the clinical situation has been extensively researched, little is known about how patient-patient communication is managed in online situations such as patient forums. The purpose of this paper is to contribute to research in that relatively un-researched area by examining how patients manage relational and informational aspects of communication in online patient forums. Whilst a typical interactional structure of the patient forum exchange is question and answer, we focus on responses to questions on patient forums.This paper reports on the findings of a thematic analysis (Braun/Clarke 2006) of an online thyroid disease patient forum, investigating how interpersonal aspects are negotiated where patients share condition-related knowledge. We identify themes that relate both to informational and relational aspects as well as themes that fit under a new category which we call ‘info-relational’ as it subsumes informational and relational elements. We discuss a number of theoretical implications, which are valuable as existing health communication models and understandings of patient expertise have yet to catch up with the effects of new media such as online patient forums.
Article
Full-text available
Health communication research and guidelines often recommend that medical terminology be avoided when communicating with patients due to their limited understanding of medical terms. However, growing numbers of e-patients use the Internet to equip themselves with specialized biomedical knowledge that is couched in medical terms, which they then share on participatory media, such as online patient forums. Given possible discrepancies between preconceptions about the kind of language that patients can understand and the terms they may actually know and use, the purpose of this paper was to investigate medical terminology used by patients in online patient forums. Using data from online patient-patient communication where patients communicate with each other without expert moderation or intervention, we coded two data samples from two online patient forums dedicated to thyroid issues. Previous definitions of medical terms (dichotomized into technical and semi-technical) proved too rudimentary to encapsulate the types of medical terms the patients used. Therefore, using an inductive approach, we developed an analytical framework consisting of five categories of medical terms: dictionary-defined medical terms, co-text-defined medical terms, medical initialisms, medication brand names and colloquial technical terms. The patients in our data set used many medical terms from all of these categories. Our findings suggest the value of a situated, condition-specific approach to health literacy that recognizes the vertical kind of knowledge that patients with chronic diseases may have. We make cautious recommendations for clinical practice, arguing for an adaptive approach to medical terminology use with patients. © 2015 The Authors. Health Expectations Published by John Wiley & Sons Ltd.
Article
We consider the task of Medical Concept Normalization (MCN) which aims to map informal medical phrases such as “loosing weight” to formal medical concepts, such as “Weight loss”. Deep learning models have shown high performance across various MCN datasets containing small number of target concepts along with adequate number of training examples per concept. However, scaling these models to millions of medical concepts entails the creation of much larger datasets which is cost and effort intensive. Recent works have shown that training MCN models using automatically labeled examples extracted from medical knowledge bases partially alleviates this problem. We extend this idea by computationally creating a distant dataset from patient discussion forums. We extract informal medical phrases and medical concepts from these forums using a synthetically trained classifier and an off-the-shelf medical entity linker respectively. We use pretrained sentence encoding models to find the k-nearest phrases corresponding to each medical concept. These mappings are used in combination with the examples obtained from medical knowledge bases to train an MCN model. Our approach outperforms the previous state-of-the-art by 15.9% and 17.1% classification accuracy across two datasets while avoiding manual labeling.
Conference Paper
Automatically recognising medical concepts mentioned in social media messages (e.g. tweets) enables several applications for enhancing health quality of people in a community, e.g. real-time monitoring of infectious diseases in population. However, the discrepancy between the type of language used in social media and medical ontologies poses a major challenge. Existing studies deal with this challenge by employing techniques, such as lexical term matching and statistical machine translation. In this work, we handle the medical concept normalisation at the semantic level. We investigate the use of neural networks to learn the transition between layman’s language used in social media messages and formal medical language used in the descriptions of medical concepts in a standard ontology. We evaluate our approaches using three different datasets, where social media texts are extracted from Twitter messages and blog posts. Our experimental results show that our proposed approaches significantly and consistently outperform existing effective baselines, which achieved state-of-the-art performance on several medical concept normalisation tasks, by up to 44%.
Article
Previous studies have shown that health reports in social media, such as DailyStrength and Twitter, have potential for monitoring health conditions (e.g. adverse drug reactions, infectious diseases) in particular communities. However, in order for a machine to understand and make inferences on these health conditions, the ability to recognise when laymen's terms refer to a particular medical concept (i.e.\ text normalisation) is required. To achieve this, we propose to adapt an existing phrase-based machine translation (MT) technique and a vector representation of words to map between a social media phrase and a medical concept. We evaluate our proposed approach using a collection of phrases from tweets related to adverse drug reactions. Our experimental results show that the combination of a phrase-based MT technique and the similarity between word vector representations outperforms the baselines that apply only either of them by up to 55%.
Article
CSIRO Adverse Drug Event Corpus (CADEC) is a new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs). The corpus is sourced from posts on social media, and contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules. Annotations contain mentions of concepts such as drugs, adverse effects, symptoms, and diseases linked to their corresponding concepts in controlled vocabularies, i.e., SNOMED Clinical Terms and MedDRA. The quality of the annotations is ensured by annotation guidelines, multi-stage annotations, measuring inter-annotator agreement, and final review of the annotations by a clinical terminologist. This corpus is useful for studies in the area of information extraction, or more generally text mining, from social media to detect possible adverse drug reactions from direct patient reports. The corpus is publicly available.
Allowing patients direct access to their electronic health record (EHR) notes has been shown to enhance medical understanding and may improve healthcare management and outcome. However, EHR notes contain medical terms, shortened forms, complex disease and medication names, and other domain specific jargon that make them difficult for patients to fathom. In this paper, we present a BioNLP system, NoteAid, that automatically recognizes medical concepts and links these concepts with consumer oriented, simplified definitions from external resources. We conducted a pilot evaluation for linking EHR notes through NoteAid to three external knowledge resources: MedlinePlus, the Unified Medical Language System (UMLS), and Wikipedia. Our results show that Wikipedia significantly improves EHR note readability. Preliminary analyses show that MedlinePlus and the UMLS need to improve both content readability and content coverage for consumer health information. A demonstration version of fully functional NoteAid is available at http://clinicalnotesaid.org.
A clinical terminology is essential for Electronic Health records. It represents clinical information input into clinical IT systems by clinicians in a machine-readable manner. Use of a Clinical Terminology, implemented within a clinical information system, will enable the delivery of many patient health benefits including electronic clinical decision support, disease screening and enhanced patient safety. For example, it will help reduce medication-prescribing errors, which are currently known to kill or injure many citizens. It will also reduce clinical administration effort and the overall costs of healthcare.