Conference Paper

Information Extraction Models for German Clinical Text

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Tools and resources to automatically process clinical text are very limited, particularly outside the English speaking world. As many relevant patient information within electronic health records are described in unstructured text, this is a clear drawback. In order to slightly overcome this problem, we present information extraction models for German clinical text and make them freely available. The models have been trained on documents of the nephrology domain and do not contain personal information.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... First we ran an automatic annotation. This included the partial automatic labelling of the semantic layer using mEx (Roller et al., 2020) for named entity recognition and relation extraction. Although the schema of mEx partially differs, it includes various of our entities and is a good start to speed up the annotation. ...
Conference Paper
Full-text available
In recent years, machine learning for clinical decision support has gained more and more attention. In order to introduce such applications into clinical practice, a good performance might be essential, however, the aspect of trust should not be underestimated. For the treating physician using such a system and being (legally) responsible for the decision made, it is particularly important to understand the system's recommendation. To provide insights into a model's decision, various techniques from the field of explainability (XAI) have been proposed whose output is often enough not targeted to the domain experts that want to use the model. To close this gap, in this work, we explore how explanations could possibly look like in future. To this end, this work presents a dataset of textual explanations in context of decision support. Within a reader study, human physicians estimated the likelihood of possible negative patient outcomes in the near future and justified each decision with a few sentences. Using those sentences, we created a novel corpus, annotated with different semantic layers. Moreover, we provide an analysis of how those explanations are constructed, and how they change depending on physician, on the estimated risk and also in comparison to an automatic clinical decision support system with feature importance.
Conference Paper
Full-text available
In this work, cross-linguistic span prediction based on contextualized word embedding models is used together with neural machine translation (NMT) to transfer and apply the state-of-the-art models in natural language processing (NLP) to a low-resource language clinical corpus. Two directions are evaluated: (a) English models can be applied to translated texts to subsequently transfer the predicted annotations to the source language and (b) existing high-quality annotations can be transferred beyond translation and then used to train NLP models in the target language. Effectiveness and loss of transmission is evaluated using the German Berlin-Tübingen-Oncology Corpus (BRONCO) dataset with transferred external data from NCBI disease, SemEval-2013 drug-drug interaction (DDI) and i2b2/VA 2010 data. The use of English models for translated clinical texts has always involved attempts to take full advantage of the benefits associated with them (large pre-trained biomedical word embeddings). To improve advances in this area, we provide a general-purpose pipeline to transfer any annotated BRAT or CoNLL format to various target languages. For the entity class medication, good results were obtained with 0.806 F1-score after re-alignment. Limited success occurred in the diagnosis and treatment class with results just below 0.5 F1-score due to differences in annotation guidelines.
Conference Paper
Full-text available
Recent advances in language modeling using recurrent neural networks have made it viable to model language as distributions over characters. By learning to predict the next character on the basis of previous characters, such models have been shown to automatically internalize linguistic concepts such as words, sentences, subclauses and even sentiment. In this paper, we propose to leverage the internal states of a trained character language model to produce a novel type of word embedding which we refer to as contextual string embeddings. Our proposed embeddings have the distinct properties that they (a) are trained without any explicit notion of words and thus fundamentally model words as sequences of characters, and (b) are contextualized by their surrounding text, meaning that the same word will have different embeddings depending on its contextual use. We conduct a comparative evaluation against previous embeddings and find that our embeddings are highly useful for downstream tasks: across four classic sequence labeling tasks we consistently outperform the previous state-of-the-art. In particular, we significantly outperform previous work on English and German named entity recognition (NER), allowing us to report new state-of-the-art F1-scores on the C O NLL03 shared task. We release all code and pre-trained language models in a simple-to-use framework to the research community, to enable reproduction of these experiments and application of our proposed embeddings to other tasks: https://github.com/zalandoresearch/flair
Conference Paper
Full-text available
In this work, we present a syntactic parser specialized for German clinical data. Our model, trained on a small gold standard nephrological dataset, outperforms the default German model of Stanford CoreNLP in parsing nephrology documents in respect to LAS (74.64 vs. 42.15). Moreover, retraining the default model via domain-adaptation to nephrology leads to further improvements on nephrology data (78.96). We also show that our model performs well on fictitious clinical data from other subdo-mains (69.69).
Article
Full-text available
Background: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main body: We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion: We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Conference Paper
Full-text available
In this work we present a fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases. The annotation schema is driven by the needs of our clinical partners and the linguistic aspects of German language. In order to generate annotations within a short period, the work also presents a semi-automatic annotation which uses additional sources of knowledge such as UMLS, to pre-annotate concepts in advance. The presented schema will be used to apply novel techniques from natural language processing and machine learning to support doctors treating their patients by improved information access from unstructured German texts.
Conference Paper
Full-text available
An important subtask in clinical text mining tries to identify whether a clinical finding is expressed as present, absent or unsure in a text. This work presents a system for detecting mentions of clinical findings that are negated or just speculated. The system has been applied to two different types of German clinical texts: clinical notes and discharge summaries. Our approach is built on top of NegEx, a well known algorithm for identifying non-factive mentions of medical findings. In this work, we adjust a previous adaptation of NegEx to German and evaluate the system on our data to detect negation and speculation. The results are compared to a baseline algorithm and are analyzed for both types of clinical documents. Our system achieves an F1-Score above 0.9 on both types of reports.
Article
Full-text available
Clinical narratives are typically produced under time pressure, which incites the use of abbreviations and acronyms. To expand such short forms in a correct way eases text comprehension and further semantic processing. We propose a completely unsupervised and data-driven algorithm for the resolution of non-lexicalised and potentially ambiguous abbreviations. Based on the lookup of word bigrams and unigrams extracted from a corpus of 30,000 pseudonymised cardiology reports in German, our method achieved an F 1 score of 0.91, evaluated with a test set of 200 text excerpts. The results are statistically significantly better (p < 0.001) than a baseline approach and show that a simple and domain-independent strategy may be enough to resolve abbreviations when a large corpus of similar texts is available. Further work is needed to combine this strategy with sentence and abbreviation detection modules, to adapt it to acronym resolution and to evaluate it with different datasets.
Article
Full-text available
Background With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. Objectives In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. Methods A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. Results A total of 1,917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications (including disease areas, drug-related studies, and clinical workflow optimizations). Conclusions Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Article
Full-text available
The second track of the 2014 i2b2/UTHealth Natural Language Processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems. Copyright © 2015. Published by Elsevier Inc.
Conference Paper
Full-text available
This paper reports on the 3rd CLEFeHealth evaluation lab, which continues our evaluation resource building activities for the medical domain. In this edition of the lab, we focus on easing patients and nurses in authoring, understanding, and accessing eHealth information. The 2015 CLEFeHealth evaluation lab was structured into two tasks, focusing on evaluating methods for information extraction (IE) and information retrieval (IR). The IE task introduced two new challenges. Task 1a focused on clinical speech recognition of nursing handover notes; Task 1b focused on clinical named entity recognition in languages other than English, specifically French. Task 2 focused on the retrieval of health information to answer queries issued by general consumers seeking information to understand their health symptoms or conditions. The number of teams registering their interest was 47 in Tasks 1 (2 teams in Task 1a and 7 teams in Task 1b) and 53 in Task 2 (12 teams) for a total of 20 unique teams. The best system recognized 4, 984 out of 6, 818 test words correctly and generated 2, 626 incorrect words (i.e., \(38.5 \%\) error) in Task 1a; had the F-measure of 0.756 for plain entity recognition, 0.711 for normalized entity recognition, and 0.872 for entity normalization in Task 1b; and resulted in P@10 of 0.5394 and nDCG@10 of 0.5086 in Task 2. These results demonstrate the substantial community interest and capabilities of these systems in addressing challenges faced by patients and nurses. As in previous years, the organizers have made data and tools available for future research and development.
Article
Full-text available
We translated an existing English negation lexicon (NegEx) to Swedish, French, and German and compared the lexicon on corpora from each language. We observed Zipf's law for all languages, i.e., a few phrases occur a large number of times, and a large number of phrases occur fewer times. Negation triggers "no" and "not" were common for all languages; however, other triggers varied considerably. The lexicon is available in OWL and RDF format and can be extended to other languages. We discuss the challenges in translating negation triggers to other languages and issues in representing multilingual lexical knowledge.
Article
Full-text available
Background: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. Objective: To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text. Materials and methods: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. Results: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. Conclusions: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.
Article
Full-text available
The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records focused on the identification of medications, their dosages, modes (routes) of administration, frequencies, durations, and reasons for administration in discharge summaries. This challenge is referred to as the medication challenge. For the medication challenge, i2b2 released detailed annotation guidelines along with a set of annotated discharge summaries. Twenty teams representing 23 organizations and nine countries participated in the medication challenge. The teams produced rule-based, machine learning, and hybrid systems targeted to the task. Although rule-based systems dominated the top 10, the best performing system was a hybrid. Of all medication-related fields, durations and reasons were the most difficult for all systems to detect. While medications themselves were identified with better than 0.75 F-measure by all of the top 10 systems, the best F-measure for durations and reasons were 0.525 and 0.459, respectively. State-of-the-art natural language processing systems go a long way toward extracting medication names, dosages, modes, and frequencies. However, they are limited in recognizing duration and reason fields and would benefit from future research.
Article
Full-text available
Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
Article
Full-text available
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.
Article
Full-text available
MetaMap is a widely available program providing access to the concepts in the unified medical language system (UMLS) Metathesaurus from biomedical text. This study reports on MetaMap's evolution over more than a decade, concentrating on those features arising out of the research needs of the biomedical informatics community both within and outside of the National Library of Medicine. Such features include the detection of author-defined acronyms/abbreviations, the ability to browse the Metathesaurus for concepts even tenuously related to input text, the detection of negation in situations in which the polarity of predications is important, word sense disambiguation (WSD), and various technical and algorithmic features. Near-term plans for MetaMap development include the incorporation of chemical name recognition and enhanced WSD.
Article
Full-text available
Medication information is one of the most important types of clinical data in electronic medical records. It is critical for healthcare safety and quality, as well as for clinical research that uses electronic medical record data. However, medication data are often recorded in clinical notes as free-text. As such, they are not accessible to other computerized applications that rely on coded data. We describe a new natural language processing system (MedEx), which extracts medication information from clinical notes. MedEx was initially developed using discharge summaries. An evaluation using a data set of 50 discharge summaries showed it performed well on identifying not only drug names (F-measure 93.2%), but also signature information, such as strength, route, and frequency, with F-measures of 94.5%, 93.9%, and 96.0% respectively. We then applied MedEx unchanged to outpatient clinic visit notes. It performed similarly with F-measures over 90% on a set of 25 clinic visit notes.
Article
Full-text available
Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms. The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain. The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of four diseases were compared with the analysis of a panel of three physicians to determine recall and precision. Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision.
Article
Full-text available
We compare the performance of two part-of-speech taggers trained on a German newspaper corpus for mixed types of medical documents. TnT, a tagger based on a statistical language model, outperforms Brill's rule-based tagger, and supplied with additional lexicon resources matches state-of-the-art performance figures (close to 97% accuracy) on the medical corpus. We explain this unexpected result by focusing on the statistically significant part-of-speech type overlap between the newspaper training set and the medical test set. At least at that level, sublanguage differences seem to vanish. Thus, statistical off-the-shelf part-of-speech taggers can immediately be reused for medical language processing
Book
Für ein großes Fach wie die Chirurgie sind Überblick und eine gute Didaktik die besten Helfer. Genau das bietet Siewert/Stein, Chirurgie, in der 9. Auflage. Einige Kapitel wurden neu geschrieben, die bewährten Mittel zum Verstehen und Lernen beibehalten: Wichtig- und Cave-Hervorhebungen, Übersichten, Tabellen und Zusammenfassungen nach jedem wichtigen Krankheitsbild bilden zweckmäßige Lernhäppchen. In den Praxisboxen werden operative und diagnostische Vorgehen genau erklärt und im Fallquiz können Sie die theoretischen Inhalte praktisch durchdenken. Über 1500 Abbildungen, darunter Röntgenbilder, CTs, MRTs, Ultraschallbilder und Fotos direkt aus Ambulanz und OP zeigen, worum es in der Chirurgie geht.
The automatic processing of non-English clinical documents is massively hampered by the lack of publicly available medical language resources for training, testing and evaluating NLP components. We suggest sharing statistical models derived from access-protected clinical documents as a reasonable substitute and provide solutions for sentence splitting, tokenization and POS tagging of German clinical texts. These three components were trained on the confidential FRAMED corpus, a non-sharable collection of various German-language clinical document types. The models derived therefrom outperform alternative components from OPENNLP and the Stanford POS tagger, also trained on FRAMED.
Conference Paper
Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have difficulties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-specific idioms. This paper reports on an evaluation lab with an aim to support the continuum of care by developing methods and resources that make clinical reports in English easier to understand for patients, and which helps them in finding information related to their condition. This ShARe/CLEFeHealth2013 lab offered student mentoring and shared tasks: identification and normalisation of disorders (1a and 1b) and normalisation of abbreviations and acronyms (2) in clinical reports with respect to terminology standards in healthcare as well as information retrieval (3) to address questions patients may have when reading clinical reports. The focus on patients’ information needs as opposed to the specialised information needs of physicians and other healthcare workers was the main feature of the lab distinguishing it from previous shared tasks. De-identified clinical reports for the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Task 3 were from the Internet and originated from the Khresmoi project. Task 1 annotations originated from the ShARe annotations. For Tasks 2 and 3, new annotations, queries, and relevance assessments were created. 64, 56, and 55 people registered their interest in Tasks 1, 2, and 3, respectively. 34 unique teams (3 members per team on average) participated with 22, 17, 5, and 9 teams in Tasks 1a, 1b, 2 and 3, respectively. The teams were from Australia, China, France, India, Ireland, Republic of Korea, Spain, UK, and USA. Some teams developed and used additional annotations, but this strategy contributed to the system performance only in Task 2. The best systems had the F1 score of 0.75 in Task 1a; Accuracies of 0.59 and 0.72 in Tasks 1b and 2; and Precision at 10 of 0.52 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.
Conference Paper
We introduce the brat rapid annotation tool (BRAT), an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annotation for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques. We discuss several case studies of real-world annotation projects using pre-release versions of BRAT and present an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15% decrease in total annotation time. BRAT is available under an open-source license from: http://brat.nlplab.org
Article
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the kappa statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that kappa approaches these measures as the number of negative cases grows large. Positive specific agreement-or the equivalent F-measure-may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies.
Clef ehealth 2018 multilingual information extraction task overview: Icd10 coding of death certificates in french, hungarian and italian
  • A Névéol
  • A Robert
  • F Grippo
  • C Morgand
  • C Orsi
  • L Pelikan
  • L Ramadier
  • G Rey
  • P Zweigenbaum
Névéol, A., Robert, A., Grippo, F., Morgand, C., Orsi, C., Pelikan, L., Ramadier, L., Rey, G., Zweigenbaum, P.: Clef ehealth 2018 multilingual information extraction task overview: Icd10 coding of death certificates in french, hungarian and italian. In: CLEF (Working Notes) (2018)
Overview of the clef ehealth 2019 multilingual information extraction
  • A Dörendahl
  • N Leich
  • B Hummel
  • G Schönfelder
  • B Grune
Dörendahl, A., Leich, N., Hummel, B., Schönfelder, G., Grune, B.: Overview of the clef ehealth 2019 multilingual information extraction (2019)
  • F Borchert
  • C Lohr
  • L Modersohn
  • T Langer
  • M Follmann
  • J P Sachs
  • U Hahn
  • M.-P Schapranow
Borchert, F., Lohr, C., Modersohn, L., Langer, T., Follmann, M., Sachs, J.P., Hahn, U., Schapranow, M.-P.: Ggponc: A corpus of german medical text with rich metadata based on clinical practice guidelines. arXiv preprint arXiv:2007.06400 (2020)
Sharing copies of synthetic clinical corpora without physical distribution-a case study to get around iprs and privacy constraints featuring the german jsyncc corpus
  • C Lohr
  • S Buechel
  • U Hahn
Lohr, C., Buechel, S., Hahn, U.: Sharing copies of synthetic clinical corpora without physical distribution-a case study to get around iprs and privacy constraints featuring the german jsyncc corpus. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Linguistic Modeling for Text Analytic Tasks for German Clinical Texts
  • L Seiffe
Seiffe, L.: Linguistic Modeling for Text Analytic Tasks for German Clinical Texts. Master's thesis, TU Berlin (2018)
Universal dependencies v1: A multilingual treebank collection
  • J Nivre
  • M.-C De Marneffe
  • F Ginter
  • Y Goldberg
  • J Hajic
  • C D Manning
  • R Mcdonald
  • S Petrov
  • S Pyysalo
  • N Silveira
Nivre, J., De Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., et al.: Universal dependencies v1: A multilingual treebank collection. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pp. 1659-1666 (2016)
  • P Bojanowski
  • E Grave
  • A Joulin
  • T Mikolov
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
What's in a drug name?: A rose might smell as sweet by any other name, but the process of naming the growing number of medications has become quite complex and serious
  • J Y Wick
Wick, J.Y.: What's in a drug name?: A rose might smell as sweet by any other name, but the process of naming the growing number of medications has become quite complex and serious. Journal of the American Pharmacists Association 44(1), 12-14 (2004)
Extending the negex lexicon for multiple languages
  • W W Chapman
  • D Hilert
  • S Velupillai
  • M Kvist
  • M Skeppstedt
  • B E Chapman
  • M Conway
  • M Tharp
  • D L Mowery
  • L Deleger