PreprintPDF Available

GERNERMED -- An Open German Medical NER Model


Abstract and Figures

BACKGROUND Data mining in the field of medical data analysis often needs to rely solely on processing of unstructured data to retrieve relevant data. For German NLP, no open medical neural named entity recognition (NER) model has been published prior to this work. A major issue can be attributed to the lack of German training data. OBJECTIVE We develop a novel German medical NER model for public access. In order to bypass legal restrictions due to potential data leaks through model analysis, we do not make use of internal, proprietary datasets. METHODS The underlying German dataset is retrieved by translation and word alignment of a public English dataset. The dataset serves as foundation for model training and evaluation. RESULTS The obtained dataset consists of 8599 sentences including 30233 annotations. The model achieves an averaged f1 score of 0.82 on the test set after training across seven different NER types. The model is publicly available. CONCLUSIONS We demonstrate the feasibility of training a German medical NER model by the exclusive use of public training data. The sample code and the statistical model are available on GitHub.
Content may be subject to copyright.
GERNERMED - An Open German Medical
NER Model
Johann Frei Frank Kramer
September 2021
The current state of adoption of well-structured electronic health records
and integration of digital methods for storing medical patient data in
structured formats can often considered as inferior compared to the use
of traditional, unstructured text based patient data documentation. Data
mining in the field of medical data analysis often needs to rely solely on
processing of unstructured data to retrieve relevant data. In natural lan-
guage processing (NLP), statistical models have been shown successful
in various tasks like part-of-speech tagging, relation extraction (RE) and
named entity recognition (NER).
In this work, we present GERNERMED, the first open, neural NLP
model for NER tasks dedicated to detect medical entity types in German
text data. Here, we avoid the conflicting goals of protection of sensitive
patient data from training data extraction and the publication of the
statistical model weights by training our model on a custom dataset that
was translated from publicly available datasets in foreign language by a
pretrained neural machine translation model.
The sample code and the statistical model is available at:
1 Introduction
Despite continuous efforts to transform the storage and processing of medical
data in healthcare systems into a framework of machine-readable highly struc-
tured data, implementation designs that aim to fulfill such requirements are
only slowly gaining traction in the clinical healthcare environment. In addition
to common technical challenges, physicians tend to bypass or completely avoid
inconvenient data input interfaces, which enforce structured data formats, by
encoding relevant information as free-form unstructured text.
Electronic data capturing systems are developed in order to improve the
situation of structured data capturing, yet their primary focus lies on clinical
studies and the involvement of these systems needs to be designed in early stages
arXiv:2109.12104v1 [cs.CL] 24 Sep 2021
and requires active software management and maintenance. Such electronic data
capturing solutions are commonly considered in the context of clinical research
but are largely omitted in non research-centric healthcare services.
In the light of the rise of data mining and big data analysis, the emerging
importance of large scale data acquisition and collection for finding and un-
derstanding novel relationships of disease, disease-indicating biomarkers, drug
effects and other input variables, induces additional pressure on finding new pos-
sible data sources. While new datasets can be designed and created for specific
use cases, the amount of obtained data might be very limited and not sufficient
for modern data-driven methods. Furthermore, such data collection efforts can
turn out as rather inefficient in terms of time and work involved in creating new
datasets with respect to the number of acquired data samples.
In contrast, unstructured data of sources from legacy systems and non
research-centric healthcare, referred to as second use, offer a potential alterna-
tive. However, techniques for information extraction and retrieval, mainly from
the NLP domain, needs to be applied to transform raw data into structured
While the availability of existing NLP models in English, and other non NLP-
based techniques, for medical use cases is focus of active research, the situation
of medical NLP models for non English languages is less satisfying. Since the
performance of an NLP model often depends on its dedicated target language,
most models cannot be shared and reused easily on different languages, but
requires re-training on new data from the desired target language.
In particular, for the case of detection of entities like prescribed drugs and
level or frequency of dosage from German medical documents like doctoral let-
ters, no open and publicly available model has been published to the best of our
knowledge. We attribute two main contributing factors specifically to this fact:
Lack of public German datasets: Most open public datasets are de-
signed for English data only. Until recently, no German dataset has been
published. Specifically in the context of clinical data, legal restrictions and
privacy policies prevent collection and publication of German datasets.
Data-driven NLP research for medical applications utilize largely internal
data for training and evaluation. In addition to the dataset itself, in or-
der to model relevant text features with supervised learning, high quality
annotations of the dataset are essential for robust model performance.
Protection of sensitive data and privacy concerns: While few works
have been published that present data-driven models for German texts,
the weights of these models have not been published. Since respective
training data has been used in non-anonymized or pseudonymized fash-
ion, the publication of the model weights inherently comes at the risk
of possible data leakage issues through training data extraction from the
model, potentially exposing sensitive information like patient names or id
In this paper, we aim to tackle the issue of absent of anonymous training
data as well as publicly available medical German NLP models. Our main
contributions are as follows:
Automated retrieval of German dataset: We create a custom dataset
for our target language, based on a public English dataset. In addition,
we apply a strategy to preserve relevant annotation information across
Training of medical German NLP model component: We trained
and built a named entity recognition component on the custom dataset.
The model pipeline supports multiple types of medical entities.
Evaluation and publication of the NLP pipeline: The NER model
was evaluated as part of an NLP pipeline. The trained model is publicly
available for further use by third parties.
2 Materials and Methods
2.1 Related Work
In recent years substantial progress has been in the area of NLP which can
mostly be attributed to the joint use of large amounts of data and its process-
ing through large language models like BERT[7] and its (bio)medical-specific
models[1, 2, 18, 19, 25, 28] as a straightforward way to encode representations of
semantic information for further processing in downstream tasks like text classi-
fication or text segmentation. These works mostly focus on English language due
to available language corpora like scientific texts from PubMed or specifically
designed corpora such as n2c2[12] (with annotations), MIMIC-III[27]. For Ger-
man, only GGPONC[3] has been published during our work on our project as a
dataset that carries annotation information, yet other German datasets[5,35] do
not. Moreover, the Technical-Laymen[34] corpus provides an annotated corpus,
yet it is based on crawled texts from non-professional online forums. Various
other German medical text corpora exist[4,6,8,9,14–16,20,22, 32, 36, 37] as basis
for certain NLP and information extraction use-cases, but are inaccessible for
public distribution.
In the field of NLP systems for German medical texts, medSynDiKATe[10,11]
approaches information extraction on pathological finding reports by parsing
and mapping text elements to (semi)automatically built knowledge representa-
tion structures. Processing of pathological findings in German has also been
applied for the tasks of sentence classification[4]. In the context of patient
records, a hybrid RE and NER parsing approach using the SProUT [26] parser
has been proposed[17], however the entity tags lack medical relevance. Similar
general NER for non-medical entity tags has been applied in order to enable
de-identification of clinical records[29] using statistical and regex-based models
through the StanfordNLP parser[21].
Neural methods have been shown to perform well on certain NLP tasks.
In particular, convolutional (CNN) approaches for RE [23, 33, 38] have become
popular in recent years. For German texts, the performance of various methods
have been investigated for medical NER tasks[31], such as CNN, LSTM or
SVM-based models. In this context, the text processing platform mEx [30] uses
CNN-based methods for solving medical NER in German texts. Similar to
our work, mEx is build on SpaCy[13], but provides custom models for other
NLP tasks such as RE. However, the platform has been partially trained on
non-anonymized clinical data and thus, its statistical models have not been
2.2 Custom Dataset Creation
We rely on the publicly available training data from n2c2 NLP 2018 Track
2[12] dataset (ADE and Medication Extraction Challenge) as our initial source
dataset. The data is composed of 303 annotated text documents that have been
postprocessed by the editor for anonymization purposes in order to explicitly
mask sensitive privacy-concerning information.
In order to transform the data into a semantically plausible text, we identify
the type and text span of text masks such that we are able replace the text
masks by sampling type-compatible data randomly from a set of sample entries.
During the sampling stage, depending on the type of the mask, text samples for
entities like dates, names, years or phone numbers are generated and inserted
into the text. Since every replacement step might affects the location of the
text annotation labels as provided by the character-wise start and stop indices,
these label annotation indices must be updated accordingly. For a further pre-
processing, we split up the text into single sentences such that we can omit all
sentences with no associated annotation labels.
For automated translation, we make use of the open source fairseq[24] (0.10.2)
model architecture. fairseq is an implementation of a neural machine transla-
tion model, which supports automatic translation of sequential text data using
pretrained models. For our purposes, we ran the transformer.wmt19.en-de pre-
trained model to translate our set of English sentences into German.
The reconstructive mapping of the annotation labels from the English source
text to the German target text is tackled by FastAlign.FastAlign is an unsu-
pervised method for aligning words from two sentences of source and target
language. We project the annotation labels onto the translated German sen-
tences using the word-level mapping between the corresponding English and
German sentence in order to obtain new annotation label indices in the German
The word alignment mapping tends to induce errors in situations of sentences
with irregular structure such as tabular or itemized text sections. We mitigate
the issue and potential subsequent error propagation by inspecting the structure
of the word mapping matrix A.
Aregular =
The cat sat on the mat.
Die 1 0 0 0 0 0
Katze 0 1 0 0 0 0
saß 0 0 1 0 0 0
auf 0 0 0 1 0 0
der 0 0 0 0 1 0
Matte. 0 0 0 0 0 1
In situations where FastAlign fails to establish a meaningful mapping be-
tween source and target sentence, it can be observed that the resulting mapping
table collapses to a highly non-diagonal matrix structure as illustrated by the
following example:
Airregular =
The cat sat on the mat.
Die 0 0 0 0 0 0
Katze 1 1 0 0 0 0
saß 0 0 1 0 0 0
auf 0 0 0 0 0 0
der 0 0 0 0 0 0
Matte. 0 0 0 1 1 1
Severely ill-aligned word mapping matrices can be detected and removed
from the final set of sentences by applying the simple filter decision rule
i=1 Pwen
j=1 Aij
|wen iwen +iwde +jwde j|
p(wen 1)2+ (wde 1)2
max(wen, wde )> t (1)
where the average distance between a non-zero entry and the diagonal line
from A1,1to Awde,wen is evaluated, given wen as the number of words in the
English sentence and wde as the number of words in the German sentence. If
the value exceeds the threshold t, the sentence pair is disregarded for the final
set of sentences.
The word mapping matrices describe a non-symmetric cross-correspondence
between two language-dependent tokensets, which enables the projection of to-
kens within the English annotation span onto the semantically corresponding
tokens in the German translation text. Therefore, the annotation label indices
for the English text can be resolved to the actual indices for the translated
German text at character level. Since the entity classes remain unchanged,
the following annotation label types can be obtained: Drug,Route,Reason,
Strength,Frequency,Duration,Form,Dosage and ADE.
2.3 NLP Model for NER Training
For the buildup of our NER model as part of an NLP pipeline, we use SpaCy
as an NLP framework for training and inference.
The SpaCy NER model follows an transducer-based parsing approach in-
stead of a state-agnostic token tagging approach.
Embedding: The word tokens are embedded by Bloom embeddings where
different linguistic features are concatenated into a single vector and passed
through nembed separate dense layers, followed by a final maxpooling and layer
norm step. This step enables the model to learn meaningful linear combinations
of single input feature embeddings while reducing the number of dimensions.
Context-aware Token Encoding: In order to extract context-aware fea-
tures that are able to capture larger token window sizes, the final token embed-
ding is passed through an multi-layered convolutional network. Each convolu-
tion step consists of the convolution itself as well as the following maxpooling
operation to keep the dimensions constrained. For each convolution step, a
residual (skip) connection is added to allow the model to pass intermediate
data representations from previous layers to subsequent layers.
NER Parsing: For each encoded token, a corresponding feature-token vec-
tor is precomputed in advance by a dense layer. For parsing, the document
is processed token-wise in a stateful manner. For NER, the state at a given
position consists of the current token, the first token of the last entity and the
previous token by index. Given the state, the feature-position vectors are re-
trieved by indexing the values from the precomputed data and sumed up. A
dense layer is applied to predict the next action. Depending on the action,
the current token is annotated and the next state is generated until the entire
document has been parsed.
3 Results
3.1 Custom Dataset Creation
As initial preprocessing step, we need to replace the anonymization masks by
meaningful regular text data to reconstruct the natural appearance of the text
and alleviate a potential dataset bias that leads to gaps between the dataset
and real world data. For numerical data, we can retrieve mask replacements by
random sampling. Similar to numerical data, dates and years are sampled and
formatted to common date formats. For semantically relevant data types, we
use the Python package Faker. The package maintains lists of plausible data
of various types such as first names, last names, addresses or phone numbers.
We make use of these data entries for certain typed anonymization masks. In
order to obtain our custom dataset, we split the texts from the original dataset
into single sentences using the sentence splitting algorithm from SpaCy. The
English sentences were translated into German by the fairseq library with beam
search (b=5). The sentence-wise word alignments were obtained by FastAlign
and cleaned up by our filter decision rule (t=1.8).
The labels Reason and ADE were removed from the dataset due to the fact
that their definitions are rather ambiguous in general contexts beyond the scope
of the initial source dataset.
Our final custom dataset consists of 8599 sentence pairs, annotated with
30233 annotations of 9 different class labels. The different class labels and their
corresponding frequency in absolute numbers are shown in table 1.
NER Tag Count
Drug 8305
Route 4071
Strength 4549
Frequency 4238
Duration 409
Form 5242
Dosage 3419
Table 1: The distribution of annotations in the custom dataset in absolute
numbers. The dataset consists of 8599 sentence samples. A single tag sample
may span multiple tokens.
3.2 NLP Model for NER Training
For training, we utilize our custom German dataset as our training data and
split the dataset into training set (80%, 6879 sentence samples), validation set
and test set (both 10%, 860 samples). The training setup follows the default
NER setup of SpaCy, the Adam optimizer with a learning rate of 0.001 with
decay (β1= 0.9, β2= 0.999) is used. The training took 10 minutes on an Intel
i7-8665U CPU.
The model performance during training is shown in figure 1. The corre-
sponding performance scores are evaluated on the validation set.
We select the final model based on the highest F1-score on the validation
set. The performance of the selected model is evaluated on the test set per NER
tag as well as in total. The results are shown in table 2.
For demonstration purposes, a generic German sentence is shown in figure
2. The annotations were inferred from the final model.
4 Discussion
In general, the availability of German NER models and methods for medical
and clinical domains leaves much to be desired as described in previous chap-
ters. Analogous to that fact, German datasets in such domain are largely kept
unpublished and are not available to the research community. However, its im-
plications are significantly broader. In the case of unpublished NLP models,
it renders independent reproduction of results and fair comparisons impossible.
Figure 1: Training scores on validation set: Evaluation scores are computed at
every 200th iteration.
NER Tag Precision Recall F1-Score
Drug 67.33 66.17 66.74
Strength 92.34 90.99 91.66
Route 89.93 90.14 90.04
Form 91.94 89.24 90.57
Dosage 87.83 87.57 87.70
Frequency 79.14 76.92 78.01
Duration 67.86 52.78 59.37
total 82.31 80.79 81.54
Table 2: The model performance scores per NER tag. The evaluation is based
on the separated test set.
In the case of lacking datasets, novel competitive data-driven techniques cannot
be developed or validated easily.
As a consequence, we cannot use such independent datasets for an extended
evaluation of our model in order to estimate the inherent dataset bias of our
custom dataset.
Regarding the topic of our custom dataset synthesis, one should emphasize
that the outcome quality of the custom dataset and thus, the quality of the
Figure 2: Demonstration of successfully detected entities from German text
model, is likely to be influenced by the quality of the English-to-German trans-
lation engine. While in the case of English-to-German, modern NMT models
are often sufficient and output reasonable results in the majority of text sam-
ples, the results are likely to worsen in the context of low-resource languages
where powerful NMT models are not available.
The choice of the statistical model and the slim neural model architecture
in particular is attributed to its small computational footprint while being able
to achieve satisfying results. In addition, the NER pipeline of SpaCy explicitly
induces inductive bias through hand-crafted feature extraction during the token
embedding stage. However, since the focus of our work lies on the integration
of NMT-based data for training purposes, we consider an exhaustive hyperpa-
rameter optimization as well as the utilization of a transformer-based model for
improved NER performance scores as future work.
5 Conclusion
In this paper, we presented the first neural NER model for German medical
text as an open, publicly available model that is trained on a custom German
dataset from an publicly available English dataset. We described the method
to extract and postprocess texts from the masked English texts, and generate
German texts by translating and cross-lingual token aligning. In addition, the
NER model architecture was described and the final model performance was
evaluated for single NER tags as well as its performance in total.
We believe that our model is a well-suited baseline for future work in the
context of German medical entity recognition and natural language processing.
The need for independent datasets in order to further improve the situation for
the research community on this matter has been highlighted. We are looking
forward to compare our model to upcoming German medical NER models.
The model as well as the training/test data are available at the following
repository on GitHub:
This work is a part of the DIFUTURE project funded by the German Ministry
of Education and Research (Bundesministerium f¨ur Bildung und Forschung,
BMBF) grant FKZ01ZZ1804E.
[1] Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jin,
Tristan Naumann, and Matthew McDermott. Publicly available clinical
BERT embeddings. In Proceedings of the 2nd Clinical Natural Language
Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, June
2019. Association for Computational Linguistics.
[2] Iz Beltagy, Kyle Lo, and Arman Cohan. Scibert: Pretrained language
model for scientific text. In EMNLP, 2019.
[3] Florian Borchert, Christina Lohr, Luise Modersohn, Thomas Langer,
Markus Follmann, Jan Philipp Sachs, Udo Hahn, and Matthieu-P Schapra-
now. Ggponc: A corpus of german medical text with rich metadata based
on clinical practice guidelines. In Proceedings of the 11th International
Workshop on Health Text Mining and Information Analysis, pages 38–48,
[4] Claudia Bretschneider, Sonja Zillner, and Matthias Hammon. Identifying
pathological findings in german radiology reports using a syntacto-semantic
parsing approach. In Proceedings of the 2013 Workshop on Biomedical
Natural Language Processing, pages 27–35, 2013.
[5] Sven Buechel Christina Lohr and Udo Hahn. Sharing copies of synthetic
clinical corpora without physical distribution — a case study to get around
iprs and privacy constraints featuring the german jsyncc corpus. In Pro-
ceedings of the Eleventh International Conference on Language Resources
and Evaluation (LREC 2018). European Language Resources Association
(ELRA), may 2018.
[6] Viviana Cotik, Roland Roller, Feiyu Xu, Hans Uszkoreit, Klemens Budde,
and Danilo Schmidt. Negation detection in clinical reports written in ger-
man. In Proceedings of the Fifth Workshop on Building and Evaluating Re-
sources for Biomedical Text Mining (BioTxtM2016), pages 115–124, 2016.
[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
BERT: pre-training of deep bidirectional transformers for language under-
standing. CoRR, abs/1810.04805, 2018.
[8] Georg Fette, Maximilian Ertl, Anja W¨orner, Peter Kluegl, Stefan St¨ork,
and Frank Puppe. Information extraction from unstructured electronic
health records and integration into a data warehouse. INFORMATIK 2012,
[9] Udo Hahn, Franz Matthies, Christina Lohr, and Markus L¨offler. 3000pa-
towards a national reference corpus of german clinical language. In MIE,
pages 26–30, 2018.
[10] Udo Hahn, Martin Romacker, and Stefan Schulz. How knowledge drives
understanding—matching medical ontologies with the needs of medical lan-
guage processing. Artificial Intelligence in Medicine, 15(1):25–51, 1999.
Terminology and Concept Representation.
[11] Udo Hahn, Martin Romacker, and Stefan Schulz. Medsyndikate–a natural
language system for the extraction of medical information from findings
reports. International journal of medical informatics, 67(1-3):63—74, De-
cember 2002.
[12] Sam Henry, Kevin Buchan, Michele Filannino, Amber Stubbs, and Ozlem
Uzuner. 2018 n2c2 shared task on adverse drug events and medication
extraction in electronic health records. Journal of the American Medical
Informatics Association : JAMIA, 27(1):3—12, January 2020.
[13] Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane
Boyd. spaCy: Industrial-strength Natural Language Processing in Python,
[14] Maximilian K¨onig, Andr´e Sander, Ilja Demuth, Daniel Diekmann, and Elis-
abeth Steinhagen-Thiessen. Knowledge-based best of breed approach for
automated detection of clinical events based on german free text digital
hospital discharge letters. PloS one, 14(11):e0224916, 2019.
[15] Jonathan Krebs, Hamo Corovic, Georg Dietrich, Max Ertl, Georg Fette,
Mathias Kaspar, Markus Krug, Stefan St¨ork, and Frank Puppe. Semi-
automatic terminology generation for information extraction from german
chest x-ray reports. GMDS, 243:80–84, 2017.
[16] Markus Kreuzthaler and Stefan Schulz. Detection of sentence boundaries
and abbreviations in clinical narratives. In BMC medical informatics and
decision making, volume 15, pages 1–13. BioMed Central, 2015.
[17] Hans-Ulrich Krieger, Christian Spurk, Hans Uszkoreit, Feiyu Xu, Yi Zhang,
Frank M¨uller, and Thomas Tolxdorff. Information extraction from german
patient records via hybrid parsing and relation extraction strategies. In
LREC, pages 2043–2048, 2014.
[18] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim,
Chan Ho So, and Jaewoo Kang. BioBERT: a pre-trained biomedical lan-
guage representation model for biomedical text mining. Bioinformatics, 09
[19] Fei Li, Yonghao Jin, Weisong Liu, Bhanu Pratap Singh Rawat, Pengshan
Cai, and Hong Yu. Fine-tuning bidirectional encoder representations from
transformers (bert)–based models on large-scale electronic health record
notes: An empirical study. JMIR Med Inform, 7(3):e14830, Sep 2019.
[20] Joann M Lohr, Daniel T McDevitt, Kenneth S Lutter, L Richard Roed-
ersheimer, and Michael G Sampson. Operative management of greater
saphenousthrombophlebitis involving the saphenofemoral junction. The
American journal of surgery, 164(3):269–275, 1992.
[21] Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel,
Steven J. Bethard, and David McClosky. The Stanford CoreNLP natural
language processing toolkit. In Association for Computational Linguistics
(ACL) System Demonstrations, pages 55–60, 2014.
[22] Jose A Mi˜narro-Gim´enez, Ronald Cornet, Marie-Christine Jaulent, Heike
Dewenter, Sylvia Thun, Kirstine Rosenbeck Gøeg, Daniel Karlsson, and
Stefan Schulz. Quantitative analysis of manual annotation of clinical text
samples. International journal of medical informatics, 123:37–48, 2019.
[23] Thien Huu Nguyen and Ralph Grishman. Relation extraction: Perspective
from convolutional neural networks. In Proceedings of the 1st workshop on
vector space modeling for natural language processing, pages 39–48, 2015.
[24] Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan
Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for
sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations,
[25] Yifan Peng, Shankai Yan, and Zhiyong Lu. Transfer learning in biomed-
ical natural language processing: An evaluation of bert and elmo on ten
benchmarking datasets. In Proceedings of the 18th BioNLP Workshop and
Shared Task, pages 58–65, 2019.
[26] Jakub Piskorski, Petr Homola, Malgorzata Marciniak, Agnieszka
Mykowiecka, Adam Przepi´orkowski, and Marcin Wolinski. Information
extraction for polish using the sprout platform. In Proceedings of Inter-
national Conference on Intelligent Information Systems - New Trends in
Intelligent Information Processing and Web Mining, May 2004. Interna-
tional Conference on Intelligent Information Systems, Zakopane, Poland,
[27] Tom J Pollard and Alistair EW Johnson. The mimic-iii clinical database., 2016.
[28] Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert:
pretrained contextualized embeddings on large-scale structured electronic
health records for disease prediction. npj Digital Medicine, 4(1):1–13, 2021.
[29] Phillip Richter-Pechanski, Stefan Riezler, and Christoph Dieterich. De-
identification of german medical admission notes. In GMDS, pages 165–169,
[30] Roland Roller, Christoph Alt, Laura Seiffe, and He Wang. mex - an infor-
mation extraction platform for german medical text. In Proceedings of the
11th International Conference on Semantic Web Applications and Tools for
Healthcare and Life Sciences (SWAT4HCLS’2018). Semantic Web Appli-
cations and Tools for Healthcare and Life Sciences (SWAT4HCLS-2018),
December 3-5, Antwerp, Belgium, 12 2018.
[31] Roland Roller, Nils Rethmeier, Philippe Thomas, Marc H¨ubner, Hans
Uszkoreit, Oliver Staeck, Klemens Budde, Fabian Halleck, and Danilo
Schmidt. Detecting named entities and relations in german clinical reports.
In International Conference of the German Society for Computational Lin-
guistics and Language Technology, pages 146–154. Springer, Cham, 2017.
[32] Roland Roller, Hans Uszkoreit, Feiyu Xu, Laura Seiffe, Michael Mikhailov,
Oliver Staeck, Klemens Budde, Fabian Halleck, and Danilo Schmidt. A
fine-grained corpus annotation schema of german nephrology records. In
Proceedings of the Clinical Natural Language Processing Workshop (Clini-
calNLP), pages 69–77, 2016.
[33] Sunil Kumar Sahu, Ashish Anand, Krishnadev Oruganty, and Mahanan-
deeshwar Gattu. Relation extraction from clinical texts using domain in-
variant convolutional neural network. arXiv preprint arXiv:1606.09370,
[34] Laura Seiffe, Oliver Marten, Michael Mikhailov, Sven Schmeier, Sebastian
oller, and Roland Roller. From witch’s shot to music making bones -
resources for medical laymen to technical language and vice versa. In Pro-
ceedings of the 12th Language Resources and Evaluation Conference, pages
6185–6192, Marseille, France, May 2020. European Language Resources
[35] Hanna Suominen, Liadh Kelly, Lorraine Goeuriot, and Martin Krallinger.
CLEF ehealth evaluation lab 2020. In Joemon M. Jose, Emine Yilmaz,
Jo˜ao Magalh˜aes, Pablo Castells, Nicola Ferro, M´ario J. Silva, and Fl´avio
Martins, editors, Advances in Information Retrieval - 42nd European Con-
ference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020,
Proceedings, Part II, volume 12036 of Lecture Notes in Computer Science,
pages 587–594. Springer, 2020.
[36] Martin Toepfer, Hamo Corovic, Georg Fette, Peter Kl¨ugl, Stefan St¨ork, and
Frank Puppe. Fine-grained information extraction from german transtho-
racic echocardiography reports. BMC medical informatics and decision
making, 15(1):1–16, 2015.
[37] Joachim Wermter and Udo Hahn. An annotated german-language medical
text corpus as language resource. In LREC. Citeseer, 2004.
[38] Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. Rela-
tion classification via convolutional deep neural network. In Proceedings of
COLING 2014, the 25th International Conference on Computational Lin-
guistics: Technical Papers, pages 2335–2344, 2014.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21–6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.
Conference Paper
Full-text available
Many people share information in social media or forums, like food they eat, sports activities they do or events which have been visited. This also applies to information about a person's health status. Information we share online unveils directly or indirectly information about our lifestyle and health situation and thus provides a valuable data resource. If we can make advantage of that data, applications can be created that enable e.g. the detection of possible risk factors of diseases or adverse drug reactions of medications. However, as most people are not medical experts, language used might be more descriptive rather than the precise medical expression as medics do. To detect and use those relevant information, laymen language has to be translated and/or linked to the corresponding medical concept. This work presents baseline data sources in order to address this challenge for German. We introduce a new data set which annotates medical laymen and technical expressions in a patient forum, along with a set of medical synonyms and definitions, and present first baseline results on the data.
Full-text available
Objectives The secondary use of medical data contained in electronic medical records, such as hospital discharge letters, is a valuable resource for the improvement of clinical care (e.g. in terms of medication safety) or for research purposes. However, the automated processing and analysis of medical free text still poses a huge challenge to available natural language processing (NLP) systems. The aim of this study was to implement a knowledge-based best of breed approach, combining a terminology server with integrated ontology, a NLP pipeline and a rules engine. Methods We tested the performance of this approach in a use case. The clinical event of interest was the particular drug-disease interaction “proton-pump inhibitor [PPI] use and osteoporosis”. Cases were to be identified based on free text digital discharge letters as source of information. Automated detection was validated against a gold standard. Results Precision of recognition of osteoporosis was 94.19%, and recall was 97.45%. PPIs were detected with 100% precision and 97.97% recall. The F-score for the detection of the given drug-disease-interaction was 96,13%. Conclusion We could show that our approach of combining a NLP pipeline, a terminology server, and a rules engine for the purpose of automated detection of clinical events such as drug-disease interactions from free text digital hospital discharge letters was effective. There is huge potential for the implementation in clinical and research contexts, as this approach enables analyses of very high numbers of medical free text documents within a short time period.
Full-text available
Motivation: Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing, extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in natural language processing to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this paper, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results: We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement), and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation: We make the pre-trained weights of BioBERT freely available at, and the source code for fine-tuning BioBERT available at Supplementary information: Supplementary data are available at Bioinformatics online.
Conference Paper
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This corpus is one of the largest ever built from German medical documents. Unlike clinical documents, clinical guidelines do not contain any patient-related information and can therefore be used without data protection restrictions. Moreover, GGPONC is the first corpus for the German language covering diverse conditions in a large medical subfield and provides a variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other corpora, medical and non-medical ones.
Objective: This article summarizes the preparation, organization, evaluation, and results of Track 2 of the 2018 National NLP Clinical Challenges shared task. Track 2 focused on extraction of adverse drug events (ADEs) from clinical records and evaluated 3 tasks: concept extraction, relation classification, and end-to-end systems. We perform an analysis of the results to identify the state of the art in these tasks, learn from it, and build on it. Materials and methods: For all tasks, teams were given raw text of narrative discharge summaries, and in all the tasks, participants proposed deep learning-based methods with hand-designed features. In the concept extraction task, participants used sequence labelling models (bidirectional long short-term memory being the most popular), whereas in the relation classification task, they also experimented with instance-based classifiers (namely support vector machines and rules). Ensemble methods were also popular. Results: A total of 28 teams participated in task 1, with 21 teams in tasks 2 and 3. The best performing systems set a high performance bar with F1 scores of 0.9418 for concept extraction, 0.9630 for relation classification, and 0.8905 for end-to-end. However, the results were much lower for concepts and relations of Reasons and ADEs. These were often missed because local context is insufficient to identify them. Conclusions: This challenge shows that clinical concept extraction and relation classification systems have a high performance for many concept types, but significant improvement is still required for ADEs and Reasons. Incorporating the larger context or outside knowledge will likely improve the performance of future systems.