GERNERMED - An Open German Medical
Johann Frei Frank Kramer
The current state of adoption of well-structured electronic health records
and integration of digital methods for storing medical patient data in
structured formats can often considered as inferior compared to the use
of traditional, unstructured text based patient data documentation. Data
mining in the ﬁeld of medical data analysis often needs to rely solely on
processing of unstructured data to retrieve relevant data. In natural lan-
guage processing (NLP), statistical models have been shown successful
in various tasks like part-of-speech tagging, relation extraction (RE) and
named entity recognition (NER).
In this work, we present GERNERMED, the ﬁrst open, neural NLP
model for NER tasks dedicated to detect medical entity types in German
text data. Here, we avoid the conﬂicting goals of protection of sensitive
patient data from training data extraction and the publication of the
statistical model weights by training our model on a custom dataset that
was translated from publicly available datasets in foreign language by a
pretrained neural machine translation model.
The sample code and the statistical model is available at:
Despite continuous eﬀorts to transform the storage and processing of medical
data in healthcare systems into a framework of machine-readable highly struc-
tured data, implementation designs that aim to fulﬁll such requirements are
only slowly gaining traction in the clinical healthcare environment. In addition
to common technical challenges, physicians tend to bypass or completely avoid
inconvenient data input interfaces, which enforce structured data formats, by
encoding relevant information as free-form unstructured text.
Electronic data capturing systems are developed in order to improve the
situation of structured data capturing, yet their primary focus lies on clinical
studies and the involvement of these systems needs to be designed in early stages
arXiv:2109.12104v1 [cs.CL] 24 Sep 2021
and requires active software management and maintenance. Such electronic data
capturing solutions are commonly considered in the context of clinical research
but are largely omitted in non research-centric healthcare services.
In the light of the rise of data mining and big data analysis, the emerging
importance of large scale data acquisition and collection for ﬁnding and un-
derstanding novel relationships of disease, disease-indicating biomarkers, drug
eﬀects and other input variables, induces additional pressure on ﬁnding new pos-
sible data sources. While new datasets can be designed and created for speciﬁc
use cases, the amount of obtained data might be very limited and not suﬃcient
for modern data-driven methods. Furthermore, such data collection eﬀorts can
turn out as rather ineﬃcient in terms of time and work involved in creating new
datasets with respect to the number of acquired data samples.
In contrast, unstructured data of sources from legacy systems and non
research-centric healthcare, referred to as second use, oﬀer a potential alterna-
tive. However, techniques for information extraction and retrieval, mainly from
the NLP domain, needs to be applied to transform raw data into structured
While the availability of existing NLP models in English, and other non NLP-
based techniques, for medical use cases is focus of active research, the situation
of medical NLP models for non English languages is less satisfying. Since the
performance of an NLP model often depends on its dedicated target language,
most models cannot be shared and reused easily on diﬀerent languages, but
requires re-training on new data from the desired target language.
In particular, for the case of detection of entities like prescribed drugs and
level or frequency of dosage from German medical documents like doctoral let-
ters, no open and publicly available model has been published to the best of our
knowledge. We attribute two main contributing factors speciﬁcally to this fact:
•Lack of public German datasets: Most open public datasets are de-
signed for English data only. Until recently, no German dataset has been
published. Speciﬁcally in the context of clinical data, legal restrictions and
privacy policies prevent collection and publication of German datasets.
Data-driven NLP research for medical applications utilize largely internal
data for training and evaluation. In addition to the dataset itself, in or-
der to model relevant text features with supervised learning, high quality
annotations of the dataset are essential for robust model performance.
•Protection of sensitive data and privacy concerns: While few works
have been published that present data-driven models for German texts,
the weights of these models have not been published. Since respective
training data has been used in non-anonymized or pseudonymized fash-
ion, the publication of the model weights inherently comes at the risk
of possible data leakage issues through training data extraction from the
model, potentially exposing sensitive information like patient names or id
In this paper, we aim to tackle the issue of absent of anonymous training
data as well as publicly available medical German NLP models. Our main
contributions are as follows:
•Automated retrieval of German dataset: We create a custom dataset
for our target language, based on a public English dataset. In addition,
we apply a strategy to preserve relevant annotation information across
•Training of medical German NLP model component: We trained
and built a named entity recognition component on the custom dataset.
The model pipeline supports multiple types of medical entities.
•Evaluation and publication of the NLP pipeline: The NER model
was evaluated as part of an NLP pipeline. The trained model is publicly
available for further use by third parties.
2 Materials and Methods
2.1 Related Work
In recent years substantial progress has been in the area of NLP which can
mostly be attributed to the joint use of large amounts of data and its process-
ing through large language models like BERT and its (bio)medical-speciﬁc
models[1, 2, 18, 19, 25, 28] as a straightforward way to encode representations of
semantic information for further processing in downstream tasks like text classi-
ﬁcation or text segmentation. These works mostly focus on English language due
to available language corpora like scientiﬁc texts from PubMed or speciﬁcally
designed corpora such as n2c2 (with annotations), MIMIC-III. For Ger-
man, only GGPONC has been published during our work on our project as a
dataset that carries annotation information, yet other German datasets[5,35] do
not. Moreover, the Technical-Laymen corpus provides an annotated corpus,
yet it is based on crawled texts from non-professional online forums. Various
other German medical text corpora exist[4,6,8,9,14–16,20,22, 32, 36, 37] as basis
for certain NLP and information extraction use-cases, but are inaccessible for
In the ﬁeld of NLP systems for German medical texts, medSynDiKATe[10,11]
approaches information extraction on pathological ﬁnding reports by parsing
and mapping text elements to (semi)automatically built knowledge representa-
tion structures. Processing of pathological ﬁndings in German has also been
applied for the tasks of sentence classiﬁcation. In the context of patient
records, a hybrid RE and NER parsing approach using the SProUT  parser
has been proposed, however the entity tags lack medical relevance. Similar
general NER for non-medical entity tags has been applied in order to enable
de-identiﬁcation of clinical records using statistical and regex-based models
through the StanfordNLP parser.
Neural methods have been shown to perform well on certain NLP tasks.
In particular, convolutional (CNN) approaches for RE [23, 33, 38] have become
popular in recent years. For German texts, the performance of various methods
have been investigated for medical NER tasks, such as CNN, LSTM or
SVM-based models. In this context, the text processing platform mEx  uses
CNN-based methods for solving medical NER in German texts. Similar to
our work, mEx is build on SpaCy, but provides custom models for other
NLP tasks such as RE. However, the platform has been partially trained on
non-anonymized clinical data and thus, its statistical models have not been
2.2 Custom Dataset Creation
We rely on the publicly available training data from n2c2 NLP 2018 Track
2 dataset (ADE and Medication Extraction Challenge) as our initial source
dataset. The data is composed of 303 annotated text documents that have been
postprocessed by the editor for anonymization purposes in order to explicitly
mask sensitive privacy-concerning information.
In order to transform the data into a semantically plausible text, we identify
the type and text span of text masks such that we are able replace the text
masks by sampling type-compatible data randomly from a set of sample entries.
During the sampling stage, depending on the type of the mask, text samples for
entities like dates, names, years or phone numbers are generated and inserted
into the text. Since every replacement step might aﬀects the location of the
text annotation labels as provided by the character-wise start and stop indices,
these label annotation indices must be updated accordingly. For a further pre-
processing, we split up the text into single sentences such that we can omit all
sentences with no associated annotation labels.
For automated translation, we make use of the open source fairseq (0.10.2)
model architecture. fairseq is an implementation of a neural machine transla-
tion model, which supports automatic translation of sequential text data using
pretrained models. For our purposes, we ran the transformer.wmt19.en-de pre-
trained model to translate our set of English sentences into German.
The reconstructive mapping of the annotation labels from the English source
text to the German target text is tackled by FastAlign.FastAlign is an unsu-
pervised method for aligning words from two sentences of source and target
language. We project the annotation labels onto the translated German sen-
tences using the word-level mapping between the corresponding English and
German sentence in order to obtain new annotation label indices in the German
The word alignment mapping tends to induce errors in situations of sentences
with irregular structure such as tabular or itemized text sections. We mitigate
the issue and potential subsequent error propagation by inspecting the structure
of the word mapping matrix A.
The cat sat on the mat.
Die 1 0 0 0 0 0
Katze 0 1 0 0 0 0
saß 0 0 1 0 0 0
auf 0 0 0 1 0 0
der 0 0 0 0 1 0
Matte. 0 0 0 0 0 1
In situations where FastAlign fails to establish a meaningful mapping be-
tween source and target sentence, it can be observed that the resulting mapping
table collapses to a highly non-diagonal matrix structure as illustrated by the
The cat sat on the mat.
Die 0 0 0 0 0 0
Katze 1 1 0 0 0 0
saß 0 0 1 0 0 0
auf 0 0 0 0 0 0
der 0 0 0 0 0 0
Matte. 0 0 0 1 1 1
Severely ill-aligned word mapping matrices can be detected and removed
from the ﬁnal set of sentences by applying the simple ﬁlter decision rule
|wen −i∗wen +i−wde +j∗wde −j|
p(wen −1)2+ (wde −1)2
max(wen, wde )> t (1)
where the average distance between a non-zero entry and the diagonal line
from A1,1to Awde,wen is evaluated, given wen as the number of words in the
English sentence and wde as the number of words in the German sentence. If
the value exceeds the threshold t, the sentence pair is disregarded for the ﬁnal
set of sentences.
The word mapping matrices describe a non-symmetric cross-correspondence
between two language-dependent tokensets, which enables the projection of to-
kens within the English annotation span onto the semantically corresponding
tokens in the German translation text. Therefore, the annotation label indices
for the English text can be resolved to the actual indices for the translated
German text at character level. Since the entity classes remain unchanged,
the following annotation label types can be obtained: Drug,Route,Reason,
Strength,Frequency,Duration,Form,Dosage and ADE.
2.3 NLP Model for NER Training
For the buildup of our NER model as part of an NLP pipeline, we use SpaCy
as an NLP framework for training and inference.
The SpaCy NER model follows an transducer-based parsing approach in-
stead of a state-agnostic token tagging approach.
Embedding: The word tokens are embedded by Bloom embeddings where
diﬀerent linguistic features are concatenated into a single vector and passed
through nembed separate dense layers, followed by a ﬁnal maxpooling and layer
norm step. This step enables the model to learn meaningful linear combinations
of single input feature embeddings while reducing the number of dimensions.
Context-aware Token Encoding: In order to extract context-aware fea-
tures that are able to capture larger token window sizes, the ﬁnal token embed-
ding is passed through an multi-layered convolutional network. Each convolu-
tion step consists of the convolution itself as well as the following maxpooling
operation to keep the dimensions constrained. For each convolution step, a
residual (skip) connection is added to allow the model to pass intermediate
data representations from previous layers to subsequent layers.
NER Parsing: For each encoded token, a corresponding feature-token vec-
tor is precomputed in advance by a dense layer. For parsing, the document
is processed token-wise in a stateful manner. For NER, the state at a given
position consists of the current token, the ﬁrst token of the last entity and the
previous token by index. Given the state, the feature-position vectors are re-
trieved by indexing the values from the precomputed data and sumed up. A
dense layer is applied to predict the next action. Depending on the action,
the current token is annotated and the next state is generated until the entire
document has been parsed.
3.1 Custom Dataset Creation
As initial preprocessing step, we need to replace the anonymization masks by
meaningful regular text data to reconstruct the natural appearance of the text
and alleviate a potential dataset bias that leads to gaps between the dataset
and real world data. For numerical data, we can retrieve mask replacements by
random sampling. Similar to numerical data, dates and years are sampled and
formatted to common date formats. For semantically relevant data types, we
use the Python package Faker. The package maintains lists of plausible data
of various types such as ﬁrst names, last names, addresses or phone numbers.
We make use of these data entries for certain typed anonymization masks. In
order to obtain our custom dataset, we split the texts from the original dataset
into single sentences using the sentence splitting algorithm from SpaCy. The
English sentences were translated into German by the fairseq library with beam
search (b=5). The sentence-wise word alignments were obtained by FastAlign
and cleaned up by our ﬁlter decision rule (t=1.8).
The labels Reason and ADE were removed from the dataset due to the fact
that their deﬁnitions are rather ambiguous in general contexts beyond the scope
of the initial source dataset.
Our ﬁnal custom dataset consists of 8599 sentence pairs, annotated with
30233 annotations of 9 diﬀerent class labels. The diﬀerent class labels and their
corresponding frequency in absolute numbers are shown in table 1.
NER Tag Count
Table 1: The distribution of annotations in the custom dataset in absolute
numbers. The dataset consists of 8599 sentence samples. A single tag sample
may span multiple tokens.
3.2 NLP Model for NER Training
For training, we utilize our custom German dataset as our training data and
split the dataset into training set (80%, 6879 sentence samples), validation set
and test set (both 10%, 860 samples). The training setup follows the default
NER setup of SpaCy, the Adam optimizer with a learning rate of 0.001 with
decay (β1= 0.9, β2= 0.999) is used. The training took 10 minutes on an Intel
The model performance during training is shown in ﬁgure 1. The corre-
sponding performance scores are evaluated on the validation set.
We select the ﬁnal model based on the highest F1-score on the validation
set. The performance of the selected model is evaluated on the test set per NER
tag as well as in total. The results are shown in table 2.
For demonstration purposes, a generic German sentence is shown in ﬁgure
2. The annotations were inferred from the ﬁnal model.
In general, the availability of German NER models and methods for medical
and clinical domains leaves much to be desired as described in previous chap-
ters. Analogous to that fact, German datasets in such domain are largely kept
unpublished and are not available to the research community. However, its im-
plications are signiﬁcantly broader. In the case of unpublished NLP models,
it renders independent reproduction of results and fair comparisons impossible.
Figure 1: Training scores on validation set: Evaluation scores are computed at
every 200th iteration.
NER Tag Precision Recall F1-Score
Drug 67.33 66.17 66.74
Strength 92.34 90.99 91.66
Route 89.93 90.14 90.04
Form 91.94 89.24 90.57
Dosage 87.83 87.57 87.70
Frequency 79.14 76.92 78.01
Duration 67.86 52.78 59.37
total 82.31 80.79 81.54
Table 2: The model performance scores per NER tag. The evaluation is based
on the separated test set.
In the case of lacking datasets, novel competitive data-driven techniques cannot
be developed or validated easily.
As a consequence, we cannot use such independent datasets for an extended
evaluation of our model in order to estimate the inherent dataset bias of our
Regarding the topic of our custom dataset synthesis, one should emphasize
that the outcome quality of the custom dataset and thus, the quality of the
Figure 2: Demonstration of successfully detected entities from German text
model, is likely to be inﬂuenced by the quality of the English-to-German trans-
lation engine. While in the case of English-to-German, modern NMT models
are often suﬃcient and output reasonable results in the majority of text sam-
ples, the results are likely to worsen in the context of low-resource languages
where powerful NMT models are not available.
The choice of the statistical model and the slim neural model architecture
in particular is attributed to its small computational footprint while being able
to achieve satisfying results. In addition, the NER pipeline of SpaCy explicitly
induces inductive bias through hand-crafted feature extraction during the token
embedding stage. However, since the focus of our work lies on the integration
of NMT-based data for training purposes, we consider an exhaustive hyperpa-
rameter optimization as well as the utilization of a transformer-based model for
improved NER performance scores as future work.
In this paper, we presented the ﬁrst neural NER model for German medical
text as an open, publicly available model that is trained on a custom German
dataset from an publicly available English dataset. We described the method
to extract and postprocess texts from the masked English texts, and generate
German texts by translating and cross-lingual token aligning. In addition, the
NER model architecture was described and the ﬁnal model performance was
evaluated for single NER tags as well as its performance in total.
We believe that our model is a well-suited baseline for future work in the
context of German medical entity recognition and natural language processing.
The need for independent datasets in order to further improve the situation for
the research community on this matter has been highlighted. We are looking
forward to compare our model to upcoming German medical NER models.
The model as well as the training/test data are available at the following
repository on GitHub: https://github.com/frankkramer-lab/GERNERMED.
This work is a part of the DIFUTURE project funded by the German Ministry
of Education and Research (Bundesministerium f¨ur Bildung und Forschung,
BMBF) grant FKZ01ZZ1804E.
 Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jin,
Tristan Naumann, and Matthew McDermott. Publicly available clinical
BERT embeddings. In Proceedings of the 2nd Clinical Natural Language
Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, June
2019. Association for Computational Linguistics.
 Iz Beltagy, Kyle Lo, and Arman Cohan. Scibert: Pretrained language
model for scientiﬁc text. In EMNLP, 2019.
 Florian Borchert, Christina Lohr, Luise Modersohn, Thomas Langer,
Markus Follmann, Jan Philipp Sachs, Udo Hahn, and Matthieu-P Schapra-
now. Ggponc: A corpus of german medical text with rich metadata based
on clinical practice guidelines. In Proceedings of the 11th International
Workshop on Health Text Mining and Information Analysis, pages 38–48,
 Claudia Bretschneider, Sonja Zillner, and Matthias Hammon. Identifying
pathological ﬁndings in german radiology reports using a syntacto-semantic
parsing approach. In Proceedings of the 2013 Workshop on Biomedical
Natural Language Processing, pages 27–35, 2013.
 Sven Buechel Christina Lohr and Udo Hahn. Sharing copies of synthetic
clinical corpora without physical distribution — a case study to get around
iprs and privacy constraints featuring the german jsyncc corpus. In Pro-
ceedings of the Eleventh International Conference on Language Resources
and Evaluation (LREC 2018). European Language Resources Association
(ELRA), may 2018.
 Viviana Cotik, Roland Roller, Feiyu Xu, Hans Uszkoreit, Klemens Budde,
and Danilo Schmidt. Negation detection in clinical reports written in ger-
man. In Proceedings of the Fifth Workshop on Building and Evaluating Re-
sources for Biomedical Text Mining (BioTxtM2016), pages 115–124, 2016.
 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
BERT: pre-training of deep bidirectional transformers for language under-
standing. CoRR, abs/1810.04805, 2018.
 Georg Fette, Maximilian Ertl, Anja W¨orner, Peter Kluegl, Stefan St¨ork,
and Frank Puppe. Information extraction from unstructured electronic
health records and integration into a data warehouse. INFORMATIK 2012,
 Udo Hahn, Franz Matthies, Christina Lohr, and Markus L¨oﬄer. 3000pa-
towards a national reference corpus of german clinical language. In MIE,
pages 26–30, 2018.
 Udo Hahn, Martin Romacker, and Stefan Schulz. How knowledge drives
understanding—matching medical ontologies with the needs of medical lan-
guage processing. Artiﬁcial Intelligence in Medicine, 15(1):25–51, 1999.
Terminology and Concept Representation.
 Udo Hahn, Martin Romacker, and Stefan Schulz. Medsyndikate–a natural
language system for the extraction of medical information from ﬁndings
reports. International journal of medical informatics, 67(1-3):63—74, De-
 Sam Henry, Kevin Buchan, Michele Filannino, Amber Stubbs, and Ozlem
Uzuner. 2018 n2c2 shared task on adverse drug events and medication
extraction in electronic health records. Journal of the American Medical
Informatics Association : JAMIA, 27(1):3—12, January 2020.
 Matthew Honnibal, Ines Montani, Soﬁe Van Landeghem, and Adriane
Boyd. spaCy: Industrial-strength Natural Language Processing in Python,
 Maximilian K¨onig, Andr´e Sander, Ilja Demuth, Daniel Diekmann, and Elis-
abeth Steinhagen-Thiessen. Knowledge-based best of breed approach for
automated detection of clinical events based on german free text digital
hospital discharge letters. PloS one, 14(11):e0224916, 2019.
 Jonathan Krebs, Hamo Corovic, Georg Dietrich, Max Ertl, Georg Fette,
Mathias Kaspar, Markus Krug, Stefan St¨ork, and Frank Puppe. Semi-
automatic terminology generation for information extraction from german
chest x-ray reports. GMDS, 243:80–84, 2017.
 Markus Kreuzthaler and Stefan Schulz. Detection of sentence boundaries
and abbreviations in clinical narratives. In BMC medical informatics and
decision making, volume 15, pages 1–13. BioMed Central, 2015.
 Hans-Ulrich Krieger, Christian Spurk, Hans Uszkoreit, Feiyu Xu, Yi Zhang,
Frank M¨uller, and Thomas Tolxdorﬀ. Information extraction from german
patient records via hybrid parsing and relation extraction strategies. In
LREC, pages 2043–2048, 2014.
 Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim,
Chan Ho So, and Jaewoo Kang. BioBERT: a pre-trained biomedical lan-
guage representation model for biomedical text mining. Bioinformatics, 09
 Fei Li, Yonghao Jin, Weisong Liu, Bhanu Pratap Singh Rawat, Pengshan
Cai, and Hong Yu. Fine-tuning bidirectional encoder representations from
transformers (bert)–based models on large-scale electronic health record
notes: An empirical study. JMIR Med Inform, 7(3):e14830, Sep 2019.
 Joann M Lohr, Daniel T McDevitt, Kenneth S Lutter, L Richard Roed-
ersheimer, and Michael G Sampson. Operative management of greater
saphenousthrombophlebitis involving the saphenofemoral junction. The
American journal of surgery, 164(3):269–275, 1992.
 Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel,
Steven J. Bethard, and David McClosky. The Stanford CoreNLP natural
language processing toolkit. In Association for Computational Linguistics
(ACL) System Demonstrations, pages 55–60, 2014.
 Jose A Mi˜narro-Gim´enez, Ronald Cornet, Marie-Christine Jaulent, Heike
Dewenter, Sylvia Thun, Kirstine Rosenbeck Gøeg, Daniel Karlsson, and
Stefan Schulz. Quantitative analysis of manual annotation of clinical text
samples. International journal of medical informatics, 123:37–48, 2019.
 Thien Huu Nguyen and Ralph Grishman. Relation extraction: Perspective
from convolutional neural networks. In Proceedings of the 1st workshop on
vector space modeling for natural language processing, pages 39–48, 2015.
 Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan
Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for
sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations,
 Yifan Peng, Shankai Yan, and Zhiyong Lu. Transfer learning in biomed-
ical natural language processing: An evaluation of bert and elmo on ten
benchmarking datasets. In Proceedings of the 18th BioNLP Workshop and
Shared Task, pages 58–65, 2019.
 Jakub Piskorski, Petr Homola, Malgorzata Marciniak, Agnieszka
Mykowiecka, Adam Przepi´orkowski, and Marcin Wolinski. Information
extraction for polish using the sprout platform. In Proceedings of Inter-
national Conference on Intelligent Information Systems - New Trends in
Intelligent Information Processing and Web Mining, May 2004. Interna-
tional Conference on Intelligent Information Systems, Zakopane, Poland,
 Tom J Pollard and Alistair EW Johnson. The mimic-iii clinical database.
 Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert:
pretrained contextualized embeddings on large-scale structured electronic
health records for disease prediction. npj Digital Medicine, 4(1):1–13, 2021.
 Phillip Richter-Pechanski, Stefan Riezler, and Christoph Dieterich. De-
identiﬁcation of german medical admission notes. In GMDS, pages 165–169,
 Roland Roller, Christoph Alt, Laura Seiﬀe, and He Wang. mex - an infor-
mation extraction platform for german medical text. In Proceedings of the
11th International Conference on Semantic Web Applications and Tools for
Healthcare and Life Sciences (SWAT4HCLS’2018). Semantic Web Appli-
cations and Tools for Healthcare and Life Sciences (SWAT4HCLS-2018),
December 3-5, Antwerp, Belgium, 12 2018.
 Roland Roller, Nils Rethmeier, Philippe Thomas, Marc H¨ubner, Hans
Uszkoreit, Oliver Staeck, Klemens Budde, Fabian Halleck, and Danilo
Schmidt. Detecting named entities and relations in german clinical reports.
In International Conference of the German Society for Computational Lin-
guistics and Language Technology, pages 146–154. Springer, Cham, 2017.
 Roland Roller, Hans Uszkoreit, Feiyu Xu, Laura Seiﬀe, Michael Mikhailov,
Oliver Staeck, Klemens Budde, Fabian Halleck, and Danilo Schmidt. A
ﬁne-grained corpus annotation schema of german nephrology records. In
Proceedings of the Clinical Natural Language Processing Workshop (Clini-
calNLP), pages 69–77, 2016.
 Sunil Kumar Sahu, Ashish Anand, Krishnadev Oruganty, and Mahanan-
deeshwar Gattu. Relation extraction from clinical texts using domain in-
variant convolutional neural network. arXiv preprint arXiv:1606.09370,
 Laura Seiﬀe, Oliver Marten, Michael Mikhailov, Sven Schmeier, Sebastian
M¨oller, and Roland Roller. From witch’s shot to music making bones -
resources for medical laymen to technical language and vice versa. In Pro-
ceedings of the 12th Language Resources and Evaluation Conference, pages
6185–6192, Marseille, France, May 2020. European Language Resources
 Hanna Suominen, Liadh Kelly, Lorraine Goeuriot, and Martin Krallinger.
CLEF ehealth evaluation lab 2020. In Joemon M. Jose, Emine Yilmaz,
Jo˜ao Magalh˜aes, Pablo Castells, Nicola Ferro, M´ario J. Silva, and Fl´avio
Martins, editors, Advances in Information Retrieval - 42nd European Con-
ference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020,
Proceedings, Part II, volume 12036 of Lecture Notes in Computer Science,
pages 587–594. Springer, 2020.
 Martin Toepfer, Hamo Corovic, Georg Fette, Peter Kl¨ugl, Stefan St¨ork, and
Frank Puppe. Fine-grained information extraction from german transtho-
racic echocardiography reports. BMC medical informatics and decision
making, 15(1):1–16, 2015.
 Joachim Wermter and Udo Hahn. An annotated german-language medical
text corpus as language resource. In LREC. Citeseer, 2004.
 Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. Rela-
tion classiﬁcation via convolutional deep neural network. In Proceedings of
COLING 2014, the 25th International Conference on Computational Lin-
guistics: Technical Papers, pages 2335–2344, 2014.