Conference PaperPDF Available

A Domain-adapted Dependency Parser for German Clinical Text



In this work, we present a syntactic parser specialized for German clinical data. Our model, trained on a small gold standard nephrological dataset, outperforms the default German model of Stanford CoreNLP in parsing nephrology documents in respect to LAS (74.64 vs. 42.15). Moreover, retraining the default model via domain-adaptation to nephrology leads to further improvements on nephrology data (78.96). We also show that our model performs well on fictitious clinical data from other subdo-mains (69.69).
A Domain-adapted Dependency Parser for German Clinical Text
Elif Kara?, Tatjana Zeen?, Aleksandra Gabryszak?, Klemens Budde3,
Danilo Schmidt3,Roland Roller?
?Language Technology Lab, DFKI Berlin, Germany
e – Universit¨
atsmedizin Berlin, Germany
In this work, we present a syntactic parser
specialized for German clinical data. Our
model, trained on a small gold standard
nephrological dataset, outperforms the de-
fault German model of Stanford CoreNLP
in parsing nephrology documents in re-
spect to LAS (74.64 vs. 42.15). Moreover,
re-training the default model via domain-
adaptation to nephrology leads to further
improvements on nephrology data (78.96).
We also show that our model performs well
on fictitious clinical data from other subdo-
mains (69.69).
1 Introduction
The demand for Natural Language Processing
(NLP) in the clinical domain is rapidly increas-
ing due to growing interest in clinical information
systems and their potential to enhance clinical ac-
tivities. Clinical text data exists in abundance in
unstructured format (patient records, hand-written
notes, etc.) that, once structured by NLP solutions,
could be used to improve interaction between pa-
tients and medical staff, to aid the personalization
of treatment and to automate risk stratification. Fur-
ther, NLP can aid the detection of adverse drug
events, as well as the detection and prediction of
healthcare associated infections (Dalianis, 2018).
A multitude of NLP tools were developed to
process English clinical text, such as Savova et al.
(2010) or Aronson and Lang (2010), but, thus far,
few for German (see Section 2). The primary rea-
son for this is the lack of existing clinical text in
German that can be accessed for research, due to
strict laws revolving around issues of ethics, pri-
vacy and safety (Starlinger et al., 2016; Suster et
al., 2017; Lohr et al., 2018).
Added to the juridical constraints, clinical lan-
guage is by itself difficult to process and, thus, re-
quires specialized solutions. It tends to be driven by
time pressure and the need for minuteness, often
deviating from stylistic, grammatical and ortho-
graphic conventions.
Some features of clinical language problematic
for machine-readability are (Patrick and Nguyen,
2011; Roller et al., 2016; Savkov et al., 2016; Dalia-
nis, 2018):
Extensive use of Greek-
and Latin-rooted terminology, e.g. Appen-
dektomie (‘appendectomy’), thorakal (‘tho-
Complex syntactic embeddings,
e.g. In Anbetracht der initial bestehenden
undungskonstellation haben wir antibi-
otisch mit Levofloxacin 500 mg 1-0-1
10 Tage behandelt, was sich im Nachhinein
nach dem bakteriologischen Resistenzprofil
als treffsicher erwies. (‘Given the initial
inflammatory constellation, we treated
antibiotically with Levofloxacin 500 mg 1-0-1
for 10 days, which turned out to be accurate
according to the bacteriological resistance
Ellipses of auxiliary and copula verbs
as well as sentence boundaries, e.g. Geht gut.
(‘Goes well.’),
Odeme r
aufig (‘Edema de-
This work focuses on syntactic dependency pars-
ing of clinical text in German. Syntactic depen-
dency relations provide insight into the grammat-
ical structure of a sentence and are often used as
input for NLP applications.
We use the Stanford Parser (SP) (Chen and Man-
ning, 2014), a domain-independent syntactic neu-
ral network dependency parser from the Stanford
CoreNLP pipeline (Manning et al., 2014), and ex-
amine its accuracy on highly specialized German-
language data from the clinical domain. The re-
sult is rather poor when using the German default
model. In order to improve it, two experiments
Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018)
Vienna, Austria – September 19-21, 2018
were conducted:
1) We provide the SP with gold standard to-
kenization and PoS-tags and (re-)train two new
models on the gold standard annotation. Based on
the results, we establish that a model trained on
a small nephrological dataset already outperforms
Stanford’s own model when parsing clinical text.
However, our best-performing model is a blend of
Stanford’s data model, re-trained with the model
described here. From this, we take that re-training
models that were initially trained on large-scale
datasets of mixed domains with in-domain data (of
smaller scale) is beneficial.
2) We further test the potential of our best-
performing model on additional documents from
varying clinical subdomains, with promising re-
sults. These are fictitious as opposed to the previ-
ous test set, and thus, can be published for further
In this work, we demonstrate how existing NLP
models can be refined to process domain- and
language-specific data. The paper is structured
as follows: In Section 2, we present a selection of
previous research. Next, we present our dataset in
Section 3, followed by the procedure of our experi-
ments in Section 4. Finally, we sum up our findings
in Section 5.
2 Related Work
The Stanford Parser is a popular language-agnostic
syntactic statistical parser (Zou et al., 2015; Ma et
al., 2015; Chaplot et al., 2015) that can be trained
on any language. As part of the unified Universal
Dependencies (UD) framework (Nivre et al., 2016),
models for various languages, including German,
are available. The German model was trained on
the UD Treebank for German – a large dataset of
heterogeneous nature (website crawls). German
uses all 17 universal Part of Speech (PoS) cate-
gories and most of the 40 dependencies due to its
morpho-syntactic complexity. For a complete list
of language-specific relations, please refer to the
UD website.
It is tried and tested that the source domain
trained on the parser needs to match the domain
of the data to be parsed (McClosky et al., 2010).
As a consequence, existing parsers tend to handle
domain-specific data poorly. With the increased
interest in biomedical NLP in recent years, there
have been a number of shared domain-adaptation
initiatives, whereby pre-built, domain-independent
parser models are customized and used for re-
training (McClosky et al., 2010; Jiang et al., 2015;
Skeppstedt et al., 2014; Rimell and Clark, 2009).
The Charniak Parser (Charniak, 2000) was en-
riched with data from a variety of domains, includ-
ing abstracts from PubMed, a corpus of biomedical
and life sciences journal literature (McClosky and
Charniak, 2008). Similarity measures between tar-
get and source domains were fed into a regression
model that analyzes the effect of domain dissimilar-
ities and, subsequently, selects the input that max-
imizes the regression function. This multi-source
approach to domain adaptation improves the parse
quality of texts from all source domains, compared
to non-specific domains. The system learned quan-
titative measures of domain differences and their
effects on parsing accuracy, so that it proposes lin-
ear combinations of the source models.
Jiang et al. (2015) compared the SP, Charniak
and Bikel (Bikel, 2004) parsers on clinical text
before and after domain-adaptation and found that
domain-adapted re-training is an effective measure
and that the SP outperformed the others.
A different approach was taken by Skeppstedt et
al. (2014) via a direct comparison between clinical
and standard Swedish text parses using a domain-
independent Swedish parser. Based on the manual
analysis and the identification of eight PoS-related
error types, pre-processing rules were formulated
and fed back to the tool, resulting in improved
parsing. Likewise, Rimell and Clark (2009) report
that simply retraining the PoS-tagger on biomedical
data leads to significant improvements in parsing
performance. This indicates the importance of a
relevant PoS-tagset applied consistently.
Contrasting these shared efforts, there is – to date
– not a single dependency processing tool available
for use on clinical German. As already mentioned
in the Introduction, the lack of shared resources
is a persisting obstacle for clinical NLP in Ger-
many (Starlinger et al., 2016). However, progress
can only be made by the sharing of models (Hell-
rich et al., 2015; Starlinger et al., 2016). Further-
more, Lohr et al. (2018) propose testing models on
synthetically generated medical corpora, which can
be made public without infringing on data privacy
There are currently a total of ten corpora in clin-
ical German – all inaccessible (Lohr et al., 2018)
– the first and most well-known being the FraMed
corpus (Wermter and Hahn, 2004; Hellrich et al.,
Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018)
Vienna, Austria – September 19-21, 2018
2015), which contains authentic de-identified med-
ical data and is PoS-tagged using a variant of
the Stuttgart-T
ubingen-TagSet (STTS). The cor-
pus was used for generating in-domain machine
learning models for different tasks, e.g. sentence
splitting, PoS-tagging and tokenization (Faessler
et al., 2014; Hahn et al., 2016). However, it is
unaccessible for research.
Hellrich et al. (2015) tested JCoRe, a newly de-
veloped NLP pipeline, on FraMed with respect
to PoS-tagging and compared the results to the
OpenNLP (Ingersoll et al., 2013) and the Stanford
PoS-tagger (all trained on FraMed). JCoRe outper-
formed alternative components of OpenNLP and
3 Data
This section presents the data used for training and
evaluation of our dependency tree parser.
3.1 Nephrological Dependency Corpus
A small gold standard corpus of nephrological text
documents, including PoS and dependency annota-
tions, serves as the reference point for our experi-
ments. It is henceforth referred to as Nephro Gold.
The dataset comprising our gold standard cor-
pus, presented in Table 1, consists of original de-
identified German nephrology records – clinical
notes and discharge summaries. While the first
are characterized by poor syntactic structure, mis-
spellings as well as extensive use of abbreviations
and acronyms, the discharge summaries are embed-
ded in a letter format and comprise well-formed
sentences as well as detailed lists of medical diag-
nostics and procedures.
clinical notes dis. summaries
number of files 44 11
total word count 3,154 10,436
avg. words (std.) 71.7 (75.2) 948.7 (333.3)
Table 1: Annotated files comprising Nephro Gold
The syntactic annotation was carried out by two
postgraduate students of linguistics in their final
year, in roughly 150 hours each, using the UD-
tagset. The PoS-annotation had been carried out
manually in previous work (Seiffe, 2018), using the
STTS-tagset. Four clinical notes and one discharge
summary were annotated by both annotators, ini-
tially scoring an Inter-Annotator Agreement (IAA)
of 0.83, according to Cohen’s kappa. The rela-
tively low IAA can be attributed to the linguistic
challenges outlined in Section 1. The annotators
reviewed the cases of disagreement and identified a
number of systematic differences, such as the anno-
tation of coordinated compound words with a pre-
ceding truncated element or discrepancies in com-
bining nouns with other tokens in specific, com-
plex syntactic structures. Some of these cases may
have been resolved with medical knowledge that
the annotators were lacking. With an adaptation
of the annotation guidelines and a subsequent re-
annotation, the IAA was increased to a kappa score
of 0.9578.
An exemplary sentence parse from our clinical
data is presented in Figure 1.
Translation: RR (Riva-Rocci – ‘blood pressure’) at home
about 130/80 mmHg.
Figure 1: Sentence with syntactic dependencies
3.2 Additional Evaluation Data
In addition to the previously presented clinical de-
pendency corpus, we use a collection of fictitious
clinical notes and discharge summaries to further
evaluate parsing accuracy. These were written by
students familiar with the nephrology data. Thus,
they may not be authentic from a medical perspec-
tive but they maintain the linguistic characteristics
and vocabulary of genuine documents.
In order to enrich the corpus semantically and
provide a more realistic setting, additional ficti-
tious discharge summaries in the subdomains of
Surgery,Cardiac Rehabilitation,Discharge,Inter-
nal Medicine and Relocation were created by a
medical student using the template-based online
tool Arztbriefmanager
, which we refer to as ABM.
Table 2 provides an overview of the fictitious data.
Within our experiment this dataset will be automat-
ically parsed before being manually annotated.
clinical notes ABM
number of files 30 5
total word count 1,233 1,991
avg. words (std.) 41.1 (12.0) 398.2 (226.6)
Table 2: Fictitious data for extended experiments
Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018)
Vienna, Austria – September 19-21, 2018
4 Experiments & Evaluation
The experiments are carried out using the Stanford
Parser (SP) (Chen and Manning, 2014). We applied
a 10-fold cross-validation on the Nephro Gold
dataset. Clinical notes and discharge summaries are
equally assigned to the different folds. Within each
validation step 80% of the data is used for training,
10% for development and 10% for testing.
The Labeled Attachment Score (LAS) was ap-
plied as our accuracy metric. A given dependency
is therefore scored as correct only if both the syn-
tactic head and the label are identical.
4.1 Baseline: Stanford Out-of-the-box
First of all, we would like to determine the perfor-
mance of Stanford CoreNLP on German clinical
data using the pre-existing PoS-tagger and depen-
dency parser models for German. Thus, CoreNLP
was tested out-of-the-box, without any further pro-
cessing, on the Nephro Gold test partitions, input in
plain text. It automatically performs tokenization,
PoS-tagging and dependency parsing, yielding an
average LAS of 27.09.
As expected, the original dependency tree model
for German does not perform flawlessly on our
clinical data. This may be due to the fact that the
model was trained on non-clinical data. Moreover,
we observe errors in tokenization and PoS-tagging
that lead to consequential errors in the labelling of
4.2 Experiment 1: Dependency Parsing of
German Nephrology Reports
In order to observe the true efficiency of the SP, we
eliminate potential errors caused by automatic pre-
processing. Thus, we provide single tokens along
with PoS-labels of the gold standard set as input to
the SP and test the following three models on the
Nephro Gold test set using a 10-fold cross valida-
We test the default SP model again, as in the
baseline test in Section 4.1, this time skip-
ping the tokenizer and PoS-tagger, and in-
stead, feeding the SP with tokens and their
PoS from the gold standard test split. We refer
to this configuration as ‘stanford-conf ’.
We train a new parser model using only the
Nephro Gold training and development set. In
doing so, we aim to create a parser specialized
to German clinical language (specifically the
subdomain of nephrology). We refer to this
model as ‘nephro’.
Stanford’s given German parser model con-
tains optimized parameters to label dependen-
cies on general text. We use this already exist-
ing model as baseline, re-train it (250 epochs)
with the Nephro Gold training set and opti-
mize it against the Nephro Gold development
set. This way, we train a specialized depen-
dency parser for clinical data that retains pre-
viously learnt knowledge about dependency
parsing of more general data. We refer to this
configuration as ‘transfer’.
baseline stanford-conf nephro transfer
27.09 42.15 74.64 78.96
Table 3: Average LAS, based on a 10-fold cross-
validation, on German nephrology data (Baseline +
Experiment 1).
The results of the cross-validation presented in
Table 3 show that simply by including gold an-
notation PoS-tokens into the input data and, thus,
avoiding consequential parse errors, stanford-conf
yields achieves better results than baseline. More-
over, nephro, trained solely on the small gold stan-
dard corpus, significantly outperforms stanford-
conf, and transfer outperforms both other models.
All tested setups yield promising results, though,
they have three drawbacks: 1) Inputting gold stan-
dard PoS-tokens does not represent a realistic sce-
nario. 2) The gold standard data applied in nephro
and transfer is relatively small. 3) Applying the
parser to linguistically distinct nephrology data ob-
scures its performance on more diverse German
clinical data. These issues will be addressed in the
next section.
4.3 Experiment 2: Extended Experiments
For the second experiment, a number of problems
described in this paper have been successfully re-
solved: 1) We increase the size and the semantic
variety of our test set (in comparison to the size of
the test set in each single cross-validation step), 2)
we use an external tool for tokenization and auto-
mated PoS-tagging and 3) we circumvent the legal
obstacle by using fictitious clinical data which we
can make available for further use.
In the first step, the fictitious data described in
Section 3.2 is automatically pre-processed using
Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018)
Vienna, Austria – September 19-21, 2018
JPOS, a PoS-tagger trained on medical data that
utilizes a Maximum Entropy model (Hellrich et al.,
2015). As the fictitious dataset is not annotated,
and evaluation has to be carried out manually, only
the best performing model from the previous ex-
periment in Section 4.2, transfer, is applied to the
fictitious data. Our re-trained model takes JPOS-
processed text (sentence-split, tokenized and PoS-
tagged) as input.
In the final step, the output files are manually
corrected by human evaluators (eval-1 and eval-2),
who previously carried out the gold standard anno-
tation (see Section 3.1), in roughly three hours per
person. They evaluated PoS-tags and dependencies,
and made amendments where required. Two clini-
cal notes and one ABM discharge summary were
examined by both evaluators, respectively, scoring
an IAA of 0.9686 in terms of Cohen’s kappa.
subset eval-1 eval-2
clinical notes 75.96 81.75
ABM 69.69 76.26
Table 4: LAS for ‘transfer’-model on the fictitious
dataset (Experiment 2).
Table 4 provides an overview of the parse ac-
curacy determined manually by the evaluators. It
shows that the performance of our system attains
an LAS of over 75 on the clinical notes from the
nephrology domain. Moreover, the results show
that the performance on the clinical notes outper-
forms the results on ABM, which is not surprising
as our dependency parser is trained on data of the
same domain. However, a performance of above
69 on German clinical data outside the nephrology
domain is still promising.
5 Conclusion
In this work, we examined the accuracy of the Stan-
ford Parser on German clinical data. As expected,
the default parser model, trained on the general
domain, yielded deflating results. We presented
our solution of re-training the existing model with
a small gold standard dataset from the nephrol-
ogy domain, which shows an improvement from
42.15 (stanford-conf ) to 78.96 (transfer) (Experi-
ment 1) when tested on the same domain. We fur-
ther demonstrate that the re-trained model is able to
process other clinical data outside the nephrology
domain, despite the relatively small size of train-
ing and evaluation data. The fictitious data and
the models trained on the confidential corpus are
available here2.
This research was supported by the German Fed-
eral Ministry of Economics and Energy (BMWi)
through the project MACSS (01MD16011F).
[Aronson and Lang2010] A. Aronson and F. Lang.
2010. An overview of MetaMap: historical perspec-
tive and recent advances. Journal of the American
Medical Association, 17(3):229–236.
[Bikel2004] Daniel M Bikel. 2004. A Distributional
Analysis of a Lexicalized Statistical Parsing Model.
In Conference on Empirical Methods in Natural Lan-
guage Processing (EMNLP).
[Chaplot et al.2015] Devendra Singh Chaplot, Pushpak
Bhattacharyya, and Ashwin Paranjape. 2015. Unsu-
pervised word sense disambiguation using markov
random field and dependency parser. In AAAI, pages
[Charniak2000] Eugene Charniak. 2000. A Maximum-
Entropy-Inspired Parser. In NAACL 2000 Proceed-
ings of the 1st North American chapter of the As-
sociation for Computational Linguistics conference,
pages 132–139. Brown Laboratory for Linguistic In-
formation Processing.
[Chen and Manning2014] Danqi Chen and Christopher
Manning. 2014. A Fast and Accurate Dependency
Parser using Neural Networks. pages 740–750. As-
sociation for Computational Linguistics.
[Dalianis2018] Hercules Dalianis. 2018. Clinical Text
Mining. Springer International Publishing, Cham.
[Faessler et al.2014] Erik Faessler, Johannes Hellrich,
and Udo Hahn. 2014. Disclose Models, Hide
the Data - How to Make Use of Confidential Cor-
pora without Seeing Sensitive Raw Data. In Nico-
letta Calzolari (Conference Chair), Khalid Choukri,
Thierry Declerck, Hrafn Loftsson, Bente Maegaard,
Joseph Mariani, Asuncion Moreno, Jan Odijk, and
Stelios Piperidis, editors, Proceedings of the Ninth
International Conference on Language Resources
and Evaluation (LREC’14), Reykjavik, Iceland,
may. European Language Resources Association
[Hahn et al.2016] Udo Hahn, Franz Matthies, Erik
Faessler, and Johannes Hellrich. 2016. Uima-based
jcore 2.0 goes github and maven centralstate-of-the-
art software resource engineering and distribution of
nlp pipelines. In LREC.
Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018)
Vienna, Austria – September 19-21, 2018
[Hellrich et al.2015] Johannes Hellrich, Franz Matthies,
Erik Faessler, and Udo Hahn. 2015. Sharing models
and tools for processing German clinical texts. Stud-
ies in Health Technology and Informatics, 210:734–
[Ingersoll et al.2013] Grant S. Ingersoll, Thomas S.
Morton, and Andrew L. Farris. 2013. Taming Text:
How to Find, Organize, and Manipulate It. Manning
Publications Co., Greenwich, CT, USA.
[Jiang et al.2015] Min Jiang, Yang Huang, Jung-wei
Fan, Buzhou Tang, Josh Denny, and Hua Xu. 2015.
Parsing clinical text: how good are the state-of-the-
art parsers? BMC Medical Informatics and Decision
Making, 15(1):S2.
[Lohr et al.2018] Christina Lohr, Sven Buechel, and
Udo Hahn. 2018. Sharing Copies of Synthetic
Clinical Corpora without Physical Distribution A
Case Study to Get Around IPRs and Privacy Con-
straints Featuring the German JSYNCC Corpus. In
Proceedings of the Eleventh International Confer-
ence on Language Resources and Evaluation (LREC
2018), pages 1259–1266. European Language Re-
sources Association (ELRA).
[Ma et al.2015] Mingbo Ma, Liang Huang, Bowen
Zhou, and Bing Xiang. 2015. Tree-based convolu-
tion for sentence modeling. CoRR, abs/1507.01839.
[Manning et al.2014] Christopher Manning, Mihai Sur-
deanu, John Bauer, Jenny Finkel, Steven Bethard,
and David McClosky. 2014. The Stanford CoreNLP
Natural Language Processing Toolkit. pages 55–60.
Association for Computational Linguistics.
[McClosky and Charniak2008] David McClosky and
Eugene Charniak. 2008. Self-Training for Biomedi-
cal Parsing. In Proceedings of ACL-08: HLT, Short
Papers, pages 101–104, Columbus, Ohio, June. As-
sociation for Computational Linguistics.
[McClosky et al.2010] David McClosky, Eugene Char-
niak, and Mark Johnson. 2010. Automatic do-
main adaptation for parsing. In Human Language
Technologies: The 2010 Annual Conference of the
North American Chapter of the Association for Com-
putational Linguistics, pages 28–36. Association for
Computational Linguistics.
[Nivre et al.2016] Joakim Nivre, Marie-Catherine
De Marneffe, Filip Ginter, Yoav Goldberg, Jan
Hajic, Christopher D Manning, Ryan T McDonald,
Slav Petrov, Sampo Pyysalo, Natalia Silveira, et al.
2016. Universal dependencies v1: A multilingual
treebank collection. In LREC.
[Patrick and Nguyen2011] Jon Patrick and Dung
Nguyen. 2011. Automated Proof Reading of
Clinical Notes. page 10.
[Rimell and Clark2009] Laura Rimell and Stephen
Clark. 2009. Porting a lexicalized-grammar parser
to the biomedical domain. Journal of biomedical in-
formatics, 42(5):852–865.
[Roller et al.2016] Roland Roller, Hans Uszkoreit,
Feiyu Xu, Laura Seiffe, Michael Mikhailov, Oliver
Staeck, Klemens Budde, Fabian Halleck, and
Danilo Schmidt. 2016. A fine-grained corpus
annotation schema of german nephrology records.
In Proceedings of the Clinical Natural Language
Processing Workshop (ClinicalNLP). The COLING
2016 Organizing Committee.
[Savkov et al.2016] Aleksandar Savkov, John Carroll,
Rob Koeling, and Jackie Cassell. 2016. Annotating
patient clinical records with syntactic chunks and
named entities: the harvey corpus. Language Re-
sources and Evaluation, 50(3):523–548.
[Savova et al.2010] Guergana K Savova, James J
Masanz, Philip V Ogren, Jiaping Zheng, Sungh-
wan Sohn, Karin C Kipper-Schuler, and Christo-
pher G Chute. 2010. Mayo clinical text analysis
and knowledge extraction system (ctakes): architec-
ture, component evaluation and applications. Jour-
nal of the American Medical Informatics Associa-
tion, 17(5):507–513.
[Seiffe2018] Laura Seiffe. 2018. Linguistic Modeling
for Text Analytic Tasks for German Clinical Texts.
Master’s thesis, TU Berlin. To appear.
[Skeppstedt et al.2014] Maria Skeppstedt, Maria Kvist,
Gunnar H. Nilsson, and Hercules Dalianis. 2014.
Automatic recognition of disorders, findings, phar-
maceuticals and body structures from clinical text:
An annotation and machine learning study . Journal
of Biomedical Informatics, 49:148–158.
[Starlinger et al.2016] Johannes Starlinger, Madeleine
Kittner, Oliver Blankenstein, and Ulf Leser. 2016.
How to improve information extraction from Ger-
man medical records. IT-Information Technology.
[Suster et al.2017] Simon Suster, St´
ephan Tulkens, and
Walter Daelemans. 2017. A short review of ethi-
cal challenges in clinical natural language process-
ing. CoRR, abs/1703.10090.
[Wermter and Hahn2004] Joachim Wermter and Udo
Hahn. 2004. An annotated German-language medi-
cal text corpus. In Proceedings of the Forth Interna-
tional Conference on Language Resources and Eval-
uation, Lisbon, Portugal.
[Zou et al.2015] Huang Zou, Xinhua Tang, Bin Xie,
and Bing Liu. 2015. Sentiment classification using
machine learning techniques with syntax features.
In Computational Science and Computational Intel-
ligence (CSCI), 2015 International Conference on,
pages 175–179. IEEE.
Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018)
Vienna, Austria – September 19-21, 2018
... The authors published the tool, as they were not allowed to publish the underlying FRAMED corpus [21] itself. In addition to that, two NegEx [22] versions for German exist [23] [24], as well as a dependency tree parser [25], an abbreviation expansion [26] and a tool to pseudonomize protected health information (PHI) in German clinical text [27]. ...
... In addition to the semantic clinical corpus, also a small clinical syntactic dataset has been created in previous work [38] [25]. We refer to this corpus as Nephro Gold. ...
... We evaluated all models using a 5-fold cross validation with a training, development, and test split of 75/10/15. The reported results only include part-ofspeech tagging, concept detection, and relation extraction, because dependency tree parsing and attribute (negation) detection results were partly already presented in previous work and tools made available (see Kara et al. [25] and Cotik et al. [24]). Table 7 presents the results of part-of-speech tagging. ...
Full-text available
Background: In the information extraction and natural language processing domain, accessible datasets are crucial to reproduce and compare results. Publicly available implementations and tools can serve as benchmark and facilitate the development of more complex applications. However, in the context of clinical text processing the number of accessible datasets is scarce -- and so is the number of existing tools. One of the main reasons is the sensitivity of the data. This problem is even more evident for non-English languages. Approach: In order to address this situation, we introduce a workbench: a collection of German clinical text processing models. The models are trained on a de-identified corpus of German nephrology reports. Result: The presented models provide promising results on in-domain data. Moreover, we show that our models can be also successfully applied to other biomedical text in German. Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.
... The authors published the tool, as they were not allowed to publish the underlying FRAMED corpus [21] itself. In addition to that, two NegEx versions for German exist [22] [23], a dependency tree parser [24] and an abbreviation expansion [25]. Up to now, no freely available tool to extract concepts or relations from German clinical text exists so far. ...
... In addition to the semantic clinical corpus, also a small clinical syntactic dataset has been created in previous work [36] [24]. We refer to this corpus as Nephro Gold. ...
... We evaluate all models using a 5-fold cross validation with a training, development, and test split of 75/10/15. The reported results only include part-ofspeech tagging, concept detection, and relation extraction, because dependency tree parsing and attribute (negation) detection results were partly already presented in previous work and tools made available (see Kara et al. [24] and Cotik et al. [44]). Table 7 presents the results of part-of-speech tagging. ...
Conference Paper
Tools and resources to automatically process clinical text are very limited, particularly outside the English speaking world. As many relevant patient information within electronic health records are described in unstructured text, this is a clear drawback. In order to slightly overcome this problem, we present information extraction models for German clinical text and make them freely available. The models have been trained on documents of the nephrology domain and do not contain personal information.
... Word distributions might differ between general corpora and biomedical corpora. Clinical sublanguage provides features that differ from general domain language and they complicate data processing [10,11]. Features include an extensive use of Greek and Latin-rooted terminology, complex syntactic embeddings and reduction, i.e. ellipsis of auxiliary and copula verbs, and complex compound words, often built on the fly. ...
Full-text available
In the medical domain, multiple ontologies and terminology systems are available. However, existing classification and prediction algorithms in the clinical domain often ignore or insufficiently utilize semantic information as it is provided in those ontologies. To address this issue, we introduce a concept for augmenting embeddings, the input to deep neural networks, with semantic information retrieved from ontologies. To do this, words and phrases of sentences are mapped to concepts of a medical ontology aggregating synonyms in the same concept. A semantically enriched vector is generated and used for sentence classification. We study our approach on a sentence classification task using a real world dataset which comprises 640 sentences belonging to 22 categories. A deep neural network model is defined with an embedding layer followed by two LSTM layers and two dense layers. Our experiments show, classification accuracy without content enriched embeddings is for some categories higher than without enrichment. We conclude that semantic information from ontologies has potential to provide a useful enrichment of text. Future research will assess to what extent semantic relationships from the ontology can be used for enrichment.
... Both approaches were implemented for the Russian language by the DeepPavlov project [21]. Several studies proved that parsing of medical text is better with a model that is also trained on medical data [22,23]. However, we did not have labeled data for re-training, so we used a model already trained on a UD Russian SynTagRus corpus (version 2.3). ...
Full-text available
Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary.
Full-text available
To deliver better healthcare to patients and advance healthcare solutions as well as to increase the efficiency of the manufacturing process and thus reduce material and energy consumption in production, more and more artificial intelligence (AI) methods are applied in the field of both, Medicine and Manufacturing. Some of the exciting applications in these areas are remote patient treatment, transcription and storage of digitalized medical data, new drug development, support tools for fastened disease diagnosis, visual quality control in production processes, intelligent supply chain, and logistics solutions, and machine parameter optimization. The applications of artificial intelligence are manifold and therefore, many experts call AI a disruptive and cross-sectional technology. In general, AI is mainly used to optimize internal processes or to develop new business models. However, AI for Medicine and Manufacturing faces obstacles such as a shortage of high-dimensional data, privacy concerns regarding the data collected, and the risks of the creation of biased algorithms if data are not collected over a representative population. The articles presented in this conference proceedings report are selected from the oral presentations and poster presentations from the conference held on 19 October 2022 at the University Furtwangen, Campus Schwenningen, and assigned to the chapters MANUFACTURING, MACHINE LEARNING, and MEDICAL TECHNOLOGY. The URAI 2022 is the fourth conference on artificial intelligence organized by the tri-national alliance TriRhenaTech, the alliance of universities of applied sciences in the Upper Rhine region.
TBase is an electronic health record (EHR) for kidney transplant recipients (KTR) combining automated data entry of key clinical data (e.g., laboratory values, medical reports, radiology and pathology data) via standardized interfaces with manual data entry during routine treatment (e.g., clinical notes, medication list, and transplantation data). By this means, a comprehensive database for KTR is created with benefits for routine clinical care and research. It enables both easy everyday clinical use and quick access for research questions with highest data quality. This is achieved by the concept of data validation in clinical routine in which clinical users and patients have to rely on correct data for treatment and medication plans and thereby validate and correct the clinical data in their daily practice. This EHR is tailored for the needs of transplant outpatient care and proved its clinical utility for more than 20 years at Charité - Universitätsmedizin Berlin. It facilitates efficient routine work with well-structured, comprehensive long-term data and allows their easy use for clinical research. To this point, its functionality covers automated transmission of routine data via standardized interfaces from different hospital information systems, availability of transplant-specific data, a medication list with an integrated check for drug-drug interactions, and semi-automated generation of medical reports among others. Key elements of the latest reengineering are a robust privacy-by-design concept, modularity, and hence portability into other clinical contexts as well as usability and platform independence enabled by HTML5 (Hypertext Markup Language) based responsive web design. This allows fast and easy scalability into other disease areas and other university hospitals. The comprehensive long-term datasets are the basis for the investigation of Machine Learning algorithms, and the modular structure allows to rapidly implement these into clinical care. Patient reported data and telemedicine services are integrated into TBase in order to meet future needs of the patients. These novel features aim to improve clinical care as well as to create new research options and therapeutic interventions.
Conference Paper
Full-text available
In this work we present a fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases. The annotation schema is driven by the needs of our clinical partners and the linguistic aspects of German language. In order to generate annotations within a short period, the work also presents a semi-automatic annotation which uses additional sources of knowledge such as UMLS, to pre-annotate concepts in advance. The presented schema will be used to apply novel techniques from natural language processing and machine learning to support doctors treating their patients by improved information access from unstructured German texts.
Full-text available
Vast amounts of medical information are still recorded as unstructured text. The knowledge contained in this textual data has a great potential to improve clinical routine care, to support clinical research, and to advance personalization of medicine. To access this knowledge, the underlying data has to be semantically integrated – an essential prerequisite to which is information extraction from clinical documents. A body of work, and a good selection of openly available tools for information extraction and semantic integration in the medical domain exist, yet almost exclusively for English language documents. For German texts the situation is rather different: research work is sparse, tools are proprietary or unpublished, and rarely any freely available textual resources exist. In this survey, we (1) describe the challenges of information extraction from German medical documents and the hurdles posed to research in this area, (2) especially address the problems of missing German language resources and privacy implications, and (3) identify the steps necessary to overcome these hurdles and fuel research in semantic integration of textual clinical data.
Conference Paper
Full-text available
We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language & Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an underlying source code versioning system and various means to support collaboration for software development and code modification management. In order to automate the builds of complex NLP pipelines and properly represent and track dependencies of the underlying Java code, we incorporated Maven as part of our software configuration management efforts. In the meantime, we have deployed our artifacts on Maven Central, as well. JCoRe 2.0 offers a broad range of text analytics functionality (mostly) for English-language scientific abstracts and full-text articles, especially from the life sciences domain. (Full Text:
Full-text available
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.
Full-text available
In sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies. To exploit both deep learning and linguistic structures, we propose a tree-based convolutional neural network model which exploit various long-distance relationships between words. Our model improves the sequential baselines on all three sentiment and question classification tasks, and achieves the highest published accuracy on TREC.
Conference Paper
Full-text available
Word Sense Disambiguation is a difficult problem to solve in the unsupervised setting. This is because in this setting inference becomes more dependent on the interplay between different senses in the context due to unavailability of learning resources. Using two basic ideas, sense dependency and selective dependency, we model the WSD problem as a Maximum A Posteriori (MAP) Inference Query on a Markov Random Field (MRF) built using WordNet and Link Parser or Stanford Parser. To the best of our knowledge this combination of dependency and MRF is novel, and our graph-based unsupervised WSD system beats state-of-the-art system on SensEval-2, SensEval-3 and SemEval-2007 English all-words datasets while being over 35 times faster.
Full-text available
Misspellings, abbreviations and acronyms are very popular in clinical notes and can be an obstacle to high quality information extraction and classification. In addition, another important part of narrative reports is clinical scores and measurements as doctors infer a patient"s status by analyzing them. We introduce a knowledge discovery process to resolve unknown tokens and convert scores and measures into a standard layout so as to improve the quality of semantic processing of the corpus. System performance is evaluated before and after an automatic proof reading process by comparing the computed SNOMED-CT codes to the coding created originally by the clinical staff. The automatic coding of the texts increased the coded content by 15% after the automatic correction process and the number of unique codes increased by 4.7%. Accuracy of the automatic coding and annotations in the notes which have not been coded by the clinical staff is suggested by the system output.
Conference Paper
Sentiment classification has adopted machine learning techniques to improve its precision and efficiency. However, the features are always produced by basic words-bag methods without much consideration for words' syntactic properties, which could play an important role in the judgment of sentiment meanings. To remedy this, we firstly generate syntax trees of the sentences, with the analysis of syntactic features of the sentences. Then we introduce multiple sentiment features into the basic words-bag features. Such features were trained on movie reviews as data, with machine learning methods (Naive Bayes and support vector machines). The features and factors introduced by syntax tree were examined to generate a more accurate solution for sentiment classification.
The automatic processing of non-English clinical documents is massively hampered by the lack of publicly available medical language resources for training, testing and evaluating NLP components. We suggest sharing statistical models derived from access-protected clinical documents as a reasonable substitute and provide solutions for sentence splitting, tokenization and POS tagging of German clinical texts. These three components were trained on the confidential FRAMED corpus, a non-sharable collection of various German-language clinical document types. The models derived therefrom outperform alternative components from OPENNLP and the Stanford POS tagger, also trained on FRAMED.