Udo Hahn

Udo Hahn
Friedrich Schiller University Jena | FSU · Jena University Language & Information Engineering (JULIE) Lab

Doctor of Philosophy

About

481
Publications
49,241
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,888
Citations
Citations since 2017
68 Research Items
1962 Citations
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400

Publications

Publications (481)
Article
Phosphorylation-dependent signal transduction plays an important role in regulating the functions and fate of skeletal muscle cells. Central players in the phospho-signaling network are the protein kinases AKT, S6K, and RSK as part of the PI3K-AKT-mTOR-S6K and RAF-MEK-ERK-RSK pathways. However, despite their functional importance, knowledge about t...
Article
Background: Childhood asthma is a result of a complex interaction of genetic and environmental components causing epigenetic and immune dysregulation, airway inflammation and impaired lung function. Although different microarray based EWAS studies have been conducted, the impact of epigenetic regulation in asthma development is still widely unknow...
Article
Background As the number of concomitantly used drugs increases, the prevalence of medication risks increases. These include, for example, drug interactions which may reduce or increase the desired and undesired effects of individual drugs.Objectives The POLypharmacy, drug interActions and Risks (POLAR) project aims to contribute to the detection of...
Chapter
Full-text available
We describe the creation of GRASCCO, a novel German-language corpus composed of some 60 clinical documents with more than.43,000 tokens. GRASCCO is a synthetic corpus resulting from a series of alienation steps to obfuscate privacy-sensitive information contained in real clinical documents, the true origin of all GRASCCO texts. Therefore, it is pub...
Preprint
Motivation Knowledge about interactions between genes and proteins is vital for bio-molecular research. A large part of this knowledge is published in written text and not accessible in a structured way. To remedy this situation, several repositories of automatically extracted interaction facts were proposed over the years. However, existing soluti...
Preprint
We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the othe...
Chapter
Full-text available
We describe the adaptation of a non-clinical pseudonymization system, originally developed for a German email corpus, for clinical use. This tool replaces previously identified Protected Health Information (PHI) items as carriers of privacy-sensitive information (original names for people, organizations, places, etc.) with semantic type-conformant,...
Preprint
Full-text available
We present EmoCoder, a modular encoder-decoder architecture that generalizes emotion analysis over different tasks (sentence-level, word-level, label-to-label mapping), domains (natural languages and their registers), and label formats (e.g., polarity classes, basic emotions, and affective dimensions). Experiments on 14 datasets indicate that EmoCo...
Conference Paper
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based o...
Article
Full-text available
Automated identification of advanced chronic kidney disease (CKD ≥ III) and of no known kidney disease (NKD) can support both clinicians and researchers. We hypothesized that identification of CKD and NKD can be improved, by combining information from different electronic health record (EHR) resources, comprising laboratory values, discharge summar...
Article
Aryl hydrocarbon receptor (AHR) activation by tryptophan (Trp) catabolites enhances tumor malignancy and suppresses anti-tumor immunity. The context specificity of AHR target genes has so far impeded systematic investigation of AHR activity and its upstream enzymes across human cancers. A pan-tissue AHR signature, derived by natural language proces...
Article
Full-text available
Objectives: We survey recent developments in medical Information Extraction (IE) as reported in the literature from the past three years. Our focus is on the fundamental methodological paradigm shift from standard Machine Learning (ML) techniques to Deep Neural Networks (DNNs). We describe applications of this new paradigm concentrating on two basi...
Preprint
Full-text available
The lack of publicly available text corpora is a major obstacle for progress in clinical natural language processing, for non-English speaking countries in particular. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines in the field of...
Article
We here describe the evolution of annotation guidelines for major clinical named entities, namely Diagnosis, Findings and Symptoms, on a corpus of approximately 1,000 German discharge letters. Due to their intrinsic opaqueness and complexity, clinical annotation tasks require continuous guideline tuning, beginning from the initial definition of cru...
Preprint
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and...
Article
Full-text available
The PI3K/Akt pathway promotes skeletal muscle growth and myogenic differentiation. Although its importance in skeletal muscle biology is well documented, many of its substrates remain to be identified. We here studied PI3K/Akt signaling in contracting skeletal muscle cells by quantitative phosphoproteomics. We identified the extended basophilic pho...
Preprint
Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis. Yet, manually curated lexicons are only available for a handful of languages, leaving most languages of the world without such a precious resource for downstream applications. Even worse, their coverage is often li...
Conference Paper
Full-text available
Genes and proteins are the fundamental entities of molecular genetics and deeper knowledge about their interactions constitutes a cornerstone for advancing precision medicine. We here introduce PROGENE (formerly called FSU-PRGE), a corpus that reflects our efforts to cope with this important class of named entities within the framework of a long-la...
Conference Paper
Full-text available
The 2019 Precision Medicine Track at TREC (TREC-PM) aimed at identifying relevant documents from two collections, namely PubMed (biomedical abstracts) and ClinicalTrials.gov (clinical trials), given 40 precision medicine topics representing (virtual) patients. The organizers also proposed a new subtask on treatment retrieval from PubMed. We describ...
Preprint
We examine the affective content of central bank press statements using emotion analysis. Our focus is on two major international players, the European Central Bank (ECB) and the US Federal Reserve Bank (Fed), covering a time span from 1998 through 2019. We reveal characteristic patterns in the emotional dimensions of valence, arousal, and dominanc...
Article
We devised annotation guidelines for the de-identification of German clinical documents and assembled a corpus of 1,106 discharge summaries and transfer letters with 44K annotated protected health information (PHI) items. After three iteration rounds, our annotation team finally reached an inter-annotator agreement of 0.96 on the instance level and...
Conference Paper
We here explore a new corpus construction workflow which exploits the inherent potential of the growing number of Digital Libraries worldwide and the ever-expanding Internet Archive. Rather than building corpora from scratch (which typically consumes a huge amount of resources), we search the Web for fragments of relevant digitized contents scatter...
Article
Full-text available
All cells and organisms exhibit stress-coping mechanisms to ensure survival. Cytoplasmic protein-RNA assemblies termed stress granules are increasingly recognized to promote cellular survival under stress. Thus, they might represent tumor vulnerabilities that are currently poorly explored. The translation-inhibitory eIF2α kinases are established as...
Article
Full-text available
We present the outcome of an annotation effort targeting the content-sensitive segmentation of German clinical reports into sections. We recruited an annotation team of up to eight medical students to annotate a clinical text corpus on a sentence-by-sentence basis in four pre-annotation iterations and one final main annotation step. The annotation...
Article
Full-text available
We present the outcome of an annotation effort targeting the content-sensitive segmentation of German clinical reports into sections. We recruited an annotation team of up to eight medical students to annotate a clinical text corpus on a sentence-by-sentence basis in four pre-annotation iterations and one final main annotation step. The annotation...
Conference Paper
This paper introduces the Jena Document Information System (JeDIS). The focus lies on its capability to partition annotation graphs into modules. Annotation modules are defined in terms of types from the annotation schema. Modules allow easy manipulation of their annotations (deletion or update) and the creation of alternative annotations of indivi...
Preprint
The reliability of word embeddings algorithms, i.e., their ability to provide consistent computational judgments of word similarity when trained repeatedly on the same data set, has recently raised concerns. We compared the effect of probabilistic and weighting as downsampling strategies. We found the latter to provide superior reliability while be...
Conference Paper
Full-text available
We introduce JOCO, a novel text corpus for NLP analytics in the field of economics , business and management. This corpus is composed of corporate annual and social responsibility reports of the top 30 US, UK and German companies in the major (DJIA, FTSE 100, DAX), middle-sized (S&P 500, FTSE 250, MDAX) and technology (NASDAQ, FTSE AIM 100, TECDAX)...
Preprint
Full-text available
We here introduce a substantially extended version of JeSemE, a website for visually exploring computationally derived time-variant information on word meaning and lexical emotion assembled from five large diachronic text corpora. JeSemE is intended as an interactive tool for scholars in the (digital) humanities who are mostly limited to consulting...
Preprint
Full-text available
In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the numb...
Article
Full-text available
Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. "Smart Medical Information Technology for Healthcare (SMITH)" is one of four consortia funded by the German Medical Informatics Initiative (MI-I) to create an alliance of universities, university hospitals, rese...
Preprint
Full-text available
Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques...
Preprint
Full-text available
The emotional connotation attached to words undergoes language change. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive than polarity alone,...
Article
Full-text available
The legal culture in the European Union imposes almost unsurmountable hurdles to exploit copyright protected language data (in terms of intellectual property rights (IPRs) of media contents) and privacy protected medical health data (in terms of the notion of informational self-determination) as language resources for the NLP community. These jurid...
Article
Full-text available
We introduce 3000PA, a clinical document corpus composed of 3,000 EPRs from three different clinical sites, which will serve as the backbone of a national reference language resource for German clinical NLP. We outline its design principles, results from a medication annotation campaign and the evaluation of a first medication information extractio...
Conference Paper
Full-text available
In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the numb...
Conference Paper
Full-text available
Predicting the emotional value of lexical items is a well-known problem in sentiment analysis. While research has focused on polarity for quite a long time, meanwhile this early focus has been shifted to more expressive emotion representation models (such as Basic Emotions or Valence-Arousal-Dominance). This change resulted in a proliferation of he...
Conference Paper
Full-text available
Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques...
Conference Paper
Full-text available
Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques...
Conference Paper
The CRC AquaDiva is a large collaborative project spanning a variety of domains, such as biology, geology, chemistry, and computer science with the common goal to better understand the Earth's critical zone in particular how environmental conditions and surface properties shape the structure, properties, and functions of the subsurface. This necess...
Conference Paper
We introduce ADOnIS, an information system which coherently integrates two important, yet mostly disparate data sources, namely structured, tabular data, and unstructured data in terms of publications. The integration is achieved by providing the underlying background knowledge of the domains involved in terms of adequately tailored ontologies. Onc...
Conference Paper
Full-text available
While research on emotions has become one of the most productive areas at the intersection of cognitive science, artificial intelligence and natural language processing, the diversity and incommensurability of emotion models seriously hampers progress in the field. We here propose kNN regression as a simple, yet effective method for computationally...
Conference Paper
We present Joyce, a scalable tool for identifying and assembling relevant (pieces of) ontologies from a repository of source ontologies, thus enabling the effective and efficient reuse of formalized domain knowledge. Joyce includes a conceptual filter to identify relevant classes, minimizes unintended redundancies, i.e. concept duplicates, and excl...
Conference Paper
Full-text available
We here examine how different perspectives of understanding written discourse, like the reader's, the writer's or the text's point of view, affect the quality of emotion annotations. We conducted a series of annotation experiments on two corpora, a popular movie review corpus and a genre-and domain-balanced corpus of standard English. We found stat...
Conference Paper
Full-text available
We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the othe...
Article
With the increasing availability of complete full texts (journal articles), rather than their surrogates (titles, abstracts), as resources for text analytics, entirely new opportunities arise for information extraction and text mining from scholarly publications. Yet, we gathered evidence that a range of problems are encountered for full-text proce...
Conference Paper
Full-text available
We describe a novel method for measuring affective language in historical texts by expanding an affective lexicon and jointly adapting it to prior language stages. We automatically construct a lexicon for word-emotion association of 18th and 19th century German which is then validated against expert ratings. Subsequently, this resource is used to i...
Poster
Full-text available
Assuming that organizations can be granted a status as actors, resting on an organizationa, identity, we ask whether organizations also “have” emotions (as part of their identity and as a human-like trait) or at least can be attributed emotions in the sense of distinct and temporally stable emotional profiles.
Data
Supplementary Figures 1-39, Supplementary Tables 1-7, Supplementary References.
Data
List of differentially regulated phosphopeptides. Phosphopeptides showing fold ratio larger than 2 or 1.5 fold changes (FC) and p-values lower than 0.05 between time points 5 minutes, 10 minutes and 15 minutes after amino acid readdition compared to the starting time (0 minutes).
Data
SBML model including only the canonical amino acid input on mTORC1.
Data
SBML model including four amino acids input in the network (simple p70-S6K module).
Data
List of differentially regulated phosphosites. Phosphosites showing fold ratio larger than 2 or 1.5 fold changes (FC) and p-values lower than 0.05 between time points 5 minutes, 10 minutes and 15 minutes after amino acid readdition compared to the starting time (0 minutes).
Data
Text mining input and results for the detection of molecular event partners of AMPK reported in scientific texts (Medline and PubMed Central). Genes and proteins were mapped to their respective UniProt ID to avoid ambiguity. The event partners as well as the textual contexts of the events themselves are listed.
Data
Phosphoproteomic identification data. Contains excerpts from the output files "proteinGroups" including information on protein group identification and quantification, "peptides" including information about peptide identification and quantification and "PhosphoSTY" containing information about phosphopeptide identification and quantification as wel...
Data
SBML model similar to Supplementary Model 2, but including a more complex p70-S6K module.
Article
Full-text available
Amino acids (aa) are not only building blocks for proteins, but also signalling molecules, with the mammalian target of rapamycin complex 1 (mTORC1) acting as a key mediator. However, little is known about whether aa, independently of mTORC1, activate other kinases of the mTOR signalling network. To delineate aa-stimulated mTOR network dynamics, we...
Conference Paper
Full-text available
Emotion analysis (EA) and sentiment analysis are closely related tasks differing in the psychological phenomenon they aim to catch. We address fine-grained models for EA which treat the computation of the emotional status of narrative documents as a regression rather than a classification problem, as performed by coarse-grained approaches. We intro...
Conference Paper
Full-text available
We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language & Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an u...