
Udo HahnFriedrich Schiller University Jena | FSU · Jena University Language & Information Engineering (JULIE) Lab
Udo Hahn
Doctor of Philosophy
About
481
Publications
49,241
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,888
Citations
Citations since 2017
Publications
Publications (481)
Phosphorylation-dependent signal transduction plays an important role in regulating the functions and fate of skeletal muscle cells. Central players in the phospho-signaling network are the protein kinases AKT, S6K, and RSK as part of the PI3K-AKT-mTOR-S6K and RAF-MEK-ERK-RSK pathways. However, despite their functional importance, knowledge about t...
Background:
Childhood asthma is a result of a complex interaction of genetic and environmental components causing epigenetic and immune dysregulation, airway inflammation and impaired lung function. Although different microarray based EWAS studies have been conducted, the impact of epigenetic regulation in asthma development is still widely unknow...
Background
As the number of concomitantly used drugs increases, the prevalence of medication risks increases. These include, for example, drug interactions which may reduce or increase the desired and undesired effects of individual drugs.Objectives
The POLypharmacy, drug interActions and Risks (POLAR) project aims to contribute to the detection of...
We describe the creation of GRASCCO, a novel German-language corpus composed of some 60 clinical documents with more than.43,000 tokens. GRASCCO is a synthetic corpus resulting from a series of alienation steps to obfuscate privacy-sensitive information contained in real clinical documents, the true origin of all GRASCCO texts. Therefore, it is pub...
Motivation
Knowledge about interactions between genes and proteins is vital for bio-molecular research. A large part of this knowledge is published in written text and not accessible in a structured way. To remedy this situation, several repositories of automatically extracted interaction facts were proposed over the years. However, existing soluti...
We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the othe...
We describe the adaptation of a non-clinical pseudonymization system, originally developed for a German email corpus, for clinical use. This tool replaces previously identified Protected Health Information (PHI) items as carriers of privacy-sensitive information (original names for people, organizations, places, etc.) with semantic type-conformant,...
We present EmoCoder, a modular encoder-decoder architecture that generalizes emotion analysis over different tasks (sentence-level, word-level, label-to-label mapping), domains (natural languages and their registers), and label formats (e.g., polarity classes, basic emotions, and affective dimensions). Experiments on 14 datasets indicate that EmoCo...
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based o...
Automated identification of advanced chronic kidney disease (CKD ≥ III) and of no known kidney disease (NKD) can support both clinicians and researchers. We hypothesized that identification of CKD and NKD can be improved, by combining information from different electronic health record (EHR) resources, comprising laboratory values, discharge summar...
Aryl hydrocarbon receptor (AHR) activation by tryptophan (Trp) catabolites enhances tumor malignancy and suppresses anti-tumor immunity. The context specificity of AHR target genes has so far impeded systematic investigation of AHR activity and its upstream enzymes across human cancers. A pan-tissue AHR signature, derived by natural language proces...
Objectives: We survey recent developments in medical Information Extraction (IE) as reported in the literature from the past three years. Our focus is on the fundamental methodological paradigm shift from standard Machine Learning (ML) techniques to Deep Neural Networks (DNNs). We describe applications of this new paradigm concentrating on two basi...
The lack of publicly available text corpora is a major obstacle for progress in clinical natural language processing, for non-English speaking countries in particular. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines in the field of...
We here describe the evolution of annotation guidelines for major clinical named entities, namely Diagnosis, Findings and Symptoms, on a corpus of approximately 1,000 German discharge letters. Due to their intrinsic opaqueness and complexity, clinical annotation tasks require continuous guideline tuning, beginning from the initial definition of cru...
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and...
The PI3K/Akt pathway promotes skeletal muscle growth and myogenic differentiation. Although its importance in skeletal muscle biology is well documented, many of its substrates remain to be identified. We here studied PI3K/Akt signaling in contracting skeletal muscle cells by quantitative phosphoproteomics. We identified the extended basophilic pho...
Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis. Yet, manually curated lexicons are only available for a handful of languages, leaving most languages of the world without such a precious resource for downstream applications. Even worse, their coverage is often li...
Genes and proteins are the fundamental entities of molecular genetics and deeper knowledge about their interactions constitutes a cornerstone for advancing precision medicine. We here introduce PROGENE (formerly called FSU-PRGE), a corpus that reflects our efforts to cope with this important class of named entities within the framework of a long-la...
The 2019 Precision Medicine Track at TREC (TREC-PM) aimed at identifying relevant documents from two collections, namely PubMed (biomedical abstracts) and ClinicalTrials.gov (clinical trials), given 40 precision medicine topics representing (virtual) patients. The organizers also proposed a new subtask on treatment retrieval from PubMed. We describ...
We examine the affective content of central bank press statements using emotion analysis. Our focus is on two major international players, the European Central Bank (ECB) and the US Federal Reserve Bank (Fed), covering a time span from 1998 through 2019. We reveal characteristic patterns in the emotional dimensions of valence, arousal, and dominanc...
We devised annotation guidelines for the de-identification of German clinical documents and assembled a corpus of 1,106 discharge summaries and transfer letters with 44K annotated protected health information (PHI) items. After three iteration rounds, our annotation team finally reached an inter-annotator agreement of 0.96 on the instance level and...
We here explore a new corpus construction workflow which exploits the inherent potential of the growing number of Digital Libraries worldwide and the ever-expanding Internet Archive. Rather than building corpora from scratch (which typically consumes a huge amount of resources), we search the Web for fragments of relevant digitized contents scatter...
All cells and organisms exhibit stress-coping mechanisms to ensure survival. Cytoplasmic protein-RNA assemblies termed stress granules are increasingly recognized to promote cellular survival under stress. Thus, they might represent tumor vulnerabilities that are currently poorly explored. The translation-inhibitory eIF2α kinases are established as...
We present the outcome of an annotation effort targeting the content-sensitive segmentation of German clinical reports into sections. We recruited an annotation team of up to eight medical students to annotate a clinical text corpus on a sentence-by-sentence basis in four pre-annotation iterations and one final main annotation step. The annotation...
We present the outcome of an annotation effort targeting the content-sensitive segmentation of German clinical reports into sections. We recruited an annotation team of up to eight medical students to annotate a clinical text corpus on a sentence-by-sentence basis in four pre-annotation iterations and one final main annotation step. The annotation...
This paper introduces the Jena Document Information System (JeDIS). The focus lies on its capability to partition annotation graphs into modules. Annotation modules are defined in terms of types from the annotation schema. Modules allow easy manipulation of their annotations (deletion or update) and the creation of alternative annotations of indivi...
The reliability of word embeddings algorithms, i.e., their ability to provide consistent computational judgments of word similarity when trained repeatedly on the same data set, has recently raised concerns. We compared the effect of probabilistic and weighting as downsampling strategies. We found the latter to provide superior reliability while be...
We introduce JOCO, a novel text corpus for NLP analytics in the field of economics , business and management. This corpus is composed of corporate annual and social responsibility reports of the top 30 US, UK and German companies in the major (DJIA, FTSE 100, DAX), middle-sized (S&P 500, FTSE 250, MDAX) and technology (NASDAQ, FTSE AIM 100, TECDAX)...
We here introduce a substantially extended version of JeSemE, a website for visually exploring computationally derived time-variant information on word meaning and lexical emotion assembled from five large diachronic text corpora. JeSemE is intended as an interactive tool for scholars in the (digital) humanities who are mostly limited to consulting...
In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the numb...
Introduction:
This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. "Smart Medical Information Technology for Healthcare (SMITH)" is one of four consortia funded by the German Medical Informatics Initiative (MI-I) to create an alliance of universities, university hospitals, rese...
Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques...
The emotional connotation attached to words undergoes language change. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive than polarity alone,...
The legal culture in the European Union imposes almost unsurmountable hurdles to exploit copyright protected language data (in terms of intellectual property rights (IPRs) of media contents) and privacy protected medical health data (in terms of the notion of informational self-determination) as language resources for the NLP community. These jurid...
We introduce 3000PA, a clinical document corpus composed of 3,000 EPRs from three different clinical sites, which will serve as the backbone of a national reference language resource for German clinical NLP. We outline its design principles, results from a medication annotation campaign and the evaluation of a first medication information extractio...
In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the numb...
Predicting the emotional value of lexical items is a well-known problem in sentiment analysis. While research has focused on polarity for quite a long time, meanwhile this early focus has been shifted to more expressive emotion representation models (such as Basic Emotions or Valence-Arousal-Dominance). This change resulted in a proliferation of he...
Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques...
Emotion Representation Mapping (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., mapping Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques...
The CRC AquaDiva is a large collaborative project spanning a variety of domains, such as biology, geology, chemistry, and computer science with the common goal to better understand the Earth's critical zone in particular how environmental conditions and surface properties shape the structure, properties, and functions of the subsurface. This necess...
We introduce ADOnIS, an information system which coherently integrates two important, yet mostly disparate data sources, namely structured, tabular data, and unstructured data in terms of publications. The integration is achieved by providing the underlying background knowledge of the domains involved in terms of adequately tailored ontologies. Onc...
While research on emotions has become one of the most productive areas at the intersection of cognitive science, artificial intelligence and natural language processing, the diversity and incommensurability of emotion models seriously hampers progress in the field. We here propose kNN regression as a simple, yet effective method for computationally...
We present Joyce, a scalable tool for identifying and assembling relevant (pieces of) ontologies from a repository of source ontologies, thus enabling the effective and efficient reuse of formalized domain knowledge. Joyce includes a conceptual filter to identify relevant classes, minimizes unintended redundancies, i.e. concept duplicates, and excl...
We here examine how different perspectives of understanding written discourse, like the reader's, the writer's or the text's point of view, affect the quality of emotion annotations. We conducted a series of annotation experiments on two corpora, a popular movie review corpus and a genre-and domain-balanced corpus of standard English. We found stat...
We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the othe...
With the increasing availability of complete full texts (journal articles), rather than their surrogates (titles, abstracts), as resources for text analytics, entirely new opportunities arise for information extraction and text mining from scholarly publications. Yet, we gathered evidence that a range of problems are encountered for full-text proce...
We describe a novel method for measuring affective language in historical texts by expanding an affective lexicon and jointly adapting it to prior language stages. We automatically construct a lexicon for word-emotion association of 18th and 19th century German which is then validated against expert ratings. Subsequently, this resource is used to i...
Assuming that organizations can be granted a status as actors, resting on an organizationa, identity, we ask whether organizations also “have” emotions (as part of their identity and as a human-like trait) or at least can be attributed emotions in the sense of distinct and temporally stable emotional profiles.
Supplementary Figures 1-39, Supplementary Tables 1-7, Supplementary References.
List of differentially regulated phosphopeptides. Phosphopeptides showing fold ratio larger than 2 or 1.5 fold changes (FC) and p-values lower than 0.05 between time points 5 minutes, 10 minutes and 15 minutes after amino acid readdition compared to the starting time (0 minutes).
SBML model including only the canonical amino acid input on mTORC1.
SBML model including four amino acids input in the network (simple p70-S6K module).
List of differentially regulated phosphosites. Phosphosites showing fold ratio larger than 2 or 1.5 fold changes (FC) and p-values lower than 0.05 between time points 5 minutes, 10 minutes and 15 minutes after amino acid readdition compared to the starting time (0 minutes).
Text mining input and results for the detection of molecular event partners of AMPK reported in scientific texts (Medline and PubMed Central). Genes and proteins were mapped to their respective UniProt ID to avoid ambiguity. The event partners as well as the textual contexts of the events themselves are listed.
Phosphoproteomic identification data. Contains excerpts from the output files "proteinGroups" including information on protein group identification and quantification, "peptides" including information about peptide identification and quantification and "PhosphoSTY" containing information about phosphopeptide identification and quantification as wel...
SBML model similar to Supplementary Model 2, but including a more complex p70-S6K module.
Amino acids (aa) are not only building blocks for proteins, but also signalling molecules, with the mammalian target of rapamycin complex 1 (mTORC1) acting as a key mediator. However, little is known about whether aa, independently of mTORC1, activate other kinases of the mTOR signalling network. To delineate aa-stimulated mTOR network dynamics, we...
Emotion analysis (EA) and sentiment analysis are closely related tasks differing in the psychological phenomenon they aim to catch. We address fine-grained models for EA which treat the computation of the emotional status of narrative documents as a regression rather than a classification problem, as performed by coarse-grained approaches. We intro...
We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language & Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an u...