
Alan AkbikHumboldt-Universität zu Berlin | HU Berlin · Department of Computer Science
Alan Akbik
Doctor of Engineering
About
25
Publications
5,192
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,803
Citations
Citations since 2017
Publications
Publications (25)
Current state-of-the-art approaches to text classification typically leverage BERT-style Transformer models with a softmax classifier, jointly fine-tuned to predict class labels of a target task. In this paper, we instead propose an alternative training objective in which we learn task-specific embeddings of text: our proposed objective learns embe...
Medical coding (MC) is an essential pre-requisite for reliable data retrieval and reporting. Given a free-text reported term (RT) such as "pain of right thigh to the knee", the task is to identify the matching lowest-level term (LLT) - in this case "unilateral leg pain" - from a very large and continuously growing repository of standardized medical...
Named entity recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, be highly accurate, and be robust towards variations in text genre and style. We present HunFlair, an NER tagger fulfilling these requirements. HunFlair is integrated into the widely-u...
Current state-of-the-art approaches for named entity recognition (NER) using BERT-style transformers typically use one of two different approaches: (1) The first fine-tunes the transformer itself on the NER task and adds only a simple linear layer for word-level predictions. (2) The second uses the transformer only to provide features to a standard...
Recent advances in language modeling using recurrent neural networks have made it viable to model language as distributions over characters. By learning to predict the next character on the basis of previous characters, such models have been shown to automatically internalize linguistic concepts such as words, sentences, subclauses and even sentime...
Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syn...
Digitally collected data suffers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and missing data. Data cleaning systems must be able to treat data quality rules holistically, to...
The task of Relation Extraction (RE) is concerned with creating extractors that automatically find structured, relational information in unstructured data such as natural language text. Motivated by an explosion of sources of readily available text data such as the Web, RE offers intriguing possibilities for querying, organizing, and analyzing info...
Semantic role labeling (SRL) is crucial to natural language understanding as it identifies the predicate-Argument structure in text with semantic labels. Unfortunately, resources required to construct SRL models are expensive to obtain and simply do not exist for most languages. In this paper, we present a two-stage method to enable the constructio...
We present SCHNÄPPER, a web toolkit for Exploratory Relation Extraction (ERE). The tool allows users to identify relations of interest in a very large text corpus in an exploratory and highly interactive fashion. With this tool, we demonstrate the easeof- use and intuitive nature of ERE, as well as its applicability to large corpora. We show how us...
In this paper, we prose to build a repository of events and event references from clusters of news articles. We present an automated approach that is based on the hypothesis that if two sentences are a) found in the same cluster of news articles and b) contain temporal expressions that reference the same point in time, they are likely to refer to t...
Current techniques for Open Information Extraction (OIE) focus on the extraction of binary facts and suffer significant quality loss for the task of extracting higher order N-ary facts. This quality loss may not only affect the correctness, but also the completeness of an extracted fact. We present KrakeN, an OIE system specifically designed to cap...
Unsupervised Relation Extraction (URE) is the task of extracting relations of a priori unknown semantic types using clustering methods on a vector space model of entity pairs and patterns. In this paper, we show that an informed feature generation technique based on dependency trees significantly improves clustering quality, as measured by the F-sc...
A great share of applications in modern information tech-nology can benefit from large coverage, machine accessible knowledge bases. However, the bigger part of todays knowl-edge is provided in the form of unstructured data, mostly plain text. As an initial step to exploit such data, we present Wanderlust, an algorithm that automatically extracts s...