Ulrich Heid's research while affiliated with Universität Hildesheim and other places

Publications (149)

Conference Paper
Full-text available
Legal documents often have a complex layout with many different headings, headers and footers, side notes, etc. For the further processing, it is important to extract these individual components correctly from a legally binding document, for example a signed PDF. A common approach to do so is to classify each (text) region of a page using its geome...
Chapter
Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren....
Chapter
Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren....
Chapter
Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren....
Chapter
Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren....
Chapter
Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren....
Chapter
Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren....
Chapter
Full-text available
In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning...
Conference Paper
We design a bilingual electronic dictionary for the mathematical domain of graph theory. The target group of the dictionary are students in the field, and the dictionary should support them in both cognitive and communicative situations. Therefore, it will not only provide equivalents but also an ontology of the terminology. The dictionary is based...
Conference Paper
Full-text available
We present a comparative evaluation study for splitting German compounds which belong to general language or to a specific domain. For the domain, we focus on DIY ("do-it-yourself"). The study consists of two parts: First, we evaluate three tools for compound splitting in German, one based on lexicons and corpus frequencies and two based on languag...
Conference Paper
Full-text available
For the analysis of contract texts, validated model texts, such as model clauses, can be used to identify used contract clauses. This paper investigates how the similarity between titles of model clauses and headings extracted from contracts can be computed, and which similarity measure is most suitable for this. For the calculation of the similari...
Article
This paper describes the resources and software procedures used or developed in a major enabling step towards the revision of the scholarly reference work A Dictionary of South African English on Historical Principles (DSAE, Silva et al. 1996), namely the semi-automatic generation of a digitally-sourced lexical database on which new and updated dic...
Article
Full-text available
The aim of this article is to give an overview of the nature of current lexicographic user guidance devices and to suggest a possible classification or taxonomy to serve as a guideline for future compilations of such tools to assist users in communicative and cognitive situations, especially in text production, text reception and computer assisted...
Article
The aim of this article is to give an overview of the nature of current lexicographic user guidance devices and to suggest a possible classification or taxonomy to serve as a guideline for future compilations of such tools to assist users in communicative and cognitive situations, especially in text production, text reception and computer assisted...
Conference Paper
Full-text available
This paper presents preliminary considerations regarding objectives and workflow of LexBib, a project which is currently being developed at the University of Hildesheim. We briefly describe the state of the art in electronic bibliographies in general, and bibliographies of lexicography and dictionary research in particular. The LexBib project is in...
Article
Full-text available
This article introduces a prototype of a writing (and learning) assistant for verbal relative clauses of the African language Sepedi, accessible from within a dictionary or from a word processor. It is an example of how a user support tool for complicated grammatical structures in a scarcely resourced language can be compiled. We describe a dynamic...
Chapter
Full-text available
In recent years, the number of domain-specific modelling techniques increased. Method engineering already provides text-based and semantic approaches which aim to unify constructs and allocate terminologies. As existing procedures are usually carried out manually, challenges arise such as reproducibility and standardization as well as ensuring qual...
Article
Full-text available
An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to guide users in text production; we focus on complex phenomena of the interaction between lexis and g...
Article
Full-text available
This paper describes the TTC Web platform, an online demonstrator to show the whole pipeline to compile bilingual terminologies out of comparable corpora gathered from the web using the tools developed in the TTC project Terminology Extraction, Translation Tools and Comparable Corpora. We present the whole chain which has been integrated into the p...
Conference Paper
Testing a theory against real world data can sometimes be helpful in figuring out the shortcomings of your current theory. In this paper, we test a theory about the syntax-semantics interface of German nach-particle verbs against data from a web corpus in order to see if we can use our automatic NLP machinery to corroborate the predictions of the t...
Article
Full-text available
Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users...
Article
The aim of this article is to discuss the design of a new English to Setswana dictionary for two narrowly defined target user groups of Setswana learners, i.e. Upper Primary (10 to 12 years old); and Junior Secondary (13 to 15 years old). The dictionary is intended to be a guide to text and speech production in the foreign language L2 (English) and...
Article
Full-text available
In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic analysis of the texts, which in turn allows for advanced data extraction. The tool component that i...
Conference Paper
Full-text available
In this paper, we discuss practical and methodological issues of the creation of reference term lists (RTLs) for the evaluation of mono-lingual and bilingual term candidate extraction from comparable corpora in the domains of wind energy and mobile technology. These reference term lists are intended to serve as a "gold standard" for the qualita-tiv...
Conference Paper
Full-text available
The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) has contributed to leveraging computer-assisted translation tools, machine translation systems and multilingual content (corpora and terminology) management tools by generating bilingual terminologies automatically from comparable corpora in seven EU languages, as we...
Article
In this paper, we deal with bilingual termi-nology extraction from comparable corpora. The extraction can be seen as a pipeline of processing steps. We will discuss grouping of term variants and describe two methods for bilingual term alignment of neoclassical terms: a knowledge-poor approach using string similarity measures and a linguisti-cally m...
Conference Paper
Full-text available
This paper presents usage scenarios of the platform being developed within the TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) along with the first feedback from potential users. The TTC project aims at leveraging translation tools, computer-assisted translation tools, and terminology management tools by automatically...
Article
In this paper, we have given an overview of computational linguistic tools available to us, which can be used to produce raw material for the lexicographic description of a specialised language. The underlying idea of our method is the following: what is significantly more frequent in a domain-specific text than in a general language reference text...
Article
Between classical symbolic word sense disambiguation (wsd) using explicit deep semantic representations of sentences and texts and statistical wsd using word co-occurrence information, there is a recent tendency towards mediating methods. Similar to so-called lightweight semantics (Marek, 2009) we suggest to only make sparse use of semantic informa...
Conference Paper
Full-text available
This paper presents the work on terminology extraction from comparable corpora for Latvian. In the first section we introduce our work; the second section briefly describes the concept of the project and the implemented general terminology processing chain; the following two sections focus on terminology extraction workflow for Latvian and evaluati...
Conference Paper
Full-text available
The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. This paper presents (i) a graph-based method for creating one such resource and (ii) a resource created using the method, a cross-lingual relatedness thesaurus. Given a word...
Conference Paper
Full-text available
The need for linguistic resources in any natural language application is undeniable. Lexicons and terminologies play indeed a central role in any machine translation tool, regardless of the theoretical foundations upon which the machine translation tool is based (e.g. statistical machine translation or rule-based machine translation). The EU projec...
Conference Paper
Most of the research on the extraction of idiomatic multiword expressions (MWEs) focused on the acquisition of MWE types. In the present work we investigate whether a text instance of a potentially idiomatic MWE is actually used idiomatically in a given context or not. Inspired by the dataset provided by (Cook et al., 2008), we manually analysed 9,...
Conference Paper
Full-text available
We present a new method, based on graph theory, for bilingual lexicon extraction without relying on resources with limited availability like parallel corpora. The graphs we use represent linguistic relations between words such as adjectival modification. We experiment with a number of ways of combining different linguistic relations and present a n...
Conference Paper
A major obstacle to part-of-speech (=POS) tagging of Northern Sotho (Bantu, S 32) are ambiguous function words. Many are highly polysemous and very frequent in texts, and their local context is not always distinctive. With certain taggers, this issue leads to comparatively poor results (between 88 and 92 % accuracy), especially when sizeable tagset...
Conference Paper
A major obstacle to part-of-speech (=POS) tagging of Northern Sotho (Bantu, S 32) are ambiguous function words. Many are highly polysemous and very frequent in texts, and their local context is not always distinctive. With certain taggers, this issue leads to comparatively poor results (between 88 and 92% accuracy), especially when sizeable tagset...
Article
The aim of this article is to describe the design and implementation of a verb guesser that will enhance the results of statistical part of speech (POS) tagging of verbs in Northern Sotho. It will be illustrated that verb stems in Northern Sotho can successfully be recognised by examining their suffixes and combinations of suffixes. Two approaches...
Article
Full-text available
On the development of a tagset for Northern Sotho with special reference to the issue of standardisation Working with corpora in the South African Bantu languages has up till now been limited to the utilisation of raw corpora. Such corpora, however, have limited functionality. Thus the next logi- cal step in any NLP application is the development o...
Article
Full-text available
We present a general approach to formally modelling corpora with multi-layered annotation in a typed logical representation language, OWL DL. By defining abstractions over the corpus data, we can generalise from a large set of individual corpus annotations, thereby inducing a lexicon model. The resulting combined corpus and lexicon model can be int...
Conference Paper
Full-text available
We present the main findings and preliminary results of an ongoing project aimed at developing a system for collocation extraction based on contextual morpho-syntactic properties. We explored two hybrid extraction methods: the first method applies language- indepedent statistical techniques followed by a linguistic filtering, while the second appro...
Conference Paper
Full-text available
Word sketches are part of the Sketch Engine corpus query system. They represent automatic, corpus-derived summaries of the words' grammatical and collocational behaviour. Besides the corpus itself, word sketches require a sketch grammar, a regular expression-based shallow grammar over the part-of-speech tags, to extract evidence for the properties...