Ulrich Heid’s research while affiliated with University of Hildesheim and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (154)


EMLex 2024
  • Article
  • Full-text available

November 2024

·

2 Reads

Lexicographica - International Annual for Lexicography / Internationales Jahrbuch für Lexikographie

Ulrich Heid
Download

Learning from students. On the design and usability of an e-dictionary of mathematical graph theory

September 2022

·

14 Reads

We created a prototype of an electronic dictionary for the mathematical domain of graph theory. We evaluate our prototype and compare its effectiveness in task-­based tests with that of Wikipedia. Our dictionary is based on a corpus; the terms and their definitions were automatically extracted and annotated by experts (cf. Kruse/Heid 2020). The dictionary is bilingual, covering German and English; it gives equivalents, definitions and semantically related terms. For the implementation of the dictionary, we used LexO (Bellandi et al. 2017). The target group of the dictionary are students of mathematics who attend lectures in German and work with English resources. We carried out tests to understand which items the students search for when they work on graph-­theoretical tasks. We ran the same test twice, with comparable student groups, either allowing Wikipedia as an information source or our dictionary. The dictionary seems to be especially helpful for students who already have a vague idea of a term because they can use the resource to check if their idea is right.


Figure 1. Example of a page from a contract with marking of text regions and their classes.
Figure 2. Pipeline for using global page features to classify layout elements.
Size of the corpus.
Confusion matrix from SVC for page layout classes
Results by CNN implementation with PyTorch for page layout classification as image classification. With an accuracy value of 0.89.

+2

Preparing Legal Documents for NLP Analysis: Improving the Classification of Text Elements by Using Page Features

January 2022

·

1,157 Reads

·

2 Citations

Legal documents often have a complex layout with many different headings, headers and footers, side notes, etc. For the further processing, it is important to extract these individual components correctly from a legally binding document, for example a signed PDF. A common approach to do so is to classify each (text) region of a page using its geometric and textual features. This approach works well, when the training and test data have a similar structure and when the documents of a collection to be analyzed have a rather uniform layout. We show that the use of global page properties can improve the accuracy of text element classification: we first classify each page into one of three layout types. After that, we can train a classifier for each of the three page types and thereby improve the accuracy on a manually annotated collection of 70 legal documents consisting of 20,938 text elements. When we split by page type, we achieve an improvement from 0.95 to 0.98 for single-column pages with left marginalia and from 0.95 to 0.96 for double-column pages. We developed our own feature-based method for page layout detection, which we benchmark against a standard implementation of a CNN image classifier. The approach presented here is based on corpus of freely available German contracts and general terms and conditions. Both the corpus and all manual annotations are made freely available. The method is language agnostic.


Remembering Sue Atkins

January 2022

·

12 Reads

International Journal of Lexicography

Michael Rundell

·

Jem Clear

·

Thierry Fontenelle

·

[...]

·

Krista Varantola

1. Introduction Michael Rundell Sue Atkins, who died in September at the age of 90, was a true visionary, and one of the most important and influential lexicographers of this or any other era. Her long and distinguished career began in the 1960s — when dictionary-making inhabited a rather amateurish, index-carded (and male-dominated) milieu — and continued well into the 21st century. By the time Sue retired (if she ever truly did), corpus-based lexicography was the norm, collaborations between lexicographers and linguists were almost routine, and tools for semi-automatic dictionary compilation were already well advanced. But the important point here is not simply that Sue lived through these dramatic changes — she was one of the key people driving them, and her impact will continue to be felt for years to come. The eight pieces collected here come from friends and colleagues who worked with Sue at different points in her career. They testify to her massive contributions across the range of activities — technical, theoretical, and practical — which her career spanned. They also show the great admiration and affection in which Sue was held by all who knew her and worked with her. Reading these reminiscences, one can’t help noticing a number of recurrent themes: her remarkable insight into the nature of language; her readiness to engage with people from other communities (academic linguists, computer scientists, and so on) in order to take things forward; her boundless energy and infectious enthusiasm; her organisational skills and talent for getting things done (and for knowing what needed to be done); and her wit, charm, and great capacity for fun and friendship. Whatever the situation, Sue was always good company, as many of these reminiscences attest.






10 Themenfeld Themenvielfalt

December 2021

·

12 Reads

Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren. Aber welche Prozesse, Inhalte und Kontexte spielen dabei eine Rolle? Dieser Frage widmete sich das interdisziplinäre Forschungsprojekt Rez@Kultur, dessen Ergebnisse hier erstmals umfassend dargestellt werden. Ergänzt werden die Befunde um Anschlussperspektiven und Kommentare aus Forschung und Praxis.


21 Forschungsdatenmanagement und Nachnutzung der Daten

December 2021

·

4 Reads

Online-Rezensionen zu künstlerischen Artefakten können Bildungsprozesse anstoßen. Sowohl in der produktiven Auseinandersetzung mit einem Werk als auch in der Aufbereitung dieser Erfahrung in einem rezensiven Text und für ein spezifisches Publikum liegt ein hohes Potenzial hinsichtlich der kulturellen Teilhabe und Überwindung von Bildungsbarrieren. Aber welche Prozesse, Inhalte und Kontexte spielen dabei eine Rolle? Dieser Frage widmete sich das interdisziplinäre Forschungsprojekt Rez@Kultur, dessen Ergebnisse hier erstmals umfassend dargestellt werden. Ergänzt werden die Befunde um Anschlussperspektiven und Kommentare aus Forschung und Praxis.


Citations (58)


... The queries can incorporate the hierarchical structures implied by many annotation types (e.g. syllables contained in words) but can also relate multiple independent annotation categories, such as prosodic annotation and part-of-speech-tagging (Gut et al. 2004). ...

Reference:

Mesotext. Digitised Emblems, Modelled Annotations and Humanities Scholarship
Querying annotated speech corpora
  • Citing Conference Paper
  • March 2004

... Based on their origin, electronic dictionaries can also be divided in two types -dictionaries transferred from existent print dictionaries or digitized print dictionaries and dictionaries compiled for the electronic medium or purposebuilt electronic dictionaries (Svensén, 2009: 438-439). The properties of a print dictionary that has been adapted to the electronic medium have been described by Debus-Gregor andHeid (2013: 1002) as 'somewhat "in between" those of a paper dictionary and those of a dictionary conceived to exist exclusively as an electronic tool'. Though, it should be noted that these electronic dictionaries may be very close or nearly identical to the original print dictionaries or, due to extensive use of advantages offered by the electronic medium, already differ from them considerably. ...

67. Design criteria and ‘added value’ of electronic dictionaries for human users
  • Citing Chapter
  • December 2013

... It should be notes within the discussion on the language of legislation documents, among other issues, the focus on their titles/ headings/ headlines emerged in the past century (Carlson, 1968). The 21st century consistently recognizes the importance of titles/headlines/headings in legal documents as legal titles/headings verbal representation either contributes or downplays the visibility of concept hierarchy within the above sources (Josi, et al., 2022;Sanchez, 2019). Currently scholars underline the importance of accurate and consistent verbal representation of legal titles/headings in the process of avoiding legal uncertainty (Mavroidis, 2022). ...

Preparing Legal Documents for NLP Analysis: Improving the Classification of Text Elements by Using Page Features

... Lü and Zhou (2004) propose a model for translating between English and Chinese collocations that they extract from monolingual corpora parsed with the NLPWin parser (Heidorn, 2000). Heid et al. (2008) extract German juridical terminology and use FSPAR (Schiehlen, 2003) to extract verb-object pairs. Weller and Heid (2010) use the same parser to extract German multiword expressions and their morphosyntactic features. ...

Providing corpus data for a dictionary for German juridical phraseology
  • Citing Chapter
  • September 2008

... Today, corpora and corpus-based tools are considered almost a conventional approach to building lexicographic materials (Sinclair 1992;Abdelzaher 2022). Therefore, it comes as a surprise that it has generally not been adequately acknowledged and precisely defined by technical dictionaries and glossaries, especially as the vocabulary volume in technical and scientific texts is not as large as in GE texts, thus having higher frequency (density) of core vocabulary (Chung and Nation 2004;Kovalev et al. 2019;Kruse and Heid 2021). This may be the case because of frequently and ad hoc developed technical (often bilingual) specialized glossaries (e.g. of some medical, business or nautical terms) which are often not compiled by language professionals, thus not receiving significant attention from lexicographers (cf. ...

Lemma Selection and Microstructure: Definitions and Semantic Relations Domain-Specific of a Domain-Specific e-Dictionary of the Mathematical Field of Graph Theory
  • Citing Conference Paper
  • November 2020

... Both corpora were compiled from open-access specialized journals and conference proceedings, and are similar in nature. In the case of the lexicography corpus, we used one compiled by Lindemann, Kliche and Heid (2018). The lexicography corpus, which can be considered a subfield (though not a subset) of the other, includes proceedings of Euralex and other conferences, as well as papers from open-source lexicography and lexicology journals. ...

LexBib: A Corpus and Bibliography of Metalexicographical Publications

... (Rundell 2012: 72) Good electronic dictionaries are characterised by the utilisation of electronic features enabled by computer technology and utilisation of virtually unlimited space on the internet. The interested reader is referred to De Schryver (2003), and Prinsloo (2019a) for a more detailed discussion of such features and to Bothma, Prinsloo and Heid (2018), , Prinsloo (2019a), Prinsloo and Bothma (2020) and Prinsloo and Taljard (2019) for detailed discussions on user support tools in electronic dictionaries. Bukantswe has more than 10,000 Sesotho entries with their English equivalents available from http://bukantswe.sesotho.org/. ...

A taxonomy of user guidance devices for e-lexicography
  • Citing Article
  • August 2018

Lexicographica - International Annual for Lexicography / Internationales Jahrbuch für Lexikographie

... G-MdS: The digital format offers far more options to 'solve' problematic lexicographical issues, especially for languages with complex morphology. English is trivial lexicographically speaking (it has typically two forms for nouns, four forms for verbs, and very little morphology and thus lemmatisation issues elsewhere); this is very different for many other language families, including for instance the Bantu languages, which are agglutinative, and where digital dictionaries may truly simplify look-up for both decoding as well as encoding purposes (Prinsloo, 2005;Prinsloo et al., 2012;Prinsloo et al., 2017). Moreover, for polysynthetic languages, such as several of the Amerindian languages, the only truly successful way to lemmatise lexis in a user-friendly way is in a digital product (Frawley et al., 2002). ...

Direct User Guidance in e-Dictionaries for Text Production and Text Reception — The Verbal Relative in Sepedi as a Case Study

Lexikos

... This research uses qualitative methods and is built based on modeling based on main theories (Mani, 2022;Chen & Luo, 2019;Schoormann et al., 2017). Research with a strong theoretical literature review and consistently summarizing previous research results can summarize theoretical prepositions and build modeling. ...

Semi-Automatic Development of Modelling Techniques with Computational Linguistics Methods – A Procedure Model and its Application

Lecture Notes in Business Information Processing