Sarah Schulz

Sarah Schulz
Universität Stuttgart · Institute for Natural Language Processing

PhD

About

22
Publications
5,049
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
302
Citations
Introduction
Sarah Schulz currently works at the Institute for Natural Language Processing, Universität Stuttgart. Sarah does research in Computational Linguistics. Their current project is 'Center for Reflected Text Analytics (CRETA)'.
Skills and Expertise

Publications

Publications (22)
Book
Full-text available
The Center for Reflected Text Analytics (CRETA) develops interdisciplinary mixed methods for text analytics in the research fields of the digital humanities. This volume is a collection of text analyses from specialty fields including literary studies, linguistics, the social sciences, and philosophy. It thus offers an overview of the methodology o...
Chapter
Full-text available
In computational linguistics (CL), annotation is used with the goal of compiling data as the basis for machine learning approaches and automation. At the same time, in the Humanities scholars use annotation in the form of notetaking while reading texts. We claim that with the development of Digital Humanities (DH), annotation has become a method th...
Article
Full-text available
By building a part-of-speech (POS) tagger for Middle High German, we investigate strategies for dealing with a low resource, diverse and non-standard language in the domain of natural language processing. We highlight various aspects such as the data quantity needed for training and the influence of data quality on tagger performance. Since the lac...
Conference Paper
Full-text available
Coreference resolution is the task of grouping together references to the same discourse entity. Resolving coreference in literary texts could benefit a number of Digital Humanities (DH) tasks, such as analyzing the depiction of characters and/or their relations. Domain-dependent training data has shown to improve coreference resolution for many do...
Conference Paper
Full-text available
In computational linguistics (CL), annotation is used with the goal of compiling data as the basis for machine learning approaches and automation. At the same time, in the Humanities scholars use annotation in the form of note-taking while reading texts. We claim that with the development of Digital Humanities (DH), annotation has become a method t...
Conference Paper
Full-text available
The structure of the Digital Humanities master's program at University of Stuttgart is characterized by a big proportion of classes related to natural language processing. In this paper, we discuss the motivation for this design and associated challenges students and teachers are faced with. To provide background information, we also sum up our und...
Conference Paper
Full-text available
Das Ziel dieses Tutorials ist es, den Teilnehmerinnen und Teilnehmern konkrete und praktische Einblicke in einen Standardfall automatischer Textanalyse zu geben. Am Beispiel der automatischen Erkennung von Entitätenreferenzen gehen wir auf allgemeine Annahmen, Verfahrensweisen und methodische Standards bei maschinellen Lernverfahren ein. Die Teilne...
Presentation
Full-text available
Das Ziel dieses Tutorials ist es, den Teilnehmerinnen und Teilnehmern konkrete und praktische Einblicke in einen Standardfall automatischer Textanalyse zu geben. Am Beispiel der automatischen Erkennung von Entitätenreferenzen gehen wir auf allgemeine Annahmen, Verfahrensweisen und methodische Standards bei maschinellen Lernverfahren ein. Die Teilne...
Article
Full-text available
We characterize three notions of explainable AI that cut across research fields: opaque systems that offer no insight into its algo- rithmic mechanisms; interpretable systems where users can mathemat- ically analyze its algorithmic mechanisms; and comprehensible systems that emit symbols enabling user-driven explanations of how a conclusion is reac...
Conference Paper
Full-text available
One of the main obstacles for many Digital Humanities projects is the low data availability. Texts have to be digitized in an expensive and time consuming process whereas Optical Character Recognition (OCR) post-correction is one of the time-critical factors. At the example of OCR post-correction, we show the adaptation of a generic system to solve...
Presentation
Full-text available
Mit dem Tagger möchten wir nicht nur eine „Marktlücke“ schließen (denn bisher gibt es keinen frei verwendbaren PoS-Tagger für das Mittelhochdeutsche), sondern auch eine größtmögliche Anwendbarkeit auf mittelhochdeutsche Texte verschiedener Gattungen, Jahrhunderte und regionaler Varietäten erreichen und weiteren Arbeiten mit mittelhochdeutschen Text...
Article
Full-text available
This paper addresses challenges of Natural Language Processing (NLP) on non-canonical multilingual data in which two or more languages are mixed. It refers to code-switching which has become more popular in our daily life and therefore obtains an increasing amount of attention from the research community. We report our experience that cov- ers not...
Poster
Full-text available
In this paper, we describe computer-aided authorship testing on the Middle High German (MHG) text Apollonius von Tyrland written by Heinrich von Neustadt (HvN) in the late 13th century. Being based on a Latin original, HvN is suspected to incorporate other sources into the translation. We investigate assumptions regarding a segmentation of this tex...
Article
As social media constitutes a valuable source for data analysis for a wide range of applications, the need for handling such data arises. However, the nonstandard language used on social media poses problems for natural language processing (NLP) tools, as these are typically trained on standard language material. We propose a text normalization app...
Conference Paper
Full-text available
This paper addresses challenges of Natural Language Processing (NLP) on non-canonical multilingual data in which two or more languages are mixed. It refers to code-switching which has become more popular in our daily life and therefore obtains an increasing amount of attention from the research community. We report our experience that cov- ers not...
Conference Paper
Full-text available
In this paper, we report on the creation of a web corpus for the variety of German spoken in South Tyrol. We hence provide an example for the compilation of a corpus for a language variety that has neighboring varieties and for which the content on the internet is both sparse and published under various top-level domains. We discuss how we tackled...
Article
This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it h...
Conference Paper
Full-text available
Linguists working on the diachronic development of German observe a change in the inflectional verb system between Middle and Modern High German. In part due to the thinness of information it is difficult to determine the factors influencing this development. We investigated this with a model which simulates the change in the verbal morphology usin...
Conference Paper
Linguists working on the diachronic development of German observe a change in the inflectional verb system between Middle and Modern High German. In part due to the thinness of information it is difficult to determine the factors influencing this development. We investigated this with a model which simulates the change in the verbal morphology usin...
Conference Paper
Full-text available
In this paper we want to show, that the change of the German verb system from Middle High German to New High German can be simulated and explained by an adoption of the Iterated Learning Model. We claim, that the change of the German verb system is due to the frequency of usage and the process of overgeneralization. This is the first application of...

Projects

Project (1)
Project
CRETA focuses on the development of technical tools and a general workflow methodology for text analysis within Digital Humanities. Of particular importance is the transparency of tools and traceability of results, such that they can be employed in a critically-reflected way. CRETA is a collaboration project with partners from studies of literature, language, history, politics, philosophy, and computational linguistics and data visualization. http://www.creta.uni-stuttgart.de