Elke Teich

Elke Teich
Universität des Saarlandes | UKS · Language Science and Technology

Professor

About

122
Publications
16,260
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,114
Citations
Citations since 2016
34 Research Items
577 Citations
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
Introduction
My current research interests are focused on language change, language variation and (human) translation using computational language models and combining them with information-theoretic measures of variation. This ranges from plain n-gram models to neural models and includes surprisal, entropy and relative entropy as measures of information content. The main perspective adopted is rational communication, asking about the role of communicative concerns in linguistic variation and change.

Publications

Publications (122)
Chapter
Full-text available
This chapter will present lessons learned from CLARIN-D, the German CLARIN national consortium. Members of the CLARIN-D communities and of the CLARIN-D consortium have been engaged in innovative, data-driven, and communitybased research, using language resources and tools in the humanities and neigh - bouring disciplines. We will present different...
Chapter
Full-text available
While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation m...
Cover Page
Full-text available
Translatology is the theoretical and practical study of translation. It combines insights from linguistics, the humanities, cognitive and computer science to understand the process of translating between languages and the particular features characterizing language in translation. Central concepts of contemporary translatology are translationese, l...
Article
Full-text available
We present empirical evidence of the communicative utility of conventionalization , i.e., convergence in linguistic usage over time, and diversification , i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation...
Article
Full-text available
This paper provides an overview of metadata generation and management for the Royal Society Corpus (RSC), aiming to encourage discussion about the specific challenges in building substantial diachronic corpora intended to be used for linguistic and humanistic analysis. We discuss the motivations and goals of building the corpus, describe its compos...
Conference Paper
Full-text available
We report on an application of universal dependencies for the study of diachronic shifts in syntactic usage patterns. Our focus is on the evolution of Scientific English in the Late Modern English period (ca. 1700-1900). Our data set is the Royal Society Corpus (RSC), comprising the full set of publications of the Royal Society of London between 16...
Article
Full-text available
We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specializatio...
Book
Full-text available
This book is concerned with the contrastive-linguistic properties of English and German in system and text. The issue in focus here are the specific properties of English translations from German and German translations from English compared to their source language texts and to original texts in the same languages as the target languages. This inv...
Conference Paper
Translationese is a phenomenon present in human translations, simultaneous interpreting, and even machine translations. Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear. This study analyzes translationese patterns in translation, inter...
Presentation
Full-text available
There is a rich body of research on translationese in corpus-based translation studies (cf. Baker 1993 , Olohan & Baker 2000, Teich 2003), where a set of predefined features (for instance, type-token ratio, lexical density and sentence length) are typically applied and tested for significance, as well as in computational linguistics, where translat...
Presentation
Full-text available
Our aim is to identify the features distinguishing simultaneously interpreted texts from translations (apart from being more oral) and the characteristics they have in common which set them apart from originals (translationese features).
Article
Full-text available
We present a model of the linguistic development of scientific English from the mid-seventeenth to the late-nineteenth century, a period that witnessed significant political and social changes, including the evolution of modern science. There is a wealth of descriptive accounts of scientific English, both from a synchronic and a diachronic perspect...
Presentation
Full-text available
It has been argued that the process of translation leaves specific "fingerprints" on the translation product known as translationese (Gellerstam, 1986). While some authors have suggested that these fingerprints are universal (Baker, 1993, Chesterman, 2004), others have shown that there is a fair amount of variation among translations due to source...
Conference Paper
Full-text available
We present a data-driven approach to detect periods of linguistic change and the lexical and grammatical features contributing to change. We focus on the development of scientific English in the late modern period. Our approach is based on relative entropy (Kullback-Leibler Divergence) comparing temporally adjacent periods and sliding over the time...
Conference Paper
Full-text available
Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliame...
Conference Paper
Full-text available
We present an approach to investigate the differences between lexical words and function words and the respective parts-of-speech from an information-theoretical point of view (cf. Shannon, 1949). We use average surprisal (AvS) to measure the amount of information transmitted by a linguistic unit. We expect to nd function words to be more predictab...
Conference Paper
Full-text available
The Royal Society Corpus is a corpus of Early and Late modern English built in an agile process covering publications of the Royal Society of London from 1665 to 1869 (Kermes et al., 2016) with a size of approximately 30 million words. In this paper we will provide details on two aspects of the building process namely the mining of patterns for OCR...
Conference Paper
Full-text available
We present a new approach for modeling diachronic linguistic change in grammatical usage. We illustrate the approach on English scientific writing in Late Modern English, focusing on grammatical patterns that are potentially indicative of shifts in register, genre and/or style. Commonly, diachronic change is characterized by the relative frequency...
Conference Paper
We introduce a diachronic corpus of English scientific writing - the Royal Society Corpus (RSC) - adopting a middle ground between big and ‘poor’ and small and ‘rich’ data. The corpus has been built from an electronic version of the Transactions and Proceedings of the Royal Society of London and comprises c. 35 million tokens from the period 1665-1...
Article
Full-text available
We introduce IDeaL (Information Density and Linguistic Encoding), a collaborative research center that investigates the hypothesis that language use may be driven by the optimal use of the communication channel. From the point of view of linguistics, our approach promises to shed light on selected aspects of language variation that are hitherto not...
Conference Paper
We report on a project investigating the development of scientific writing in English from the mid-17th century to present. While scientific discourse is a much researched topic, including its historical development (see e.g. Banks (2008) in the context of Systemic Functional Grammar), it has so far not been modeled from the perspective of informat...
Conference Paper
Full-text available
The linguistic evolution of scientific writing is characterized by two major motifs: specialization and conventionalization. The assumption is that as scientific domains become more specialized, particular meanings become more predictable in these domains and call for denser encodings that minimize redundancy while maintaining accuracy in transmiss...
Article
Full-text available
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by th...
Conference Paper
Full-text available
A key task in corpus based approaches to linguistic analysis is to filter, order, and group language data according to some criteria, e.g., to find terms and phrases typical for some register, or to analyse the distributional context of terms. There exist two, complementary approaches to this end: One is to start with a concrete conceptual hypothes...
Conference Paper
Full-text available
This paper presents TeLeMaCo, a collaborative portal for training and teaching materials relevant in linguistics and digital humanities hosted at the CLARIN-D centre at Saarland University in Saarbrücken. The portal is easy to use both for casual users who search for teaching and training material and for community members who want to contribute de...
Conference Paper
Full-text available
We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics...
Conference Paper
Full-text available
Language resources are often compiled for the purpose of variational analysis, such as studying differences between genres, registers, and disciplines, regional and diachronic variation, influence of gender, cultural context, etc. Often the sheer number of potentially interesting contrastive pairs can get overwhelming due to the combinatorial explo...
Conference Paper
We present a semi-automatic approach to detect evaluative expressions, evaluated targets and the attribution structure involved in academic writing. The aim is to uncover the linguistic properties of evaluative expressions used in this genre. In terms of targets, evaluative expressions might precede (pre-modification, e.g., the importance of linear...
Conference Paper
Full-text available
This paper presents ongoing work on a collaborative portal and repository for training and teaching materials in linguistics. It is implemented as a web service and allows access to a wide range of multimedia materials, including text, powerpoint presentations and videos. Supporting semantic tagging of learning contents, our framework allows for fi...
Conference Paper
Research strands: systemic functional linguistics, register theory, corpus linguistics Keywords: register/genre analysis, diachronic linguistic evolution, computational stylistics, automatic text classification We present a corpus-based study on recent diachronic trends in scientific writing. The focus is on selected academic domains at the boundar...
Article
The overall goal of our research is to uncover the linguistic options of expressing negative attitude and experience. To this end, a small corpus of English newsgroup texts about relationship problems and eating disorders, part of the Englische & Deutsche Newsgroup Texte – Annotiertes Korpus (EDNA corpus), has been annotated manually, using Systemi...
Book
Full-text available
Aims and Scope The intuition that translations are somehow different from texts that are not translations has been around for many years, but most of the common linguistic frameworks are not comprehensive enough to account for the wealth and complexity of linguistic phenomena that make a translation a special kind of text. The present book provides...
Chapter
Introduction Choice is a central concept in Systemic Functional Linguistics (SFL), acting as the fundamental metaphor for what we do as language users. Apart from its use for representing options or terms in the linguistic system, choice is used to denote the process of choosing as well as the result of that process. As systemic linguists we make a...
Conference Paper
In the last few decades, a number of new scientific disciplines have emerged at the boundaries of computer science with other disciplines, e.g., computational linguistics or bioinformatics. Linguistically speaking, the registers of existing disciplines (e.g., computer science and linguistics, computer science and biology/genetics) come into contact...
Article
We report on a project investigating the lexico-grammatical properties of English scientific texts. The goal of this project is to gain insight into the linguistic effects of two scientific disciplines coming into contact with one another (e. g. computer science and linguistics) and possibly forming a merged, new discipline (i.e. computational ling...
Article
Full-text available
We report on a project investigating the linguistic properties of English scientific texts on the basis of a corpus of journal articles from nine academic disciplines. The goal of the project is to gain insights on registers emerging at the boundaries of computer science and some other discipline (e.g., bioinformatics, computational linguistics, co...
Article
Full-text available
Knowledge about Theme-Rheme serves the interpretation of a text in terms of its thematic progression and provides a window into the topicality of a text as well as text type (genre). This is potentially relevant for NLP tasks such as information extraction and text clas- sification. To explore this potential, large corpora annotated for Theme-Rheme...
Article
Full-text available
We present an XML-based data model that is deployed in a system for querying corpora with multiple layers of linguistic annotation. The model is based upon the simple, but effective idea of leaving each layer of annotation intact at annotation time and only relate the layers to each other at query time. Queries select parts of the layers or of the...
Conference Paper
Full-text available
It is commonly acknowledged that the evolution of new scientific disciplines is accompanied by specific communicative needs, thus posing particular re- quirements on linguistic expression. New registers develop that instantiate the linguistic system in novel ways and in the long run may cause changes in the linguistic system itself (cf. e.g., (5) o...
Article
Full-text available
As the interest in annotated corpora is spreading, there is increasing concern with using existing language technology for corpus processing. In this paper we ex-plore the idea of using natural language generation systems for corpus annotation. Resources for generation systems often fo-cus on areas of linguistic variability that are under-represent...