Swantje Westpfahl

Swantje Westpfahl
Institute for the German Language · Pragmatics

About

5
Publications
573
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
32
Citations

Publications

Publications (5)
Conference Paper
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctuation marks, the basis for segmenting spoken language corpora is not predetermined by the primary data, but rather has to be established by the corpus compilers. This impedes consistent querying and visualization of such data. Several ways of segment...
Preprint
Full-text available
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctuation marks, the basis for segmenting spoken language corpora is not predetermined by the primary data, but rather has to be established by the corpus compilers. This impedes consistent querying and visualization of such data. Several ways of segment...
Conference Paper
Full-text available
This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approac...
Conference Paper
Full-text available
In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for...
Conference Paper
Full-text available
Part-of-speech tagging (POS-tagging) of spoken data requires different means of annotation than POS-tagging of written and edited texts. In order to capture the features of German spoken language, a distinct tagset is needed to respond to the kinds of elements which only occur in speech. In order to create such a coherent tagset the most prominent...

Network

Cited By

Projects

Projects (2)
Project
The project aims to develop a method of segmentation that is adequate for the analysis of data from talk-in-interaction at different levels and for various communities of researchers. It evaluates and further develops approaches to segmentation put forward in the literature on conversation analysis, interactional linguistics, pragmatics and corpus linguistics by applying them to samples from three large collections of French and German audio and video recordings of various interaction types (the databases CLAPI, ESLO and FOLK, respectively). The project will result in a systematic segmentation guideline applicable across different interaction types and to French as well as German data. The project is the first approach to segmentation that is both based on comprehensive data treatment of a sufficiently large and diverse empirical basis and takes into account the cross-linguistic dimension. The results will improve the usability of the three databases, contribute to best practices for the work with oral corpora on a more general level, and enhance our understanding of structures of talk-in-interaction. The project will thus address current needs in conversation analysis, corpus-based language teaching, contrastive analysis of spoken German and French and in the development of language technology for interaction data.