
Manfred StedeUniversität Potsdam · Department Linguistik
Manfred Stede
PhD
About
189
Publications
43,751
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,366
Citations
Introduction
Additional affiliations
April 2001 - June 2020
Publications
Publications (189)
Hulme et al. (Nat Clim Change, 8:515–521, 2018) manually coded ‘frames’ in 490 Nature and Science editorials (1966–2016) they found relevant for climate change. We produced a digital version of the corpus and conducted a set of experiments: We explored many variants of supervised categorization for automatically reproducing the manual frame coding,...
We investigate the variation in oral and written language in terms of anaphoric distance (i.e., the textual distance between anaphors and their antecedents), expanding corpus-based research with experimental evidence. Contrastive corpus studies demonstrate that oral genres include longer average anaphoric distance than written genres, if the distan...
Adverbial connectives like therefore , which link a preceding ‘external’ to an ‘internal’ argument, can be regarded as anaphoric: The external argument is selected by an interpretation process akin to that of an event anaphor, and intervening material can appear between both arguments. We report on a crowdsourcing experiment on the German connectiv...
Recent years have seen increased interests in applying the latest technological innovations, including artificial intelligence (AI) and machine learning (ML), to the field of education. One of the main areas of interest to researchers is the use of ML to assist teachers in assessing students’ work on the one hand and to promote effective self-tutor...
In the last decade, the field of argument mining has grown notably. However, only relatively few studies have investigated argumentation in social media and specifically on Twitter. Here, we provide the, to our knowledge, first critical in-depth survey of the state of the art in tweet-based argument mining. We discuss approaches to modelling the st...
Background: Coherence is the quality that distinguishes discourse from a random collection of sentences. People with aphasia have been reported to produce less-coherent discourse than non-language-impaired speakers. It is largely unclear how coherence is established in natural language and what leads to its impairment in aphasia.
Aims: This paper p...
Reflecting in written form on one’s teaching enactments has been considered a facilitator for teachers’ professional growth in university-based preservice teacher education. Writing a structured reflection can be facilitated through external feedback. However, researchers noted that feedback in preservice teacher education often relies on holistic,...
Parsing of argumentative structures has become a very active line of research in recent years. Like discourse parsing or any other natural language task that requires prediction of linguistic structures, most approaches choose to learn a local model and then perform global decoding over the local probability distributions, often imposing constraint...
In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, a...
The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data s...
Background
Computational linguistic methodology allows quantification of speech abnormalities in non-affective psychosis. For this patient group, incoherent speech has long been described as a symptom of formal thought disorder. Our study is an interdisciplinary attempt at developing a model of incoherence in non-affective psychosis, informed by co...
Argumentation mining is a subfield of Computational Linguistics that aims (primarily) at automatically finding arguments and their structural components in natural language text. We provide a short introduction to this field, intended for an audience with a limited computational background. After explaining the subtasks involved in this problem of...
We present DiMLex-Bangla, a newly developed lexicon of discourse connectives in Bangla. The lexicon, upon completion of its first version, contains 123 Bangla connective entries, which are primarily compiled from the linguistic literature and translation of English discourse connectives. The lexicon compilation is later augmented by adding more con...
In this paper, we present a tangible outcome of the TextLink network: a joint online database project displaying and linking existing and newly-created lexicons of discourse connectives in multiple languages. We discuss the definition and demarcation of the class of connectives that should be included in such a resource, and present the syntactic,...
Argumentation Mining aims at finding components of arguments, as well as relations between them, in text. One of the largely unsolved problems is implicitness, where the text invites the reader to infer a missing component, such as the claim or a supporting statement. In the work of Wojatzki and Zesch (2016), an interesting implicitness problem is...
Computer-assisted text coding can facilitate the analysis of large text collections. To evaluate the functionality of providing an analyst with a ranked list of suggestions for suitable text codes, we used a data set of discussion posts, which had been manually coded for reasons given for taking a stance on the topic of vaccination. We trained a lo...
The OntoNotes corpus is widely used for training and testing coreference resolution systems, but only little attention has so far been given to the differences between the different genres of language that the corpus is composed of. We are primarily interested in the contrast between spoken and written language, and thus we conducted in-depth analy...
We present an approach to the extraction of arguments for explicit discourse relations in German, as a sub-task of the larger task of shallow discourse parsing for German. Using the Potsdam Commentary Corpus, we evaluate two methods (one based on constituency trees, the other based on dependency trees) to extract both the internal and the external...
Speech deficits are common symptoms among Parkinson's Disease (PD) patients. The automatic assessment of speech signals is promising for the evaluation of the neurological state and the speech quality of the patients. Recently , progress has been made in applying machine learning and computational methods to automatically evaluate the speech of PD...
Speech deficits are common symptoms among Parkinson's Disease (PD) patients. The automatic assessment of speech signals is promising for the evaluation of the neurological state and the speech quality of the patients. Recently, progress has been made in applying machine learning and computational methods to automatically evaluate the speech of PD p...
Incoherent discourse in schizophrenia has long been recognized as a dominant symptom of the mental disorder (Bleuler, 1911/1950). Recent studies have used modern sentence and word embeddings to compute coherence metrics for spontaneous speech in schizophrenia. While clinical ratings always have a subjective element, computational linguistic methodo...
This paper presents RST-Tace, a tool for automatic comparison and evaluation of RST trees. RST-Tace serves as an implementation of Iruskieta's comparison method, which allows trees to be compared and evaluated without the influence of decisions at lower levels in a tree in terms of four factors: constituent, attachment point, nuclearity as well as...
Strictly speaking, the generation (or synthesis) of argumentative text is outside the scope of mining, but nonetheless we consider the topic here, as it is a part of the wider field of argumentation technology and will be increasingly relevant for many applications. However, in contrast to analysis, the generation of arguments has so far received m...
Unlike many of the standard tasks in NLP, argumentation mining is not a single unified process, but a constellation of subtasks, which are of different prominence depending on the goals of the underlying target application. For a (hypothetical) example, in order to obtain the gist of a Twitter conversation, it can be sufficient to extract claims an...
The tasks explained in the previous two chapters were to label the components of an argument and, along the way, to identify spans of text that are not part of the argument. We illustrate this with an extended version of Example 6.1, using the subscripts C, S, A for claim, support, attack: (7.1) Last week I bought this new camera here. [You should...
Argumentation mining is an application of natural language processing (NLP) that emerged a few years ago and has recently enjoyed considerable popularity, as demonstrated by a series of international workshops and by a rising number of publications at the major conferences and journals of the field. Its goals are to identify argumentation in text o...
Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and then focus on those found in written text: connectives. To subdivide this class further, we follow the recent idea of structuring the set of connectives by differentiating b...
Parsing of argumentative structures has become a very active line of research in recent years. Like discourse parsing or any other natural language task that requires prediction of linguistic structures, most approaches choose to learn a local model and then perform global decoding over the local probability distributions, often imposing constraint...
Arguments used when vaccination is debated on Internet discussion forums might give us valuable insights into reasons behind vaccine hesitancy. In this study, we applied automatic topic modelling on a collection of 943 discussion posts in which vaccine was debated, and six distinct discussion topics were detected by the algorithm. When manually cod...
We present a text classifier that can distinguish Italian news stories from editorials. Inspired by earlier work on English, we built a suitable train/test corpus and implemented a range of features, which can predict the distinction with an accuracy of 89,12%. As demonstrated by the earlier work, such a feature-based approach outperforms simple ba...
We present a lexicon of Dutch Discourse Connectives (DisCoDict). Its content was obtained using a two-step process, in which we first exploited a parallel corpus and a German seed lexicon, and then manually evaluated the candidate entries against existing connective resources for Dutch, using these resources to complete our lexicon. We compared con...
The physical formats used to represent linguistic data and its annotations have evolved over the past four decades, accommodating different needs and perspectives as well as incorporating advances in data representation generally. This chapter provides an overview of representation formats with the aim of surveying the relevant issues for represent...
Newspaper text can be broadly divided in the classes ‘opinion’ (editorials, commentary, letters to the editor) and ‘neutral’ (reports). We describe a classification system for performing this separation, which uses a set of linguistically motivated features. Working with various English newspaper corpora, we demonstrate that it significantly outper...
Despite a substantial progress made in developing new sentiment lexicon generation (SLG) methods for English, the task of transferring these approaches to other languages and domains in a sound way still remains open. In this paper, we contribute to the solution of this problem by systematically comparing semi-automatic translations of common Engli...
This book offers a clear, critical, and comprehensive overview of theoretical and experimental work on information structure. Different chapters examine the main theories of information structure in syntax, phonology, and semantics as well as perspectives from psycholinguistics and other relevant fields. Following the editors’ introduction the book...
This chapter describes the contributions that Corpus Linguistics (the study of linguistic phenomena by means of systematically exploiting collections of naturally-occurring linguistic data) can make to IS research. It discusses issues of designing a corpus that can serve as a basis for qualitative or quantitative studies, and then turns to the cent...
For more than 10 years now, Sentiment Analysis has enjoyed enormous popularity in Computational Linguistics, one main reason being its great potential for practical applications, predominantly (but not only) for industrial purposes. We observe a tendency that early work referred to certain theoretical notions of Subjectivity, whereas a lot of the l...
Argument mining has started to yield early results in automatic analysis of text to produce representations of reason-conclusion structures. This paper addresses for the first time the question of automatically extracting such structures from dialogical settings of argument. More specifically, we introduce theoretical foundations for dialogical arg...
A simple conceptual model is employed to investigate events, and break the task of coreference resolution into two steps: semantic class detection and similaritybased matching. With this perspective an algorithm is implemented to cluster event mentions in a large-scale corpus. Results on test data from AQUAINT TimeML, which we annotated manually wi...
We investigate the differences and the levelsof difficulty for sentiment analysis on thetwo genres of newspaper text and twitter text(tweets). Two existing systems are comparedwith respect to their performance on bothgenres: SentiStrength (Thelwall et al., 2012)and SO-CAL (Taboada et al., 2011). Bothhave similar architectures, using hand-builtpolar...
In this paper, the authors consider argument mining as the task of building a formal representation for an argumentative piece of text. Their goal is to provide a critical survey of the literature on both the resulting representations (i.e., argument diagramming techniques) and on the various aspects of the automatic analysis process. For represent...
Annotating linguistic data has become a major field of interest, both for supplying the necessary data for machine learning approaches to NLP applications, and as a research issue in its own right. This comprises issues of technical formats, tools, and methodologies of annotation. We provide a brief overview of these notions and then introduce the...
The meaning of linguistic connectives has often been characterized in terms of their position in a bipartite (semantic, pragmatic) or a tripartite (content, epistemic, speech act) structure of domains, depending on what kinds of entities are being connected (largely: propositions or speech acts). This paper argues that a more fine-grained analysis...
Speech synthesis nowadays is of acceptable quality for many purposes. Nonetheless there are applications where contextual and other pragmatic factors play an important role, which cannot be accounted for by straightforward text-to-speech (TTS) systems. This is the case for systems giving product comparisons and recommendations: For instance, an app...
This white paper is part of a series that promotes knowledge about language technology and its potential. It addresses educators, journalists, politicians, language communities and others. The availability and use of language technology in Europe varies between languages. Consequently, the actions that are required to further support research and d...
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper...
Discourse Processing here is framed as marking up a text with structural descriptions on several levels, which can serve to support many language-processing or text-mining tasks. We first explore some ways of assigning structure on the document level: the logical document structure as determined by the layout of the text, its genre-specific content...
Images and videos resulting from diagnostic imaging procedures such as echocardiography need to be analyzed and interpreted by physicians in order to diagnose diseases of the patient. This process can be split into two steps: in a first step, various morphological features depicted in the images have to be interpreted and described. Then, a diagnos...
We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or...
As a result of the aging societies in the western world, the impact of dementia, with its characteristics like disorientation and obliviousness is becoming a significant problem to an increasing amount of persons and the health system. To enable such dementia patients to regain a self determined life, we have developed a mobile orientation system c...
Text documents are structured on (at least) two separate levels: The “logical” structure is largely reflected in the layout
(headlines, paragraphs, etc.), and the “content” structure specifies the functional zones that serve a part of the text’s
overall communicative purpose. The latter is clearly genre-specific, whereas the former is independent o...
A crucial step in the development of NLP systems is a detailed error analysis. Our system demonstration presents the infrastructure and the workflow for training classifiers for different NLP tasks and the verification of their predictions on annotated corpora. We describe an enhancement cycle of subsequent steps of classification and context-sensi...
A central step in the automatic processing of court decisions is the identification of the various content zones, i.e., breaking up the document into functionally independent areas. We assembled a corpus of German court decisions and
argue that this genre belongs to the class of semi-structured text documents. Currently, we are implementing zone id...
We present a taxonomy and classification system for distinguishing between differ- ent types of paragraphs in movie reviews: formal vs. functional paragraphs and, within the latter, between description and comment. The classification is used for sentiment extraction, achieving im- provement over a baseline without para- graph classification.
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday’s NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper...
This short paper describes our ongoing work on representing the argument structure of a particular class of persuasive texts, and on reading experiments designed to investigate the effects of certain rhetorical devices, in particular the use of explicit argumentative connectives.
Empirical studies of text coherence often use tree-like structures in the spirit of Rhetorical Structure Theory (RST) as representational
device. This paper identifies several sources of ambiguity in RST-inspired trees and argues that such structures are therefore
not as explanatory as a text representation should be. As an alternative, an approach...