added 3 research items
MODERN (subproject A1)
Discourse segmentation is an important step in the process of annotating coherence relations. Ideally, implementing segmentation rules results in text segments that correspond to the units of thought related to each other. This paper demonstrates that accurate segmentation is in part dependent on the propositional content of text fragments, and that completely separating segmentation and annotation does not always yield text segments that correspond to the text units between which a conceptual relationship holds. In addition, it argues that elements belonging to the propositional content of the discourse should necessarily be included in the segmentation, but that inclusion of other text elements, for instance stance markers, should be optional.
The Cognitive approach to Coherence Relations (Sanders, Spooren, & Noordman, 1992) was originally proposed as a set of cognitively plausible primitives to order coherence relations, but is also increasingly used as a discourse annotation scheme. This paper provides an overview of new CCR distinctions that have been proposed over the years, summarizes the most important discussions about the operationalization of the primitives, and introduces a new distinction (DISJUNCTION) to the taxonomy to improve the descriptive adequacy of CCR. In addition, it reflects on the use of the CCR as an annotation scheme in practice. The overall aim of the paper is to provide an overview of state-of-the-art CCR for discourse annotation that can form, together with the original 1992 proposal, a comprehensive starting point for anyone interested in annotating discourse using CCR.
Connectives and cue phrases are the most prototypical linguistic elements that signal coherence relations, but by limiting our attention to connectives, we are likely missing out on important other cues readers and listeners use when establishing coherence relations. However, defining the role of other types of linguistic elements in the signaling of coherence relations is not straightforward, and it is also not obvious why and how non-connective elements function as signals for coherence relations. In this paper, we aim to develop a systematic way of categorizing segment-internal elements as signals of coherence relations on the basis of a literature review and evidence from parallel corpora. We propose a three-way distinction between division of labor, agreement , and general collocation to categorize the different ways in which elements inside discourse segments interact with connectives in the marking of coherence relations. In each type of interaction, segment-internal elements can function as signals for coherence relations, but the mechanism behind it is slightly different for each type.
Coherence relations can be made linguistically explicit by means of connectives (e.g., but, because) or cue phrases (e.g., on the other hand, which is why), but can also be left implicit and conveyed through the juxtaposition of two clauses or sentences. However, it seems that not all relations are equally easy to reconstruct when they are implicit. In this paper, we explore which features of coherence relations make them more, or less, likely to be conveyed implicitly. We adopt the assumption that expected relations are more often implicit than relations that are not expected, and propose to determine a relation's expectedness using the notion of cognitive complexity. We test our hypotheses by means of a parallel corpus study, in which we analyze the translations of explicit English coherence relations from the Europarl Direct corpus into four target languages: Dutch, German, French, and Spanish. We find that cognitive complexity indeed influences the linguistic marking of coherence relations, and that this does not vary between the languages in our corpus. In addition, we find that a relation's relational and syntactic dependency also influences its linguistic marking, but that these measures are not completely independent of relation type.