Article

Text Structure in a Contrastive and Translational Perspective

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
This paper investigates text level differences in exo- and endocentric languages. It takes its point of departure in Italian and Danish anaphoric relations, distinguishing between coreferential, associative and resumptive anaphors and their distribution. The text informational value of the anaphors shows parallel characteristics to the lexical distribution of informational “weight” in exo- and endocentric nouns and verbs, and the anaphor distribution also reveals important typological differences in exo- and endocentric text structure. I argue that several of these structural differences may be accounted for by differences in exo- and endocentric “cognitive schemas”, caused by differences in the lexical systems. Very likely, also socio-cultural differences between the speech communities play a role in the difference between a more complex and hierarchic exocentric and a simpler and more linear endocentric text structure.
Article
”[Der findes ikke abstract til denne artikel]”
Technical Report
Full-text available
abstraction:instance whole:part process:step object:attribute 8It is also anomalous in another way: the widely used pattern of presenting a problem and its solution does not occur in this text The Conditional Schema --- 6; This second use of the Conditional schema is unusual principally because the condition (clause 7) is expressed after the consequence (clause 6) This may make the consequence more prominent or make it seem less uncertain The J u s t i f y S c h e m a --- 9; - The writer has argued his case to a conclusion, and now wants to argue for this unpopular conclusion again To gain acceptance for this tactic, and perhaps to show that a second argument is beginning, he says "Let's be clear " This is an instance of the J u s t i f y schema, shown in Figure 2 - Here the satellite is attempting to make acceptable the act of exoressinq the nuclear conceptual span The Concessive Schema - - 10; - The writer again employs the concessive schema, this time to show that favoring the NFI is consistent with voting against having CCC endorse it In clause 10, the writer concedes that he personally favors the NFI The T h e s i s / A n t i t h e s i s Schema - - 1 1 ; 12 The writer states his position by contrasting two actions: CCC endorsing the NFI, which he does not approve, and CCC acting on matters of process, which he does approve The Mechanisms of Descriptive RST In the preceding example we have seen how rhetorical schemas can be used to describe text This section describes the three basic mechanisms of descriptive RST which have been exemplified above: Schemas Relation Definitions Schema Application Conventions
Chapter
Full-text available
The papers collected in this volume (including a comprehensive introduction) investigate semantic and discourse-related aspects of subordination and coordination, in particular the relationship between subordination/coordination at the sentence level and subordination/coordination – or hierarchical/non-hierarchical organization – at the discourse level. The contributions in part I are concerned with central theoretical questions; part II consists of corpus-based cross-linguistic studies of clause combining and discourse structure, involving at least two of the languages English, German, Dutch, French and Norwegian; part III contains papers addressing specific – predominantly semantic – topics relating to German, English or French; and the papers in part IV approach the topic of subordination, coordination and rhetorical relations from a diachronic (Old Indic and Early Germanic) perspective. The book aims to contribute to a better understanding of information packaging on the sentence and text level related, within a particular language as well as cross-linguistically.
Chapter
Full-text available
Traditionally the study of syntax is restricted to the study of what goes on within the boundaries of the prosodic sentence. Although the nature of clause combining within a prosodic sentence has always been a central concern of traditional syntax (in GG, e.g. it underlies important research on deletion and anaphora), work within a discourse analysis framework has hardly been done. Analyses like this are given in the present volume.
Article
Full-text available
L'A. propose une analyse contrastive destinee a determiner comment la presentation de l'information est realisee structuralement dans diverses langues. Il montre que l'anglais et beaucoup d'autres langues germaniques exploitent principalement l'intonation a des fins informationnelles, alors que d'autres langues, comme le catalan, privilegient plutot la syntaxe, et que dans d'autres groupes de langues encore, le correlat structural premier est morphologique. L'A. tente ainsi d'identifier un ensemble de primitifs lies a la presentation de l'information et appliques a des faits translinguistiques et examine la maniere dont ils interagissent avec d'autres facteurs structuraux, pragmatiques et semantiques
Chapter
Full-text available
This contribution has three main goals. Firstly, so as to be able to accom-modate classifying modifiers, I will propose a revised model of the noun phrase, which consists of five concentric layers of modification rather than four, as it did in the previous model (section 2).2 It will also be claimed that discourse-referential modifiers, which are specified at the Interpersonal Level, only relate to the status of the referent in the world of discourse (Rijkhoff fc. a; Rijkhoff and Seibt 2005). That is to say, attitudinal modifi-ers of the kind proposed in Hengeveld (2004b) (see also Hengeveld this volume; Butler this volume) are not deemed to have their own slot in the layered model of the noun phrase defended here. Secondly, I will propose some modifications regarding the contextual component with special attention to noun phrases and argue that external reality (the “context of situation”) needs to be represented by a separate component and that each component of the FDG model constitutes a differ-ent kind of context (section 3). A single rule will be proposed to capture the influence of any contextual factor on the form, function or meaning of a linguistic expression. Section 4, finally, argues that in the current FDG model the Interperson-al Level in the grammatical component contains elements that actually belong to the external component (or “E-context”) and proposes alternative schemas for the NP and the clause without variables for Speaker and Ad-dressee. In this proposal the descriptive modifiers (i.e. classifying, qualify-ing, quantifying, and localizing modifiers) are specified at the Representa-tional Level (“language as carrier of content”). The Interpersonal Level is regarded as that part of the grammatical component that is concerned with “language as exchange” and accommodates three layers of modification, one for things or events, one for propositions, and one for clauses.
Article
Full-text available
Taking translation mismatches between (clause or VP) coordination and non- coordinated structures (sentence se- quences and syntactic subordination) as an observational point of departure, we discuss the interpretation of coordinated structures and their alternatives with a view to the relative discourse salience of (the units corresponding to) the conjuncts. We show that coordination is used some- what differently in Norwegian than in German and English, in particular, that syntactic coordination seems to be com- patible with some discourse relations in Norwegian that are blocked in German or English. We argue that this might chal- lenge the cross-linguistic validity of the definition of discourse relations in theo- ries like SDRT or RST. In particular, this concerns the distinction between coordi- nating (SDRT) or multinuclear (RST) and subordinating (SDRT) or nucleus-satellite (RST) relations and the diagnostic value of the coordination marker and (or its counterparts) as a signal of discourse co- ordination. We conclude that a more re- fined approach to discourse structure may be needed to account for the mapping of discourse relations onto syntactic con- structions and lexical items across lan- guages.
Article
Full-text available
The present paper is concerned with theoretical and practical aspects of paraphrasing or translating (German) texts showing a relatively high degree of syntactic complexity and informational density into (Norwegian) texts characterized by a less complex, more paratactical style, and vice versa. The theoretical setting is (segmented) discourse representional theory (Kamp and Reyle 1993; Asher 1993), which allows a whole family of informationally equivalent texts to be represented by one discourse representation structure and thus opens the way for a theoretical explication of the notion of (relative) informational density. It is shown that paraphrasing/translating a ''hypotactical'' into a ''paratactical'' text is governed by two principles - information splitting and discourse structure fidelity - that are, to a certain degree, in conflict with each other. The more information splitting is clone, the more difficult it will be to reconstruct the segmented discourse representational structure (SDRS) of the original text, that is, the overall discourse/text structure in the more traditional sense. Translation from ''paratactical'' into ''hypotactical'' texts calls for information collecting instead of information splitting; the main difficulty lies in assigning a SDRS to the text and determining which part of the information given in the text should be syntactically downgraded and how that should be done.
Article
Full-text available
Most linguists who have investigated linguistic categories from a universal viewpoint have accepted the existence of two basic parts of speech, NOUN and VERB. Other categories are found to be only inconsistently represented; thus ADJECTIVE is manifested in many languages as a class of stative verb. Furthermore, individual languages often have intermediate categories such as GERUND, which cannot be unambiguously assigned to a single category. We suggest here that the basic categories N and V are to be viewed as universal lexicalizations of the prototypical discourse functions of 'discourse-manipulable participant' and 'reported event', respectively. We find that the grammars of languages tend to label the categories N and V with morpho-syntactic markers which are iconically characteristic of these categories to the degree that a given instance of N or V approaches its prototypical function. In other words, the closer a form is to signaling this prime function, the more the language tends to recognize its function through morphemes typical of the category-e.g. deictic markers for N, tense markers for V. We conclude by suggesting that categoriality itself is another fundamental property of grammars which may be directly derived from discourse function.
Article
Full-text available
In recent proposals for the crucial internal structure of the framing Contextual Component within Functional Discourse Grammar (Hengeveld & Mackenzie 2006, 2008) — for example by Rijkhoff (2008:88-97) and Connolly (2007) —, what is here called text is considered as equivalent to discourse within an account of the NP (Rijkhoff) or of context (Connolly). The article purports to show that this conflation of text and discourse is not adequate to the task of describing and accounting satisfactorily for discourse-anaphoric reference in actual texts, in particular, and that a principled distinction between the two is needed. Discourse anaphora is a particularly good diagnostic of context, since it clearly involves a (co-)textual dimension, but also a discourse one, relating to the world of referents, properties and states of affairs. The context relevant for a given act of utterance is in constant development: the discourse derived via the text both depends on the context and at the same time changes it as the discourse is constructed on line. So both the (co-)text and the discourse (a provisional, hence revisable, interpretation of the preceding co-text and/or context), as well as the anchoring situation of utterance, must be represented within the Contextual Component within an FDG representation of a given communicative event.
Chapter
Full-text available
In this introductory chapter, Granger traces the development of Contrastive Linguistics and Translation Studies over the last decades to the present day, focusing on the role of the computer corpus in giving new impetus to each field and bringing them closer together. She discusses the different types of monolingual and multilingual corpora being used in CL and TS research, proposing at the same time a common corpus terminology. She then relates the contribution of the different corpus types to the major research interests of each discipline, highlighting the complementarity of the research and calling for increased cross-fertilization and resource-pooling. She then examines the practical issues of corpus exploitation, with a review of corpus analysis tools of particular value for CL and TS and finally, the contribution of corpus-based CL and TS research in the teaching of foreign languages and translation. 0. Contrastive Linguistics and Translation Studies, two converging disciplines Although the disciplines of Contrastive Linguistics (CL) and Translation Studies (TS) cover partly common ground, it is only recently, with the emergence of corpora, that they have started to converge. This rapprochement is apparent from recent publications 1 and conferences that have brought together specialists from the two fields, bearing witness to the vitality of multilingual studies in general. The history of Contrastive Linguistics has been characterized by a pattern of success-decline-success. CL was originally a purely applied enterprise, aiming to produce more efficient foreign language teaching methods and tools. Based on the general assumption that difference equals difficulty, CL, which in those days was called Contrastive Analysis (CA), consisted in charting areas of similarity and difference between languages and basing the teaching syllabus on the contrastive findings. Advances in the understanding of Second Language Acquisition (SLA) mechanisms led to a questioning of the very basis of CA. Interlingual factors were found to be less prevalent than other factors, among which intralingual mechanisms such as the overgeneralization of target rules and external factors such as the influence of teaching methods or personal factors like motivation. This led to the decline of CA, but not to its death. At first, it gave rise to some drastic pedagogical decisions, which in some cases culminated in a total ban of the mother tongue in FL teaching. But research (see Odlin 1989, Selinker 1992, James 1998) re-established transfer as a major - if not the major - factor in SLA, which in turn led to a progressive - albeit limited - return of contrastive
Article
Full-text available
The use and nature of clause combining in natural discourse are explored in this paper. First, a theory of text structure, Rhetorical Structure Theory, is introduced and illustrated for a number of short texts. Then, it is shown how the grammar of clause combining can be explained in terms of the structuring of text. The paper focuses on one particular way of combining clauses and shows how it is used to express a nuclear-satellite structuring of text identified by Rhetorical Structure Theory. Keywords: clause combining, discourse structure, functional grammar, natural language processing, Rhetorical Structure Theory, text generation, test linguistics.
Article
Full-text available
Rhetorical Structure Theory is a descriptive theory of a major aspect of the organization of natural text. It is a linguistically useful method for describing natural texts, characterizing their structure primarily in terms of relations that hold between parts of the text. This paper establishes a new definitional foundation for RST. The paper also examines three claims of RST: the predominance of nucleus/satellite structural patterns, the functional basis of hierarchy, and the communicative role of text structure.
Conference Paper
Full-text available
This paper shows that it is very often possible to identify the source language of medium-length speeches in the EU- ROPARL corpus on the basis of fre- quency counts of word n-grams (87.2%- 96.7% accuracy depending on classifica- tion method). The paper also examines in detail which positive markers are most powerful and identifies a number of lin- guistic aspects as well as culture- and domain-related ones. 1
Conference Paper
Full-text available
We describe a method for incorporating syntactic informa- tion in statistical machine translation systems. The first step of the method is to parse the source language string that is be- ing translated. The second step is to apply a series of trans- formations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statis- tical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2% Bleu score for a baseline system to 26.8% Bleu score for the system with reordering, a statistically significant improvement.
Conference Paper
Full-text available
We present the second version of the Penn Discourse Treebank, PDTB-2.0, describing its lexically-grounded annotations of discourse relations and their two abstract object arguments over the 1 million word Wall Street Journal corpus. We describe all aspects of the annotation, including (a) the argument structure of discourse relations, (b) the sense annotation of the relations, an d (c) the attribution of discourse relations and each of their arguments. We list the differences between PDTB-1.0 and PDTB-2.0. We present representative statistics for several aspects of the annotation in the corp us.
Chapter
This volume of new work explores the forms and functions of serial verbs. The introduction sets out the cross-linguistic parameters of variation, and the final chapter draws out a set of conclusions. These frame fourteen explorations of serial verb constructions and similar structures in languages from Asia, Africa, North, Central and South America, and the Pacific. Chapters on well-known languages such as Cantonese and Thai are set alongside the languages of small hunter-gatherer and slash-and-burn agriculturalist groups. A serial verb construction (sometimes just called serial verb) is a sequence of verbs which acts together as one. Each describes what can be conceptualized as a single event. They are monoclausal; their intonational properties are those of a monoverbal clause; they generally have just one tense, aspect, mood, and polarity value; and they are an important tool in cognitive packaging of events. Serial verb constructions are a pervasive feature of isolating languages of Asia and West Africa, and are also found in the languages of the Pacific, South, Central and North America, most of them endangered. Serial verbs have been a subject of interest among linguists for some time. This outstanding book is the first to study the phenomenon across languages of different typological and genetic profiles. The authors, all experienced linguistic fieldworkers, follow a unified typological approach and avoid formalisms. The book will interest students, at graduate level and above, of syntax, typology, language universals, information structure, and language contact. in departments of linguistics and anthroplogy.
Article
This paper investigates the use and function of the apposition. On the basis of a general distinction between “indicative” constituents (subjects, objects and complements of prepositions, whose text pragmatic function is to indicate - introduce or reiterate - text referents) and “predicative” constituents (predicatives and attributives, which describe - ascribe properties to - referents designated by indicative constituents), it is argued that the apposition belongs to the latter group. It can therefore never be referential or co-referential. The paper also examines the frequency of all the apposition types encountered in a corpus of written and oral Italian and Danish texts. The distribution found confirms the general predominance of nominalising structures in Romance over Germanic languages, and in written over oral texts.
Book
Why is a raven like a writing-desk? The concept of similarity lies at the heart of this new book on contrastive analysis. Similarity judgements depend partly on properties of the objects being compared, and partly on what the person judging considers to be relevant to the assessment; similarity thus has both objective and subjective aspects. The author shows how contrastive analysis and translation theory make use of the concept in different ways, and explains how it relates to the problematic notions of equivalence and tertium comparationis. The book then develops a meaning-based contrastive methodology, and outlines one theory of semantic structure which can be used in this methodology. The approach is illustrated with four sample studies covering different kinds of phenomena in some European languages. The final part of the book proposes an extension of the theoretical framework to cover contrastive rhetoric: the aim is to suggest a unified approach linking aspects of semantics, pragmatics and rhetoric. Keywords: similarity, contrastive analysis, functional grammar, semantics, rhetoric, translation.
Chapter
This paper presents a method for the analysis of connected speech (or writing).1 The method is formal, depending only on the occurrence of morphemes as distinguishable elements; it does not depend upon the analyst’s knowledge of the particular meaning of each morpheme. By the same token, the method does not give us any new information about the individual morphemic meanings that are being communicated in the discourse under investigation. But the fact that such new information is not obtained does not mean that we can discover nothing about the discourse but how the grammar of the language is exemplified within it. For even though we use formal procedures akin to those of descriptive linguistics, we can obtain new information about the particular text we are studying, information that goes beyond descriptive linguistics.
Book
This book sheds new light on Appositive Relative Clauses (ARCs), a structure that is generally studied from a merely syntactic point of view, in opposition to Determinative (or Restrictive) Relative Clauses (DRCs). In this volume, ARCs are examined from a discourse/pragmatic point of view, independently of DRCs, in order to provide a positive definition of the structure. After a presentation of the morphosyntactic, semantic and pragmatic characteristics of ARCs, a taxonomy of their functions in discourse is established for both written and spoken English based on the results of a corpus-based investigation. Constraints are then defined within an information-packaging approach to syntactic structures to show why speakers choose ARCs over other competing allostructures, i.e. syntactic structures that fulfil similar discourse functions (e.g. nominal appositives, independent clauses, adverbials, noun premodifiers, topicalization). The end result is a deeper understanding of the richness of ARCs in their natural contexts of use. Détails disponibles sur : http://www.benjamins.com/cgi-bin/t_bookview.cgi?bookid=SiDaG%2022
Book
Written by a leading researcher in the field, this fascinating examination of the relations between grammar, text, and discourse is designed to provoke critical discussion on key issues in discourse analysis which are not always clearly identified and examined. Written by a leading researcher in the field. Continues the enquiry into discourse analysis that Zellig Harris initiated 50 years ago, which raised a number of problematic issues that have remained unresolved ever since. Introduces the notion of pretext as an additional factor in the general interpretative process. Focuses attention specifically on the work of critical discourse analysis (CDA) in light of the issues discussed.
Article
In this paper we consider the crucial problem of providing ‘coherence relation’ analyses for natural texts. Among the several accounts currently being pursued, claims of broad compatibility usually combine simultaneously with a lack of detailed consensus. To improve on this situation, we take a selection of approaches from the current state of the art—ranging from theories rooted in functional linguistics to formal semantic discourse theory—and apply them to the concrete task of text analysis. Contrasting these analysis styles discloses a range of information assumed by the distinct approaches. We organize this information according to principles of linguistic “stratification,” “metafunction,” and “paradigmatic/syntagmatic axiality” in order to provide a bi‐stratal, three‐way classification system from which individual discourse structure relations can be motivated. This provides a more effective decomposition of the space of “discourse structure information” which at the same time synthesizes most previous views. We illustrate some of the benefits of the specification for linguistic analysis and point out problematic areas for future formal and computational specification.
Article
Michael Herslund, Le participe présent comme co-verbe The present article proposes an analysis of the French present participle (the -ant form) as a converb, i.e. as a non-finite verb form which forms a secondary predication with a primary verb and the subject of this verb. Contrary to the gérondif (the en -ant form), which always denotes a situation which is contemporary with but distinct from the situation denoted by the primary verb, the present participle as a converb denotes either the same situation as the primary verb or a salient aspect of it. In the verbal couple, verb + converb, the converb most often carries the most important information, cf. the prototypical example : Elle s'est levée mettant fin à l'entretien.
Article
L'A. propose une etude contrastive des fonctionnements des categories des noms et des verbes en francais, danois et anglais. Pour la categorie des verbes, l'A. constate que dans les verbes de mouvement, pendant que le francais contient un composant semantique DIRECTION ou TRAJET, le verbe danois comporte plutot un composant MANIERE. L'anglais s'avere etre lui un cas intermediaire. Pour la categorie des noms, le nom francais lexicalise de preference un composant CONFIGURATION alors que le danois selectionne la FONCTION. De ces constatations, l'A. differe deux types majeurs de langue: (1) endocentriques, ou le poids lexical est localise au centre de la preposition; (2) exocentriques, ou l'information converge vers les unites excentrees (e.g. actants du verbe).
Article
It is planned to publish this paper in three parts, in this and the two subsequent issues of the Journal of Linguistics. The three parts will consist respectively of the numbered sections 1–3, 4–7 and 8–10; references to section 4 onwards are thus to forthcoming parts of the paper. Sections 1–3 contain observations concerning transitivity; 4–7 deal with what is here referred to as ‘theme’, a general term for all those choices involving the distribution of information in the clause; in 8–10, transitivity is reconsidered in the light of certain further problems and of what has been said about theme, and some generalization is attempted.
Article
Structured meanings have evolved as a well-suited tool to describe the semantics of focus constructions (cf. von Stechow 1990; Jacobs 1991; Krifka 1992). In this paper, I will show how structured meanings can be combined with a framework of dynamic interpretation that allows for a cogent expression of anaphoric relations and presuppositions. I will concentrate in particular on the semantics of the focusing particle only and discuss several phenomena that have gone unnoticed or unsolved so far, for example the introduction of discourse markers in the scope of only and alternatives that are anaphorically related to quantifiers. In particular, I will show that the proposed representation format can handle sentences with multiple occurrences of focusing particles. The paper also includes a discussion of the behavior of negation with respect to presuppositions, and of principles that govern the interpretation of focus on quantified NPs.
Article
Explicitness or implicitness as assumed properties of translated texts and other texts in multilingual communication have for some time been the object of speculation and, at a later stage, of more systematic research in linguistics and translation studies. This paper undertakes an investigation of explicitness/implicitness and related phenomena of translated texts on the level of cohesion. A corpusbased research architecture, embedded in an empirical research methodology, will be outlined, and first results and possible explanations will be discussed. The paper starts with a terminological clarification of the concepts of 'explicitness' and 'explicitation' in terms of dependent variables to be investigated. The two terms — and their usage by other scholars — will be discussed. An electronic corpus will then be described which provides the empirical data and techniques for information extraction. For the investigation carried out using our corpus, indicators will then be derived on the basis of which operationalizations and hypotheses can be formulated for patterns of explicitation occurring between source and target texts. Some initial results relating to cohesive explicitness and explicitation in the data will be presented and discussed, with particular attention being paid to the areas of 'reference', 'substitution', 'ellipsis','conjunction', and 'lexical cohesion'. First attempts will also be made at explaining the findings.
Article
We collected a corpus of parallel text in 11 lan-guages from the proceedings of the European Par-liament, which are published on the web 1 . This cor-pus has found widespread use in the NLP commu-nity. Here, we focus on its acquisition and its appli-cation as training data for statistical machine trans-lation (SMT). We trained SMT systems for 110 lan-guage pairs, which reveal interesting clues into the challenges ahead.
Chapter
We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discourse-specific applications. Key wordsdiscourse–corpus–annotation–rhetorical structure
Article
This paper studies the distinction between subordinating and coordinating discourse relations, a distinction that governs the hierarchical structure of discourse. We provide linguistic tests to clarify which discourse relations are subordinating and which are coordinating. We argue that some relations are classified as subordinating or coordinating by default, a default that can be overridden in specific contexts. The distinction between subordinating and coordinating relations thus belongs to the level of information packaging in discourse and not to the level of information content or the semantics of the relations themselves.
Article
This chapter outlines what tools are used in corpus linguistics (CL) and how they can be applied, then discusses the benefits of employing corpus linguistics methods in the analysis of intercultural encounters. It is argued that intercultural studies can benefit from the application of corpus methods in terms of improving rigor and reducing perceived arbitrariness, specifically through the creation and analysis of smaller specialized corpora. CL has been effectively employed in cross-cultural comparisons. There are three necessary components in CL: a researcher, the corpus data stored in electronic form on a computer, and corpus software. The chapter also discusses the compatibility of corpus methods with other methods, and examines the issue of empiricism in relation to corpus linguistics. Interview and other background data were referred to during analysis. CL is also useful in pinpointing the absence of something, which can then be discussed with the analyzed discourse community.