Table 1 - uploaded by Julia Lavid
Content may be subject to copyright.
Source publication
In this paper we focus on the contrastive corpus annotation of certain aspects of the phenomena of Thematisation, on the one hand, and of Modality, on the other, in the framework of the CONTRANOT project, a research effort aimed at the creation and validation of contrastive functional descriptions through corpus analysis and annotation. Our most im...
Contexts in source publication
Context 1
... designed two annotation schemes, one for English and one for Spanish, including both coarse-grained and more ne-grained annotations. Table 1 below displays the English and the Spanish core tagsets which include six coarse-grained tags reecting the range of possi- ble thematic types which can occur as part of the ematic eld in English and Spanish declarative clauses, both in news reports and in commentaries. Denitions and realisations of these tags for each language are provided in Appendix 1 (for English) and Appendix 2 (for Spanish) at the end of the paper. ...
Context 2
... and might were not included and additional examples were introduced of the rest of the verbs, so that the total of 480 remained the same. is was done to focus on the modals which had proved problematic in the rst experiment and on which consensus had been reached, as described above. e overall inter-annotator results obtained are summarized in Table 9 below. Tables 10 and 11 specify the agreement level obtained for each of the English and the Spanish modals, respectively. ...
Context 3
... corpus annotation in the CONTRANOT project e general proportion of agreement does not seem to change in a very signicant way (there is not much dierence between Table 9 and Table 5), but Tables 10 and 11 show that there are variations depending on the individual verbs. It can be seen that the level of agreement increases signicantly for can and poder (present). ...
Citations
... The work developed has basically focused on English, although some cross-linguistic studies involving both European and non-European languages has emerged during the last decade. This includes contrastive work between English and Spanish journalistic texts (Marín and Perucha 2006;McCabe 2007), consumer reviews (Mora 2011, Carretero and Taboada 2009, 2010a, 2010b) and other text types (Taboada, Carretero and Hinnel, 2014;Lavid et al. 2014;Lavid, Carretero and Zamorano 2016). ...
Evaluation, opinion and subjectivity are related phenomena which are currently receiving attention both in the linguistic and the computational communities. One of the most influential theories that deal with the phenomenon of evaluation and subjectivity is the ‘Appraisal’ framework, which proposes that linguistic expressions of evaluative meanings such as emotion, attitude and opinion can be divided into three different axes: Attitude, Engagement and Graduation. To date, there is no large-scale cross-linguistic work in the computational or in the linguistic communities which has set out the task to validate the different features of Appraisal Theory empirically through corpus annotation, and to investigate its application to a relatively new genre, namely, mobile application reviews
The work developed in this dissertation is an attempt at validating aspects of Appraisal Theory in a contrastive manner (i.e. comparing English and Spanish), and at providing a cross-linguistic characterisation of this new review genre in terms of Appraisal features. The categories, empirically validated with a high degree of inter-annotator agreement, are used to annotate a larger bilingual corpus. The results showed interesting language-specific differences, such as a higher degree of straightforward strategies in the Spanish texts in opposition to a higher modulation of meanings in the English ones. In addition, some tendencies may be pointed out among different products (applications and games, vs. books and films) as well as specific aspects that are typically included in negative reviews (Pseudo-Questions and Judgement) or in positive reviews (Capacity).
... Como ejemplo ilustrativo, me referiré al fenómeno de la tematización en inglés y español, que ha sido anotado tanto en inglés como de forma contrastiva inglés-español por la autora de este trabajo y colegas del grupo de investigación FUNCAP (véase Lavid et al. 2014Lavid et al. , 2013; Arús, Lavid y Moratón 2013). ...
Resumen Esta conferencia se centra en la descripción de los principales desafíos con los que se enfrenta el investigador a la hora de crear corpus bilingües que puedan ser susceptibles de ser explotados tanto desde el punto de vista lingüístico como computacional. Utilizando como ejemplo el proyecto MULTINOT, el trabajo describe las diferentes fases de construcción de un corpus bilingüe y paralelo de alta calidad, anotado de forma multidimensional y diversificado en cuanto a registros y géneros lingüísticos. El corpus compilado en el proyecto consta de textos originales y traducciones en ambas direcciones y se ha diseñado como un recurso multifuncional que pueda ser de utilidad en diferentes disciplinas como la lingüística contrastiva, la traducción (incluída la automatica), la enseñanza de lenguas asistida por ordenador y la extracción de terminologías. Palabras clave: compilación, anotación, corpus bilingües, inglés-español 1. Introducción A pesar de la necesidad creciente de corpus de calidad y enriquecidos con anotaciones lingüísticas a diferentes niveles en diferentes lenguas en el ámbito del Procesamiento del Lenguaje Natural (PLN) y la necesidad de corpus paralelos interpretados lingüísticamente en los estudios de traducción, no existen corpus multifuncionales para el par de lenguas inglés-español cuyos rasgos –en términos de calidad del preprocesamiento, diversidad de registros y
This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.
This chapter reports on the contrastive analysis of interpersonal discourse markers (IDMs) in a sample of English and Spanish newspaper texts in three genres: news reports, editorials and letters to the editor. The sample was divided into a training dataset of eighteen (English-Spanish) comparable texts and a larger dataset of 220 texts, divided into 60 news reports, 60 editorials and 100 letters to the editor. Following the methodology of Hovy & Lavid (2010), we present a preliminary annotation scheme validated by an inter-Annotation agreement study. We then present the results of annotating the larger dataset, which reveals genre-related and language-specific variation in the distribution of IDMs in these newspaper genres. We discuss and provide some possible explanations for the results obtained.
This thesis, drawing on insights from Appraisal Theory, Pattern Grammar and Corpus Linguistics, explores the association between grammar patterns and attitudinal meanings. Particular attention is paid to adjective complementation patterns and Judgement, i.e. the ethical evaluation of human behaviour and character. Using a corpus compiled of biographical discourse, this study addresses four research questions: 1) whether the current JUDGEMENT system is sufficiently comprehensive and systematic to deal with the Judgement resources identified in this corpus, 2) what insights a detailed scrutiny of adjective-in-pattern exemplars can offer into the description and characterisation of attitudinal resources, 3) how local grammars of evaluation can be developed with the help of grammar patterns, and 4) what local grammars of evaluation may be useful for. It is suggested that the original JUDGEMENT system should be refined so as to enable it to deal effectively with the Judgement resources found. Drawing on evidence from both personality psychology and corpus analysis, Emotivity is proposed as a new sub-type of Judgement to account for those resources which construe attitudes towards emotional types of personality traits. The examination of adjective-in-pattern exemplars in terms of Attitude shows that grammar patterns are of limited use in distinguishing types of attitudinal meanings but that grammar patterns are a very useful heuristic to investigate attitudinal resources. Further, it is demonstrated that grammar patterns are a good starting point for the construction of local grammars of evaluation, which is exemplified by the local grammar of Judgement developed in the current study. Lastly, it is argued that local grammars of evaluation, in theory, provide an alternative way to model attitudinal meanings, and in practice, offer some insights into the automation of appraisal analysis. Other related issues (e.g. local grammar analyses of some special cases, replicability of the methodology) are also discussed.
This paper outlines current work on the construction of a high-quality, richly-annotated and register-diversified parallel corpus for the English-Spanish language pair, as currently carried out within the framework of the MULTINOT project. The corpus consists of original and translated texts in both directions and is designed as a multifunctional resource to be used in a number of disciplines such as corpus-based contrastive linguistic and translation studies, machine translation, computer-assisted translation, computer-assisted language learning and terminology extraction. The paper describes the structure of the corpus which includes four subcorpora: English originals (EO) and Spanish originals (SO), English translations (Etrans) and Spanish translations (Strans)-, the registers selected for inclusion in the corpus, and the methodology used to guarantee the quality of the processing steps to enrich the corpus with linguistic information at different levels.