ChapterPDF Available

Detección, descripción y contraste de las unidades fraseológicas mediante tecnologías lingüísticas

Authors:

Abstract

El presente capítulo aborda los desarrollos recientes de la fraseología computacional, la cual engloba tanto el procesamiento de expresiones multiverbales como la aplicación de técnicas de PLN para el análisis de unidades fraseológicas (UFs). Como telón de fondo, se parte del lugar que ocupa la fraseología en los distintos niveles competenciales del MCER, y se destacan las posibilidades que ofrecen las tecnologías lingüísticas para la resolución de cuestiones pendientes, como la detección, selección y análisis de UFs. A continuación se pasa revista a los tipos de unidades más frecuente-mente estudiadas por la fraseología y el procesamiento de expresiones multiverbales a fin de tender puentes de unión entre ambas disciplinas. En este sentido, se presenta una metodología de corpus, de corte pragmático-discursivo, que permite el análisis exhaustivo de UFs, con especial referencia a sus límites formales y núcleos residuales, así como a la productividad y creatividad fraseológicas. Finalmente, la metodología propuesta se amplía para dar cabida a la comparación y al contraste de unidades fraseológicas, con vistas a su traducción. En este trabajo se emplea una amplia gama de aplicaciones de PLN que incluyen corpus en línea, sistemas de gestión de corpus web, bases de datos de conocimiento y sistemas CLIR. Finalmente, se aboga por un enfoque pragmático-tecnológico de la fraseología, basado en el análisis de datos reales. Palabras clave: fraseología computacional, PLN, enfoques basados en corpus, unidad fraseológica, expresión multiverbal. This chapter deals with recent developments in computational phraseology, ie. the processing of MWE and NLP applications for the analysis of phraselogical units. The starting point is a discussion of the status of phraseology within the CEFR's proficiency levels. It is established that language technologies can shed light on important—and yet—unsolved issues on the detection, selection and analysis of phraseological units. Next, frequently studied types of multiword units are revised in order to build bridges
A preview of the PDF is not available
... In addition, there are certainly heterogeneous phraseological challenges involved in translating any fantastic work [36]. For that reason, mastery of phraseology is crucial for literary translators, who face numerous challenges that test their skills [41]. ...
... Indeed, they hold the same view as the participants of these studies, i.e., they truly believe that NMT systems are still not able to convey all the literary challenges, but not all of them were completely against them. This partial acceptance might occur due not only to the current challenges that NMT systems face in dealing with literary texts [19], but also to the difficulty of translating MWEs [41,49], especially discontinuous or manipulated MWEs [36]. ...
Article
Full-text available
In the digital era, the (r)evolution of neural machine translation (NMT) has reshaped both the market and translators’ workflow. However, the adoption of this technology has not fully reached the creative field of literary translation. Against this background, this study aims to explore to what extent NMT systems can be used to translate the creative challenges posed by idioms, specifically manipulated multiword expressions (MWEs) found in literary texts. To carry out this pilot study, five manipulated MWEs were selected from a fantasy novel and machine-translated (English > Spanish) by four NMT systems (DeepL, Google Translate, Bing Translator, and Reverso). Then, each NMT output as well as a human translation are assessed by six professional literary translators by using a human evaluation sheet. Based on these results, the creativity obtained in each translation method was calculated. Despite the satisfactory performance of both DeepL and Google Translate, HT creativity was highly superior in almost all manipulated MWEs. To the best of our knowledge, this paper not only contributes to the ongoing study of NMT applied to literature, but it is also one of the few studies that delve into the almost unexplored field of assessing creativity in neural machine-translated MWEs.
... In other words, we attempt to elucidate English speakers' mental and linguistic and non-linguistic representations when pertaining to irony. Among the various representations, we will pay special attention to phraseological units, which, in line with authors in the field of phraseology, serve as an umbrella term (Corpas Pastor, 2013a) to refer to ready-made recurrent units of two or more lexical elements (e.g., collocations, idiomatic expressions) and which are cognitively salient among languages (Corpas Pastor, 1996, 2003, 2013bNaciscione, 2010). Furthermore, we aim to offer informed options for a follow-up study on irony teaching with American-English learners of Spanish (Martín-Gascón, 2023). ...
Chapter
The investigation of phraseology through corpus-based and computational approaches holds significant relevance for various professionals, including translators, interpreters, terminologists, lexicographers, language instructors, and learners. Computational Phraseology, and in particular the computational analysis of multiword expressions (also known as multiword units), has gained prominence in recent years and is essential for a number of Natural Language Processing and Translation Technology applications. The failure to detect these units automatically could result in incorrect and problematic automatic translations and could hinder the performance of applications such as text summarisation and web search. Against this background, the volume offers 13 articles carefully selected and organised into two parts: ‘Computational treatment of multiword units’ and ‘Corpus-based and linguistic studies in phraseology‘. The contributions not only highlight the latest advancements in computational and corpus-based phraseology but also reiterate its vital role in all areas of language technologies, including basic and applied research.
... The interweaving of different disciplines such as Corpus Linguistics (branch that studies data obtained from corpora), Natural Language Processing (NLP) (branch of Artificial Intelligence that helps machines to understand and process spoken and written human language) or Computational Linguistics (area of NLP that studies the development of linguistic applications using computational technologies) has had a significant impact on Translation and Interpreting studies (González Fernández, 2018;Corpas Pastor et al., 2021). Against this background, several ICT technologies can be found to assist translators, such as corpora, online glossaries, repertoires, encyclopaedias or databases, spell checkers, online monolingual or bilingual dictionaries, revision tools, parallel texts, or lexicons, among others (Merlo Vega, 2005;Biau Gil and Pym, 2006;Corpas Pastor, 2013;Surià López, 2014;Bowker and Corpas Pastor, 2022;Rothwell et al., 2023). These resources can be used to uncover relevant information about a particular author or the socio-cultural context of the (literary) text, and an infinite number of nuances with which give coherence to that text. ...
Article
Full-text available
This article compares the output of three neural machine translation systems (Google Translate, DeepL, and Phrase TMS) and human translation (undergraduate level students, English into Spanish). It focuses on five formal neologisms extracted from literary texts, thus considering creativity, and technology adoption and training.
... Esta primera categoría se compone de aquellas UF en la LO que fueron traducidas por otras UF en la LM. Al traducirse, se pueden diferenciar, según Corpas (2013), entre una equivalencia fraseológica plena y una equivalencia fraseológica parcial. En el primer caso, se emplea en el texto meta (subtitulado) una unidad plenamente equivalente a la secuencia del texto origen, es decir, perteneciente a la misma clase, con el mismo significado léxico-semántico, el mismo registro y la misma estructura, tal como ocurre en los ejemplos de la Tabla 2, donde encontramos UF plenamente equivalentes tanto en el diálogo origen en español como en la subtitulación en ruso. ...
Article
Full-text available
One of the difficulties translators faces when translating film dialogue is how to convey colloquial language and ensure the orality and comprehension of the metatext. According to Hatim and Mason (2005)Hatim, B., & MASON, I. (2005). The translator as communicator. Routledge. https://doi.org/10.4324/9780203992722 https://doi.org/10.4324/9780203992722... , pragmatic factors, which depend on the communicative context, and cultural factors, which are characterised by each person’s peculiar way of perceiving the world through the parameters of his or her cultural community, come together in the translation activity. The present research focuses on the role of phraseology in audiovisual translation within the Russian/Spanish language combination. The methodology used is based on the analysis of concrete examples of idiomatic phrases subtitled into Russian by fansubs. The methodological approach in fansub subtitling of phraseologisms involves a meticulous selection of relevant idiomatic units in the film dialogue, followed by the identification of suitable equivalents in the target language. This systematic approach shed light on the relevance of phraseology in Russian-Spanish audiovisual translation, contributing to the understanding of this process and the choices translators must make in this specific context. Keywords: VAT; phraseology; pragmatics; subtitling; culture
Article
Full-text available
Computer-assisted interpreting (CAI) has become an essential element in an increasingly globalized and in- terconnected world. With the growing demand for instant online communication and the need to overcome language limitations in a global context, CAI has become a valuable tool for a wide range of applications. However, the CAI is not without its challenges. The interpretation of idiomatic expressions remains a sig- nificant barrier, as these linguistic constructs can be particularly difficult to interpret accurately due to their culturally embedded nature. In this context, the objective of this article is to address the problem of in- terpreting idiomatic expressions, in our case 52 verbal idioms within the framework of CAI, focusing on the Spanish-Russian language combination. The aim is to analyze how this technology meets the challenges of idiomatic phraseology and how it influences accurate and effective intercultural communication. To achieve this goal, a comprehensive methodology has been applied, combining linguistic analysis with practical ob- servations of real-time technology-assisted interpreting situations using the live on-air speech translator and multimedia content provided by Yandex. The results of this research provide a deeper understanding of how translation and interpreting technology addresses the challenges of idiomatic expression, while also providing critical insight into the effectiveness and limitations of technological solutions in the field of intercultural communication.
Article
Full-text available
Es innegable que la traducción automática se ha convertido en una constante en el día a día y que ha transformado la forma en que los usuarios abordan el proceso de traducción. Este fenómeno ha tenido un impacto significativo en diversas áreas, especialmente en el contexto del turismo debido a su carácter internacional. Cada vez es más común que empresas, especialmente las de pequeño y mediano tamaño, recurran a herramientas de traducción automática para llegar a un público más amplio y plurilingüe. No obstante, a pesar de su popularidad, estas herramientas pueden ofrecer resultados limitados en términos de calidad y adecuación. El presente trabajo se centra en el estudio de las posibilidades y limitaciones que los sistemas de traducción automática presentan al lidiar con expresiones multiverbales dentro del ámbito del turismo gastronómico. Para ello, se confeccionó un corpus monolingüe (ES), que incluye treinta folletos y guías de diferentes regiones españolas, siguiendo el protocolo de compilación propuesto por Seghiri (2017). A partir de este corpus, se extrajeron las expresiones multiverbales objeto de estudio, junto con sus respectivos contextos, y se sometieron a un proceso de traducción automática utilizando cuatro motores (DeepL, Google Translate, Microsoft Translator y Yandex) pertenecientes a los paradigmas más utilizados hoy en día dentro de la traducción automática para fines específicos. Los resultados obtenidos, categorizados siguiendo una modificación del modelo propuesto por Ortiz Boix (2016), permitieron identificar diferencias de rendimiento entre los sistemas más populares y revelaron los obstáculos comunicativos a los que los usuarios podrían enfrentarse al lidiar con fraseología.
Chapter
The investigation of phraseology through corpus-based and computational approaches holds significant relevance for various professionals, including translators, interpreters, terminologists, lexicographers, language instructors, and learners. Computational Phraseology, and in particular the computational analysis of multiword expressions (also known as multiword units), has gained prominence in recent years and is essential for a number of Natural Language Processing and Translation Technology applications. The failure to detect these units automatically could result in incorrect and problematic automatic translations and could hinder the performance of applications such as text summarisation and web search. Against this background, the volume offers 13 articles carefully selected and organised into two parts: ‘Computational treatment of multiword units’ and ‘Corpus-based and linguistic studies in phraseology‘. The contributions not only highlight the latest advancements in computational and corpus-based phraseology but also reiterate its vital role in all areas of language technologies, including basic and applied research.
Book
This book explores the pragmatics of specialized language with a focus on multiword terms, complex phrases characterized by sequences of nouns or adjectives whose meaning is clarified in the unspecified but implicit links between them, with implications for their use and translation. The volume adopts an innovative approach rooted in Frame-Based Terminology which allows for the analysis of multiword – compound terms in specialized language, such as horizontal-axis wind turbine – term formation from an integrated semantic and pragmatic perspective. The book features data from a corpus on wind power in English, Spanish, and French comprising such specialized texts as research articles, books, reports, and PhD theses to consider term extraction and the identification of terminological correspondences. Cabezas-García highlights the ways in which pragmatic analysis is an integral part of understanding multiword terms, due to the necessary inference of information implicit within them, with applications for future research on pragmatics and specialized language more broadly. This book will be of interest to students and researchers in pragmatics, semantics, corpus linguistics, and terminology.
Chapter
Full-text available
Anhand des onomasiologisch angeordneten deutsch-spanischen SCHWEIGEN/CALLAR-Korpus aus dem Forschungsprojekt FRASESPAL wird im ersten Teil des Aufsatzes auf die Konzeptualisierung des Schweigens mittels Phraseologismen und Sprichwörtern und deren Unterschiede auf denotativer und nichtdenotativer Ebene eingegangen. Die Notwendigkeit der pragmatischen Analyse in der kontrastiven Parömiologie soll im zweiten Teil der Arbeit gezeigt werden. Untersuchungsgegenstand sind die Sprichwörter Reden ist Silber, Schweigen ist Gold und En boca cerrada no entran moscas, die als interlinguale funktionale Äquivalente im Deutschen und Spanischen gelten. Durch die Analyse werden für die kontrastive Parömiologie zu beachtende Unterschiede im textuellen Verhalten, in der Art der illokutiven Funktionen, in der sozialen Funktion, in den kognitiven Konzepten und in der Typologie der Modifikationen beider Sprichwörter aufgedeckt. Darüber hinaus werden die argumentativen und kommunikativen Funktionen beider Sprichwörter vor dem Hintergrund einer Differenzierung zwischen konzeptioneller Schriftlichkeit und Mündlichkeit anhand reichlicher Beispiele näher bestimmt.
Chapter
Full-text available
In the linguistics literature, “discourse” is often defined in two, not mutually exclusive, ways, namely, structurally, for instance, “language above the sentence or above the clause” (Stubbs 1983: 1) and functionally, for example, “language that is doing some job in some context” (Halliday 1985: 10). We shall privilege the functional viewpoint here, though analyzing the structures of discourses is important for shedding light on the jobs being done. It has to be stressed that discourse is not a special form of language, but a perspective upon it, language described not only as a set of interacting units and systems, but also precisely that implied by Halliday, as an instrument put to work. The work which it does is the attempt by one participant or set of participants to influence the ideas, opinions, and behavior of other participants. Such work can be studied in a single text or in a number of tokens of similar texts to try to infer generalities of behaviors and responses (which may well then in turn serve as background to studying particular language events for particular, special meanings). Most forms of traditional non-corpus-assisted discourse analysis have practiced the close-reading (that is, “qualitative analysis”) of single texts or a small number of texts in the attempt to highlight both textual structures and also how meanings are conveyed. Some types, such as much work in critical discourse analysis (CDA), use few concepts from linguistics proper, tending to rely on the analyst's knowledge and experience (and prejudices) of similar texts, in a manner reminiscent of literary analysis (though with a politically driven purpose). Other traditional discourse analysis is more linguistically grounded. Thompson (1996a: 108-112), for instance, demonstrates the power of functional grammar, in particular transitivity analysis, in displaying how meanings, including what we might call non-obvious meanings, are communicated.
Chapter
Languages are made up of words, which combine via morphosyntax to encode meaning in the form of phrases and sentences. While it may appear relatively innocuous, the question of what constitutes a “word” is a surprisingly vexed one. First, are dog and dogs two separate words, or variants of a single word? The traditional view from lexicography and linguistics is to treat them as separate inflected wordforms of the lexeme dog, as any difference in the syntax/semantics of the two words is predictable from the general process of noun pluralization in English. Second, what is the status of expressions like top dog and dog days? A speaker of English who knew top, dog, and day in isolation but had never been exposed to these two expressions would be hard put to predict the semantics of “person who is in charge” and “period of inactivity,” respectively.* To be able to retrieve the semantics of these expressions, they must have lexical status of some form in the mental lexicon, which encodes their particular semantics. Expressions such as these that have surprising properties not predicted by their component words are referred to as multiword expressions (MWEs).† The focus of this chapter is the precise nature and types of MWEs, and the current state of MWE research in NLP.
Chapter
The cycle of lexicographic and linguistic work involved in compiling a computational phraseological database is divided into three phases and described in relation to the specific challenges multi-word expressions (MWEs) pose for a lexical database. Data collection is a process that is far from complete for the MWEs found in English, with the variability of some phrases making identification of all occurrences in large corpora a major challenge. Formalization of the form and variability of MWEs is an interrelated process which can improve tools for data collection and other applications. Increased use of the phraseological lexical database in NLP applications can ultimately lead to further insights into thenature of MWEs and to improvements in the database. Due to the volume of lexicographic data on MWEs that still needs to be collected, analysed and formalized, and the cyclical nature of the work, the resulting lexical database should be reusable in as many applications as possible. WordManager-PhraseManager, the lexical resource described in the second part of the chapter, can capture the variability of MWEs in a way that allows for maximum reusability of lexical data.
Chapter
This chapter aims at bridging the functionalist theoretical perspective on word usage with corpus-based studies. We are dealing with the issue of construction of reliable lists of what is called 'phraseological units' in general linguistics literature or 'multi-word expressions' (MWEs) in literature on computational linguistics. The two groups of constructions under investigation in this chapter are phrasal verbs and light verb constructions. Another distinguishing feature of this study is its multilingual aspect. Previous computational approaches to MWEs have mainly focussed on English, and there has been little research on computational approaches to MWEs in other languages. In this chapter, we examine phrasal verbs in English and their translation equivalents in Russian, and compare English-Russian/Russian-English translation equivalents of selected light verb constructions in two case studies. Our study reveals some interesting cross-language structural divergences between the languages under consideration and shows that a phraseological expression in a language may have equivalent expressions in other languages with different morpho-syntactic structures and semantic properties. However, our investigation not only reveals marked differences between English and Russian, but also discovers some general corresponding structural patterns between them; for example, English phrasal verbs usually have single-word translation equivalents in Russian. Moreover, our study of phrasal and light verbs demonstrates that corpus-based resources can provide an invaluable help to a practising translator, as dictionaries do not cover a large variety of real-life language examples.
Chapter
This chapter describes computational linguistic work which deals with phraseological units. This includes both computational support for human-use tools and resources in the field of phraseology, such as electronic learners' dictionaries, and work in natural language processing aimed at the automatic treatment of phraseological units, for example in the analysis of texts, or in machine translation. It points to a number of issues which have received attention from computational linguists, but space prevents a detailed account of them. The issues are grouped around the phenomena targeted (Section 2), questions of how to represent phraseological units in lexicons and corpora (Section 3), as well as methods for automatically identifying and/or classifying phraseological units in texts (Section 4). The chapter gives an overview of the current approaches in these fields and points to relevant literature.
Chapter
In this chapter, we discuss the foundations of computational methods of collocation extraction from text corpora. First, we analyse the extent to which the collocation features stipulated by theoretical studies are taken into account in practice. Then, we introduce the basic concepts of statistical modelling of collocations as significant word associations, and describe the association measures typically used in the field. We then survey the role played by lemmatizers, POS taggers, chunkers and syntactic parsers in preprocessing source corpora with the aim of improving extraction performance. The rest of the chapter contains a thorough review of the state of the art in collocation extraction which provides relevant details about the linguistic preprocessing performed in existing extraction work. Despite the fact that efficient syntactic parsers are now increasingly available, this preprocessing is currently limited in most cases to shallow methods such as parsing based on pattern matching over POS tags. These methods are inadequate, however, as many researchers have pointed out, for those languages exhibiting a freer word order and richer morphology than English. With this in mind, we will argue that successful collocation extraction across languages therefore requires a more elaborate structural analysis which can only be provided by deep parsing.