Conference Paper

A Multimedia Parallel Corpus of English-Galician Film Subtitling

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, we present an ongoing research project focused on the building, processing and exploitation of a multimedia parallel corpus of English-Galician film subtitling, showing the TMX-based XML specification designed to encode both audiovisual features and translation alignments in the corpus, and the solutions adopted for making the data available over the web in multimedia format.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Además, el corpus de PRESEEA-Valledupar es el primer corpus de habla de esta ciudad y uno de los primeros corpus recogidos en Colombia y en el Caribe, por lo que toda investigación del habla de Valledupar debe considerarlo como un referente. Es necesario señalar, además, que este es un corpus que ha sido infrautilizado puesto que, hasta ahora, el único antecedente investigativo en variación fonética es el estudio de Olmos yGómez (2012). En este sentido, el presente estudio sobre el corpus PRESEEA-Valledupar permite ampliar la caracterización de esta comunidad de habla y establecer comparaciones con otros cor- ...
Conference Paper
Full-text available
The outsider art project is a transdisciplinary research initiative that aims to deconstruct the definition of outsider art based on the computational analysis of texts and images. This is a transdisciplinary project because it fosters collaboration and knowledge exchange between artists, researchers, and citizens. The project goals involve the ontologization of the domain of Outsider Art, the computational analysis of the outsider art style of painting and the deployment of the outsider art web portal.
... Además, el corpus de PRESEEA-Valledupar es el primer corpus de habla de esta ciudad y uno de los primeros corpus recogidos en Colombia y en el Caribe, por lo que toda investigación del habla de Valledupar debe considerarlo como un referente. Es necesario señalar, además, que este es un corpus que ha sido infrautilizado puesto que, hasta ahora, el único antecedente investigativo en variación fonética es el estudio de Olmos yGómez (2012). En este sentido, el presente estudio sobre el corpus PRESEEA-Valledupar permite ampliar la caracterización de esta comunidad de habla y establecer comparaciones con otros cor- ...
Conference Paper
Full-text available
Las redes sociales entre personajes que los grafos y otras herramientas de las Humanidades Digitales (HD) nos permiten visualizar aportan perspectivas nuevas que avalan en algunos casos las interpretaciones críticas tradicionales, pero también incluyen enfoques diferentes que enriquecen el estudio de dichos textos (Fernández et al., 2015, Fisher et al., 2017). Dentro de la Literatura Española ya tenemos algunas investigaciones que han recurrido al estudio de esas redes sociales de los personajes, destacando en este caso que nos compete, los que se refieren al ámbito del teatro de la llamada Edad de Plata (Martínez Carro, 2019) o a dos de los Episodios Nacionales (Isasi, 2017). En este año en que se cumple el centenario de la muerte del escritor español Benito Pérez Galdós resulta interesante estudiar las redes sociales que establecen los personajes de tres piezas teatrales: Doña Perfecta que fue estrenada en el Teatro de la Comedia el 28 de enero de 1896, Electra, que se estrenó en 1905 en el Teatro Español y Casandra que se representó por primera vez en ese mismo teatro en 1910. Las tres piezas forman parte del corpus teatral digitalizado en TEI de la Biblioteca Electrónica Textual del Teatro en Español de 1868-1939 (BETTE) (Jiménez et al. 2017 y Martínez Carro y Santa María, 2019), que se ha integrado en el Drama Corpora Project bajo el nombre de Spanish Drama Corpora (DraCor).
Article
This article explores bilingual subtitling, a relatively under-researched mode of audiovisual translation, and its role in the ever-evolving landscape of global media streaming. Originally used for cinema productions in officially bilingual countries and international film festivals, bilingual subtitling has now resurfaced as a response to the growing affordances of streaming media. This article investigates the proliferation of bilingual subtitling tools and practices in different contexts, from PC-based tools and Chrome extensions that add bilingual subtitle features to streaming platforms (Netflix, YouTube) to amateur (optionally bilingual) subtitling streaming services (Viki Rakuten), video sharing websites (Bilibili), and online channels with open bilingual subtitles embedded in their videos (Easy Languages). Bilingual subtitling is further promoted as a pedagogical tool for foreign-language learning that matches the expectations of contemporary learners, especially ‘digital natives’ who have grown up with new online modalities. The conventional ways in which audiences used to engage with audiovisual content have, arguably, been superseded as streaming platforms that offer an abundance of options in terms of language and content are gradually reshaping viewing patterns. Shifting away from long-established patterns of passive TV consumption, this article also sets out to present online collaborations and initiatives that seek to incorporate bilingual subtitles in language learning while promoting the active participation of the audience within the emerging media streaming landscape.
Article
As stated by Jay and Janschewitz (2008) , the primary pragmatic function of swear words is to express emotions, such as anger and frustration. The main objective of the present paper is to analyse the translation of the two commonest English swear words, fuck and shit ( Jay 2009 : 156) – together with their morphological variants – into Galician. The research instrument used for this purpose has been the Veiga Corpus, a bilingual English-Galician corpus of subtitles. Regarding the results obtained in this study, the most frequent solution has been pragmatic correspondence, followed by omission, softening, and de-swearing. However, descending in the analysis, clear differences emerge between the treatment of the two words. Thus, the tendency to sanitize the Galician subtitles by omitting, neutralizing or smoothing swearwords is much more evident in the case of fuck . This finding may be explained by the difference in tone between the two taboo words analysed. As shit is considered milder, translators may feel there is no need to tone it down. In addition, while shit has a literal translation which is perfectly natural in Galician, that is not the case with fuck . Finally, the grammatical category variable has also been found to have an effect on the choice of translation solution.
Book
Full-text available
This volume seeks to investigate how humour translation has developed since the beginning of the 21st century, focusing in particular on new ways of communication. The authors, drawn from a range of countries, cultures and academic traditions, address and debate how today’s globalised communication, media and new technologies are influencing and shaping the translation of humour. Examining both how humour translation exploits new means of communication and how the processes of humour translation may be challenged and enhanced by technologies, the chapters cover theoretical foundations and implications, and methodological practices and challenges. They include a description of current research or practice, and comments on possible future developments. The contributions interconnect around the issue of humour creation and translation in the 21st century, which can truly be labelled as the age of multimedia. Accessible and engaging, this is essential reading for advanced students and researchers in Translation Studies and Humour Studies.
Article
Following an overview of corpus linguistics in audiovisual translation, and more specifically in audio description, this article presents the VIW (Visuals Into Words) project and its resulting corpus. It describes the compilation and annotation processes, highlighting the main challenges found. The article also presents the web application that has been developed, explaining in detail various data visualisation and search possibilities.
Chapter
Full-text available
Considerable mileage has been covered since the early days, in the late 1950s, when audiovisual translation (AVT) began to be addressed as a subject in its own right. Concerns have moved from the dubbing/subtitling debate to more specific domains such as the analysis of issues pertaining to discourse analysis, technical constraints and audience design. Comprehensive lists of audiovisual translation typologies (Luyken et al., 1991; Gambier, 1996; Díaz Cintas, 2003) have reflected the gradual change in scope and are making space for newer forms of language transfer within the audiovisual context. Indeed, the lines once drawn between language transfer types in the media are becoming less and less visible. Principles and techniques are merging to give way to specific offers, directed to specific audiences. This implies that the very concept of ‘mass’ media is changing; technology is now allowing masses to be broken down into smaller groups and products are tailor-made to the expectations and the needs of defined sub-groups. AVT will inevitably need to follow the general trend in the audiovisual market and, rather than aiming to cater for a general audience, audiovisual translation now finds itself focusing on the needs of smaller distinct audiences in order to respond to them in a more adequate manner.
Article
Full-text available
Translation Memories are very useful for translators but are dicult to share and reuse in a community of translators. This article presents the concept of Distributed Translation Memories, where all users can contribute and sharing translations. Implementation details using WebServices are shown, as well as an example of a distributed system between Portugal and Spain.
Chapter
The crucial role played by audiovisual translation (AVT) in contemporary international communication invites translator trainers to contemplate the different possibilities available when training translators for the modern mass-communication market. The acquisition of new skills in the use of Information and Communication Technology (ICT) is a challenge that instructors of subtitling and trainees must face up to. In online multimedia courses, the use of digital technology is an imperative, but professional computer programs are expensive and educational licences are often unavailable. In this situation, how can students get the hands-on training they need? In response to this situation, some academic institutions try to generate their own in-house solutions.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
This paper presents a parallel corpora-based bilingual terminology extraction method based on the occurrence of bilingual mor-phosyntactic patterns in probabilistic translation dictionaries. We discuss an experiment focused on two language pairs – English-Galician and English-Portuguese, and show results which experimentally confirm the high degree of accuracy of the proposed extraction technique.
Conference Paper
This paper presents a research on parallel corpora-based bilingual terminology extraction based on the occurrence of bilingual morphosyntactic patterns in the probabilistic translation dictionaries generated by NATools. To evaluate this method, we carried out an experiment in which both the level of lexical cohesion of the term candidates and their specificity with respect to a non-terminological corpus of the target language were taken into account. The evaluation results show a high degree of accuracy of the terminology extraction based on probabilistic translation dictionaries complemented by bilingual syntactic patterns.
Chapter
Probabilistic Translation Dictionaries (PTD) are translation resources that can be obtained automatically from parallel corpora. Although this process is simple, it requires the existence of a parallel corpora for the involved languages. Minoritized languages have a limited amount of available resources. For example, while they can have a few parallel corpora, the number of parallel language-pairs uses to be restricted. We defend that if a minoritized language A has a paral-lel corpus with a language B, and language B has a parallel corpus with another language C, then we can obtain a helpful probabilistic translation dictionary between A and C. In this document we will formalize the probabilistic transla-tion dictionaries triangulation, perform some experiments making the triangulation between Galician, English and Italian, and conclude with an evaluation of the proposed approach.
Article
This paper sets out to describe a multimedia database, the Forlì Corpus of Screen Translation, developed at the University of Bologna's Department of Interdisciplinary Studies in Translation, Languages and Culture (SITLeC), emphasizing the advantages it offers for audiovisual translator training. The database is actually a corpus of 20 original and dubbed films, fully indexed on the basis of a set of predefined linguistic, cultural and pragmatic categories. The tagging of the corpus makes it possible to extract concordances in the form of transcripts of film dialogues along with the original audiovisual scenes. Thanks to these features, the database is a valid tool to be used as a component in the training of audiovisual translators, in that it can help develop traditional linguistic, but also communicative and cultural skills. In addition, from a methodological point of view, it offers an ideal basis on which to ground empirical and quantitative research, thus helping establish a scientific approach in the new discipline of audiovisual translation studies.
Chapter
In this paper, we present the methodology developed by the SLI (Computational Linguistics Group of the University of Vigo) for the building and processing of the CLUVI Corpus, showing the TMX-based XML specification designed to encode both morphosyntactic features and translation alignments in parallel corpora, and the solutions adopted for making the CLUVI parallel corpora freely available over the WWW (http://sli.uvigo.es/CLUVI/).
Article
: Corpus-based research has become widely accepted as a factor in improving the performance of machine translation systems, and corpus-based terminology compilation is now the norm rather than the exception. Within translation studies proper, Lindquist (1984) has advocated the use of corpora for training translators, and Baker (1993a) has argued that theoretical research into the nature of translation will receive a powerful impetus from corpus-based studies. It is becoming increasingly important to take stock of what is happening on this front and to start working towards the development of an explicit and coherent methodology for corpus-based research in the discipline. This paper discusses the current and potential use of corpora in translation studies, with particular reference to theoretical issues. Résumé: On s'accorde à voir dans la recherche sur corpus un facteur susceptible d'améliorer les systèmes de traduction automatique; la terminologie basée sur corpus devient la règle plutôt que l'exception. A propos des recherches sur la traduction, Lidquist (1984) a prôné le recours aux corpora dans la formation des traducteurs; selon Baker (1993a), l'étude théorique de la traduction bénéficiera des recherches fondées sur corpus. Il importe désormais de répertorier les acquis en ce domaine, afin de mettre au point une méthodologie explicite et cohérente. L'article qui suit analyse l'usage présent et possible des corpora dans les recherches sur la traduction, et prêtant une attention particulière aux questions théoriques.
Aplicacións da lexicografía bilingüe baseada en córpora na elaboración do Dicionario CLUVI inglés-galego
  • Eva Díaz Xavier Gómez Guinovart
  • Alberto Álvarez Rodríguez
  • Lugrís
Xavier Gómez Guinovart, Eva Díaz Rodríguez, and Alberto Álvarez Lugrís. Aplicacións da lexicografía bilingüe baseada en córpora na elaboración do Dicionario CLUVI inglés-galego. Viceversa: Revista Galega de Traducción, 14:71-87, 2008.
TMX 1.4b Specification. Technical report. Localisation Industry Standards Association
  • Yves Savourel
  • Arle Lommel
Yves Savourel and Arle Lommel. TMX 1.4b Specification. Technical report. Localisation Industry Standards Association. <http://www.gala-global.org/oscarStandards/tmx/ tmx14b.html>, 2005.
A hybrid corpus-based approach to bilingual terminology extraction Decoding The Future: Corpora in the 21st Century
  • Gómez Guinovart
Gómez Guinovart. A hybrid corpus-based approach to bilingual terminology extraction. In Isabel Moskowich-Spiegel Fandiño and Begoña Crespo, editors, Encoding the Past, Decoding The Future: Corpora in the 21st Century, pages 147–175, Newcastle upon Tyne, 2012. Cambridge Scholar Publishing.
Localisation Industry Standards Association
  • Yves Savourel
  • Arle Lommel
Yves Savourel and Arle Lommel. TMX 1.4b Specification. Technical report. Localisation Industry Standards Association. <http://www.gala-global.org/oscarStandards/tmx/ tmx14b.html>, 2005.