Alina Karakanta

Alina Karakanta
Fondazione Bruno Kessler | FBK · Human Language Technologies (HLT)

Doctor of Philosophy
Machine translation researcher

About

31
Publications
4,266
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
135
Citations
Citations since 2016
31 Research Items
135 Citations
2016201720182019202020212022010203040
2016201720182019202020212022010203040
2016201720182019202020212022010203040
2016201720182019202020212022010203040
Introduction
I am a Machine Translation researcher and professional translator. I received my PhD in Computer Science from the University of Trento on the topic of Automatic Subtitling while conducting research in the HLT-MT group at FBK. My research focuses on novel Speech Translation methods for translating audiovisual content and new evaluation methodologies and benchmarks.
Additional affiliations
January 2019 - November 2022
Fondazione Bruno Kessler
Position
  • PhD Student
November 2017 - December 2018
Universität des Saarlandes
Position
  • Research Assistant
Description
  • SFB B7 Project: Modelling human translation with a noisy channel
December 2016 - December 2016
Ionian University
Position
  • Invited Speaker
Description
  • "Machine translation; why and how?". Workshop for undergraduate students of the Department of Foreign Languages, Translation and Interpreting.
Education
November 2022 - November 2023
University of Macerata
Field of study
  • Media Accessibility
January 2016 - December 2022
Ionian University
Field of study
  • Interpreting Studies
October 2014 - July 2017
Universität des Saarlandes
Field of study
  • Language Science and Technology

Publications

Publications (31)
Preprint
Full-text available
Automatic subtitling is the task of automatically translating the speech of an audiovisual product into short pieces of timed text, in other words, subtitles and their corresponding timestamps. The generated subtitles need to conform to multiple space and time requirements (length, reading speed) while being synchronised with the speech and segment...
Preprint
Full-text available
Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines. Similar to speech translation (ST), model training requires parallel data comprising audio inputs paired with their textual translations. In SubST, howeve...
Article
Full-text available
Recent developments in neural machine translation, and especially speech translation, are gradually but firmly entering the field of audiovisual translation (AVT). Automation in subtitling is extending from a machine translation (MT) component to fully automatic subtitling, which comprises MT, auto-spotting and automatic segmentation. The rise of t...
Preprint
Full-text available
Subtitles appear on screen as short pieces of text, segmented based on formal constraints (length) and syntactic/semantic criteria. Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference. However, standard segmentation metrics cannot be applied when systems generate outputs different than the reference,...
Chapter
Full-text available
While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation m...
Preprint
Full-text available
With the increased audiovisualisation of communication, the need for live subtitles in multilingual events is more relevant than ever. In an attempt to automatise the process, we aim at exploring the feasibility of simultaneous speech translation (SimulST) for live subtitling. However, the word-for-word rate of generation of SimulST systems is not...
Preprint
Full-text available
Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i.e. captions). However, the joint generation of source captions and target subtitles does not only bring potential output quality advantages when the two decoding processes info...
Preprint
Full-text available
Five years after the first published proofs of concept, direct approaches to speech translation (ST) are now competing with traditional cascade solutions. In light of this steady progress, can we claim that the performance gap between the two is closed? Starting from this question, we present a systematic comparison between state-of-the-art systems...
Conference Paper
Full-text available
Dubbing has two shades; synchronisation constraints are applied only when the actor's mouth is visible on screen, while the translation is unconstrained for off-screen dubbing. Consequently, different synchronisation requirements, and therefore translation strategies, are applied depending on the type of dubbing. In this work, we manually annotate...
Conference Paper
Full-text available
Subtitles, in order to achieve their purpose of transmitting information, need to be easily readable. The segmentation of subtitles into phrases or linguistic units is key to their readability and comprehension. However, automatically segmenting a sentence into subtitles is a challenging task and data containing reliable human segmentation decision...
Preprint
Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting a...
Article
Full-text available
Growing needs in localising multimedia content for global audiences have resulted in Neural Machine Translation (NMT) gradually becoming an established practice in the field of subtitling in order to reduce costs and turn-around times. Contrary to text translation, subtitling is subject to spatial and temporal constraints, which greatly increase th...
Preprint
Full-text available
Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires hi...
Conference Paper
Full-text available
Growing needs in translating multimedia content have resulted in Neural Machine Translation (NMT) gradually becoming an established practice in the field of subtitling. Contrary to text translation, subtitling is subject to spatial and temporal constraints , which greatly increase the post-processing effort required to restore the NMT output to a p...
Preprint
Full-text available
Multilingual Neural Machine Translation (MNMT) for low-resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, w...
Presentation
Full-text available
There is a rich body of research on translationese in corpus-based translation studies (cf. Baker 1993 , Olohan & Baker 2000, Teich 2003), where a set of predefined features (for instance, type-token ratio, lexical density and sentence length) are typically applied and tested for significance, as well as in computational linguistics, where translat...
Presentation
Full-text available
Our aim is to identify the features distinguishing simultaneously interpreted texts from translations (apart from being more oral) and the characteristics they have in common which set them apart from originals (translationese features).
Presentation
Full-text available
It has been argued that the process of translation leaves specific "fingerprints" on the translation product known as translationese (Gellerstam, 1986). While some authors have suggested that these fingerprints are universal (Baker, 1993, Chesterman, 2004), others have shown that there is a fair amount of variation among translations due to source...
Article
Full-text available
The problem of a total absence of parallel data is present for a large number of language pairs and can severely detriment the quality of machine translation. We describe a language-independent method to enable machine translation between a low-resource language (LRL) and a third language, e.g. English. We deal with cases of LRLs for which there is...
Conference Paper
Full-text available
Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, extra-textual information about speakers is often absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliame...

Network

Cited By

Projects

Projects (4)
Project
Unlike text translation, subtitling is subject to spatial and temporal constraints, which increase the post-processing effort required to restore an NMT output to a proper subtitle format. Can we reduce this effort?
Project
The project aims to model human translation on the basis of a noisy channel, as commonly done in machine translation. The two main objectives of translation, source language fidelity and target language conformity, as well as translation effort, are modelled probabilistically. We observe differences depending on mode (interpreting, translation), level of expertise (learner, professional) and source language (German, Spanish).