
Eleftherios AvramidisDeutsches Forschungszentrum für Künstliche Intelligenz | DFKI · Language Technology
Eleftherios Avramidis
Doctor of Philosophy
About
92
Publications
80,094
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
944
Citations
Introduction
Senior Researcher at German Research Center for Artificial Intelligence (DFKI Lab Berlin). Teaching at the Technical University and the University of Arts of Berlin. Working on automatic translation of Sign Language, Quality Estimation and Evaluation of Machine Translation
Additional affiliations
September 2015 - December 2015
September 2005 - September 2006
December 2007 - January 2008
Publications
Publications (92)
A deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indica...
We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 lin...
This paper describes a test suite submission providing detailed statistics of linguistic performance for the state-of-the-art German-English systems of the Fifth Conference of Machine Translation (WMT20). The analysis covers 107 phenomena organized in 14 categories based on about 5,500 test items, including a manual annotation effort of 45 person h...
This paper describes the proof-of-concept evaluation for a system that provides translation of speech to virtually performed sign language on augmented reality (AR) glasses. The discovery phase via interviews confirmed the idea for a signing avatar displayed within the users field of vision through AR glasses. In the evaluation of the first prototy...
In this paper we present a prototypical implementation of a pipeline that allows the automatic generation of a German Sign Language avatar from 2D video material. The presentation is accompanied by the source code. We record human pose movements during signing with computer vision models. The joint coordinates of hands and arms are imported as land...
This is the preliminary ranking of WMT24 General MT systems based on automatic metrics. The official ranking will be a human evaluation, which is superior to the automatic ranking and supersedes it. The purpose of this report is not to interpret any findings but only provide preliminary results to the participants of the General MT task that may be...
High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages. On the other hand, just ass...
This document describes the submission of the very first version of the Occiglot open-source large language model to the General MT Shared Task of the 9th Conference of Machine Translation (WMT24).
Occiglot is an open-source, community-based LLM based on Mistral-7B, which went through language-specific continual pre-training and subsequent instruc...
Despite the importance of mouth actions in Sign Languages,
previous work on Automatic Sign Language Recognition (ASLR) has limited use of the mouth area. Disambiguation of homonyms is one of the
functions of mouth actions, making them essential for tasks involving ambiguous hand signs. To measure their importance for ASLR, we trained a
classifier t...
This paper describes the concept and the software architecture of a fully integrated system supporting a dialog between a deaf person and a hearing person through a virtual sign language interpreter (aka avatar) projected in the real space by an Augmented Reality device. In addition, a Visual Simultaneous Localization and Mapping system provides in...
We examine methods and techniques, proven to be helpful for the text-to-text translation of spoken languages in the context of gloss-to-text translation systems, where the glosses are the written representation of the signs. We present one of the first works that include experiments on both parallel corpora of the German Sign Language (PHOENIX14T a...
We employ a linguistically motivated challenge set in order to evaluate the state-of-the-art machine translation metrics submitted to the Met-rics Shared Task of the 7th Conference for Machine Translation. The challenge set includes about 20,000 items extracted from 145 MT systems for two language directions (German ⇔ English), covering more than 1...
This paper presents the results of the WMT22 Metrics Shared Task. Participants submitting automatic MT evaluation metrics were asked to score the outputs of the translation systems competing in the WMT22 News Translation Task on four different domains: news, social, e-commerce, and chat. All metrics were evaluated on how well they correlate with hu...
This paper presents the results of the First WMT Shared Task on Sign Language Translation (WMT-SLT22) 1. This shared task is concerned with automatic translation between signed and spoken 2 languages. The task is novel in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known p...
This document describes a fine-grained linguistically motivated analysis of 29 machine translation systems submitted at the Shared Task of the 7th Conference of Machine Translation (WMT22). This submission expands the test suite work of previous years by adding the language direction of English-Russian. As a result , evaluation takes place for the...
This paper describes the participation of DFKI-SLT at the Sign Language Translation Task of the Seventh Conference of Machine Translation (WMT22). The system focuses on the translation direction from the Swiss German Sign Language (DSGS) to written German. The original videos of the sign language were analyzed with computer vision models to provide...
In this paper, we investigate the capability of convolutional neural networks to recognize in sign language video frames the six basic Ekman facial expressions for 'fear', 'disgust', 'surprise', 'sadness', 'happiness' and 'anger' along with the 'neutral' class. Given the limited amount of annotated facial expression data for the sign language domai...
This paper presents a fine-grained test suite for the language pair German-English. The test suite is based on a number of linguistically motivated categories and phenomena and the semi-automatic evaluation is carried out with regular expressions. We describe the creation and implementation of the test suite in detail, providing a full list of all...
This paper describes the development of the first test suite for the language direction Portuguese-English. Designed for fine-grained linguistic analysis, the test suite comprises 330 test sentences for 66 linguistic phenomena and 14 linguistic categories. Eight different MT systems were compared using quantitative and qualitative methods via the t...
We are using a semi-automated test suite in order to provide a fine-grained linguistic evaluation for state-of-the-art machine translation systems. The evaluation includes 18 German to English and 18 English to German systems, submitted to the Translation Shared Task of the 2021 Conference on Machine Translation. Our submission adds up to the submi...
In this paper, we describe the current main approaches to sign language translation which use deep neural networks with videos as input and text as output. We highlight that, under our point of view, their main weakness is the lack of generalization in daily life contexts. Our goal is to build a state-of-the-art system for the automatic interpretat...
Influencing transportation demand can significantly reduce CO 2 emissions. Individual user mobility models are key to influencing demand at the personal and structural levels. Constructing such models is a challenging task that depends on a number of interdependent steps. Progress on this task is hamstrung by the lack of high quality public dataset...
We present the results of the application of a grammatical test suite for German→English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expr...
We present the results of the application of a grammatical test suite for German$\rightarrow$English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, mul...
We describe a workshop on applications of Artificial Intelligence on Textile Technology through the usage of Machine Learning. The participants are given an overview of the state-of-the-art methods and technologies and are introduced to them through practical exercises. The exercises involve simple hands-on group work with a basic pre-built electri...
We describe a workshop on applications of Artificial Intelligence on Textile Technology through the usage of Machine Learning. The participants are given an overview of the state-of-the-art methods and technologies and are introduced to them through practical exercises. The exercises involve simple hands-on group work with a basic pre-built electri...
Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach o...
Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach o...
We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 lin...
We present an alternative method of evaluating Quality Estimation systems, which is based on a linguistically-motivated Test Suite. We create a test-set consisting of 14 linguistic error categories and we gather for each of them a set of samples with both correct and erroneous translations. Then, we measure the performance of 5 Quality Estimation s...
We present an alternative method of evaluating Quality Estimation systems, which is based on a linguistically-motivated Test Suite. We create a test-set consisting of 14 linguistic error categories and we gather for each of them a set of samples with both correct and erroneous translations. Then, we measure the performance of 5 Quality Estimation s...
We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 lin...
Despite its wide applicability, Quality Estimation (QE) of Machine Translation (MT) poses a difficult entry barrier since there are no open source tools with a graphical user interface (GUI). Here we present a tool in this direction by connecting the back-end of the QE decision-making mechanism with a web-based GUI. The interface allows the user to...
This submission investigates alternative machine learning models for predicting the HTER score on the sentence level. Instead of directly predicting the HTER score, we suggest a model that jointly predicts the amount of the 4 distinct post-editing operations, which are then used to calculate the HTER score. This also gives the possibility to correc...
In this article we present a novel linguistically driven evaluation method and apply it to the main approaches of Machine Translation (Rule-based, Phrase-based, Neural) to gain insights into their strengths and weaknesses in much more detail than provided by current evaluation schemes. Translating between two languages requires substantial modellin...
This report on the web services and datasets for applications provides an update of the contents/plans described in D3.4 and D3.1 and of the ongoing report (D3.13 v0.91) submitted earlier.
The first part of this document (Section 2) will be concerned with web services. The MT Pilots developed in the QTLeap project are documented in Deliverables D2....
We are presenting the development contributions of the last two years to our Python opensource Quality Estimation tool, a tool that can function in both experiment-mode and online web-service mode. The latest version provides a new MT interface, which communicates with SMT and rule-based translation engines and supports on-the-fly sentence selectio...
This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT developme...
The tool described in this article has been designed to help MT developers by implementing a web-based graphical user interface that allows to systematically compare and evaluate various MT engines/experiments using comparative analysis via automatic measures and statistics. The evaluation panel provides graphs, tests for statistical significance a...
DFKI participated in the shared translation task of WMT 2015 with the German-English language pair in each translation direction. The submissions were generated using an experimental hybrid system based on three systems: a statistical Moses system, a commercial rule-based system, and a serial coupling of the two where the output of the rule-based s...
The idea to improve MT quality by using deep linguistic and knowledge-driven information has frequently been expressed. If the goal is to use deep information for building an MT system, there are two extreme options: (1) to start from a purely knowledge-driven approach (RBMT) and try to arrive at the same recall found in current SMT systems; (2) to...
This paper demonstrates the possibility to make an existing automatic error classi-fier for machine translations independent from the requirement of lemmatisation. This makes it usable also for smaller and under-resourced languages and in situations where there is no lemmatiser at hand. It is shown that cutting all words into the first four letters...
This work investigates situations in the de-coding process of Phrase-based SMT that cause particular errors on the output of the translation. A set of translations post-edited by professional translators is used to automatically identify errors based on edit distance. Binary classifiers predicting the sentence-level existence of an error are fitted...
Significant breakthroughs in machine translation (MT) only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The tara...
We present benchmarking experiments for the intrinsic and extrinsic evaluation of an extended version of our open source framework for machine translation quality estimation Q U E ST , which is described in D2.1.2. We focus on the application of quality predictions for dissemination by estimating post-editing effort. As an extrinsic task, we use qu...
In order to support the integration of the MT Pilots generated in WP2 in the real usage
scenario (WP3), they will be made available as web services. The web services will
provide the essential translation functionalities while encapsulating the details of the
whole development process, except for the necessary parameters required by the
application...
In this paper we describe experiments on predicting HTER, as part of our submission in the Shared Task on Quality Estimation, in the frame of the 9th Workshop on Statistical Machine Translation. In our experiment we check whether it is possible to achieve better HTER prediction by training four individual regression models for each one of the edit...
We present experiments using state of the art quality estimation models to improve the performance of machine translation systems without changing the internal functioning of such systems. The experiments include the following approaches: (i) n-best list re-ranking, where translation candidates (segments) produced by a machine translation system ar...
"Qualitative" is a python toolkit for ranking and selection of sentence-level output by different MT systems using Quality Estimation. The toolkit implements a basic pipeline for annotating the given sentences with black-box features. Consequently, it applies a machine learning mechanism in order to rank data based on models pre-trained on human pr...
Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error...
Despite the growing interest in and use of machine translation post-edited outputs, there is little research work exploring different types of post-editing operations, i.e. types of translation errors corrected by post-editing. This work investigates five types of post-edit operations and their relation with cognitive post-editing effort (quality l...
This work presents the new flexible Multidimensional Quality Metrics (MQM) framework and uses it to analyze the performance of state-of-the-art machine translation systems, focusing on "nearly acceptable" translated sentences. A selection of WMT news data and "customer" data provided by language service providers (LSPs) in four language pairs was a...
Starting from human annotations, we provide a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level. Rankings are decomposed into pairwise comparisons so that they can be learned by binary classifiers, using black-box features derived from linguistic analysis. I...
In this paper we present QUEST, an open source framework for machine translation quality estimation. The framework includes a feature extraction component and a machine learning component. We describe the architecture of the system and its use, focusing on the feature extraction component and on how to add new feature extractors. We also include ex...
Recent research and applications for evaluation and quality estimation of Machine Translation require statistical measures for comparing machine-predicted ranking against gold sets annotated by humans. Additional to the existing practice of measuring segment-level correlation with Kendall tau, we propose using ranking metrics from the research fiel...