Eleftherios Avramidis

Eleftherios Avramidis
Deutsches Forschungszentrum für Künstliche Intelligenz | DFKI · Language Technology

Doctor of Philosophy

About

92
Publications
80,094
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
944
Citations
Introduction
Senior Researcher at German Research Center for Artificial Intelligence (DFKI Lab Berlin). Teaching at the Technical University and the University of Arts of Berlin. Working on automatic translation of Sign Language, Quality Estimation and Evaluation of Machine Translation
Additional affiliations
September 2015 - December 2015
Google Inc.
Position
  • Software Engineering PhD Intern
September 2005 - September 2006
University of Macedonia
Position
  • Student
December 2007 - January 2008
University of Edinburgh
Position
  • Research Assistant
Education
May 2010 - December 2018
Saarland University
Field of study
  • Computational Linguistics, Quality Estimation of Machine Translation
September 2007 - September 2008
University of Edinburgh
Field of study
  • Artificial Intelligence
September 2000 - September 2006
University of Macedonia
Field of study
  • Applied Informatics

Publications

Publications (92)
Article
Full-text available
A deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indica...
Conference Paper
Full-text available
We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 lin...
Conference Paper
Full-text available
This paper describes a test suite submission providing detailed statistics of linguistic performance for the state-of-the-art German-English systems of the Fifth Conference of Machine Translation (WMT20). The analysis covers 107 phenomena organized in 14 categories based on about 5,500 test items, including a manual annotation effort of 45 person h...
Conference Paper
Full-text available
This paper describes the proof-of-concept evaluation for a system that provides translation of speech to virtually performed sign language on augmented reality (AR) glasses. The discovery phase via interviews confirmed the idea for a signing avatar displayed within the users field of vision through AR glasses. In the evaluation of the first prototy...
Conference Paper
Full-text available
In this paper we present a prototypical implementation of a pipeline that allows the automatic generation of a German Sign Language avatar from 2D video material. The presentation is accompanied by the source code. We record human pose movements during signing with computer vision models. The joint coordinates of hands and arms are imported as land...
Preprint
Full-text available
This is the preliminary ranking of WMT24 General MT systems based on automatic metrics. The official ranking will be a human evaluation, which is superior to the automatic ranking and supersedes it. The purpose of this report is not to interpret any findings but only provide preliminary results to the participants of the General MT task that may be...
Preprint
Full-text available
High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages. On the other hand, just ass...
Conference Paper
Full-text available
This document describes the submission of the very first version of the Occiglot open-source large language model to the General MT Shared Task of the 9th Conference of Machine Translation (WMT24). Occiglot is an open-source, community-based LLM based on Mistral-7B, which went through language-specific continual pre-training and subsequent instruc...
Conference Paper
Full-text available
Despite the importance of mouth actions in Sign Languages, previous work on Automatic Sign Language Recognition (ASLR) has limited use of the mouth area. Disambiguation of homonyms is one of the functions of mouth actions, making them essential for tasks involving ambiguous hand signs. To measure their importance for ASLR, we trained a classifier t...
Conference Paper
Full-text available
This paper describes the concept and the software architecture of a fully integrated system supporting a dialog between a deaf person and a hearing person through a virtual sign language interpreter (aka avatar) projected in the real space by an Augmented Reality device. In addition, a Visual Simultaneous Localization and Mapping system provides in...
Conference Paper
Full-text available
We examine methods and techniques, proven to be helpful for the text-to-text translation of spoken languages in the context of gloss-to-text translation systems, where the glosses are the written representation of the signs. We present one of the first works that include experiments on both parallel corpora of the German Sign Language (PHOENIX14T a...
Conference Paper
Full-text available
We employ a linguistically motivated challenge set in order to evaluate the state-of-the-art machine translation metrics submitted to the Met-rics Shared Task of the 7th Conference for Machine Translation. The challenge set includes about 20,000 items extracted from 145 MT systems for two language directions (German ⇔ English), covering more than 1...
Conference Paper
Full-text available
This paper presents the results of the WMT22 Metrics Shared Task. Participants submitting automatic MT evaluation metrics were asked to score the outputs of the translation systems competing in the WMT22 News Translation Task on four different domains: news, social, e-commerce, and chat. All metrics were evaluated on how well they correlate with hu...
Conference Paper
Full-text available
This paper presents the results of the First WMT Shared Task on Sign Language Translation (WMT-SLT22) 1. This shared task is concerned with automatic translation between signed and spoken 2 languages. The task is novel in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known p...
Conference Paper
Full-text available
This document describes a fine-grained linguistically motivated analysis of 29 machine translation systems submitted at the Shared Task of the 7th Conference of Machine Translation (WMT22). This submission expands the test suite work of previous years by adding the language direction of English-Russian. As a result , evaluation takes place for the...
Conference Paper
Full-text available
This paper describes the participation of DFKI-SLT at the Sign Language Translation Task of the Seventh Conference of Machine Translation (WMT22). The system focuses on the translation direction from the Swiss German Sign Language (DSGS) to written German. The original videos of the sign language were analyzed with computer vision models to provide...
Conference Paper
Full-text available
In this paper, we investigate the capability of convolutional neural networks to recognize in sign language video frames the six basic Ekman facial expressions for 'fear', 'disgust', 'surprise', 'sadness', 'happiness' and 'anger' along with the 'neutral' class. Given the limited amount of annotated facial expression data for the sign language domai...
Conference Paper
Full-text available
This paper presents a fine-grained test suite for the language pair German-English. The test suite is based on a number of linguistically motivated categories and phenomena and the semi-automatic evaluation is carried out with regular expressions. We describe the creation and implementation of the test suite in detail, providing a full list of all...
Chapter
Full-text available
This paper describes the development of the first test suite for the language direction Portuguese-English. Designed for fine-grained linguistic analysis, the test suite comprises 330 test sentences for 66 linguistic phenomena and 14 linguistic categories. Eight different MT systems were compared using quantitative and qualitative methods via the t...
Conference Paper
Full-text available
We are using a semi-automated test suite in order to provide a fine-grained linguistic evaluation for state-of-the-art machine translation systems. The evaluation includes 18 German to English and 18 English to German systems, submitted to the Translation Shared Task of the 2021 Conference on Machine Translation. Our submission adds up to the submi...
Conference Paper
Full-text available
In this paper, we describe the current main approaches to sign language translation which use deep neural networks with videos as input and text as output. We highlight that, under our point of view, their main weakness is the lack of generalization in daily life contexts. Our goal is to build a state-of-the-art system for the automatic interpretat...
Conference Paper
Full-text available
Influencing transportation demand can significantly reduce CO 2 emissions. Individual user mobility models are key to influencing demand at the personal and structural levels. Constructing such models is a challenging task that depends on a number of interdependent steps. Progress on this task is hamstrung by the lack of high quality public dataset...
Conference Paper
Full-text available
We present the results of the application of a grammatical test suite for German→English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expr...
Preprint
Full-text available
We present the results of the application of a grammatical test suite for German$\rightarrow$English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, mul...
Preprint
Full-text available
We describe a workshop on applications of Artificial Intelligence on Textile Technology through the usage of Machine Learning. The participants are given an overview of the state-of-the-art methods and technologies and are introduced to them through practical exercises. The exercises involve simple hands-on group work with a basic pre-built electri...
Conference Paper
We describe a workshop on applications of Artificial Intelligence on Textile Technology through the usage of Machine Learning. The participants are given an overview of the state-of-the-art methods and technologies and are introduced to them through practical exercises. The exercises involve simple hands-on group work with a basic pre-built electri...
Conference Paper
Full-text available
Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach o...
Preprint
Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach o...
Preprint
Full-text available
We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 lin...
Conference Paper
Full-text available
We present an alternative method of evaluating Quality Estimation systems, which is based on a linguistically-motivated Test Suite. We create a test-set consisting of 14 linguistic error categories and we gather for each of them a set of samples with both correct and erroneous translations. Then, we measure the performance of 5 Quality Estimation s...
Preprint
Full-text available
We present an alternative method of evaluating Quality Estimation systems, which is based on a linguistically-motivated Test Suite. We create a test-set consisting of 14 linguistic error categories and we gather for each of them a set of samples with both correct and erroneous translations. Then, we measure the performance of 5 Quality Estimation s...
Conference Paper
Full-text available
We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 lin...
Article
Full-text available
Despite its wide applicability, Quality Estimation (QE) of Machine Translation (MT) poses a difficult entry barrier since there are no open source tools with a graphical user interface (GUI). Here we present a tool in this direction by connecting the back-end of the QE decision-making mechanism with a web-based GUI. The interface allows the user to...
Article
Full-text available
This submission investigates alternative machine learning models for predicting the HTER score on the sentence level. Instead of directly predicting the HTER score, we suggest a model that jointly predicts the amount of the 4 distinct post-editing operations, which are then used to calculate the HTER score. This also gives the possibility to correc...
Article
Full-text available
In this article we present a novel linguistically driven evaluation method and apply it to the main approaches of Machine Translation (Rule-based, Phrase-based, Neural) to gain insights into their strengths and weaknesses in much more detail than provided by current evaluation schemes. Translating between two languages requires substantial modellin...
Technical Report
Full-text available
This report on the web services and datasets for applications provides an update of the contents/plans described in D3.4 and D3.1 and of the ongoing report (D3.13 v0.91) submitted earlier. The first part of this document (Section 2) will be concerned with web services. The MT Pilots developed in the QTLeap project are documented in Deliverables D2....
Article
Full-text available
We are presenting the development contributions of the last two years to our Python opensource Quality Estimation tool, a tool that can function in both experiment-mode and online web-service mode. The latest version provides a new MT interface, which communicates with SMT and rule-based translation engines and supports on-the-fly sentence selectio...
Conference Paper
Full-text available
This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT developme...
Article
Full-text available
The tool described in this article has been designed to help MT developers by implementing a web-based graphical user interface that allows to systematically compare and evaluate various MT engines/experiments using comparative analysis via automatic measures and statistics. The evaluation panel provides graphs, tests for statistical significance a...
Conference Paper
Full-text available
DFKI participated in the shared translation task of WMT 2015 with the German-English language pair in each translation direction. The submissions were generated using an experimental hybrid system based on three systems: a statistical Moses system, a commercial rule-based system, and a serial coupling of the two where the output of the rule-based s...
Conference Paper
Full-text available
The idea to improve MT quality by using deep linguistic and knowledge-driven information has frequently been expressed. If the goal is to use deep information for building an MT system, there are two extreme options: (1) to start from a purely knowledge-driven approach (RBMT) and try to arrive at the same recall found in current SMT systems; (2) to...
Conference Paper
Full-text available
This paper demonstrates the possibility to make an existing automatic error classi-fier for machine translations independent from the requirement of lemmatisation. This makes it usable also for smaller and under-resourced languages and in situations where there is no lemmatiser at hand. It is shown that cutting all words into the first four letters...
Conference Paper
Full-text available
This work investigates situations in the de-coding process of Phrase-based SMT that cause particular errors on the output of the translation. A set of translations post-edited by professional translators is used to automatically identify errors based on edit distance. Binary classifiers predicting the sentence-level existence of an error are fitted...
Article
Full-text available
Significant breakthroughs in machine translation (MT) only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The tara...
Technical Report
Full-text available
We present benchmarking experiments for the intrinsic and extrinsic evaluation of an extended version of our open source framework for machine translation quality estimation Q U E ST , which is described in D2.1.2. We focus on the application of quality predictions for dissemination by estimating post-editing effort. As an extrinsic task, we use qu...
Technical Report
Full-text available
In order to support the integration of the MT Pilots generated in WP2 in the real usage scenario (WP3), they will be made available as web services. The web services will provide the essential translation functionalities while encapsulating the details of the whole development process, except for the necessary parameters required by the application...
Conference Paper
Full-text available
In this paper we describe experiments on predicting HTER, as part of our submission in the Shared Task on Quality Estimation, in the frame of the 9th Workshop on Statistical Machine Translation. In our experiment we check whether it is possible to achieve better HTER prediction by training four individual regression models for each one of the edit...
Technical Report
Full-text available
We present experiments using state of the art quality estimation models to improve the performance of machine translation systems without changing the internal functioning of such systems. The experiments include the following approaches: (i) n-best list re-ranking, where translation candidates (segments) produced by a machine translation system ar...
Article
Full-text available
"Qualitative" is a python toolkit for ranking and selection of sentence-level output by different MT systems using Quality Estimation. The toolkit implements a basic pipeline for annotating the given sentences with black-box features. Consequently, it applies a machine learning mechanism in order to rank data based on models pre-trained on human pr...
Conference Paper
Full-text available
Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error...
Conference Paper
Full-text available
Despite the growing interest in and use of machine translation post-edited outputs, there is little research work exploring different types of post-editing operations, i.e. types of translation errors corrected by post-editing. This work investigates five types of post-edit operations and their relation with cognitive post-editing effort (quality l...
Conference Paper
Full-text available
This work presents the new flexible Multidimensional Quality Metrics (MQM) framework and uses it to analyze the performance of state-of-the-art machine translation systems, focusing on "nearly acceptable" translated sentences. A selection of WMT news data and "customer" data provided by language service providers (LSPs) in four language pairs was a...
Article
Full-text available
Starting from human annotations, we provide a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level. Rankings are decomposed into pairwise comparisons so that they can be learned by binary classifiers, using black-box features derived from linguistic analysis. I...
Article
Full-text available
In this paper we present QUEST, an open source framework for machine translation quality estimation. The framework includes a feature extraction component and a machine learning component. We describe the architecture of the system and its use, focusing on the feature extraction component and on how to add new feature extractors. We also include ex...
Article
Full-text available
Recent research and applications for evaluation and quality estimation of Machine Translation require statistical measures for comparing machine-predicted ranking against gold sets annotated by humans. Additional to the existing practice of measuring segment-level correlation with Kendall tau, we propose using ranking metrics from the research fiel...