Project

A Comparative Evaluation of Phrase-Based SMT and Neural Machine Translation

Goal: This study reports on a comparative human evaluation of phrase-based
SMT and NMT in four language pairs, using the PET tool to compare output from both systems using a variety of metrics. These metrics comprise automatic evaluation, human rankings of adequacy and fluency, error-type markup, and post-editing effort (technical and temporal effort). This evaluation is part of the work of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality MT of educational data. While the primary intention for this evaluation is to identify the best MT paradigm for our proposed methodology for TraMOOC, we believe that our evaluation results will be of interest to the wider research community and to those in the translation industry interested in the deployment of cutting-edge MT systems.

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
68
Reads
0 new
303

Project log

Joss Moorkens
added a research item
This article reports a multifaceted comparison between statistical and neural machine translation (MT) systems that were developed for translation of data from massive open online courses (MOOCs). The study uses four language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neural MT is preferred in side-by-side ranking, and is found to contain fewer overall errors. Results are less clear-cut for some error categories, and for temporal and technical post-editing effort. In addition, results are reported based on sentence length, showing advantages and disadvantages depending on the particular language pair and MT paradigm.
Sheila Castilho
added a research item
This paper reports on a comparative evaluation of phrase-based statistical machine translation (PBSMT) and neural machine translation (NMT) for four language pairs, using the PET interface to compare educational domain output from both systems using a variety of metrics, including automatic evaluation as well as human rankings of adequacy and fluency, error-type markup, and post-editing (technical and temporal) effort, performed by professional translators. Our results show a preference for NMT in side-by-side ranking for all language pairs, texts, and segment lengths. In addition, perceived fluency is improved and annotated errors are fewer in the NMT output. Results are mixed for perceived adequacy and for errors of omission, addition , and mistranslation. Despite far fewer segments requiring post-editing, document-level post-editing performance was not found to have significantly improved in NMT compared to PBSMT. This evaluation was conducted as part of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality machine translation of educational data.
Joss Moorkens
added a research item
This paper discusses neural machine translation (NMT), a new paradigm in the MT field, comparing the quality of NMT systems with statistical MT by describing three studies using automatic and human evaluation methods. Automatic evaluation results presented for NMT are very promising, however human evaluations show mixed results. We report increases in fluency but inconsistent results for adequacy and post-editing effort. NMT undoubtedly represents a step forward for the MT field, but one that the community should be careful not to oversell.
Joss Moorkens
added a project goal
This study reports on a comparative human evaluation of phrase-based
SMT and NMT in four language pairs, using the PET tool to compare output from both systems using a variety of metrics. These metrics comprise automatic evaluation, human rankings of adequacy and fluency, error-type markup, and post-editing effort (technical and temporal effort). This evaluation is part of the work of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality MT of educational data. While the primary intention for this evaluation is to identify the best MT paradigm for our proposed methodology for TraMOOC, we believe that our evaluation results will be of interest to the wider research community and to those in the translation industry interested in the deployment of cutting-edge MT systems.