To fully exploit the huge potential of existing open SMT technologies and user-provided content, we have created an innovative online platform for data sharing and MT building. This platform is being developed in the EU collaboration project LetsMT!. This paper presents motivation in developing this platform, its architecture and main features.
All content in this area was uploaded by Andrejs Vasiļjevs
Content may be subject to copyright.
A preview of the PDF is not available
... La traduzione neurale diventa "concorrente diretto alla traduzione automatica statistica e in un certo senso anche a quella basata sulle regole" (Forcada, 2017). Per maggiori informazioni sulla traduzione automatica basata sulle reti neurali si vedano, ad esempio, Koehn (2017) (Vasiļjevs et al., 2011), che offrono il servizio gratuito on-line di traduzione automatica personalizzata basata su corpora paralleli e banche dati terminologiche consegnate dal cliente. Numerosissimi sono, invece, i servizi che vengono offerti per sviluppare sistemi personalizzati di traduzione automatica a pagamento, di cui la maggior parte non è limitata a un determinato settore, come ad esempio Globalese, Tauyou, Tilde MT e altri, mentre alcuni sono specializzati solo per un settore, come ad esempio Lingua Custodia per il settore finanziario. ...
... Esistono progetti scientifici, come ad esempio LetsMT! [4] ( Vasiļjevs et al., 2011), che offrono il servizio gratuito on-line di traduzione automatica personalizzata basata su corpora paralleli e banche dati terminologiche consegnate dal cliente. Numerosissimi sono, invece, i servizi che vengono offerti per sviluppare sistemi personalizzati di traduzione automatica a pagamento, di cui la maggior parte non è limitata a un determinato settore, come ad esempio Globalese, Tauyou, Tilde MT e altri, mentre alcuni sono specializzati solo per un settore, come ad esempio Lingua Custodia per il settore finanziario. ...
... Latvian e-Government MT Platform (Vasiļjevs et al., 2014) is built by Tilde using LetsMT technologies (Vasiļjevs et al., 2011;2012) which are based on the Moses toolkit . LetsMT includes facilities to process parallel and monolingual corpora and build translation and language models for phrase-based statistical machine translation. ...
This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed. We summarize the volume of obtained corpora and provide quality metrics of MT systems trained on this data. Resulting MT systems for English-Latvian, Latvian-English and Latvian-Russian are integrated in the Latvian e-service portal and are freely available on website HUGO.LV. This paper can serve as a guidance for similar activities initiated in other countries, particularly in the context of European Language Resource Coordination action.
... Both baseline SMT system and SMT systems trained on pre-processed data were trained using the LetsMT! platform [8] which is based on the Moses SMT toolkit [3]. In all the experiments the corpus was transformed using finite state transducers. ...
In this paper, we present the results of a series of experiments done to improve the quality of a Lithuanian-English statistical MT (SMT) system. We particularly focus on word alignment and out of vocabulary issues in SMT translating from a morphologically rich language into English.
Machine translation system (MTS) constitutes of functionally heterogeneous modules for processing source language to a given target language. Deploying such an application on a stand-alone system requires much time, knowledge and complications. It even becomes more challenging for a common user to utilize such a complex application. This paper presents a MTS that has been developed using a combination of linguistic rich, rule-based and prominent neural-based approach. The proposed MTS is deployed on the cloud to offer translation as a cloud service and improve the quality of service (QoS) from a stand-alone system. It is developed on TensorFlow and deployed under the cluster of virtual machines in the Amazon web server (EC2). The significance of this paper is to demonstrate management of recurrent changes in term of corpus, domain, algorithm and rules. Further, the paper also compares the MTS as deployed on stand-alone machine and on cloud for different QoS parameters like response time, server load, CPU utilization and throughput. The experimental results assert that in the translation task, with the availability of elastic computing resources in the cloud environment, the job completion time irrespective of its size can be assured to be within a fixed time limit with high accuracy.
In this article we will describe the design and implementation of Jane, an efficient hierarchical phrase-based (HPB) toolkit developed at RWTH Aachen University. The system has been used by RWTH at several international evaluation campaigns, including ...
In this work, we show how an existing rule- based, general-purpose machine translation system may be improved and adapted auto- matically to a given domain, whenever parallel corpora are available. We perform this adap- tation by extracting dictionary entries from the parallel data. From this initial set, the applica- tion of these rules is tested against the baseline performance. Rules are then pruned depend- ing on sentence-level improvements and dete- riorations, as evaluated by an automatic string- based metric. Experiments using the Europarl dataset show a 3% absolute improvement in BLEU over the original rule-based system.
We describe an open-source toolkit for sta- tistical machine translation whose novel contributions are (a) support for linguisti- cally motivated factors, (b) confusion net- work decoding, and (c) efficient data for- mats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.
this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods introduced by Brown, Della Pietra, Della Pietra, and Mercer (1993) by using refined statistical models for the translation process. The basic idea of this approach is to develop a model of the translation process with the word alignment as a hidden variable of this process, to apply statistical estimation theory to compute the "optimal" model parameters, and to perform alignment search to compute the best word alignment
We evaluated the productivity increase of statistical MT post-editing as compared to tra-ditional translation in a two-day test involving twelve participants translating from English to French, Italian, German, and Spanish. The test setup followed an empirical methodology. A random subset of the entire new content produced in our company during a given year was translated with statistical MT engines trained on data from the previous year. The translation environment recorded translation and post-editing times for each sentence. The results show a productivity increase for each participant, with significant variance across inviduals.
Randomised techniques allow very big language models to be represented suc- cinctly. However, being batch-based they are unsuitable for modelling an un- bounded stream of language whilst main- taining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online ran- domised model matches the performance of batch-based LMs without incurring the computational overhead associated with full retraining. This opens up the possibil- ity of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day.
TDA Members doing business with Moses URL: http://www.tausdata.org/blogdoing- business-with-moses-open-source-translation
Nov 2010
A Joscelyne
A. Joscelyne. 2010. TDA Members doing business with
Moses. TAUS DA blog on October 7, 2010. URL:
http://www.tausdata.org/blog/2010/10/doing-
business-with-moses-open-source-translation/.
(Archived
by
WebCite®
at
http://www.webcitation.org/617g6iKGN)
Selective addition of corpus-extracted phrasal lexical rules to a rule-based machine translation system
Jan 2009
L Dugast
J Senellart
P Koehn
L. Dugast, J. Senellart, P. Koehn. 2009. Selective addition of corpus-extracted phrasal lexical rules to a
rule-based machine translation system, in Proceedings of MT Summit XII