Conference PaperPDF Available

Incremental Adaptation Using Translation Information and Post-Editing Analysis

Authors:

Abstract and Figures

It is well known that statistical machine translation sys-tems perform best when they are adapted to the task. In this paper we propose new methods to quickly perform incre-mental adaptation without the need to obtain word-by-word alignments from GIZA or similar tools. The main idea is to use an automatic translation as pivot to infer alignments between the source sentence and the reference translation, or user correction. We compared our approach to the stan-dard method to perform incremental re-training. We achieve similar results in the BLEU score using less computational resources. Fast retraining is particularly interesting when we want to almost instantly integrate user feed-back, for instance in a post-editing context or machine translation assisted CAT tool. We also explore several methods to combine the trans-lation models.
Content may be subject to copyright.
Incremental Adaptation Using Translation Information and Post-Editing Analysis
Fr´
ed´
eric Blain, Holger Schwenk
LIUM
Universit´
e du Maine, Avenue Laennec
72085 Le Mans, France
lastname@lium.univ-lemans.fr
Jean Senellart
Systran SA
5, rue Feydeau
75002 Paris, France
lastname@systran.fr
Abstract
It is well known that statistical machine translation sys-
tems perform best when they are adapted to the task. In this
paper we propose new methods to quickly perform incre-
mental adaptation without the need to obtain word-by-word
alignments from GIZA or similar tools. The main idea is
to use an automatic translation as pivot to infer alignments
between the source sentence and the reference translation,
or user correction. We compared our approach to the stan-
dard method to perform incremental re-training. We achieve
similar results in the BLEU score using less computational
resources. Fast retraining is particularly interesting when we
want to almost instantly integrate user feed-back, for instance
in a post-editing context or machine translation assisted CAT
tool. We also explore several methods to combine the trans-
lation models.
1. Introduction
Due to multiplication of resources and the diversity of lan-
guages, Machine Translation (MT) systems are widely used
as a precious help for human translators. Most of the systems
used today are based on the statistical approach. Those sys-
tems extract all the knowledge from the provided data. Nev-
ertheless, these systems have some limits: first, the specific
resources available at ttime could be less appropriate at t+1.
Consequently, they need to be regularly re-trained in order
to be updated, which is usually computationally demanding.
The goal of incremental adaptation is then twofold: to adapt
the system on the fly when new resources are available with-
out re-training the entire system.
Post-Editing (PE) the output of SMT systems is widely
used, amongst others, by professional translators of localiza-
tion services which need for example to translate technical
data in specific domains into several languages. However,
the use of PE is restricted by some aspects that must be taken
into consideration. As resumed by [1], the time spent by
the post-editor is a commonly used measure of the PE effort,
which should not to be, in case of poor translation quality,
more important than translation from scratch. Even if this
temporal aspect can be see as the most important, PE effort
can be evaluated using automatic metrics based on the edit
distance. These metrics commonly use the number of re-
quired edits of the MT system output to reach a reference
translation. From then, the combination of PE and incremen-
tal adaptation can be seen as a way to reduce the task effort
by allowing a MT system to gradually learn from its own er-
rors. Especially considering the repetitive nature of the task
highlighted by [2].
However, incremental adaptation is still a tricky task:
how to adapt the system correctly? Adaptation should not de-
grade system performance and accuracy. Some approaches
are possible and we will try to see the impact of several of
them in the second part of this article.
First of all, we present a new experimental approach for
incremental adaptation of a MT system using PE analysis.
Starting from a generic baseline, we have gradually adapted
our system by translating an in-domain corpora which was
split beforehand. Each part of the corpora was translated us-
ing the translation model adapted at the previous step, i.e.
updated with new extracted phrases. These phrases are the
result of a word-to-word alignment combination we present
afterward.
1.1. Similar work
The most similar approach in the literature is proposed in
[3] who present an incremental re-training algorithm to sim-
ulate a post-editing situation. It is proposed to extract new
phrases from approximate alignments which were obtained
by a modified version of Giza-pp [4]. An initial alignment
with one-to-one links between the same sentence positions
is created and then iteratively updated as long as improve-
ments are observed. In practice, a greedy search algorithm
is used to find the locally optimal word alignment. All
source positions carrying only one link are tried, and the sin-
gle link change which produces the highest probability in-
crease according to the Giza-pp model 4 is kept. The result-
ing alignment is improved with two simple post-processing
steps. First, each unknown word in source side is aligned
with the first non-aligned unknown word on the target side.
Second, unaligned pairs of positions surrounded by corre-
sponding alignments are automatically aligned.
In this paper, we present a very fast word-to-word align-
Figure 1: Incremental adaptation workflow in three steps protocol: 1. Translation and source-translation alignment: source
sentences are translated using the SMT system Moses. Alignment links are generated during the translation step; 2. Edit distance
on translation-reference: MT system output and its reference translation are aligned using edit distance algorithm of TER; 3.
Source-reference alignment: the alignment links are deduced from combination of alignments of both step 1 and 2. Phrase pairs
are then extracted, scored and added to translation model which is finally re-trained.
ment algorithm which is partially based on the edit-distance
algorithm. As argued in [3], “to be practical, incremen-
tal retraining must be performed in less than one second”.
For comparison, our entire alignment process takes few hun-
dredths of second for 1500 sentences, in comparison to sev-
eral seconds per sentences as reported in [3].
[5] present stream based incremental adaptation using an
on-line version of the EM algorithm. This approach designed
for large amounts of incoming data is not really adapted for
the post-editing context. Like [3], we propose an incremental
adaptation workflow that is more oriented to real time pro-
cessing.
As part of our experiments, we have compared our ap-
proach with the use of the freely available tool named Inc-
Giza-pp,1an incremental version of Giza-pp. It is precisely
intended to inject new data into an SMT system without hav-
ing to restart the entire word alignment procedure. To our
knowledge, this is the standard method currently used in the
field. In our experiments, we achieve similar results with re-
spect to the BLEU score using less time.
The reminder of this paper is organized as follows. In
the next section we first describe our incremental adaptation
workflow and more particularly the word-to-word alignment
methodology based on the edit distance. Section 3 is ded-
icated to the experimental protocols and compares the per-
formance of our approach with the standard method using
Inc-Giza-pp. The paper concludes with a discussion of per-
spectives of this work.
1http://code.google.com/p/inc-giza-pp/
2. Incremental Adaptation Workflow
In this paper, we present a new methodology to perform in-
cremental training and domain adaptation. Starting with a
generic phrase-based MT baseline system (PBMT), we have
sequentially translated the source side of an in-domain cor-
pus. At each step, like [3], we have simulated a human post-
editing the translations by using the corresponding reference
translations of the data. At the sentence level, the source and
its reference translation are aligned in order to subsequently
retrieve the corresponding phrase pairs. The extracted phrase
pairs are then scored and used to retrain (i.e. adapt) the trans-
lation model of our PBMT system.
We have developed an aligning protocol which operates
in three steps, named “translation”,“analysis” and “adap-
tation”. These three steps are linked together by a word-to-
word alignment algorithm which allows us to align a source
and its reference translation and then, to extract new phrase
pairs with which the MT system will be adapted. This algo-
rithm is illustrated in Figure 1 and explained in details in the
next section.
2.1. Word-to-word alignment combination
Our approach to align the source and its corresponding refer-
ence translation could be seen as a combination of the source
to hypothesis word alignments and an analysis of the edit dis-
tance between the hypothesis and the reference. The central
element of this approach is an automatic translation of the
source sentence into the target language. The principle of
this idea is illustrated in Figure 2.
Figure 2: Example of a source-to-reference alignment using using the automatic translation as pivot. The alignment links
between the source sentence and the translation are generated by the MT system. Those between the translation and its post-
edited version (i.e. the reference) are calculated by TER. Finally, the source-to-reference alignment links are deduced by an
alignment combination based on both alignment sets computed before.
2.1.1. Translation: source to translation alignment
The SMT system used to translate the source sentences is
based on the Moses SMT toolkit [6]. Moses can provide the
word-to-word alignments between the source sentence and
the translation hypothesis. This aligning information repre-
sents the first part of our alignment combination. This au-
tomatic translation is “compared” with the reference transla-
tion using an edit distance algorithm.
2.1.2. Analysis: edit distance alignment
In this paper, we use the Translation Error Rate (TER) algo-
rithm as proposed in [7]. TER is an extension of the Word
Error Rate (WER) which is more suitable for machine trans-
lation since it can take into account word reorderings. TER
uses the following edit types: insertion, deletion, substitution
and shift.
The TER is computed between the output of our SMT
system and the corresponding reference translation, and the
word-to-word alignments are inferred. We only keep the
aligned and substituted edit types in order to extract what
we consider as the most interesting phrase pairs. Indeed,
we argue that what is aligned correspond to what our sys-
tem knows, while what is substituted correspond to what our
system does not know.
Our approach can be extended to use TER-Plus [8], an
extension of TER using paraphrases, stemming and syn-
onyms in order to obtain better word-to-word alignments.
2.1.3. Adaptation: source to reference alignment
Considering the SMT translation hypothesis as a “pivot” for
aligning both source and its reference sentence, we have de-
signed the word-to-word alignment algorithm shown by Al-
gorithm 1. It combines source-to-translation and translation-
to-reference alignments, and then deduces the source-to-
reference alignment path. From this path, the translation
model is finally updated using the standard training phrase
extraction and scoring script provided with Moses.
Data: src-to-tgt word alignments, tgt-to-ref edit-path
foreach src-to-tgt word alignment do
alignment(src-word, tgt-word) = 1;
end
if edit-path has shift then
foreach shift do
updateWordPosition(tgt, shift);
end
end
foreach edit-type of edit-path do
if edit-type is ‘align’ or ‘substitution’ then
alignment(tgt-word, ref-word) = 1;
end
end
foreach ref-word of ref do
foreach tgt-word aligned to ref-word do
if isAligned?(src-word, tgt-word) then
alignment(src-word, ref-word) = 1;
end
end
end
Algorithm 1: Source-to-reference alignment algorithm
at word level. Using both source-to-translation align-
ments and translation-to-reference edit-path, the source-to-
reference alignments path are build.
3. Experimental evaluation
The approach described in the previous section is compared
to inc-Giza-pp which is considered as the state-of-the-art tool
for incremental training. In our first experiments, each sys-
tem uses a single translation model which is updated and en-
tirely retrained after each iteration. For the results we present
hereinafter, the system with inc-Giza-pp will be called “inc-
Giza-pp” and the system with our approach will be called
“noGizapp”.
3.1. Training data
The experiments were performed on data which was made
available by the French COSMAT project. The goal of this
project is to provide task-specific automatic translations of
scientific texts on the French HAL archive.2This archive
contains a large amount of scientific publications and PhD
Thesis. The MT system is closely integrated into the work-
flow of the HAL archive. In particular, the author has the pos-
sibility to correct the provided automatic translations. These
translations will be then used to improve the system. In this
paper, we consider the automatic translation from English
into French.
Three corpora of parallel data are available to train the
translation model: two generic corpora and an in-domain
corpus for adaptation. The two first corpora are Europarl and
News Commentary with 50 million and 3 million words, re-
spectively. They were used to train our SMT baseline sys-
tems. The third corpus, named “absINFO”, contains 500
thousand words randomly selected from abstracts of scien-
tific papers in the domain of Computer Science. Informa-
tion on the sub-domains is also available (networks, AI, data
base, theoretical CS, ...), but was not used in this study. The
corpus if freely available to support research in domain adap-
tation and was already used by the 2012 JHU summer work-
shop on this topic. A detailed description of this corpus can
be found in [9].
This in-domain corpus was split into three sub-corpora:
absINFO.corr.train is composed of 350k words and
is used to simulate the user post-editing or corrective
training.
absINFO.dev is a set of 75k words and used for de-
velopment.
absINFO.test another set of 75k words used as a test
corpus to monitor the performance of our adaptation
workflow.
Moreover, in order to better simulate a sequential post-
editing process, the absINFO.corr.train corpus was split into
10 sub-sets (about 1.5k sentences with 35k words each). This
corresponds quite well to the update of an MT system after a
post-correction of an entire document.
3.2. Baseline Training
The baseline SMT systems were constructed using the stan-
dard Moses pipeline and Giza-pp for word alignment. In or-
der to later use Inc-Giza-pp, the incremental version of Giza-
pp, we had to train a specific baseline system using the Hid-
den Markov Model (HMM) word alignment model option.
However, to make a fair comparison of the two adaptation
techniques, the baseline and following systems were trained
on the same data and tuned with MERT [10] with the same
2http://hal.archives-ouvertes.fr/
initial parametrization. The inc-Giza-pp and noGizapp base-
line SMT systems achieve a BLEU score of 35.27 and 35.32
BLEU points on the development corpus respectively, and
31.89 and 32.27 BLEU points on the test corpus.
3.3. Analysis of processing time and alignment quality
The two incremental training approaches are compared with
respect to the BLEU score obtained by adding the additional
aligned data. We also report the time needed to perform the
word alignments. For inc-Giza-pp, the alignment protocol is
composed of several steps (for more details, see “Incremen-
tal Training” of the “Advanced Features” section in Moses
user documentation.3) First, one has to preprocess the data
for use by Giza-pp. This involves updating the vocab files,
converting the sentences into the snt format of Giza-pp, and
then, updating the co-occurrence file. Then, Giza-pp is exe-
cuted to update and compute the alignments for the new data.
This is performed in both directions, source-to-translation
and translation-to-source. For each iteration of our experi-
ment, this process takes about 14 minutes.
For the noGizapp system, the required time to perform
the source-to-translation alignment can be considered as null
because it is implicitly achieved during the translation. The
TER between the SMT translation and the reference trans-
lation is computed using a fast and freely available C++
implementation.4This tool can align about 35k words in
about three seconds (corresponding to 1.5k sentences in the
10% subset of the absINFO.corr.train corpus). The align-
ment combination of the source and reference translation, de-
scribed in algorithm 1, takes less than a second. Overall, we
can obtain the source-to-reference alignments of 35k words
in a few seconds only.
The BLEU scores on the development (left part) and test
data (right part) are compared in Figure 3. The following
systems were built:
Gizapp for each subcorpus of the absINFO.corr.train train-
ing data (10%, 20%, 30%. . . 100%), all the avail-
able training data is concatenated and the full training
pipeline is performed, including a new word alignment
which considers all the training data. We consider this
as the upper limit of the performance we could achieve
by incremental training. This procedure is very time
consuming.
inc-Giza-pp the subcorpora of of the absINFO.corr.train
training data are added using the incremental version
of Giza. This resulted in a slight decrease of the BLEU
score on the development data and a quite unstable per-
formance on the Test data.
noGizapp incremental training using the new approach de-
scribed in this paper. We always used the same base-
3Available online: http://www.statmt.org
4http://sourceforge.net/projects/tercpp/
Figure 3: Incremental adaptation in BLEU score for our two PBMT systems on both development and test corpora. The Inc-
Giza-pp system uses incremental version of Giza-pp for aligning sentence pair, while noGizapp system uses the approach we
present in this paper, which is based on translation information and edit distance combination. The ‘Gizapp’ and ‘noGizapp’
curves represent the BLEU score obtained with a in-domain adaptation of our baseline systems, without incremental approach.
While the curves ‘Inc-Giza-pp’ and ‘(incremental) noGizapp’ represent the in-domain adaptation scores over an incremental
process.
line SMT system to translate the additional adaptation
data.
inc-noGizapp like noGizapp, but using the system adapted
in the previous step to translate the additional adapta-
tion data.
The proposed approach to obtain incremental word align-
ments achieves slightly better BLEU scores on both the de-
velopment and the test corpus, but performs much faster.
The large variations on the test corpus could be explained
by two potential reasons. The first one could be the char-
acteristics of the absINFO.corr.train corpus. It was created
from abstracts of (Computer Science) sub-domains which
were randomly selected. Consequently, a sub-corpus pre-
dominantly represented in a sub-corpus of absINFO corpus
could be not represented in the test corpus. The second rea-
son could be the use of only one translation model. As ex-
plained above, this translation model is updated with new
phrase pairs extracted from each iteration. Because we are
only interested by edit types corresponding to ’align’ and
’substitution’ edit type during the edit distance analysis (see
Section 2.1.2), the extracted phrase pairs could be generic
or in-domain. Added to all entries already in the translation
model, these new phrases disturb the probability distribution.
This could also explain why our incremental systems are per-
forming worse than the non incremental systems (what we
have called “oracle systems”) for which, the probability dis-
tribution is tuned in better way.
Another possibility could be to use two translation mod-
els like [3]. In this way, we can quickly create a phrase-table
from the word alignments of the additional data.
3.4. Combination of translation models
In this section, we present results achieved by combining sev-
eral translation models. The techniques described in the pre-
vious sections can significantly speed-up the word-alignment
process, in comparison to running incremental Giza-pp, but
we still need to create a new phrase table on all the data.
Therefore, we propose to create a new phrase table on the
newly added data only and to combine it with the original
unadapted phrase table.
3.4.1. Back-off Models
Moses support several modes to use multiple phrase tables.
We first explored the back-off mode which favors the princi-
pal phrase table: the second phrase table is only considered if
the word or phrase is not found in the first one. Figure 4. The
dotted curve represents the use of the incrementally trained
in-domain translation model with the generic one as back-off.
The crossed curve represents the use of these same models
but in reverse order.
As we can see, we got very different results depending
on which translation model is used first, but this can be
easily explained by the nature of the back-off models. Our
in-domain translation model is built with the incrementally
added data only, i.e. very small amounts of data, in particular
during the first iterations.
Figure 5 presents when jointly using both translation
models. In this configuration, separate translation options
are created for each occurrence, the score being combined if
the same translation option is found in both translation mod-
els. Compared to the use of only one translation model, we
Figure 4: Results for use of “back-off” models. The crossed
curve represents our PBMT system using only one transla-
tion model while the dotted and third curves represent re-
spectively the impact of use two back-off models but in dif-
ferent order.
Figure 5: Comparison between use of back-off (dotted curve)
and non back-off models. The crossed curve represents our
PBMT system using only one translation model. The third
curve represents a PBMT system using its both translation
models for the decoding path while the dotted curve shows
our results for using our translation models in back-off mode.
can observe a significant degradation near 80% of adaptation
data before finally achieving a similar final BLEU score (up
to +0.2 points) compared to inc-Giza-pp and noGizapp.
Once again, we believe that the nature of our absINFO
corpus may explain the evolution of our score. When our
SMT systems has to translate more generic sentences, it
is likely that the translation options were provided by our
generic translation rather than our in-domain model.
Based on this observation, we tried to limit edit distance
analysis to substitutions only.
Figure 6: Use of 2 translations models with noback-off and
only substitution were kept, or not.
3.4.2. Filtering by edit-distance type
The Figure 6 shows the results obtained with an in-domain
translation model only trained from substitutions which were
detected during the edit distance analysis. As we argued in
section 2.1.2, we consider that the “substitution” edit type
corresponds to what the MT system does not know since it
was necessary to fix its output.
As we can see, the previous degradation is less impor-
tant.Overall, the evolution of the BLEU score is smoother
than for the other approaches tested so far. By keeping the
phrase pairs corresponding to substitutions only (in the edit-
path), we have also limited the contextual phrases in our in-
domain translation model. It should also take into account
the alignment errors that would have a more important im-
pact in this configuration on the quality of the translation
model.
3.4.3. N-best alignment generation
One of the key points presented in this paper is the use of
the translations to generate the alignment links between a
source sentence and its translation generated by the system.
By default, our MT system returns the best translation can-
didate after decoding. This means that this translation has
obtained the highest decoding score, but that does not neces-
sarily mean that the alignment associated with it is the best
one.
Based on this observation, we tried to explore the nmost
likely translations hypothesis (n-best list). Indeed, a source
sentence could be translated into the same translation us-
ing different segmentations into phrase-pairs. With our ap-
proach, for the same sentence-translation pair, if we have
multiple alignment candidates, we can generate more source-
to-reference alignments and then, potentially reinforce our
in-domain translation model. Using only the two best non
distinct translation candidates, we obtained the results shown
Figure 7: Use of n-best translation candidate to reinforce
alignment possibilities and then, extend our phrase-pair gen-
eration. The starred curve presents our PBMT system for
which we used the two first translation candidates in order
to extract phrase pairs, while the second curve represents the
same system but only the 1-best translation candidate is used.
in Figure 7. Unfortunately, the results are worse than ex-
pected. In future work, we will investigate other options to
use the information in the n-best lists.
3.4.4. No tuning step
In the final part of the paper, results from an incremental
adaptation of a PBMT system without tuning step are pre-
sented. This procedure is very time-efficient and stable since
we do not apply tuning at every adaptation step. We argue
that we do not need to re-tune our models since adaptation
only adds small amounts of information. Tuning is only ap-
plied at the creation of the model, and the resulting parame-
ters are maintained during the adaptation process. The results
of this procedure are shown in Figure 8.
First, we can observe a clear difference between the
squared and the dotted curves for the 10% adaptation level,
even though they result from the same approach. This is due
to the baseline that we applied: By default, our PBMT sys-
tem is a translation model using only one phrase table. We
need to tune however on a “new baseline system” using two
phrase tables (the one at the 10% level), for which the tuning
weights obtained remain stable throughout adaptation.
Second, the resulting curve is rather smooth, indicating
the instability of the tuning process.
To sum up, by applying our incremental adaptation, we ob-
tain a clear improvement in BLEU scores (+0.5 points), how-
ever without the need to retune at every adaptation. Tuning
can be performed in larger time intervals, for example - in
an industrial post-editing context - every night or as soon as
processing resources become available.
Figure 8: Results for incremental adaptation with no tun-
ing step. The squared curve represents a PBMT system
with normal tuning process achieved at each adaptation it-
eration, while the dotted curve represents the same system
for which the tuning weights obtained at 10% level remain
stable throughout the entire adaptation.
4. Conclusion and Future Work
In this paper, we have presented a new word-to-word
alignment methodology for incremental adaptation using a
phrase-based MT system. This method uses the information
generated during the translation step and then relies on an
analysis of a (simulated) post-editing step to infer a source-
to-reference alignment at the word level.
Compared to incremental Giza, the standard method cur-
rently used in the field, the first part of our experiments show
that our approach allows us to obtain similar performance
in the BLEU score at an significantly improved speed. In-
cremental Giza needs several minutes to align two corpora
of about 35k words while the approach proposed in this pa-
per runs in some seconds. Our approach could be therefore
integrated into an interface dedicated to post-editing which
would exploit user feedback in real time.
The second part of this article was dedicated to experi-
ments on translation model combination. These experiments
show that we can get better results by jointly using two trans-
lation models instead of only one. The results of these exper-
iments suggest some directions for future research. For ex-
ample, the use of the TER algorithm for analyzing the post-
editing result could be reinforced by the notion of “Post Edit
Actions” introduced by [2], in order to better identify errors
of the SMT system.
5. Acknowledgment
This research was partially financed by the DGA and
the ANRT under CIFRE-Defense 7/2009, the french ANR
project COSMAT under ANR-09-CORD-004, and the Eu-
ropean Commission under the project MATE CAT, ICT-
2011.4.2 – 287688.
6. References
[1] M. Koponen, “Comparing human perceptions of post-
editing effort with post-editing operations, Proceed-
ings of the Seventh Workshop on Statistical Machine
Translation, p. 181–190, June 2012.
[2] F. Blain, J. Senellart, H. Schwenk, M. Plitt, and
J. Roturier, “Qualitative analysis of post-editing for
high quality machine translation,” in Machine Trans-
lation Summit XIII, A.-P. A. for Machine Transla-
tion (AAMT), Ed., Xiamen (China), 19-23 sept. 2011.
[3] D. Hardt and J. Elming, Incremental Re-training for
Post-editing SMT., 2010.
[4] F. Och and H. Ney, “A systematic comparison of var-
ious statistical alignment models,” Computational lin-
guistics, vol. 29, no. 1, pp. 19–51, 2003.
[5] A. Levenberg, C. Callison-Burch, and M. Osborne,
“Stream-based translation models for statistical ma-
chine translation,” in Human Language Technologies:
The 2010 Annual Conference of the North American
Chapter of the Association for Computational Linguis-
tics. Association for Computational Linguistics, 2010,
pp. 394–402.
[6] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch,
M. Federico, N. Bertoldi, B. Cowan, W. Shen,
C. Moran, R. Zens, et al., “Moses: Open source toolkit
for statistical machine translation,” in Annual meeting-
association for computational linguistics, vol. 45, no. 2,
2007, p. 2.
[7] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and
J. Makhoul, “A study of translation edit rate with tar-
geted human annotation,” in Proceedings of Associa-
tion for Machine Translation in the Americas, 2006, pp.
223–231.
[8] M. Snover, N. Madnani, B. Dorr, and R. Schwartz,
“Fluency, adequacy, or hter? exploring different human
judgments with a tunable mt metric,” in Proceedings of
the Fourth Workshop on Statistical Machine Transla-
tion, vol. 30. Association for Computational Linguis-
tics, 2009, pp. 259–268.
[9] L. Patrik, H. Schwenk, and F. Blain, “Automatic trans-
lation of scientific documents in the hal archive,” in
Proceedings of the Eight International Conference on
Language Resources and Evaluation (LREC’12). Is-
tanbul, Turkey: European Language Resources Associ-
ation (ELRA), may 2012, pp. p.3933–3936.
[10] F. Och, “Minimum error rate training in statistical ma-
chine translation,” in Proceedings of the 41st Annual
Meeting on Association for Computational Linguistics-
Volume 1. Association for Computational Linguistics,
2003, pp. 160–167.
... A notably different approach to online adaptation is taken by Blain et al. [2012], using the word-alignments of source and the machine translation output as a means for learning edits done by the post-editor. Hardt and Elming [2010] use word-alignments between source and the post-edit or a reference translation in order to build a local phrase-table for adaptation. ...
... Word alignments have been used for improving CAT systems: After presenting anecdotal evidence in , Schwartz et al. [2015] show that displaying word-alignment in the PE process may help post-editors when baseline machine translation quality is insufficient. Blain et al. [2012] use a pivotal wordalignment from machine translation output to post-edit inferred from a automatic string distance metric for learning corrections of the MT system. ...
Thesis
Automatic translation of natural language is still (as of 2017) a long-standing but unmet promise. While advancing at a fast rate, the underlying methods are still far from actually being able to reliably capture syntax or semantics of arbitrary utterances of natural language, way off transporting the encoded meaning into a second language. However, it is possible to build useful translating machines when the target domain is well known and the machine is able to learn and adapt efficiently and promptly from new inputs. This is possible thanks to efficient and effective machine learning methods which can be applied to automatic translation. In this work we present and evaluate methods for three distinct scenarios: a) We develop algorithms that can learn from very large amounts of data by exploiting pairwise preferences defined over competing translations, which can be used to make a machine translation system robust to arbitrary texts from varied sources, but also enable it to learn effectively to adapt to new domains of data; b) We describe a method that is able to efficiently learn external models which adhere to fine-grained preferences that are extracted from a constricted selection of translated material, e.g. for adapting to users or groups of users in a computer-aided translation scenario; c) We develop methods for two machine translation paradigms, neural- and traditional statistical machine translation, to directly adapt to user-defined preferences in an interactive post-editing scenario, learning precisely adapted machine translation systems. In all of these settings, we show that machine translation can be made significantly more useful by careful optimization via preference learning.
... The term forced alignment refers to coercively aligning unseen parallel text by selecting the maximum probability given by the model, even if its value is low. Blain, Schwenk, Senellart and Systran (2012) studied this idea further to perform adaptation with good results. The recently developed MateCat tool (Matecat, 2015), a Web-based CAT tool, is a good example in this category. ...
... Concerning the adaptation strategy, Formiga et al. (2012) combined UE-specific translation models with the baseline models using Foster and Kuhn's (2007) interpolation with empirically-set weights. Blain et al. (2012) studied two decoding strategies considering the baseline and UE phrase-tables separately: back-off, if a phrase is not found in the baseline translation model, the phrase table is considered and multiple decoding, if the same phrase is found in both translation models, both translations and scores are used. They found multiple decoding to be the best strategy while restricting the TER alignment only to the substitution operations (i.e., neglecting addition, deletion, and shifting edits). ...
Article
Full-text available
In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.
... Further insights can be drawn from the analysis of the word level corrections produced by the two translator profiles. To this aim, word insertions, deletions, substitutions and phrase shifts are extracted using the TERcom software similar to Blain et al. (2012) and Wisniewski et al. (2013). For each error type, the ratio between the number of edit operations and the total number of occurred errors operations performed is computed. ...
... ing the model components, they incrementally learn an automatic post-editing (APE) system that translates MT output into user-corrected output and employ it on top of the SMT pipeline, applying automatic post-edits before the output is passed to the user.A number of related papers have been published on adaptation experiments for the Mate-Cat tool.Blain et al. (2012) align source and user translation using the MT hypothesis as a pivot followed by full re-training of the phrase table, which is the interpolated using the Moses backoff mode. They compare the results of their alignment procedure to incremental GIZA++ alignments.Bertoldi et al. (2013) was published as a companion paper toWäschle et al. ( ...
Article
Translation and cross-lingual access to information are key technologies in a global economy. Even though the quality of machine translation (MT) output is still far from the level of human translations, many real-world applications have emerged, for which MT can be employed. Machine translation supports human translators in computer-assisted translation (CAT), providing the opportunity to improve translation systems based on human interaction and feedback. Besides, many tasks that involve natural language processing operate in a cross-lingual setting, where there is no need for perfectly fluent translations and the transfer of meaning can be modeled by employing MT technology. This thesis describes cumulative work in the field of cross-lingual natural language processing in a user-oriented setting. A common denominator of the presented approaches is their anchoring in an alignment between texts in two different languages to quantify the similarity of their content.
... Furthermore, the running time of this approach is not discussed, and it is not clear how effective this approach is in online scenarios. Blain et al. (2012) have recently studied the problem of incremental learning from post-editing data, with minimum computational complexity and acceptable quality. They use the MT out-put (hypothesis) as a pivot to find the word alignments between the source sentence and its corresponding reference. ...
Conference Paper
Post-editing is the most popular approach to improve accuracy and speed of human translators by applying the machine translation (MT) technology. During the translation process, human translators generate the translation by correcting MT outputs in the post-editing scenario. To avoid repeating the same MT errors, in this paper, we propose an efficient framework to update MT in real-time by learning from user feedback. This framework includes: (1) an anchor-based word alignment model, being specially designed to get correct alignments for unknown words and new translations of known words, for extracting the latest translation knowledge from user feedback; (2) an online translation model, being based on random forests (RFs), updating translation knowledge in real-time for later predictions and having a strong adaptability with temporal noise as well as context changes. The extensive experiments demonstrate that our proposed framework significantly improves translation quality as the number of feedback sentences increasing, and the translation quality is comparable to that of the off-line baseline system with all training data.
Article
We present online learning techniques for statistical machine translation (SMT). The availability of large training data sets that grow constantly over time is becoming more and more frequent in the field of SMTfor example, in the context of translation agencies or the daily translation of government proceedings. When new knowledge is to be incorporated in the SMT models, the use of batch learning techniques require very time-consuming estimation processes over the whole training set that may take days or weeks to be executed. By means of the application of online learning, new training samples can be processed individually in real time. For this purpose, we define a state-of-the-art SMT model composed of a set of submodels, as well as a set of incremental update rules for each of these submodels. To test our techniques, we have studied two well-known SMT applications that can be used in translation agencies: post-editing and interactive machine translation. In both scenarios, the SMT system collaborates with the user to generate high-quality translations. These user-validated translations can be used to extend the SMT models by means of online learning. Empirical results in the two scenarios under consideration show the great impact of frequent updates in the system performance. The time cost of such updates was also measured, comparing the efficiency of a batch learning SMT system with that of an online learning system, showing that online learning is able to work in real time whereas the time cost of batch retraining soon becomes infeasible. Empirical results also showed that the performance of online learning is comparable to that of batch learning. Moreover, the proposed techniques were able to learn from previously estimated models or from scratch. We also propose two new measures to predict the effectiveness of online learning in SMT tasks. The translation system with online learning capabilities presented here is implemented in the open-source Thot toolkit for SMT.
Article
Full-text available
Although machine translation research achieved big progress for several years, the output of an automated system cannot be published without prior revision by human annotators. Based on this fact, we wanted to exploit the user feedbacks from the review process in order to incrementally adapt our statistical system over time.As part of this thesis, we are therefore interested in the post-editing, one of the most active fields of research, and what is more widely used in the translation and localization industry.However, the integration of user feedbacks is not an obvious task. On the one hand, we must be able to identify the information that will be useful for the system, among all changes made by the user. To address this problem, we introduced a new concept (the “Post-Editing Actions”), and proposed an analysis methodology for automatic identification of this information from post-edited data. On the other hand, for the continuous integration of user feedbacks, we havedeveloped an algorithm for incremental adaptation of a statistical machine translation system, which gets higher performance than the standard procedure. This is even more interesting as both development and optimization of this type of translation system has a very computational cost, sometimes requiring several days of computing.Conducted jointly with SYSTRAN and LIUM, the research work of this thesis is part of the French Government Research Agency project COSMAT 2. This project aimed to provide a collaborative machine translation service for scientific content to the scientific community. The collaborative aspect of this service with the possibility for users to review the translations givesan application framework for our research.
Article
Advanced computer-assisted translation (CAT) tools include automatic quality estimation (QE) mechanisms to support post-editors in identifying and selecting useful suggestions. Based on supervised learning techniques, QE relies on high-quality data annotations obtained from expensive manual procedures. However, as the notion of MT quality is inherently subjective, such procedures might result in unreliable or uninformative annotations. To overcome these issues, we propose an automatic method to obtain binary annotated data that explicitly discriminate between useful (suitable for post-editing) and useless suggestions. Our approach is fully data-driven and bypasses the need for explicit human labelling. Experiments with different language pairs and domains demonstrate that it yields better models than those based on the adaptation into binary datasets of the available QE corpora. Furthermore, our analysis suggests that the learned thresholds separating useful from useless translations are significantly lower than as suggested in the existing guidelines for human annotators. Finally, a verification experiment with several translators operating with a CAT tool confirms our empirical findings.
Article
Abstract Globalization has dramatically increased the need of translating information from one language to another. Frequently, such translation needs should be satisfied under very tight time constraints. Machine translation (MT) techniques can constitute a solution to this overly complex problem. However, the documents to be translated in real scenarios are often limited to a specific domain, such as a particular type of medical or legal text. This situation seriously hinders the applicability of MT, since it is usually expensive to build a reliable translation system, no matter what technology is used, due to the linguistic resources that are required to build them, such as dictionaries, translation memories or parallel texts. In order to solve this problem, we propose the application of automatic post-editing in an online learning framework. Our proposed technique allows the human expert to translate in a specific domain by using a base translation system designed to work in a general domain whose output is corrected (or adapted to the specific domain) by means of an automatic post-editing module. This automatic post-editing module learns to make its corrections from user feedback in real time by means of online learning techniques. We have validated our system using different translation technologies to implement the base translation system, as well as several texts involving different domains and languages. In most cases, our results show significant improvements in terms of BLEU (up to 16 points) with respect to the baseline systems. The proposed technique works effectively when the n-grams of the document to be translated presents a certain rate of repetition, situation which is common according to the document-internal repetition property.
Conference Paper
Full-text available
Post-editing performed by translators is an increasingly common use of machine translated texts. While high quality MT may increase productivity, post-editing poor translations can be a frustrating task which requires more effort than translating from scratch. For this reason, estimating whether machine translations are of sufficient quality to be used for post-editing and finding means to reduce post-editing effort are an important field of study. Post-editing effort consists of different aspects, of which temporal effort, or the time spent on post-editing, is the most visible and involves not only the technical effort needed to perform the editing, but also the cognitive effort required to detect and plan necessary corrections. Cognitive effort is difficult to examine directly, but ways to reduce the cognitive effort in particular may prove valuable in reducing the frustration associated with post-editing work. In this paper, we describe an experiment aimed at studying the relationship between technical post-editing effort and cognitive post-editing effort by comparing cases where the edit distance and a manual score reflecting perceived effort differ. We present results of an error analysis performed on such sentences and discuss the clues they may provide about edits requiring great cognitive effort compared to the technical effort, on one hand, or little cognitive effort, on the other.
Conference Paper
Full-text available
This paper describes the development of a statistical machine translation system between French and English for scientific papers. This system will be closely integrated into the French HAL open archive, a collection of more than 100.000 scientific papers. We describe the creation of in-domain parallel and monolingual corpora, the development of a domain specific translation system with the created resources, and its adaptation using monolingual resources only. These techniques allowed us to improve a generic system by more than 10 BLEU points.
Conference Paper
Full-text available
In the context of massive adoption of Machine Translation (MT) by human localization ser-vices in Post-Editing (PE) workflows, we an-alyze the activity of post-editing high qual-ity translations through a novel PE analysis methodology. We define and introduce a new unit for evaluating post-editing effort based on Post-Editing Action (PEA) -for which we provide human evaluation guidelines and pro-pose a process to automatically evaluate these PEAs. We applied this methodology on data sets from two technologically different MT systems. In that context, we could show that more than 35% of the remaining effort can be saved by introducing of global PEA and edit propagation.
Article
Full-text available
Abstract Automatic,Machine,Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments,of,translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, mea- sure varying aspects of MT performance that can be captured by automatic,MT metrics. We explore these differences through the use of a new tunable MT met- ric: TER-Plus, which extends the Transla- tion Edit Rate evaluation metric with tun- able parameters and the incorporation of morphology, synonymy and paraphrases. TER-Plus was shown,to be one of the top metrics in NIST’s Metrics MATR 2008 Challenge, having the highest aver- age rank in terms of Pearson and Spear- man,correlation. Optimizing TER-Plus to different types of human,judgments yields significantly improved,correlations and meaningful changes in the weight of different types of edits, demonstrating sig-
Article
Full-text available
We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judg-ments. Translation Edit Rate (TER) mea-sures the amount of editing that a hu-man would have to perform to change a system output so it exactly matches a reference translation. We show that the single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU. We also define a human-targeted TER (or HTER) and show that it yields higher correlations with human judgments than BLEU—even when BLEU is given human-targeted references. Our results in-dicate that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate with human judg-ments as well as—or better than—a sec-ond human judgment does.
Conference Paper
Full-text available
We describe an open-source toolkit for sta- tistical machine translation whose novel contributions are (a) support for linguisti- cally motivated factors, (b) confusion net- work decoding, and (c) efficient data for- mats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.
Conference Paper
Full-text available
Typical statistical machine translation sys- tems are trained with static parallel corpora. Here we account for scenarios with a continu- ous incoming stream of parallel training data. Such scenarios include daily governmental proceedings, sustained output from transla- tion agencies, or crowd-sourced translations. We show incorporating recent sentence pairs from the stream improves performance com- pared with a static baseline. Since frequent batch retraining is computationally demand- ing we introduce a fast incremental alternative using an online version of the EM algorithm. To bound our memory requirements we use a novel data-structure and associated training regime. When compared to frequent batch re- training, our online time and space-bounded model achieves the same performance with significantly less computational overhead.
Article
A method is presented for incremental re-training of an SMT system, in which a local phrase table is created and incrementally up-dated as a file is translated and post-edited. It is shown that translation data from within the same file has higher value than other domain-specific data. In two technical do-mains, within-file data increases BLEU score by several full points. Furthermore, a strong recency effect is documented; nearby data within the file has greater value than more distant data. It is also shown that the value of translation data is strongly correlated with a metric defined over new occurrences of n-grams. Finally, it is argued that the incremen-tal re-training prototype could serve as the ba-sis for a practical system which could be inter-actively updated in real time in a post-editing setting. Based on the results here, such an in-teractive system has the potential to dramati-cally improve translation quality.
Conference Paper
Minimum Error Rate Training (MERT) is an effective means to estimate the feature func- tion weights of a linear model such that an automated evaluation criterion for measuring system performance can directly be optimized in training. To accomplish this, the training procedure determines for each feature func- tion its exact error surface on a given set of candidate translations. The feature function weights are then adjusted by traversing the error surface combined over all sentences and picking those values for which the resulting error count reaches a minimum. Typically, candidates in MERT are represented as N - best lists which contain the N most probable translation hypotheses produced by a decoder. In this paper, we present a novel algorithm that allows for efficiently constructing and repre- senting the exact error surface of all trans- lations that are encoded in a phrase lattice. Compared to N -best MERT, the number of candidate translations thus taken into account increases by several orders of magnitudes. The proposed method is used to train the feature function weights of a phrase-based statistical machine translation system. Experi- ments conducted on the NIST 2008 translation tasks show significant runtime improvements and moderate BLEU score gains over N -best MERT.
Article
Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training criteria which directly optimize translation quality.