Content uploaded by Joss Moorkens
Author content
All content in this area was uploaded by Joss Moorkens on Jan 13, 2017
Content may be subject to copyright.
Assessing User Interface Needs of Post-Editors of Machine Translation
Joss Moorkens & Sharon O’Brien, ADAPT/Dublin City University
Introduction
Translation Memory (TM) and Machine Translation (MT) were until quite recently considered to
be distinct and diverging technologies. The current trend in translation technology however, is to
attempt to create a synergy of the two. At present, the TM tools used for the last decades to recycle
human translation are being adopted also for the task of post-editing MT output. We consider that,
while these existing translation editor interfaces are by now familiar and functional for translators
working with TM, any support for post-editing or integration with MT has tended to be appended
as an afterthought. Post-editing of MT output is different from revision of an existing translation
suggested by a TM, or, indeed, from translation without any suggestion whatsoever, primarily
because the type of revisions differ. MT output tends to include mistakes that would not generally
be made by professional human translators. When this is coupled with the fact that few professional
translators have received training either in machine translation technology or in post-editing
practices to date, the result is often apprehension among translators with regard to the post-editing
task, along with a high level of frustration. Some of the most common complaints from translators
about the task of post-editing stem from the fact that it is an edit-intensive, mechanical task that
requires correction of basic linguistic errors over and over again (Guerberof 2013; Moorkens and
O’Brien 2014). Understandably, translators see this task as boring and demeaning and especially
despise it when the ‘machine’ does not ‘learn’ from its mistakes or from translators’ edits. Kelly
(2014: online) even goes so far as to call this task “linguistic janitorial work”.
This Chapter describes our first steps in an ongoing effort towards creating specifications for user
interfaces (UIs) that better support the post-editing task, with a view to making the task less boring,
repetitive and edit-intensive for the human translator. While our focus is on features for post-editing,
our results demonstrate that, according to translator-users, even the human translation task is not
well-supported by existing tools. We therefore also discuss basic features of TM tools, as well as
those features that are used to facilitate MT post-editing using TM interfaces.
The project began with a pre-design survey of professional translators, focusing on features of the
translation UI that they commonly use. The survey was followed by a series of interviews with
professional translators, all of whom have experience with post-editing, to examine their survey
responses in more detail, to discuss the specific challenges of post-editing, and so that potential
users could “instil their knowledge and concern into the design process from the very beginning”
(Gould and Lewis 1985). Interim results from the survey were published in Moorkens and O’Brien
(2013), which we extend here by reporting on all survey responses and by including interview data.
We assume from the outset that a standalone editor for post-editing is not required, but features and
functionality as specified could instead be built into existing translation editing environments in
order to better integrate with MT systems, and to support the post-editing task.
In this Chapter, we first define post-editing and present previous relevant research on interfaces used
in post-editing. We explain how software designers are currently responding to the emergent
integration of TM and MT. We go on to describe our survey and interview design, and present a
summary of the results of the survey, followed by the interview findings. For reasons of space, the
final UI specifications will be published elsewhere. Our conclusions are presented in the final
section.
Interfaces for Editing Human Translation and Post-Editing Machine
Translation
Somers (2001 p.138) describes post-editing of MT as “tidying up the raw output, correcting mistakes,
revisiting entire, or, in the worst case, retranslating entire sections” of a machine translated text.
This task can be further subdivided into ‘light’ (or ‘fast’) post-editing, to create output that is
understandable and readable, but not necessarily stylistically perfect, and ‘full’ (or ‘conventional’)
post-editing, with the aim of producing quality equivalent to “that of a text that has been translated
by a human linguist” (de Almeida 2013 p.13). The type of post-editing chosen depends on the
purpose of the translated text, and the financial resources available.
In professional translation workflows, once the source text has been machine translated, the output
is presented to the translator-user using a suitable interface. It is increasingly common for this
interface to be one and the same as that provided by the translation memory tool. As described in
the introduction to this volume, the TM UI is where specialized translators translate from scratch,
or edit legacy translations when the source text segment is the same as or similar to one that has
been translated previously. Target text segments from the TM are assigned a match percentage based
on the difference between the source text to be translated and the source text in the TM (a match of
less than 100% is known as a ‘fuzzy match’). This gives the translator an estimate of the amount of
similarity with a source segment previously stored in the TM. The new translation or edited fuzzy
match is then saved to the TM, dynamically improving future match suggestions. It is increasingly
common for a TM UI to have a facility to use MT when no TM target text match is available, or
when the TM match’s fuzzy match percentage is low. There is no universally accepted threshold
above which MT output is considered to require less editing effort than a TM fuzzy match but
research has suggested that raw MT editing effort may be equivalent to that required for 80%-90%
fuzzy matches (O’Brien 2006; Guerberof 2008), although this would depend on the text type,
language pair, and the raw quality from the MT engine.
Given the growing importance of TM interfaces in the post-editing process it is worthwhile
considering the extent to which they meet their users’ basic translation editing needs, even before
post-editing requirements are factored in. Previous research suggests unfortunately that user-centred
design (UCD) has not been general practice in TM technology. In Lagoudaki’s survey of TM UIs
in 2006, it was found that industry research and development was mostly motivated by “technical
improvement of the TM system and not how the TM system can best meet the needs of its users”
(Lagoudaki 2008 p.17). She added that “systems’ usability and end-users’ demands seem to have
been of only subordinate interest” in TM system development (2008 p.17). In practice, TM users
are usually “invited to provide feedback on an almost finished product with limited possibilities for
changes” (Lagoudaki 2006 p.1). Lagoudaki concluded that TM tool users wanted simplicity in their
UI, not necessarily meaning fewer features, but focusing on stable performance, improved
interoperability, and high compatibility between file formats.
The non-UCD design process goes some way to explaining why Lagoudaki found that users were
widely dissatisfied with their translation editing interface, despite (at that time) 14 years of TM tool
development. This dissatisfaction was reiterated in McBride (2009 p.125), where one forum
contributor said that the user was not the focus of the design process as tool developers hope “above
all to sell to giant corporations, who will put pressure on translation agencies to buy, who will
likewise pressure translators to buy”. According to Ozcelik et al (2011 p.7), end user involvement
in software development is frequently mitigated, as “often the person who decides on purchase is
not really the end user.”
Other research has looked at the type of post edits that are typically made by human post-editors.
De Almeida (2013), for example, found that post edits (in English to French and English to Brazilian
Portuguese) typically include changes such as word reordering, addition or removal of capitalization,
and changes to the gender or number of a word. A post-editor may find him or herself repeating
edits throughout a project, e.g. changing the same word from masculine to feminine inflection every
time it occurs, with no associated improvement to the MT suggestions (ibid.). Koponen (2012)
reported similar edits in English to Spanish post-editing, noting that word order changes were
perceived by her study participants as being more cognitively challenging than correction of an
individual word.
Unfortunately, current TM UIs are usually incapable of providing the post-editor with an estimate
of editing effort required for each segment, or of assisting with these common edits, although several
research projects are now underway that aim to create new enhanced CAT tools, by adding
functionality to assist post-editing. An early example of such a project, TransType, integrated
Interactive Machine Translation (IMT), with varying levels of success, to suggest completions of a
segment that the translator had already started to translate (Langlais, LaPalme, and Loranger 2002).
The technique used was similar to that used in predictive texting. Later projects integrated functions
that have only recently become feasible, such as the use of translation quality estimation to
recommend TM or MT matches (as proposed by Dara et al. (2013)), and most are not in use by
professional translators or post-editors at the time of writing. Another tool under active research and
development, iOmega-T, is a version of the popular open source CAT tool Omega-T that retains
information on edits carried out by translators for post-translation analysis. This information can
give valuable detail to researchers and managers of post-editing activities, but the UI itself offers no
novel functionality for post-editors (Moran and Lewis 2011). The Matecat project, meanwhile, was
an industry/academia collaboration (since commercialised) that aimed to create a web-based CAT
tool to include estimation of MT quality, and incremental “tuning” of the MT output based on post-
edits (Cettolo, Bertoldi, and Federico 2013). At the time of writing it is in full production use by the
project’s industry partner, although MT quality estimation has not yet been incorporated (de Souza
et al. 2014). The associated Casmacat project focused further on novel functionality deployed in a
web-based platform, adding interactive machine translation prediction, intelligent auto-completion,
word alignment visualisations, and confidence measures.
i
Casmacat also added integration with eye-
trackers and e-pens, and has subsequently been made available as open source software for end users
(Koehn et al. 2015). Two other tools, PET (Aziz, de Sousa, and Specia 2012) and Caitra (Koehn
2009), were developed for post-editing research purposes, although they are not actually used in
production. Notably, prior to the current research there has not been a focus on what functionality
users would like to see in a tool for post-editing as the MT research community has had a tendency
to “focus on system development and evaluation” rather than considering end users (Doherty and
O’Brien 2013 p.4). Our work builds on previous research by gathering several possibilities for a
post-editing UI and inviting user input into whether and how this may be implemented.
Research design
The main objective in this research is to create user-focused specifications for editing interfaces to
better support the post-editing task. Our two research questions are:
1. Can we get pre-design information from users in order to redress the balance of
user/engineering input that is common in translation tool development?
2. What are the ‘pain points’ in post-editing and how can these be addressed in a translation
tool?
The method employed in answering these questions was a pre-design user survey (Tidwell 2006),
followed by detailed interviews with several of the survey participants. The findings from this initial
research may form a starting point for tool development, which should involve evaluation and
validation (or otherwise) of the specifications as gathered from direct observation of users.
Survey
The first phase of this research was a pre-design survey focusing on five broad areas so as to better
understand what support features post-editors might require. These five areas were: (1) biographical
details, (2) current working methods, (3) concepts of the ideal UI, (4) presentation of TM matches
and MT output, and (5) intelligent functions to combine TM and MT matches. The survey contained
ideas for specific features that we considered might serve post-editors, based on common edits
reported in research, and post-editing functions currently in development within the research
community (see above). Respondents to the survey were also able to give more detailed comments
or suggestions immediately following specific questions.
The survey was carried out via the Internet – after ethics approval had been granted by Dublin City
University Research Ethics Committee – using the LimeService platform (www.limeservice.com),
and required completion of an Informed Consent section prior to beginning the main body of the
survey. In section 1 of the survey, participants were asked about their length of experience as a
translator and as a post-editor, in years and months. They were asked about their professional status
(freelance or employed by a company), and their views on TM and MT technology respectively (I
like using it; I dislike using it; I use it because I have to; I use it because it helps in my work; MT is
now an advanced technology; MT is still problematic). In section 2, participants were asked for their
source and target languages, what editing environments they currently use, what they like most
about their current tools, and to “describe one aspect of your current editing environment that
frustrates you.” These were all to be answered in free text. They were asked whether they customize
their translation UI and, if so, what elements they customize. Then, finally, they were asked about
preferred screen layouts for source/target texts and for glossaries.
Section 3 focused on features respondents would like to see in the post-editing environment and that
are “not currently available in (their) regular translation editing environments”, again leaving a free
text box for response. Following questions about keyboard shortcuts and a simple or rich-featured
UI preference, were a series of questions about specific types of post-edits that respondents might
like a keyboard shortcut to automate, with answers to be chosen on a four-point Likert scale (see
Figure 1), and a query about whether a macro creation tool would be useful.
Fig. 1. Survey question with radio button responses.
Questions in section 4 addressed the combination of TM features with support features for MT and
post-editing. Finally, participants were asked whether they would leave an email address for further
participation in an interview, noting that in doing so, they would waive their anonymity. The survey
went live on May 7th 2013, and a link was sent to six localization companies, who disseminated it
internally (see Acknowledgments). The survey was closed at the end of business hours on June 6th
2013.
Interviews
The follow-on interviews were largely based on the survey results. They were carried out in order
to ascertain what post editors consider most troublesome about post-editing, and to ask them in more
detail about some of the features suggested in the survey. These interviews also presented an
opportunity to see if participants had requirements for a post-editing UI that had not been identified
in the survey responses. The final question was an open one: “Do you have any other suggestions
on how to support post-editing through the UI?” Interviews took place between July 2nd and August
2nd 2013 via the Skype voice-over-IP tool, and were recorded using Callnote.
Survey Results
Demographics
There was a total of 403 survey participants, of whom 231 answered all sections. The number of
participants who completed a section will be presented in reporting that section. Where
percentages are given, these represent the percentage the participants who completed that
particular section. (In such cases, absolute values are given in parentheses.) As the survey was
advertised by six localization service providers to their translator base, responses were somewhat
biased by IT localization practices and experiences. At the same time, this sector has in recent
years embraced the deployment of MT and we therefore expected to access survey respondents
who had acquired post-editing experience and who would be in a good position to respond to the
questions. 280 participants completed the biographical section. Figure 2 shows the age ranges of
these participants.
Fig. 2. Participants’ age range.
Most participants reported that they had 6-12 years’ translation experience. 26 participants
claimed more than 20 years’ experience. As post-editing has only recently become commonplace,
reported post-editing experience was mostly 1-3 years, with 69 participants reporting no
experience of post-editing. All but three of the 42 respondents aged 20-30 had some experience of
post-editing (at most 2 years). Roughly 80% of respondents aged between 31 and 50 (125
respondents) had experience of post-editing (usually 2 to 6 years), and just over half (17) aged
over 50 had post-editing experience.
In response to a question about professional status, 29% of participants (81) reported that they
work as freelancers without an agency, 31% (85) work closely with one agency on a freelance
basis (9 participants work on a freelance basis with several agencies), and 23% (63) are translation
or localization company employees. 21 respondents run their own companies. This cohort
represents a good spread of work profiles typical to the translation industry. A statistically
significant association was found between translators’ age and professional status. Respondents
under 30 are more likely to be company employees (67%, or 23), whereas those over 30 are more
0
20
40
60
80
100
120
<20 20-30 31-40 41-50 50+
likely to work on a freelance basis (71% or 148). The proportion employed directly by a company
drops to 26% (23) for those aged 30-40, falling to 6% (2) for those over 50.
56% of participants (153) reported that they like using TM technology, as compared with 18%
(49) who said that they like using MT. 75% of participants (206) report using TM because it helps
with their work, whereas 30% (83) report using MT because it helps. 56% (149) hold the view that
MT is “still a problematic technology”. Fewer respondents aged over 40 agreed that “MT was still
problematic”, which suggests that they do not feel threatened by MT, but taken in conjunction
with the older group’s lesser post-editing experience, it could also mean that they have less
familiarity with MT and its associated errors. These differences aside, responses were consistent
between age ranges.
223 of 280 participants (80%) translate from English, a reflection of the nature of the respondents
and the companies who promoted the survey, although many translate from more than one source
language. (In the IT localization sector, English is the main source language (Kelly et. al 2012).)
Target languages are reasonably well spread amongst participants, and are listed in Table 1. This
spread of target languages was important for the survey results as the post-editing task can vary
depending on the target language in question and its typical linguistic features.
Table 1. Participants’ target languages.
Target Language
No.
Arabic
5
Chinese
24
Czech
11
Danish
3
Dutch
7
English
49
Finnish
4
French
34
German
26
Greek
5
Hindi
3
Hungarian
5
Italian
18
Japanese
17
Korean
4
Malay
1
Norwegian
3
Polish
1
Portuguese
24
Russian
7
Spanish
27
Swedish
6
Thai
3
Turkish
4
Urdu
3
Current Editing Environment
246 participants provided details of their current editing environments (63%, or 155, use more than
one environment regularly). 74% (182) of these use a version of the SDL Trados TM tool. Company
employees are more likely to use SDL Trados; 109 of 152 freelance translators (72%) and 65 of 76
(84%) company employees said that they use a version of SDL Trados.
ii
62% of company
employees (39) in our current survey said that they use multiple tools, but the rate was even higher
among freelancers (68%, or 116).
iii
Contrary to our expectations, 38% of participants (94) use
Microsoft Word for post-editing, which suggests that MT and TM are not currently as integrated as
we had thought, despite the increasing industry focus on MT integration as reported above. Figure
3 shows the number of users per editing UI among survey participants. Some other tools used by
fewer than 15 participants were XTM (13), Alchemy Catalyst (12), OmegaT (8), Star Transit (5),
TransStudio (5), and Alchemy Publisher (1). 28 participants also listed proprietary tools (Translation
Workspace, Helium, and MS LocStudio).
Fig. 3. Tools used for translation and post-editing.
Roughly half of the participants in this survey section reported unhappiness with the default layout,
colouring, and display of mark-up tags in their current editing UI. 15 complained specifically about
their current UI, citing poor layout or visibility, outdated UIs, and unhappiness with too many
product updates. “The UI is not user friendly,” wrote one, “each UI uses their own different shortcuts,
there is an inability to see segments comfortably.” Seven participants mentioned compatibility
issues and problems with tags. 66% (167) of participants would rather customize their editor than
use the default set-up. 79% of those 167 respondents adjust their onscreen layout, 74% adjust tag
visibility, 68% adjust font type, and 23% adjust colours.
Performance issues figured strongly amongst survey comments, with 19 participants complaining
about bugs, errors, and slow response times within their current UI. One participant wrote: “I work
mainly in [software name], which is useful [but] an incredibly fragile piece of software that has
caused me to lose time due to crashing or failing to save output files correctly.” 21 other participants
stated that they have experienced formatting problems.
050 100 150 200
SDL Trados (182)
MS Word (94)
Wordfast (50)
Idiom Worldserver (46)
MemoQ (41)
SDLX (26)
Passolo (15)
No. of respondents per tool
25 participants expressed unhappiness with the quality of MT output and MT support within their
current tool. One wrote that “sometimes the quality of MT makes me stay longer at a (translation
unit) than I would having no MT to deal with”. This was a recurring bugbear mentioned in open
responses throughout the survey, as some participants appear to be dissatisfied with and suspicious
of MT. Other problems mentioned were the high learning curve with CAT tools. On the other hand,
30 participants said that they are happy with their current UI, although not necessarily in the most
positive terms: “I'm so used to it, that I can't find anything frustrating”.
When asked what they liked most about their current tools, many (33) mentioned performance, ease
of use, and stability. 17 mentioned specific features such as auto-propagation, integrated QA (quality
assurance) checking, and concordance searches. Six participants wrote that they liked their current
UI, with one writing “the editing changes are clearly marked and text before and after are displayed
side by side”.
UI wish list
The importance of customizability was emphasized in many of the 245 responses to this section of
the survey. 63% of participants (152) expressed a preference for a customizable UI, and 57% (138)
a clean and uncluttered UI. In response to a question about features currently unavailable in regular
translation editing UIs, but that participants would like to see in a UI that supports post-editing, 14
users said they would like to see improved glossaries and dictionaries, with six wanting to be able
to make glossary changes that would be propagated throughout a document. Three suggestions
related to MT and involved improved display of provenance data (e.g. the origin of the suggested
translation), improved pre-processing, and dynamic changes to the MT system in the case of a
recurrent MT error which needs to be fixed many times during post-editing. Other UI wishes
included a global find-and-replace function, reliable concordance features, and grammar checking.
Notably, these latter UI requests are for features to support the general translation task, adjudged to
be lacking in users’ current tools (see Figure 3) despite two decades of TM tool development.
Participants appear to be keen users of keyboard shortcuts; 29% (70) of 241 participants use
keyboard shortcuts often, and 40% (96) use them “very often”, while only 5% never use them at all.
80% (193) responded that their productivity was improved by using keyboard shortcuts. For specific
operations required by MT post-editing and some operations not covered by current default shortcuts
options, participants were asked whether a keyboard shortcut may be useful. Responses to specific
keyboard shortcut suggestions are shown in Table 2. Proposed shortcuts are listed in the left-hand
column and the number of respondents who considered these shortcuts useful is in the right-hand
column.
Table 2. Keyboard shortcuts requested.
Shortcut
No.
Dictionary search
203
One-click rejection of MT suggestion
185
Web-based parallel text lookup
158
Change capitalization
149
Apply source punctuation to target
128
Add/delete spaces
124
Dictionary search and the suggestion of a shortcut for web-based parallel text lookup, while popular
in survey responses, are not post-editing-specific. Post-editing specific responses included a 77%
preference (185) for a keyboard shortcut that would allow a one-click rejection of an MT suggestion.
This assumes that the MT suggestion is automatically pasted to the edit window, of course, which
could be configurable in an editing interface. In XTM, for example, the MT suggestion may either
be automatically pasted or added electively using a keyboard shortcut, depending on user settings.
Incorrect letter casing is also often problematic in MT output, reflected by the 62% (149) who would
like to see a keyboard shortcut for changing capitalization.
Fewer participants consider our suggested language-specific keyboard shortcut suggestions useful,
possibly due to the large spread of target languages among participants. The most popular suggested
shortcut would change the word order in a machine translated segment (considered useful by 42%
or 102 participants), followed by a change in the grammatical number of a word (e.g. from singular
to plural). Further responses to suggested language-specific shortcuts may be seen in Table 3.
Notably, the suggestions for prepositions and for postpositions do not apply to all languages, which
may have led to lower numbers considering these options useful. Additionally, participants may be
unable to measure usefulness without first testing these features in practice.
Table 3. Language-specific keyboard shortcuts requested.
Shortcut
No.
Adjust word order
102
Change number (singular/plural)
99
Change gender
79
Change verb form
68
Add/delete preposition
67
Add/delete postposition
65
Participants expressed their opinions relating to these shortcuts in the open comments sections of
the survey. Of 125 commenters, 34 were in favour of the shortcuts: “It seems obvious to me that all
such keyboard shortcuts would be useful. I reckon I use 15-25 keyboard shortcuts in each of the
main CAT and other productivity applications I use on a daily basis.” Nine comments had further
suggestions for shortcuts, such as Internet-based text lookup, parallel text searching, and ‘copy
source formatting to target’. 41 participants provided negative comments, with many unable to
understand how the shortcuts might work in practice. 18 participants had misgivings specific to one
of their languages. Several thought that manual changes would be easier or less time-consuming
than memorizing a large number of shortcuts, an opinion that recurs in the interviews.
Participants appeared to favour customizable shortcuts. 68% (164) would like to be able to adapt
the UI functionality using macros or scripts. 52% (125) would like to be helped or guided in creating
such a macro. Three comments expressed a desire for instructions be clear and simple, and
interoperability considerations to be taken into account, so that macros from other programs (MS
Word was suggested) might work in the UI. Of 20 comments, all but one were positive about user-
added macros. Two commenters use Auto Hotkey to set shortcuts globally on their systems, but
would like to be able to add program-specific shortcuts.
Presentation of TM matches and MT output
233 participants completed this section of the survey, of which 81% (189) would like to be presented
with confidence scores for each target text segment from the MT engine. 70% of those 189 would
like confidence scores to be expressed as a percentage (like a fuzzy match score in a TM tool), and
25% expressed a preference for color-coding. If a machine translation suggestion received a higher
confidence score than any fuzzy match available from the translation memory, 88% of participants
(205) said that they would nevertheless like to see both MT suggestions and TM matches. Only
three participants (just above 1%) would like to see the MT match only. Both of these findings
suggest a lack of translator confidence in MT output. This scepticism is also expressed by the 14
participants who would like to see the TM match only, even when a higher-rated MT suggestion is
available, and in the choice by many participants of the lowest possible fuzzy match value below
which they would prefer to see MT output rather than a TM fuzzy match. (The thresholds chosen
by participants may be seen in Figure 4.) But despite other evidence suggesting a low level of post-
editor confidence in MT, 80% of respondents (186) would like to see ‘the best MT suggestion’
automatically appear in the UI target window when no TM match is available.
Fig. 4. Fuzzy match thresholds below which an MT match is favoured.
62% (144) felt it would be useful if the editing environment could combine the best sub-segments
(or phrases) from the MT system and the TM to produce an MTM (Machine Translation/Translation
Memory) combined suggestion. 47 participants commented about this proposed function showing
mixed opinions. 21 commenters responded positively about a potential MTM match, while nine
commenters were not in favour of the feature, with one writing that it “seems theoretically useful,
but when really applied it (could) create confusion”.
Five commenters had suggestions such as allowing the feature to be disabled, while 87% of
respondents (203) said they would like to see the provenance of MT or TM suggestions denoted by
010 20 30 40 50 60 70
< 65%
< 70%
< 75%
< 80%
< 85%
< 95%
Don't know
No. of respondents per fuzzy match value
colour at a sub-segment level. The importance of provenance and retention of meta-data showing
the origin of match suggestions appeared clear across the whole survey.
Features to Support Post-Editing
231 participants completed this section of the survey, wherein participants were asked for their
opinion on some functions that have been suggested for supporting post-editing. As these functions
have not yet been implemented in a commercial tool, they remain largely untested. We considered
it worthwhile to gather participants’ opinions on these functions, but they would, of course, require
evaluation and validation prior to adding to a UI for release. We have already outlined above how
some participants expressed a desire to see dynamic, real-time improvements to MT systems. Some
work on this topic has been published by Alabau et al. (2012), who suggest that MT systems could
use human post-edits as “additional information to achieve improved suggestions”. In our survey
71% of respondents (164) said that their edits should be used to improve a client-specific MT system.
23% (53) were unsure, with concerns expressed in 42 comments. Four commenters were concerned
about issues relating to client confidentiality, while others resented further reuse of their translation
work. This intellectual property (IP) concern was expressed by one participant who wrote: “Who
would pay a translator for his intellectual work in improving the TM/MTM?”
69% of participants (159) would like to see edits not just used to improve suggestions, but to also
retrain an MT system in real time. Most commenters felt positively about potential improvements,
one writing that “That should be one of (the) main goals of MT, not only lower rates.” Again, several
participants would like to use this function electively. Three commenters believe that the client
should decide whether the content should be added to the MT engine, and five were not in favour
of this function. One participant feels that, were it possible to incorporate this function, it may lead
to further complications depending on the work flow and steps required for review or approval. “If
immediately incorporated, I'd like to know where each segment is coming from (i.e. what is from
(the) old MT engine, what is a recent addition/my own work, what's been reviewed as accurate, and
what's still pending.” Another commenter wondered how a system could learn only the ‘right’
changes (i.e. changes to recurring incorrect phrases or terms): “a too-generalized auto-adaptation
feature may create errors”.
Participants were asked how useful they would find two variants of IMT. Using the first variant, the
editing environment could dynamically alter pre-populated MT suggestions depending on edits as
the user moves through a segment, so that as the user edits the MT suggestion, the system would
offer “context dependent completions”, adjusting the remainder of the target segment based on the
user’s edit (Langlais, LaPalme, and Loranger, 2002 p.78). 48% (111) considered that this feature
would be useful, with 28% (65) unsure. Participants were more certain that they would like to be
able to turn this feature on and off, with 93% (215) requesting that the function could be used
electively. 46% (106) were in favour of a second variant of IMT, whereby the editing environment
could dynamically auto-complete segments translated from scratch based on MT proposals. (20%
(46) thought that this would not be at all useful.) A slightly higher proportion of 54% (125) would
like to see MT suggestions at a sub-segment level. 184 participants (80%) said that they would like
to see sub-segment MT suggestions provided as a drop-down list, with 35% (64) of those suggesting
two to three list items, and a further 34% (63) preferring the ability to customize the number of list
items themselves.
The final questions related to user feedback on productivity, something that is of great importance
in the translation industry and is largely driving MT deployment (DePalma et al. 2013) and also
revealed participants’ suspicion of data dispossession. 70% (162) would like to see their productivity
reported dynamically, such as in words per hour or percentage completed, as long as this reporting
function can be turned on and off, and that the information is for their personal use only. 48% would
like to see dynamic reporting of their earnings. Among 58 comments about this proposed feature,
36 participants consider this to be a great idea. One participant wrote “Even after 6 years in the
industry, I still find estimating time vs. fees to be quite difficult. Even when I am able to view the
source text beforehand to make my estimate, I often misjudge the quantity or technicality of the
work. I think having an automated tracker would be fairer for me and the client.” Ten commenters
specified that this sort of information should not be available to the client (“For client tracking – oh
hell no.”) and 13 would not be in favour of this function at all, saying that it is unnecessary, will
create more clutter, and put post-editors under too much pressure.
Interview results
43 survey participants provided email addresses, agreeing to waive their anonymity and to provide
more details about their preferences for a post-editing user interface. In order to better answer our
research questions (see Section 3), we contacted 16 of these survey participants, choosing only those
who had post-editing experience and attempting to cover a wide range of language pairs. Ten
participants agreed to participate in follow-on interviews, all but one of whom listed English as their
source language. Interviewee profiles are shown in Table 4.
Table 4. Interviewee profiles.
Post-Editor
Languages
Post-Editing
Experience
Tools
A
English > French
2 years
SDLX
B
English > Portuguese
6 years
SDLT, GTT
C
English > Spanish
2 years
MemoQ, SDLX, SDL
Trados, Worldserver,
others
D
English > French
5 years
SDLX, Trados (2007 and
Studio), TagEditor,
Idiom
E
English > Italian
2 years
Wordfast Pro or SDL
Trados Studio 2011
F
English > Spanish
9 months
MemoQ, Trados (2007
and Studio), TagEditor,
Idiom
G
English, German,
Spanish, Catalan >
Italian
Various projects over
years
OmegaT
H
Russian> English
5 months or so
SDL Trados
I
English > Finnish
1 year
Trados, Wordfast, Idiom
Worldserver
J
English > Arabic
2 years
SDL Trados, MemoQ,
WordFast, Catalyst (SDL
Trados 2009 most
efficient)
Interface design
In response to the question “What existing software comes closest to your ideal UI?”, four
interviewees chose SDL Trados (all but one specified the Studio version), four interviewees
mentioned MemoQ, one chose SDLX, and one chose Omega-T. Informant D chose the SDLX tool,
but only as she had been able to customize the tool so as to link with online dictionaries. Interviewees
were asked “What do you think is most important: simplicity in the UI or a feature-rich UI?” Three
interviewees chose ‘feature-rich’ and three ‘simplicity’, but for many this was not an appropriate
distinction. Rather than having many or few items onscreen, they considered it important to have
‘the right’ features. Informant B complained of too much information displayed to the user in the
SDL Trados interface, saying “you have to use a big screen to be able to leverage from all of that
information”.
Several informants felt that the solution to onscreen clutter is in having a highly customizable UI.
H said “what I'd like would be more opportunity to build buttons into the interface myself.” He
continued, “There are some functions that I want to use repeatedly, and I have to go through various
dropdown menus (to access them)”. Another possible approach could be to create a UI that adapts
to the user or one that presupposes a learning curve whereby more functionality may be revealed as
users gain experience with the new interface. Tidwell (2006 p.45) recommends designing a UI,
particularly for new users, that hides all but the “most commonly used, most important items” by
default. Informant C explained that user needs may change as they become familiar with the UI:
“I have a lot of difficulties learning shortcuts at the beginning, but then after 6 months
using the same tool, you find that those buttons, you don't need any more, so maybe
something that could be customized to the user.”
F added that, for him, performance ranks higher than the look of a UI, and additional functionality
is only useful if allied with performance. “In the end,” he said, “what decides is how fast we can
work with it.” In his office, although he considers that SDL Trados Studio has good features, “we
use MemoQ because it’s quicker.” Good performance makes this compromise worthwhile. “We
miss some features, but we make up for it in speed.” Referring to ease of use of editing interfaces,
F said that many of the edits he makes “are very easy to see but very cumbersome to make with the
common tools”. “When a segment is wrong, you have to start all over again from scratch, then the
user interface has no influence whatsoever, but if a segment is almost right, these little things that
could make it better, those are the things that could really speed up the process.”
Current post-editing problems
Interviewees were asked about common edits they make during post-editing and possible ways to
support these changes in the UI. Six informants mentioned word order changes. Four informants
reported problems with terminology, such as term updates since the MT engine was trained. Three
informants mentioned grammar problems generally, with D asserting that for her, the “main pitfall
of Statistical Machine Translation (SMT)” is “all the agreement: all the gender, number,
conjugation.” Other orthographical and grammatical pain points mentioned were letter case,
prepositions, number, and gender. F said that these kinds of changes “are the most frustrating for us
because it’s mechanics, and if it’s mechanic, there must be a way it could be done by a machine”.
He noted that gender changes should also include “other elements… like associated articles and the
words surrounding the noun.” Many of these suggested features are similar to those proposed in the
survey, which begs the question: did the survey prime participants for these interviews, or set them
thinking on the topic of a post-editing UI? It is unlikely that they remembered an online survey in
detail from one month previously (and once closed, the survey was not available to browse), but it
is quite possible that specific suggested features remained in mind at the time of the interviews.
Three interviewees would like to see automated language-specific changes to word order with B
commenting that “one of my dream features that I haven’t seen so far is changing word order.” She
said that in Portuguese “we use an adjective after the noun and it's quite the opposite in English, so
that is a really common error in machines that are not well-trained.” Two interviewees requested
that highlighted words could be swapped around within a segment. E said she would like to drag
and drop words into the correct order, while G thought such a feature would be superfluous: “It will
always be quicker doing that with a few keystrokes rather than clicking some button or special
shortcut.”
Two interviewees requested propagation of edits in subsequent segments, one would like to see a
feature like Autosuggest in SDL Trados Studio (a feature whereby a word or phrase from the TM
or glossary is suggested to the user based on context while the user types), and one suggested a list
of alternative translations for a word when hovered over by a mouse. When interviewees were asked
more generally about pain points in the post-editing task, five returned to the topic of MT quality.
D stated “I will only work with clients who customize their MT. I will not work with anybody who
just sends something to Google Translate and says ‘go ahead and post-edit this’.” A finds that,
because in her current UI MT is highlighted in violet, she tends to “jump automatically to the target
segment as if I were reviewing.” Even though she finds it “really is important to look at the source
first,” the highlighting makes it “difficult to focus on this ideal process of looking at the source
segment before looking at the target.”
Four participants were dissatisfied with the lack of confidence scores for MT. Interviewee I
complained that, when the MT is completely wrong, “it’s more work to post-edit machine translation
than just to translate from scratch.” Nine interviewees would like to see improved terminology
features. B requested “a shortcut to get different (language-specific) variations for the same term.”
She added that “having a glossary that automatically produces variation regarding gender and
number; I think that would be a killer glossary.” Four would like to make global term changes, with
D suggesting a scenario where, she could update the second part of a project “taking into account
what you’ve added during the first part - that would be amazing”.
Features to Support Post-Editing
When asked in interviews about combined sub-segment matches from MT and TM, all but one of
the interviewees responded positively. Some were concerned with how it would work in practice.
An advantage of the interview stage was the possibility of yielding these more detailed and
considered opinions from the participants. B felt it would be useful “where you don’t have a
dedicated glossary but you have a really rich TM.” J, however, believes that this is a “far-fetched
feature that will complicate things further.”
All interviewees would like to see edits used to improve an MT system in real time, although five
would like these improvements communicated to the user (mostly using color-coding), and five
would like it to happen without notification. F said “It’s very important to know where a segment
has been assembled from, because you have to take a different approach to post-edit it.” E suggests
feedback if the user hovers with the mouse over a word that has been updated, and C would like to
see prompting as unsupervised improvements could be “very dangerous”. B and J felt that
notifications might delay them. J said “I don’t want to go through too many details while I work... I
just want the suggestion in order to be used as quickly as possible.”
Interviewees were asked about a function whereby the MT system would provide suggestions that
would be dynamically altered as they type, based on their edits. Six interviewees were in favour of
such a function whereas four interviewees thought that this would be distracting or might delay their
translation – this could also be tested during development. D said “Although I type fast, I still look
quite a lot at my keyboard, so I wouldn’t see the edits as they happen while I’m typing.”
Discussion
The survey and interviews elicited user input to potential future UIs for post-editing, and identified
pain points in the post-editing process. They also revealed a prevailing level of frustration amongst
users concerning fundamental support for the translation task quite aside from post-editing. This
finding is in keeping with recent workplace studies such as the one carried out by Ehrensberger-
Dow (2014) in which 19 hours of screen recordings were collected in the translation workplace for
analysis. Ehrensberger-Dow and Massey (2014) found that certain features of the text editing
software used by the translators in their study were slowing down the process. For example, the
small window in which the translators had to type resulted in them frequently losing their position
when they had to shift between areas of the UI to obtain other information.
Our modus operandi could be criticized on the basis that participants were asked to comment on
features that are not yet available in commercial tools. Sleeswijk Visser et al warn that such user
responses may “offer a view on people’s current and past experiences” rather than exposing “latent
needs” (2005 p.122). While we accept that such responses have their drawbacks, we consider user
input very much worthwhile as part of a pre-design survey that may lead to a new step in interactive
translation support. We also stress the necessity of testing and evaluation of all recommended
features prior to implementation as sometimes features are implemented with the supposition that
they will be useful, but this is not rigorously tested with a significant number of relevant users.
In much of the survey and in particular in the interviews, we focused on post-editing-specific
problems and requirements. We found evidence of scepticism towards MT based on perceptions of
MT quality and a feeling of dispossession related to the translator’s work being used for MT
retraining. Survey respondents stated a preference for even low-threshold TM matches (<65%) to
MT output, despite previous findings that such TM matches require more effort to edit than MT
output would (O’Brien 2006; Guerberof 2008). Scepticism about MT might also be behind survey
respondents’ enthusiasm for one-click rejection of MT suggestions. Both survey respondents and
interviewees expressed frustration at having to make the same post-edits over and over again, and
would like to see ‘on-the-fly’ improvements of MT output based on their edits (increasingly
plausible due to recent breakthroughs in SMT retraining speeds (Du et al. 2015). Nonetheless, MT
and UI developers need to find an efficient method of making incremental improvements to MT
output in real time, in order to lessen the frustration of post-editors who are currently obliged to
make the same edits repeatedly.
A further issue that emerged in discussions of the integration of MT with TM related to post-editors’
strong desire to know the provenance of data that would be re-used in such scenarios. Retaining this
level of provenance data would require user tracking and careful retention of metadata from both
TMs and training data used in SMT however, the latter of which would be particularly technically
demanding.
Conclusion
In this Chapter we have reported on the results of an online survey with 231 full participants and a
series of interviews with 10 informants, focusing on the task of post-editing Machine Translation
(MT) and associated UI requirements. Our results provide an update to survey work by Lagoudaki
(2006 and 2008) on TM tools and what users consider important and desirable in a translation editor.
The focus of the study was on post-editing and MT, but an unexpected pain point,was continued
dissatisfaction with translation tools in general. Despite updates of popular tools in recent years and
new UI features appearing in tools such as Matecat, Casmacat and Lilt
iv
, our study identified a
perceived lack of development within existing translation interfaces, with survey and interview
participants complaining of long-standing issues with their current interfaces. This highlights a lack
of HCI input in translation tool development and design, and would suggest a real need for input
from HCI experts.
In addition, we must introduce a note of caution when basing user requirements on users’ opinions.
We hope that some latent needs may have been revealed during the interview stage, and consider
that unforeseen requirements may arise during user testing. We furthermore hope that this research
may contribute to translation editor interfaces optimised for post-editing that reflect the functional
and design needs of users.
Acknowledgments
This research is supported by the Science Foundation Ireland (Grant 12/CE/I2267) as part of the
CNGL (www.cngl.ie) at Dublin City University. The authors would like to thank the companies that
helped promote the post-editing survey: Alchemy/TDC, Lingo24, Pactera, Roundtable, VistaTEC,
and WeLocalize. We are grateful to Brian Kelly of TimeTrade Systems for advice on specifications
creation, and Prof. Dave Lewis and Joris Vreeke for feedback on the ensuing specifications. Finally,
we would like to thank Dr. Johann Roturier (Symantec), Morgan O’Brien (Intel), and Linda Mitchell
(DCU) for comments and advice on survey questions.
References
Alabau, V., Leiva, L. A., Ortiz-Martínez, D., Casacuberta, F. 2012. User Evaluation of Interactive
Machine Translation Systems. In Proceedings of the 16th Annual Conference of the European
Association for Machine Translation (EAMT). Association for Computational Linguistics,
Stroudsburg, PA, 20-23.
Aziz, W., Castilho de Sousa, S., Specia, L. 2012. PET: a tool for post-editing and assessing machine
translation. In Proceedings of the Eight International Conference on Language Resources and
Evaluation (LREC'12). Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur
Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidi (Eds.) European Language
Resources Association, Paris, France.
Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S, Goutte, C., Kulesza, A., Sanchis, A., Ueffing, N.
2004. Confidence Estimation for Machine Translation. In Proceedings of COLING '04
Proceedings of the 20th international conference on Computational Linguistics. Association for
Computational Linguistics, Stroudsburg, PA, 315-321. DOI:http://dx.doi.org/
10.3115/1220355.1220401.
De Almeida, G. 2013. Translating the Post-Editor: An investigation of post-editing changes and
correlations with professional experience across two romance languages. PhD dissertation.
Dublin City University (DCU), Dublin, Ireland.
DePalma, D. A., Hegde, V., Pielmeier, H., Stewart, R. G. 2013. The Language Services Market:
2013. Common Sense Advisory, Boston, USA.
De Souza, J. G. C.,Turchi, M., Negri, M. 2014. Machine Translation Quality Estimation Across
Domains. In Proceedings of COLING 2014, the 25th International Conference on
Computational Linguistics: Technical Papers, 409–420, Dublin, Ireland, August 23-29 2014.
Du, J., Moorkens, J., Srivastava, A., Lauer, M., Way, A., Lewis, D. 2015. D4.3: Translation Project-
Level Evaluation. FALCON Project EU FP7 deliverable.
Ehrensberger-Dow, M.. 2014. Challenges of Translation Process Research at the Workplace. MonTI
Monographs in Translation and Interpreting 7: 355-383.
Ehrensberger-Dow, M., Massey, G. 2014. Cognitive Ergonomic Issues in Professional Translation.
In The Development of Translation Competence: Theories and Methodologies from
Psycholinguistics and Cognitive Science, John W. Schwieter and Aline Ferreira (Eds.), 58-86.
Newcastle upon Tyne: Cambridge Scholars Publishing.
Gould, J. D., Lewis, C. 1985. Designing for Usability: Key Principles and What Designers Think.
Commun. ACM 28, 3 (1985) 300-311. ACM Press, New York, NY, 4. DOI:http://dx.doi.org/
10.1145/3166.3170
González-Rubio, J., Ortiz-Martínez, D., Casacuberta, F. 2010. On the Use of Confidence Measures
within an Interactive-predictive Machine Translation System. In Proceedings of the 14th.
Annual Conference of the European Association for Machine Translation (EAMT). Association
for Computational Linguistics, Stroudsburg, PA, 8pp.
Guerberof, A. 2008. Productivity and Quality in the Post-editing of Outputs from Translation
Memories and Machine Translation. Minor dissertation. Universitat Rovira i Virgili, Tarragona,
Spain.
Guerberof, A. 2013. What do professional translators think about post-editing? The Journal of
Specialised Translation 19, 75-95.
Hutchins, W. J. 2001. Machine translation over fifty years. Histoire, Epistémologie, Langage. Vol.
23 (1), 2001: Le traitement automatique des langues (ed. Jacqueline Léon); 7-31. Retrieved
October 18, 2013 from http://www.mt-archive.info/HEL-2001-Hutchins.pdf
Kelly, N. 2014. Why so many translators hate translation technology. The Huffington Post. Posted
online on 19/06/2014. http://www.huffingtonpost.com/nataly-kelly/why-so-many-translators-
h_b_5506533.html. Last accessed: 04/11/2015.
Kelly, N., DePalma, D., Hegde, V. 2012. Voices from the Freelance Translator Community.
Common Sense Advisory, Boston, USA.
Kluger, A. N., Denisi, A. 1996. The effects of feedback interventions on performance: A historical
review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin,
119, 2, Mar 1996, 254-284.
Koehn, P. 2009. A process study of computer aided translation. Machine Translation 23, 4, 241-
263.
Koehn, P., Alabau, V., Carl, M., Casacuberta, F., García-Martínez, M., González-Rubio, J., Keller,
F., Ortiz-Martínez, D., Sanchis-Trilles, G., Germann, U.. 2015. CASMACAT Final Public Report.
CASMACAT EU FP7 Deliverable.
Koponen, M. 2012. Comparing human perceptions of post-editing effort with post-editing
operations. In proceedings of the Seventh Workshop on Statistical Machine Translation. 181-
190.
Lagoudaki, E. 2006. Translation Memories Survey 2006: Users’ Perceptions Around TM Use. In
proceedings of ASLIB Translating and the Computer 28. London, UK. 15-16 November 2006.
Lagoudaki, E. 2008. Expanding the Possibilities of Translation Memory Systems: From the
Translator’s Wishlist to the Developer’s Design. PhD dissertation. Imperial College, London,
UK.
Langlais, P., Lapalme, G., Loranger, M. 2002. Transtype: Development-evaluation cycles to boost
translator’s productivity. Machine Translation (Special Issue on Embedded Machine
Translation Systems), 15, 2, 77-98.
McBride, C. 2009. Translation Memory Systems: An Analysis of Translators’ Attitudes and
Opinions. Master’s thesis. University of Ottawa, Canada.
Moorkens, J. O’Brien, S. 2013. User Attitudes to the Post-Editing Interface. In Proceedings of
Machine Translation Summit XIV Workshop on Post-editing Technology and Practice. Sharon
O’Brien, Michel Simard, Lucia Specia (Eds.) Association for Computational Linguistics,
Stroudsburg, PA, 19-25.
Moorkens, J. O’Brien, S. 2014. Post-Editing Evaluations: Trade-offs between Novice and
Professional Participants. El‐Kahlout, İ. D., Özkan, M., Sánchez‐Martínez, F., Ramírez‐Sánchez,
G., Hollowood, F., Way, A. (Eds.). Proceedings of the 18th Annual Conference of the European
Association for Machine Translation (EAMT 2015), 75-81.
Moran, J., Lewis, D. 2011.Unobtrusive methods for low-cost manual evaluation of machine
translation. In Proceedings of Tralogy 2011, Centre National de la Recherche Scientifique, Paris,
France, 3-4 March 2011.
O’Brien, S. 2006. Pauses as indicators of cognitive effort in post-editing machine. Across Languages
And Cultures, 7, 1, 1-21, DOI:http://dx.doi.org/10.1556/Acr.7.2006.1.1
Ozcelik, D., Quevedo-Fernandez, J., Thalen, J., Terken, J. 2011. Engaging users in the early phases
of the design process: attitudes, concerns and challenges from industrial practice. In proceedings
of the 2011 Conference on Designing Pleasurable Products and Interfaces (DPPI '11). ACM
New York. DOI:http://dx.doi.org/10.1145/2347504.2347519
Visser, F. S., Stappers, P. J., van der Lugt, R., Sanders, E. B. N. 2005. Contextmapping: experiences
from practice. CoDesign, 1, 2, 119–149, DOI:http://dx.doi.org/ 10.1080/15710880500135987
Somers, H. 2001. Machine Translation: Applications. Mona Baker (Ed.) Routledge Encyclopaedia
of Translation Studies. London: Routledge, 140–143.
Tidwell, J. 2006. Designing Interfaces. Sebastopol, CA: O’Reilly.
i
A confidence measure is an automated estimate of raw MT quality, and may be presented at a
segment or sub-segment level (Blatz et al. 2004). At the segment level, they may inform a user of
the likely usefulness of the raw MT, and at the sub-segment level they may inform a user about
incorrect or possibly incorrect translations (Alabau et al. 2013). There have been various
suggestions as to how to estimate and display confidence scores (Blatz et al 2004; González-Rubio
et al. 2010), but a confidence estimation feature has not yet been included in a commercial
translation editor.
ii
Lagoudaki (2006) found that SDL Trados was also the most widely-used tool within her 2006
cohort, with 51% reporting that they used the tool regularly.
iii
Lagoudaki (ibid.) on the contrary, reported that company employees were more likely than
freelancers to use multiple tools.
iv
www.lilt.com