Conference PaperPDF Available

Translation Dictation vs. Post-editing with Cloud-based Voice Recognition: A Pilot Experiment

Authors:

Abstract

In this paper, we report on a pilot mixed-methods experiment investigating the effects on productivity and on the translator experience of integrating machine translation (MT) post-editing (PE) with voice recognition (VR) and translation dictation (TD). The experiment was performed with a sample of native Spanish participants. In the quantitative phase of the experiment, they performed four tasks under four different conditions, namely (1) conventional TD; (2) PE in dictation mode; (3) TD with VR; and (4) PE with VR (PEVR). In the follow-on qualitative phase, the participants filled out an online survey, providing details of their perceptions of the task and of PEVR in general. Our results suggest that PEVR may be a usable way to add MT to a translation workflow, with some caveats. When asked about their experience with the tasks, our participants preferred translation without the 'constraint' of MT, though the quantitative results show that PE tasks were generally more efficient. This paper provides a brief overview of past work exploring VR for from-scratch translation and PE purposes, describes our pilot experiment in detail, presents an overview and analysis of the data collected, and outlines avenues for future work.
Translation Dictation vs. Post-editing with
Cloud-based Voice Recognition:
A Pilot Experiment
Julián Zapata julianz@intr.co
University of Ottawa & InTr Technologies, Ottawa-Gatineau, Canada
Sheila Castilho sheila.castilho@adaptcentre.ie
ADAPT Centre/School of Computing, Dublin City University, Dublin, Ireland
Joss Moorkens joss.moorkens@adaptcentre.ie
ADAPT Centre/School of Applied Language & Intercultural Studies, Dublin City
University, Dublin, Ireland
Abstract
In this paper, we report on a pilot mixed-methods experiment investigating the effects on
productivity and on the translator experience of integrating machine translation (MT) post-
editing (PE) with voice recognition (VR) and translation dictation (TD). The experiment
was performed with a sample of native Spanish participants. In the quantitative phase of the
experiment, they performed four tasks under four different conditions, namely (1)
conventional TD; (2) PE in dictation mode; (3) TD with VR; and (4) PE with VR (PEVR).
In the follow-on qualitative phase, the participants filled out an online survey, providing
details of their perceptions of the task and of PEVR in general. Our results suggest that
PEVR may be a usable way to add MT to a translation workflow, with some caveats. When
asked about their experience with the tasks, our participants preferred translation without the
‘constraint’ of MT, though the quantitative results show that PE tasks were generally more
efficient. This paper provides a brief overview of past work exploring VR for from-scratch
translation and PE purposes, describes our pilot experiment in detail, presents an overview
and analysis of the data collected, and outlines avenues for future work.
1. Introduction
Machine translation (MT) post-editing (PE) and voice recognition (VR) technology are
gaining ground in both translation technology research and the translation industry. Over 50%
of international Language Service Providers now offer a PE service using dedicated MT
engines integrated into translators’ computer-aided translation environments (Lommel and
DePalma, 2016). In a recent survey of 586 translators in the UK, 15% responded that they use
VR technology in their work (Chartered Institute of Linguists et al., 2017). These disparate
technologies tend not to be deployed in tandem, although both offer translators the potential to
increase productivity and reduce the technical effort usually required to translate from scratch
when using conventional word-processing hardware and software.
We carried out a pilot experiment to investigate the effects on productivity and on the
translator experience (TX) (Zapata, 2016a) of integrating PE with VR and translation
dictation (TD) using a sequential mixed-methods design. In the quantitative phase, four
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 123
translators performed four translation tasks under four different conditions: (1) conventional
TD (i.e., sight-translating using a digital dictaphone), (2) PE in dictation mode (PED) (i.e.,
dictating approved or amended segments into the same dictaphone), (3) TD with VR (TDVR)
(using a cloud-based VR system on a tablet), and (4) PE with VR (PEVR) (using the same VR
system as in task 3). The quantitative experiments consisted of three phases during which task
times were measured and some input data were collected. Phase I consisted of dictating and
post-editing with dictaphone or the VR system; phase II consisted of manually transcribing
the recordings from tasks 1 and 2 on the researcher’s laptop; and phase III consisted of
revising/editing all four translations. As has been noted in a great deal of research about PE,
productivity increases alone do not make a tool desirable for translators (see Teixeira, 2014;
Moorkens and O’Brien, 2017). Translator attitudes and usability, the TX, are important
factors in the adoption of any technology. For this reason, we have appended a follow-on
qualitative phase, wherein the participants filled out an online survey, providing details of
their perceptions of the task and of PEVR in general.
In this paper, we present our pilot experiment in detail. The paper is structured as
follows: First, we provide a brief overview of past work exploring VR for from-scratch
translation and PE purposes. Then, we describe the experimental setup, and present an
overview and analysis of the quantitative and qualitative results. In the conclusion, we
describe avenues for future work.
2. Related Work
2.1. TD and VR
The idea of using human voice to interact with computers and process texts is as old as the
idea of computers themselves. For decades, and in recent years more than ever before, voice
input has been widely used in a vast array of domains and applications, from virtual assistants
on mobile phones to automated telephone customer services; from professional translation to
legal and clinical documentation.
Simply put, VR (also known as voice/speech-to-text or automatic speech recognition)
technology recognizes human-voice signals and converts them into digital data. The earliest
experiments in VR suggested that voice input was expected to replace other input modes such
as the keyboard and the mouse in full natural language communication tasks. However, it was
soon discovered that speech often performed better in combination with other input modes
such as the keyboard itself, as well as touch, stylus and gesture input on multimodal interfaces
(Bolt, 1980; Pausch and Leatherby, 1991; Oviatt 2012).
In translation, there has been a long interest in speaking translations instead of typing
them. First, in the 1960s and 1970s professional translators often collaborated with
transcriptionists, and dictated their translations either directly to the transcriptionist or into a
voice recorder (or dictaphone), before having them transcribed later (a technique often
referred to as TD). In the 1990s and 2000s, researchers began to explore VR adaptation for
TD purposes. Such developments focused mainly on reducing VR word error rates by
combining VR and MT. Hybrid VR/MT systems are presented with the source text and use
MT probabilistic models to improve recognition; translators simply dictate their translation
from scratch without being presented with the MT output (Brousseau et al., 1995; Désilets et
al., 2008; Dymetman et al., 1994; Reddy and Rose, 2010; Rodriguez et al., 2012; Vidal et al.,
2006). More recently, further efforts have been made to evaluate the performance of
translation students and professionals when using commercial VR systems for straight TD
(Dragsted et al., 2009; Dragsted et al., 2011; Mees et al., 2013); to assess and analyze
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 124
professional translators’ needs and opinions about VR technology (Ciobanu, 2014 and 2016;
Zapata, 2012), and to explore TD in mobile and multimodal environments (Zapata and
Kirkedal, 2015; Zapata, 2016a,b).
2.2. PE and VR
In recent years, the potential of using VR for PE purposes has also been investigated (García-
Martínez et al., 2014; Mesa-Lao, 2014; Torres-Hostench et al., 2017). García-Martínez and
her collaborators (2014) tested a VR system integrated into a PE environment (both research-
level cloud-based systems). They argue that voice input is more interesting than the keyboard
alone in a PE environment, not only because some segments may need major changes and
therefore could be dictated, but also because, if the post-editor is not a touch typist, the visual
attention back and forth between source text, MT text and keyboard adds to the complexity of
the PE task.
Mesa-Lao (2014) surveyed student translators, 80% of which (n=15) reported that they
would welcome the integration of voice as one of the possible input modes for performing PE
tasks. Thus, voice input offers a third dimension to the PE task, making it possible to combine
different input modes or to alternate between them according to the difficulty of the task and
to the changing conditions of human-computer interaction. Some experiments have also
suggested specifically that for certain translators, text types and language combinations, the
benefits of VR and PE integration may not be the same (e.g. in terms of efficiency,
productivity and cognitive effort) (see Carl et al. 2016a and 2016b).
Tests with VR within a mobile PE app were reported, first by Moorkens et al. (2016),
then by Torres-Hostench et al. (2017). Participants were impressed by VR quality and found it
useful for long segments. However, they mostly preferred to use the keyboard due to
limitations of the software for making minor edits to MT output.
In the following section, we describe our pilot experiment more in detail: our
participants’ profile and our methodology.
3. Experimental Setup
3.1. Participants' Profile
This experiment included a sample of native (Latin American) Spanish speakers. All four
participants are either pursuing or have recently completed a doctoral degree in translation
studies. Participants had in common at least a minimum level of acquaintance with the notions
of MT, PE and VR. Our sample includes two men and two women between the ages of 26 and
43. Participants reported 3 to 12 years of translation experience, two have training in
interpreting, and both of those are regular users of VR (and were therefore familiar with voice
commands and other specificities related to dictating with VR). All participants reported to be
occasional post-editors.
3.2. Methodology
For this study, we applied a sequential, explanatory mixed-methods design, using the follow-
up explanations model, in which the qualitative data is intended to expand upon the
quantitative results (Creswell and Plano Clark, 2007:72). We chose this methodology to
answer the following two research questions:
1. Can PEVR be as or more productive than comparable approaches, with or without
MT and VR?
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 125
2. Does the participants’ TX suggest that combining MT and VR is feasible for
translation projects?
As mentioned in the introduction, four tasks were involved in the quantitative phase of this
experiment, namely:
1) Conventional TD;
2) PED;
3) TDVR; and
4) PEVR.
A digital dictaphone was used for tasks 1 and 2. A commercial cloud-based speaker-
independent VR system
1
was used on an Android tablet for tasks 3 and 4. (See Zapata and
Kirkedal (2015) for a description of the different approaches to VR technology with respect to
users (i.e. speaker-dependent, speaker-adapted and speaker-independent systems)).
Source texts were 20-segment sections of newstest 2013 data used in WMT
2
translation tasks. The test sets were analysed using the Wordsmith Wordlist
3
tool to ensure
that they were statistically similar, based on measurements for type/text ratio, average
sentence length, and average word length. Table 1 shows the statistics of the test set.
Text file
Type/token
ratio (TTR)
Mean word
length (in
characters)
Word
length
std.dev.
Mean (in words)
Test Set 1
55.12
4.99
2.51
18.05
Test Set 2
55.73
4.80
2.63
19.65
Test Set 3
54.31
5.00
2.62
21.09
Test Set 4
54.20
5.18
2.69
17.25
Table 1. Test set statistics for source texts
A commercial-level MT system
4
was used to translate the texts. All texts were printed
out separately and presented to the participants in hard copy. Naturally, only in tasks 2 and 4
were participants presented with the segmented source and MT texts. The MT texts for tasks 1
and 3 were used only to calculate HTER scores (Snover et al., 2006); more details are
provided in section 4.1.2.
Experiments were run individually (i.e. one participant at a time) over four days. A
university study room was booked to perform the experiments.
Tasks were randomized as follows:
1
Dragon Dictation, integrated in the Swype+Dragon app. See http://www.swype.com/.
2
http://www.statmt.org/wmt13/
3
http://lexically.net/wordsmith/
4
Google Translate. See https://translate.google.com/.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 126
Participant
Order of
tasks
ES1
1
2
3
4
ES2
3
4
1
2
ES3
4
3
2
1
ES4
2
1
4
3
Table 2. Participants and order of tasks
Before performing any of the experimental tasks, participants were briefly instructed
how to use the digital dictaphone (for tasks 1 and 2) and the VR system on the tablet (for tasks
3 and 4) (i.e., they were given the opportunity to dictate while testing a few voice commands
such as punctuation marks, etc.).
The quantitative experiments consisted of three phases during which task times were
measured and some input data were collected:
Phase I - dictating and post-editing with dictaphone or the VR system on the tablet,
Phase II - manually transcribing the recordings from tasks 1 and 2 (for TD and
PED) on the researcher’s laptop; and
Phase III - revising/editing all four translations on the researcher’s laptop.
It is important to highlight that during phase II, participants were instructed not to edit
the translation, only transcribe what they heard. The documents in which dictations were
performed on the tablet for tasks 3 and 4 in phase I were automatically saved into a cloud-
based drive
5
after dictation, and therefore immediately synchronized and available to be
edited/revised on the researcher’s laptop in phase III.
In phase I, task times were measured using a stopwatch. In both phases II and III,
Inputlog (Leijten and Van Waes, 2013) was used. Inputlog is a research-level program
designed to log, analyse and visualize writing processes. The program provides data such as
total time spent in the document, total time in active writing mode (i.e., of actual keystrokes),
total time spent moving/clicking with the mouse, total number of characters typed, total
switches between the keyboard and the mouse, etc. Beyond total task times alone, we were
interested in collecting this kind of detailed input data, particularly for phase III. We are not
reporting data other than task times here given the scope and limitations of this paper; we do
consider, however, that input data analysis will be essential in larger-scale experiments.
Thereafter, in the qualitative phase, participants responded to a short online
questionnaire, with socio-demographic questions, retrospective questions about the
experiment, as well as questions providing insight on the TX with multimodal/mobile VR-
enabled TD and PE applications (more details to be provided in section 4.4).
In the following section, some of the data collected is presented and analysed.
4. Results and Analysis
4.1. Task Times Measures (Quantitative Phase)
In order to investigate the effects on productivity of integrating PE with VR and TD in the
quantitative phase of this research, we have conducted analysis of the task times as follows:
5
Dropbox. See https://www.dropbox.com.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 127
1. Comparing tasks of the same nature with and without VR, that is, a) TD vs.
TDVR (see 4.1), and b) PED vs. PEVR (see 4.2)
2. Comparing translation vs. PE within phases, that is: a) TD vs PED (4.3) and b)
TDVR vs. PEVR (4.4).
We consider:
a) Translation and/or PE time (phase I + phase II), that is, the time participants
needed to translate and/or post-edit, as well as the transcription time (for TD and
PED);
b) Revision duration (phase III), that is, the total time participants needed to
review/edit their translation/post-editing;
c) Total task time (phase I + phase II+ phase III), that is, the total time the
participants needed to perform each task.
TD
versus
TDVR
When comparing both TD tasks (Table 3), i.e. the one performed with a dictaphone (TD) and
the one performed with a VR program (TDVR), we can see that the total translation time is
always shorter when participants use VR. A reminder to the reader that the total translation
time in the dictaphone task includes the time participants need to transcribe their translations
(phase II).
Regarding revision duration, however, tasks performed with VR seem to take longer to
be completed. We speculate that this is because during the revision time, participants do not
only review their translation but also must correct errors produced by the VR program.
Participants
Task
Translation Time
Revision
Time
Total Task
Time
Translation
time
Transcription
time
Total
ES1
TD
537
716
1253
402
1655
TDVR
796
n/a
796
656
1452
ES2
TD
688
1197
1885
405
2290
TDVR
1330
n/a
1330
1191
2521
ES3
TD
846
1116
1962
227
2189
TDVR
377
n/a
377
722
1099
ES4
TD
700
1432
2132
454
2586
TDVR
460
n/a
460
1046
1506
Table 3. TD vs TDVR (in seconds)
Overall, when considering all phases, total task time seems to be lower for TDVR, apart from
participant ES2, who shows lower time when performing TD.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 128
PED
versus
PEVR
Results for both PE tasks (PED and PEVR) were also compared (table 4). We notice that the
PE time (total) is lower for all participants in the VR condition. As for revision, the time is
higher in PEVR, which we assume is for the same reason described in above: that participants
also need to correct errors produced by the VR application. However, when considering all
phases, participants were still faster post-editing with VR than with the dictaphone.
To compare how much PE was performed for each task, we have calculated the
translation edit rate (HTER) (Snover et al. 2016). The HTER score is a measure that compares
the raw MT output and the post-edited version, and goes from 0 to 1, where the higher
number, the more modifications were made in the raw MT output. We can see in table 4 that
most of the participants have an average score of 0.2 which indicates that little post-editing
was performed. However, participant ES3 displays more post-editing performed for the PED
task (0.52).
Participants
Task
PE Time
Revision
Time
Total Task
Time
HTE
R
PE
time
Transcriptio
n time
Total
ES1
PED
633
692
1325
238
1563
0.24
PEVR
623
n/a
623
776
1399
0.23
ES2
PED
822
604
1426
537
1963
0.24
PEVR
910
n/a
910
606
1516
0.17
ES3
PED
612
1366
1978
270
2248
0.52
PEVR
344
n/a
344
475
819
0.25
ES4
PED
396
1725
2121
654
2775
0.26
PEVR
1176
n/a
1176
1007
2183
0.14
Table 4. PED vs PEVR (times are in seconds)
TD
versus
PED
As mentioned above, we also decided to consider the differences between translation and PE
when both were performed in the same manner; that is TD and PED; and TDVR and PEVR.
Table 5 compares the results for TD and PED. When looking at the results for
translation and PE translation time (total task time; last column), we notice that the results are
mixed: while participants ES1 and ES2 were faster with TD, the other two participants (ES3
and ES4) were faster with PED. Interestingly, the transcription time is inversely higher, that
is, participants ES1 and ES2 had higher transcription time for the TD tasks, whereas ES3 and
ES4 had higher transcription time in PED. Now, when considering the total translation/PE
time, we can see that the results are very close, the more visible differences lying for ES1 and
ES2, where the former is faster with TD and the latter with PED.
In sum, when looking at the different time measures across phases, we notice no trend
in the results. This indicates that, in general, there were not many differences between TD and
PED.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 129
Participants
Task
Translation/PE Time
Revision
Time
Total Task
Time
Translation/
PE time
Transcription
time
Total
ES1
TD
537
716
1253
402
1655
PED
633
692
1325
238
1563
ES2
TD
688
1197
1885
405
2290
PED
822
604
1426
537
1963
ES3
TD
846
1116
1962
227
2189
PED
612
1366
1978
270
2248
ES4
TD
700
1432
2132
454
2586
PED
396
1725
2121
654
2775
Table 5. TD vs PED (in seconds)
Table 6 compares the results for TDVR and PEVR. We can see that total task times are
lower for the first three participants when post-editing with VR than translating from scratch.
Only participant ES4 was faster in the translation task. Interestingly, participant ES4
displayed close times for revision, whereas participant ES1 showed lower times to revise the
translation. In sum, only participant ES4 showed higher times when post-editing than when
translating from scratch, which suggests that PE with the help of VR could generally lead to
higher productivity.
Participants
Task
Translation/PE
Time
Revision Time
Total Task Time
ES1
TDVR
796
656
1452
PEVR
623
776
1399
ES2
TDVR
1330
1191
2521
PEVR
910
606
1516
ES3
TDVR
377
722
1099
PEVR
344
475
819
ES4
TDVR
460
1046
1506
PEVR
1176
1007
2183
Table 6. TDVR vs PEVR (in seconds)
4.2. TX Analysis (Qualitative Phase)
In the follow-on, qualitative phase of this experiment, participants responded to an online
questionnaire with sociodemographic questions (see Participant’s profile in section 3.1
above) and retrospective questions about the experiment, as well as questions providing
insight on the TX with multimodal/mobile VR-enabled TD and PE applications. The notion of
TX is inspired from the notion of user experience (UX) extensively investigated in the field
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 130
of human-computer interaction and is defined as “a translator’s perceptions of and responses
to the use or anticipated use of a product, system or service” (Zapata, 2016a).
In this section, we report on the results of our questionnaire.
Subjectively Experienced Productivity
The questionnaire included an item to ask participants to indicate which one of the four
translation tasks they felt made them most productive, and which one made them least
productive. Three participants believed that TDVR made them most productive when in fact
they had performed the PEVR task faster. Two participants felt that they were slowest in the
PED condition. This perception of slower pace when MT has been introduced, contradicting
quantitative measurements that recorded increased speed, has been seen elsewhere by Plitt
and Masselot (2010) and Gaspari et al. (2014). When compared to their actual productivity
times, we note that apart from ES1 regarding TD (where he/she is least productive), the other
participants perceive it differently from the actual numbers. Table 7 below shows the
perceived productivity against the actual productivity, where l/L = least, m/M = most, lower-
case letters are for the perceived productivity and capital letters for the actual productivity.
Participant
TD
PED
TDVR
PEVR
ES1
l/L
m
M
ES2
l
m/L
M
ES3
l
m/L
M
ES4
m
L
l/M
Table 7. Subjectively experienced productivity against actual productivity
Subjectively Perceived Quality
The questionnaire also included an item to ask participants to indicate which one of the four
translation tasks they felt would result in the best quality, and which one would result in the
worst quality (that is, quality of the final target text). Table 8 shows that two of the four
participants were confident enough in the PEVR process, that they expected the output texts
from that process to be of high quality.
Participant
TD
PED
TDVR
PEVR
ES1
worst
best
ES2
worst
best
ES3
worst
best
ES4
best
worst
Table 8. Subjectively perceived quality
Challenges for VR-enabled TD and PE
A further question asked participants to elaborate on what they thought are the challenges of
VR, on the one hand, and of MT, on the other hand, to provide translators with a useful VR-
enabled TD and PE tool.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 131
Participants found VR to be reasonably accurate, but with room for improvement,
particularly regarding “proper names and figures”. Participants preferred translation without
the ‘constraint’ of MT as they considered the suggestions artificial. Participant ES2 wrote that
“the Spanish translation sounded more like a transliteration of a technical text in English, and
this is not translation as far as I understand”. The added cognitive load when MT is added to
source and target texts may be initially off-putting for translators, and may add to the
perception of decreased speed when MT is introduced to the workflow. They recognized that
VR and MT could aid productivity, but would prefer to add MT electively. Participant ES1
wrote that “a translator or post-editor should have the option to translate from scratch by
default, and request the help from the machine only when needed”. Participant ES2 agreed:
“For quality purposes, I prefer the [VR] translation from scratch or post-editing from
[translation memories] where you have more leeway.” In the opinion of participant ES4, “MT
makes work faster but not necessarily better. It somehow guides the work towards the
paradigmatic level. I think the overall cohesion of the document is affected.”
Advantages and Disadvantages of Mobile versus PC-based TD and PE
Finally, participants were asked to elaborate on the perceived advantages and disadvantages
of using a mobile TD and PE tool (i.e., on a mobile device such as a smartphone or a tablet)
versus a laptop- or PC-based tool. Several mentioned the flexibility of a mobile device, and
participant ES2 suggested that “it may help translators to develop interpreting strategies; such
as segmentation, quick thinking, anticipation, short-term memory, etc.” Two participants
mentioned the difficulties of working in a noisy environment and of speaking translations in a
public place. Participant ES3 felt that, although PEVR felt fast to him/her, it was difficult to
edit retrospectively. He/she added that if there was “a way to make it more seamless between
the keyboard and the mic, a balance so to say, then that'd be amazing.”
5. Conclusion and Future Work
We have reported a pilot experiment on the use of a cloud-based voice recognition (VR)
application for translation dictation (TD) and post-editing (PE), using both quantitative and
qualitative methods.
In answer to our first research question, based on this small-scale pilot experiment, PE
with VR can be as or more productive than comparable approaches, with or without machine
translation (MT) and VR. When looking at quantitative data alone, our results showed that, in
general, PE with the aid of a VR system was the most efficient method, being the fastest for
three of the participants. Interestingly, PE in dictation mode (PED) was the slowest for two
participants, followed by TD and TD with VR (TDVR). In the quantitative data, however, we
observe that most participants perceived productivity to be higher in the TDVR condition, and
expressed a preference to translate/dictate from scratch and have PE added as an option.
One of the issues we identified in our experiment is high revision/editing times in the VR
tasks; transcriptions by the VR system were far from flawless, leading to higher
revision/editing times. VR applications may produce errors due to translators lack of
familiarity with TD and insufficient training in how to speak to a VR system, especially for
properly adding punctuation using the appropriate commands. Trainers and researchers in
translation have explicitly affirmed that training in sight translation, TD, and VR will be
essential to succeed with (mobile) voice-enabled tools and devices (Mees et al. 2013; Zapata
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 132
and Quirion, 2016). We noted also that some foreign-language words (e.g. Russian names) in
the source texts caused a few misrecognitions in Spanish VR. Moreover, we noticed that some
participants would often wait until the software had transcribed a sentence or chunk of a
sentence onto the word processor page to continue speaking, which tends to confuse the
system (as opposed to when the dictation is continuous). Lastly, if the user pauses for several
seconds, the VR system “stops listening” and disconnects, which also causes both the system
and the user to lose the flow of the dictation.
Another point to highlight is that the participants’ typing skills may considerably affect
translation times. If our time task measures excluded the transcription time in TD and PED,
the whole productivity picture would change. Considering this and the issues described in the
previous paragraph, the ideal scenario would be one in which translators do not need to
transcribe their dictation, either in TD or PE. Instead, they would have a VR system with
human-like transcription capabilities, keeping dictation, transcription, and editing/revision
times (as well as recognition errors) to a minimum.
In answer to our second research question, participants’ TX suggests that combining
MT and VR is indeed feasible for translation projects, with some caveats. When asked about
their experience with the tasks, our participants seem to have preferred translation without the
‘constraint’ of MT as they considered the suggestions artificial, though the quantitative results
show that the PE task was more efficient than that of translation from scratch. The results of
this small-scale experiment suggest that PE with VR (PEVR) may be a usable way to add MT
to a translation workflow, and is worth testing at a larger scale.
For future work, we intend to carry out experiments with more participants and
language pairs. Further experimentation will include input logging, as well as eye-tracking
technologies to collect empirical data on cognitive effort when using VR for TD and PE. We
also seek to evaluate the impact of training translators in TD and VR over a period of time
before performing TDVR and PEVR tasks. Also, we will include objective measures of
quality (with the participation of expert evaluators) to compare it with the participants’
perceived quality of the target texts. Another avenue for future work is to investigate a
collaborative scenario in which translators/post-editors collaborate with transcriptionists
and/or revisers who would take part in the different phases of the experiment. This list of
ideas for future work is of course non-exhaustive; the possibilities seem endless.
The unprecedented robustness of VR technology and its availability on mobile devices
via the cloud opens a world of possibilities for human-aided MT and human translation
environments. By keeping human translators at the core of research, with strong consideration
of their perceptions and preferences for new technologies and applications, we can advance
towards finding the right balance in translator-computer interaction (O’Brien, 2012), towards
establishing what it is that the machine can do better than humans, and what it is that humans
can do better than the machine.
Acknowledgement
We would like to thank our anonymous participants for their time and involvement in
this pilot experiment. This work was supported by the ADAPT Centre for Digital Content
Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-
funded under the European Regional Development Fund.
References
Bolt, R. A. (1980). “Put-that-there”: Voice and gesture at the graphics interface. In Proceedings of the
SIGGRAPH’80, pages 262270. ACM Press.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 133
Brousseau, J., Drouin, C., Foster, G., Isabelle, P., Kuhn, R., Normandin, Y., & Plamondon, P. (1995).
French speech recognition in an automatic dictation system for translators: The TransTalk project.
In Proceedings of Eurospeech’95, pages 193-196, Madrid, Spain.
Carl, M., Aizawa, A., & Yamada, M. (2016a). English-to-Japanese Translation vs. Dictation vs. Post-
editing: Comparing Translation Modes in a Multilingual Setting. In The LREC 2016 Proceedings:
Tenth International Conference on Language Resources and Evaluation, pages 40244031,
Portorož, Slovenia.
Carl, M., Lacruz, I., Yamada, M., & Aizawa, A. (2016b). Comparing spoken and written translation
with post-editing in the ENJA15 English to Japanese Translation Corpus. In The 22nd Annual
Meeting of the Association for Natural Language Processing (NLP2016), Sendai, Japan.
Chartered Institute of Linguists, European Commission Representation in the UK, and the Institute of
Translation and Interpreting. (2017). UK Translator Survey: Final Report. Technical Report.
Chartered Institute of Linguists (CIOL), London, UK.
Ciobanu, D. (2014). Of Dragons and Speech Recognition Wizards and Apprentices. Revista
Tradumàtica, 12: 524538.
Ciobanu, D. (2016). Automatic Speech Recognition in the Professional Translation Process. Translation
Spaces, 5(1): 124144.
Désilets, A., Stojanovic, M., Lapointe, J.-F., Rose, R., and Reddy, A. (2008). Evaluating Productivity
Gains of Hybrid ASR-MT Systems for Translation Dictation. In Proceedings of the International
Workshop on Spoken Language Translation, pages 158-165, Waikiki, USA.
Dragsted, B., Hansen, I. G., and Selsøe Sørensen, H. (2009). Experts Exposed. Copenhagen Studies in
Language, 38: 293317.
Dragsted, B., Mees, I. M., and Hansen, I. G. (2011). Speaking your translation: students’ first encounter
with speech recognition technology. Translation & Interpreting, 3(1): 10-43.
Dymetman, M., Brousseau, J., Foster, G., Isabelle, P., Normandin, Y., and Plamondon, P. (1994).
Towards an Automatic Dictation System for Translators: the TransTalk Project. In Fourth
European Conference on Speech Communication and Technology, page 4, Yokohama, Japan.
Garcia-Martinez, M., Singla, K., Tammewar, A., Mesa-Lao, B., Thakur, A., Anusuya, M. A., Bangalore,
S., Carl, M. (2014). SEECAT: ASR & Eye-tracking Enabled Computer-Assisted Translation. In
Proceedings of the 17th Annual Conference of the European Association for Machine Translation,
pages 8188, Dubrovnik, Croatia.
Gaspari, F., Toral, A., Kumar Naskar, S., Groves, D., Way, A. (2014). Perception vs Reality: Measuring
Machine Translation Post-Editing Productivity. In Proceedings of AMTA 2014 Workshop on Post-
editing Technology and Practice, pages 60-72, Vancouver, Canada.
Leijten, M., and Van Waes, L. (2013). Keystroke Logging in Writing Research: Using Inputlog to
Analyze and Visualize Writing Processes. Written Communication, 30(3): 358392.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 134
Lommel, A. R and DePalma, D. A. (2016). Europe’s Leading Role in Machine Translation: How
Europe Is Driving the Shift to MT. Technical Report. Common Sense Advisory, Boston, USA.
Mees, I. M., Dragsted, B., Hansen, I. G., and Jakobsen, A. L. (2013). Sound effects in translation.
Target, 25(1): 140154.
Mesa-Lao, B. (2014). Speech-Enabled Computer-Aided Translation: A Satisfaction Survey with Post-
Editor Trainees. In Workshop on Humans and Computer-assisted Translation, pages 99-103,
Gothenburg, Sweden
Moorkens, J., and O’Brien, S. (2017). Assessing user interface needs of post-editors of machine
translation. In Human Issues in Translation Technology: The IATIS Yearbook, pages 109-130.
Taylor & Francis.
Moorkens, J., O’Brien, S., and Vreeke, J. (2016). Developing and testing Kanjingo: a mobile app for
post-editing. Tradumàtica, 14: 58-65.
O’Brien, S. (2012). Translation as human–computer interaction. Translation Spaces, 1(1): 101122.
Oviatt, S. (2012). Multimodal Interfaces. In J. A. Jacko (Ed.), The Human-Computer Interaction
Handbook: Fundamentals, Evolving Technologies and Emerging Applications (3rd ed., pages 415-
429). Lawrence Erlbaum Associates.
Pausch, R., and Leatherby, J. H. (1991). An Empirical Study: Adding Voice Input to a Graphical Editor.
Journal of the American Voice Input/Output Society 9(2): 55-66.
Plitt, M., Masselot, F. (2010). A productivity test of statistical machine translation post-editing in a
typical localization context. Prague Bulletin of Mathematical Linguistics 93: 7-16.
Reddy, A., and Rose, R. C. (2010). Integration of Statistical Models for Dictation of Document
Translations in a Machine Aided Human Translation Task. IEEE Transactions on Audio, Speech
and Language Processing, 18(8): 1-11.
Rodriguez, L., Reddy, A., and Rose, R. (2012). Efficient Integration of Translation and Speech Models
in Dictation Based Machine Aided Human Translation. In Proceedings of the IEEE 2012
International Conference on Acoustics, Speech, and Signal Processing, 2: 4949-4952.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. (2006). A Study of Translation Edit
Rate with Targeted Human Annotation. In Proceedings of Association for Machine Translation in
the Americas, pages 223-231, Cambridge, USA.
Teixeira, C. S. C. (2014). Perceived vs. measured performance in the post-editing of suggestions from
machine translation and translation memories. In Proceedings of AMTA 2014 Workshop on Post-
editing Technology and Practice, pages 45-59, Vancouver, Canada.
Torres-Hostench, O., Moorkens, J., O’Brien, S., and Vreeke, J. (2017). Testing interaction with a Mobile
MT post-editing app. Translation & Interpreting, 9(2):138-150.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 135
Vidal, E., Casacuberta, F., Rodríguez, L., Civera, J., and Martínez Hinarejos, C. D. (2006). Computer-
assisted translation using speech recognition. IEEE Transactions on Audio, Speech and Language
Processing, 14(3): 941-951.
Zapata, J. (2012). Traduction dictée interactive : intégrer la reconnaissance vocale à l’enseignement et
à la pratique de la traduction professionnelle. M.A. thesis. University of Ottawa.
Zapata, J. (2016a). Translating On the Go? Investigating the Potential of Multimodal Mobile Devices for
Interactive Translation Dictation. Tradumatica, 14: 66-74.
Zapata, J. (2016b). Translators in the Loop: Observing and Analyzing the Translator Experience with
Multimodal Interfaces for Interactive Translation Dictation Environment Design. PhD thesis.
University of Ottawa.
Zapata, J., and Kirkedal, A. S. (2015). Assessing the Performance of Automatic Speech Recognition
Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation
Workflows. In Proceedings of the 20th Nordic Conference of Computational Linguistics, pages
201-210, Vilnius, Lithuania.
Zapata, J., and Quirion, J. (2016). La traduction dictée interactive et sa nécessaire intégration à la
formation des traducteurs. Babel, 62(4): 531-551.
Proceedings of MT Summit XVI, Vol.2: Users and Translators Track
Nagoya, Sep. 18-22, 2017 | p. 136
... Research on the use of ASR in noncommercial CAT tools includes Dragsted [7] and Zapata, Castilho, & Moorkens [8]. The former reports on the use of dictation as a form of sight translation, to replace typing when translating from scratch, and the latter proposes using a segment of MT output as a translation suggestion. ...
... Tile mode breaks up the sentence: "makes it hard to read" (both source and target), "no fluidity", and "can't see the 'shape' of the sentence" (8) Difficult to move tiles around (dragging not working as expected) (3) "'Alien' interface", "Feels artificial", and "not intuitive" (3) Slows you down (3) "Unwieldy" (1) "Frustrating" (1) Not ergonomic (position of hands) (1) ...
... We believe that this can be remedied by training translators to work with ASR systems as well as by using customizable ASR systems that can adapt to individual voices. Another area that remains to be explored is how to best utilize ASR in combination with translation memories (TM) and machine translation (MT) [8], since the suggestions offered by those technologies often require minimum intervention, and ASR tends to work better for longer dictations. ...
Article
Full-text available
Commercial software tools for translation have, until now, been based on the traditional input modes of keyboard and mouse, latterly with a small amount of speech recognition input becoming popular. In order to test whether a greater variety of input modes might aid translation from scratch, translation using translation memories, or machine translation postediting, we developed a web-based translation editing interface that permits multimodal input via touch-enabled screens and speech recognition in addition to keyboard and mouse. The tool also conforms to web accessibility standards. This article describes the tool and its development process over several iterations. Between these iterations we carried out two usability studies, also reported here. Findings were promising, albeit somewhat inconclusive. Participants liked the tool and the speech recognition functionality. Reports of the touchscreen were mixed, and we consider that it may require further research to incorporate touch into a translation interface in a usable way.
... Some studies have assessed the performance of different ASR systems (Zapata & Kirkedal 2015), and other studies have investigated the application of ASR to simultaneous interpreting (Li & Wang 2018), post editing (Zapata et al., 2017) and written translation (Carl et al. 2016;Dragsted et al. 2011;Baxter 2017). However, the empirical data that support the use of ASR in written translation is limited, and more systematic research is needed to examine the function of ASR in the process and product of written translation on various language pairs using a variety of ASR tools (Ciobanu & Secara 2019). ...
... Oral input has been shown to be faster than typing (Hauptmann and Rudnicky, 1990;Bowker, 2002), making ASR a promising tool for translation workflows. A large body of literature has explored the integration of ASR into translation processes and the ways in which speech technologies can be effectively utilized in translation (Dymetman et al., 1994;Brousseau et al., 1995;Reddy and Rose, 2010;Mesa-Lao, 2014;Zapata Rojas et al., 2017;Teixeira et al., 2019;Ciobanu and Secarǎ, 2020). ...
Article
Full-text available
Technologies can greatly improve translators’ productivity and reduce their workload. Previous research has found that the use of automatic speech recognition (ASR) tools for dictating translations can increase productivity. However, these studies often had small sample sizes and did not consider other important aspects of translators’ performance, such as translation quality and cognitive effort. This study aims to investigate the impact of text input method on translators’ performance in terms of task duration, time allocation, editing operations, cognitive effort, and translation quality, as well as whether text difficulty affects these factors. To do this, 60 Chinese translation trainees were randomly assigned to either a dictation group or a typing group, and completed two English-Chinese translations of varying levels of source-text difficulty. Data were collected using keylogging, subjective ratings, screen recording, and a questionnaire. The results showed that using ASR reduced the typing effort of participants without negatively affecting translation quality, but did not save time or reduce cognitive effort. No effect of text difficulty was observed. Analysis of the revisions made by the dictation group and the results of the post-test questionnaire provide insights into how ASR systems can be optimized for translation purposes.
... However, participants would "use ASR as a complement rather than a substitute" to classical methods. Zapata et al. (2017) investigates effects on productivity and translation experience of four conditions: Translation Dictation (TD; which means dictate, then transcribe, then PE the transcription); PE in dictation mode (which means dictating approved sentences into a dictaphone, followed by manual transcription); TD with ASR; PE with ASR. All texts were printed-out and presented in hard copy, thus, no real CAT interface was used. ...
... While most computer-aided translation (CAT) tools focus on traditional translation and incorporate only mouse & keyboard, previous research investigated other input modalities: automatic speech recognition (ASR) for dictating translations has already been explored in the 90s (Dymetman et al., 1994;Brousseau et al., 1995) and the more recent investigation of ASR for PE (Martinez et al., 2014) even argues that a combination with typing could boost productivity. Mesa-Lao (2014) finds that PE trainees have a positive attitude towards speech input and would consider adopting it, and Zapata et al. (2017) found that ASR for PE was faster than ASR for translation from scratch. Due to these benefits, commercial CAT tools like memoQ and MateCat are also beginning to integrate ASR. ...
Conference Paper
The shift from traditional translation to post-editing (PE) of machine-translated (MT) text can save time and reduce errors, but it also affects the design of translation interfaces, as the task changes from mainly generating text to correcting errors within otherwise helpful translation proposals. Since this paradigm shift offers potential for modalities other than mouse and keyboard, we present MMPE, the first prototype to combine traditional input modes with pen, touch, and speech modalities for PE of MT. Users can directly cross out or hand-write new text, drag and drop words for reordering, or use spoken commands to update the text in place. All text manipulations are logged in an easily interpretable format to simplify subsequent translation process research. The results of an evaluation with professional translators suggest that pen and touch interaction are suitable for deletion and reordering tasks, while speech and multi-modal combinations of select & speech are considered suitable for replacements and insertions. Overall, experiment participants were enthusiastic about the new modalities and saw them as useful extensions to mouse & keyboard, but not as a complete substitute.
... A survey regarding speech usage with PE trainees (Mesa-Lao, 2014) finds that they have a positive attitude towards speech input and would consider adopting it, but only as a complement to other modalities. In a small-scale study, Zapata et al. (2017) found that ASR for PE was faster than ASR for translation from scratch. Due to these benefits, commercial CAT tools like memoQ and MateCat are also beginning to integrate ASR. ...
Conference Paper
Current advances in machine translation (MT) increase the need for translators to switch from traditional translation to post-editing (PE) of machine-translated text, a process that saves time and reduces errors. This affects the design of translation interfaces, as the task changes from mainly generating text to correcting errors within otherwise helpful translation proposals. Since this paradigm shift offers potential for modalities other than mouse and keyboard, we present MMPE, the first prototype to combine traditional input modes with pen, touch, and speech modalities for PE of MT. The results of an evaluation with professional translators suggest that pen and touch interaction are suitable for deletion and reordering tasks, while they are of limited use for longer insertions. On the other hand, speech and multi-modal combinations of select {\&} speech are considered suitable for replacements and insertions but offer less potential for deletion and reordering. Overall, participants were enthusiastic about the new modalities and saw them as good extensions to mouse {\&} keyboard, but not as a complete substitute.
Presentation
Full-text available
This presentation reports on the results of a usability study of a browser-based translation and post-editing tool that accepts multiple input modes and follows web accessibility principles. The tool was conceived to be used by professional translators and will include standard features that are necessary for integrating translation memories (TM) and machine translation (MT) synergetically. Its distinctive features include the option of using touch and voice commands, in addition to the typical keyboard and mouse commands. Another important feature is the inclusion of accessibility principles from the outset, with the aim of improving translation editing for professionals with special needs. In order to assess whether the usability and accessibility features included in the tool translate into improved performance and user experience, we have carried out experiments with a varied cohort of participants. The current presentation reports on the results of the accessibility tests with three blind translators. The results were analysed based on frustration reports, a usability & accessibility questionnaire and an interview. The findings will help understand how the possibilities offered by the newest technologies can improve the translation process from the user’s perspective and will inform future stages of the tool’s iterative development.
Conference Paper
Full-text available
Speech-enabled interfaces have the potential to become one of the most efficient and ergonomic environments for human-computer interaction and for text production. However, not much research has been carried out to investigate in detail the processes and strategies involved in the different modes of text production. This paper introduces and evaluates a corpus of more than 55 hours of English-to-Japanese user activity data that were collected within the ENJA15 project, in which translators were observed while writing and speaking translations (translation dictation) and during machine translation post-editing. The transcription of the spoken data, keyboard logging and eye-tracking data were recorded with Translog-II, post-processed and integrated into the CRITT TPR-DB 1 , which is publicly available under a creative commons license. The paper presents the ENJA15 data as part of a large multilingual Chinese, Danish, German, Hindi and Spanish translation process data collection of more than 760 translation sessions. It compares the ENJA15 data with the other language pairs and reviews some of its particularities.
Article
Full-text available
This article provides a general overview of interactive translation dictation (ITD), an emerging translation technique that involves interacting with multimodal voice-and-touch-enabled devices such as touch-screen computers, tablets and smartphones. The author discusses the interest in integrating new techniques and technologies into the translation sector, provides a brief description of a recent experiment investigating the potential and challenges of ITD and outlines avenues for future work.
Article
Full-text available
Kanjingo is a post-editing application for iOS devices developed at the ADAPT Centre (formerly CNGL) at Dublin City University (DCU). The first stage of user testing was conducted in 2014 (reported in O'Brien, Moorkens & Vreeke, 2014), and improvements were made based on the initial feedback. This abstract describes further exploratory testing based on the second iteration of the Kanjingo application. The new tests were designed with several aims: (1) testing Kanjingo for post-editing using the phone's keyboard (2) testing Kanjingo for post-editing with voice input; (3) testing Kanjingo for revision of post-edited texts; (4) testing Kanjingo general usability; and (5) testing Kanjingo interface design. This paper presents the results of the various tests, issues identified, and ideas for improvements. For example, the use of Kanjingo for post-editing with voice input, one of the most innovative forms of interaction with MT in the test, worked much better than participants expected, and this mode of input was preferred for translating from scratch when MT quality was very poor, whereas post-editing short words or phrases was found to be faster with the iPhone keyboard. In addition, we present some reflections on the strengths and weaknesses of the testing methods employed.
Chapter
Full-text available
The increased use of machine translation (MT) in commercial localization has seen post-edited MT becoming an increasingly cost-effective solution for specific domains and language pairs. At present, post-editing tends to be carried out via tools built for editing human-generated translations. These tools, while suited to their intended task, do not always integrate with machine translation or support post-editing. This lack of support, and a sense of apprehension related to MT, leads to reluctance among translators to accept post-editing jobs. This article describes the results of a survey and a series of follow-up interviews with professional translators and post-editors, in which they discussed features and functions for their ideal translation and post-editing User Interface (UI). Participants stressed the need for a simple and customisable UI that supports multilingual formatting and follows established translation editor conventions, in addition to some functionality specific to post-editing. The survey and interview responses led to the creation of specifications for a UI that better supports the post-editing task.
Article
Full-text available
This paper describes the development of the mobile post-editing application Kanjingo, and the developers' intentions for the tool. There have, to date, been two rounds of usability testing of Kanjingo completed, with good responses from participants. Despite this, it remains difficult to make a commercial case for smartphone post-editing. RESUM Aquest article descriu el desenvolupament de Kanjingo, una aplicació mòbil per a la postedició, i les motivacions de la seva creació. L'aplicació ha estat sotmesa, fins ara, a dues rondes d'avaluació pel que fa a la facilitat d'ús, amb respostes positives dels participants. Amb tot, encara no s'ha pogut convertir en projecte comercial per a la postedició amb telèfons intel·ligents. Paraules clau: traducció, postedició, postedició mòbil, desenvolupament d'aplicacions RESUMEN Este artículo describe el desarrollo de Kanjingo, una aplicación móvil para la posedición, y las motivaciones de su creación. La aplicación ha sido sometida, hasta la fecha, a dos rondas de evaluación sobre su facilidad de uso, con respuestas positivas de los participantes. Sin embargo, todavía no se ha podido convertir en proyecto comercial para la posedición con teléfonos inteligentes.. Palabras clave: traducción, posedición, posedición móvil, desarrollo de aplicaciones
Thesis
This thesis explores interactive translation dictation (ITD), a translation technique that involves interaction with multimodal interfaces equipped with voice recognition (VR) technology throughout the entire translation process. Its main objective is to provide a solid theoretical background and an analysis of empirical qualitative and quantitative data that demonstrate ITD’s advantages and challenges, with a view to integrating this technique into the translation profession. Many empirical studies in human-computer interaction have strived to demonstrate the efficiency of voice input versus keyboard input. Although it was implicit in the earliest works that voice input was expected to completely replace—rather than complement—text-input devices, it was soon proposed that VR often performed better in combination with other input modes. This study introduces multimodal interaction to translation, taking advantage of the unparallelled robustness of commercially available voice-and-touch-enabled multimodal interfaces such as touch-screen computers and tablets. To that end, an experiment was carried out with 14 professional English-to-French translators, who performed a translation task either with the physical prototype of an ITD environment, or with a traditional keyboard-and-mouse environment. The hypothesis was that the prototypical environment would consistently provide translators with a better translator experience (TX) than the traditional environment, considering the translation process as a whole. The notion of TX as introduced in this work is defined as a translator’s perceptions of and responses to the use or anticipated use of a product, system or service. Both quantitative and qualitative data were collected using different methods, such as video and screen recording, input logging and semi-structured interviews. The combined analysis of objective and subjective usability measures suggests a better TX with the experimental environment versus the keyboard-and-mouse workstation, but significant challenges still need to be overcome for ITD to be fully integrated into the profession. Thus, this doctoral study provides a basis for better-grounded research in translator-computer interaction and translator-information interaction and, more specifically, for the design and development of an ITD environment, which is expected to support professional translators’ cognitive functions, performance and well-being. Lastly, this research aims to demonstrate that translation studies research, and translation technology in particular, needs to be more considerate of the translator, the TX, and the changing realities of the interaction between humans, computers and information in the twenty-first century.
Article
Les technologies langagières font partie intégrante du paysage traductionnel depuis bon nombre d’années; elles ont indéniablement eu jusqu’ici un effet marqué sur l’exercice et l’enseignement de la traduction professionnelle. Or, parmi ces innovations technologiques, il y en a une faisant progressivement son apparition dans le secteur traductionnel et ayant le potentiel d’influencer considérablement le travail journalier des traducteurs professionnels dans un proche avenir : la reconnaissance vocale (RV). Le présent article aborde l’intérêt et les défis pédagogiques inhérents à l’intégration de la RV à la pratique traductive. D’abord, un bref exposé sur la RV et sur la place que lui accorde la traduction (dans la recherche, la pratique et l’enseignement) sera effectué, puis le concept de traduction dictée interactive (TDI) sera défini. La nature et les modalités de la traduction à vue (TAV) seront subséquemment décrites. Finalement sera justifiée l’intégration de la TDI à la formation des traducteurs professionnels.