Conference PaperPDF Available

Figures

Content may be subject to copyright.
Proceedings of the 1st Workshop on NLP for Positive Impact, pages 135–142
Bangkok, Thailand (online), August 5, 2021. ©2021 Association for Computational Linguistics
135
A Speech-enabled Fixed-phrase Translator for Healthcare Accessibility
Pierrette Bouillon1,Johanna Gerlach1,Jonathan Mutal1,Nikos Tsourakis1, and
Herv´
e Spechbach2
1FTI/TIM, University of Geneva, Switzerland
2Hˆ
opitaux Universitaires de Gen`
eve (HUG), Switzerland
{Pierrette.Bouillon, Johanna.Gerlach, Jonathan.Mutal,
Nikolaos.Tsourakis}@unige.ch
Herve.Spechbach@hcuge.ch
Abstract
In this overview article we describe an appli-
cation designed to enable communication be-
tween health practitioners and patients who do
not share a common language, in situations
where professional interpreters are not avail-
able. Built on the principle of a fixed phrase
translator, the application implements differ-
ent natural language processing (NLP) tech-
nologies, such as speech recognition, neural
machine translation and text-to-speech to im-
prove usability. Its design allows easy porta-
bility to new domains and integration of dif-
ferent types of output for multiple target au-
diences. Even though BabelDr is far from
solving the problem of miscommunication be-
tween patients and doctors, it is a clear ex-
ample of NLP in a real world application de-
signed to help minority groups to communi-
cate in a medical context. It also gives some
insights into the relevant criteria for the devel-
opment of such an application.
1 Motivation
Access to healthcare is an important component
of quality of life, but it is often compromised by
the language barrier which prevents effective com-
munication. In hospitals, medical staff are increas-
ingly confronted with patients with whom they do
not share a common language. Lack of clear com-
munication can lead to increased risk for patients
(Flores et al.,2003) but also discourages vulnera-
ble groups from seeking medical assistance. When
professional interpreters are not easily available,
for example in emergency situations, there is a cru-
cial need for tools to overcome the language bar-
rier in order to provide medical care. While many
generic translation solutions are available on the
web, they present numerous disadvantages, includ-
ing the unreliability of machine translation (Bouil-
lon et al.,2017), the insufficient data confidential-
ity of cloud services or the absence of resources
for minority languages. To overcome these issues,
specifically designed tools based on a limited set
of pre-translated sentences have been developed.
These phraselators (Seligman and Dillinger,2013)
have the advantage of portability, accuracy and
reliability. Although these tools have limited cov-
erage, and do not solve all communication issues,
recent studies have shown that they are generally
preferred to machine translation as they are per-
ceived as more reliable and trustworthy in these
safety critical contexts (Panayiotou et al.,2019;
Turner et al.,2019).
This paper aims to provide an overview of the
NLP components included in the speech-enabled
phraselator called BabelDr. In Section 2we will
give an overview of BabelDr usage. We then ex-
plain the artificial training data derived from the
grammar to specialise the different components in
Section 3. In sections 4,5,6,7and 8we explain
BabelDr’s components in detail, as well as the pos-
sible outputs available to users. We then present
several usage studies with target groups in Section
9.1, report on the performance of the whole system
in Section 9.2 and conclude in Section 10.
2 BabelDr
BabelDr
1
is a joint project between the Faculty of
Translation and Interpreting of the University of
Geneva and Geneva University Hospitals (HUG).
(Bouillon et al.,2017). The aim of the project is to
develop a speech to speech translation system for
emergency settings which meets three criteria: reli-
ability, data security and portability to low-resource
target languages relevant for HUG. It is designed
to allow French-speaking medical practitioners to
carry out triage and diagnostic interviews with pa-
tients speaking Albanian, Arabic, Dari, Farsi, Span-
ish, Swiss French sign language and Tigrinya.
1More information available at https://babeldr.unige.ch/
136
Figure 1: Overview of BabelDr usage
BabelDr is a web application designed to func-
tion on desktops and mobiles. Built on the prin-
ciple of a phraselator, it relies on a limited set
of pre-translated sentences, hereafter called core-
sentences, collected with doctors. For improved
usability and more natural interaction with the pa-
tient, it includes a speech recognition component:
instead of searching for utterances in menus, medi-
cal staff can speak freely and the system will map
the spoken utterances to the closest pre-translated
core-sentence. This sentence is then presented for
validation, in a backtranslation step, ensuring that
the doctor knows exactly what is being translated
for the patient. The patient can then respond by
means of a pictogram-based interface. All com-
ponents can be deployed on a local server with
no dependency on cloud services, thus ensuring
the data confidentiality that is essential for medi-
cal applications. Figure 1illustrates the usage of
BabelDr.
3 Training data and grammars
Due to confidentiality issues, training data for spo-
ken French medical dialogues is scarce. For this
reason, the first version of the system was built
around a manually defined Synchronous Context
Free Grammar (SCFG, Aho and Ullman,1969),
used for grammar-based speech recognition and
parsing (Rayner et al.,2017). This grammar is
now leveraged to generate artificial data used both
for backtranslation (Section 5) and for specialising
speech recognition (Section 4).
The grammar maps source variation patterns,
described in a formalism similar to regular ex-
pressions, to core-sentences. Due to the repetitive
nature of the content, the grammars make use of
compositional sentences to make resources more
compact. These sentences contain one or more
variables, which are replaced by different values at
system compile time. Figure 2gives an example of
a compositional utterance rule.
The current version of the grammar includes
2629 utterance rules, organised by medical domain,
which expand to 10’991 core-sentences once vari-
ables are replaced by values. These core-sentences
are mapped to hundreds of millions of surface sen-
tences. Figure 3shows an example of the aligned
core-sentence - variation corpus that can be gener-
ated from the grammar.
4 Speech-to-Text
To ensure both accuracy and usability, the system
uses a hybrid approach for speech recognition, com-
bining two recognisers. The first is a grammar
based speech recogniser using GRXMLs generated
from the original SCFG (see Section 3). While
this is fast and accurate, since it directly yields
a core-sentence, it is unable to handle utterances
that are out of grammar coverage. It is therefore
complemented by a large-vocabulary recogniser
specialised with the monolingual artificial corpus
described in Section 3. The results of the two ap-
proaches are combined based on the confidence
score provided by the grammar based recogniser:
if the score is over a pre-defined threshold, this
result is kept, else the system falls back on the
large-vocabulary result. Their performance is eval-
uated in terms of WER, which is 38.9% for the
GRXML grammar and 14.4% (see Table 2) for the
large vocabulary model. In this case we have used
the dataset of the user study described in (Bouillon
et al.,2017).
137
Figure 2: Example of a grammar rule
Figure 3: Example of the aligned corpus generated from the grammar: core-sentences with corresponding source
variations
For the GRXML recogniser we use the Nuance
ASR v10 and the Nuance Transcription Engine
4 for the large-vocabulary one. Both can be ac-
cessed over the network through our custom API
using HTTP POST requests. The recognition is file-
based and it proves to work well for any real-time
interaction. The distributed nature of our back-end
platform permits easy scaling and load balancing
so that multiple users can interact simultaneously
with the recognisers. Especially, for the GRXML
case, we can load and compile grammars on the fly
or change the parameters of the recogniser dynami-
cally. We can also parse any text against a specific
grammar using an HTTP request.
5 Backtranslation
The backtranslation (introduced in Section 2) is an
essential step in BabelDr since it maps the speech
recognition result to a core-sentence that is pre-
sented to the doctor for validation. For the GRXML
recogniser, backtranslation is performed directly
by the grammar. For the large vocabulary recog-
niser, as the set of core-sentences is limited (see
Section 3), the backtranslation task can be seen
as a sentence classification task where the core-
sentences are the categories, or as translation task
into a controlled language. As a resource, we
use the bilingual corpus generated from the gram-
mar as training data. Rayner et al. (2017) intro-
duced an approach based on tf-idf indexing and
dynamic programming (DP) achieving 91.8% on
accuracy (assuming perfect speech recognition and
1-best). Mutal et al. (2019) then applied different
approaches using deep learning methods, neural
machine translation (NMT) and sentence classifi-
cation achieving 93.2% (see Table 2) accuracy on
core-sentence matching for transcriptions (assum-
ing perfect speech recognition), improving on the
previous approach. This approach is currently used
in BabelDr.
6 Elliptical Sentences
In dialogues, elliptical utterances are very common,
since they ensure the principle of economy and usu-
ally avoid duplication (Hamza,2019). In BabelDr,
they allow doctors to question patients in a more ef-
ficient way (Tanguy et al.,2011). However, literal
translation of these utterances could affect com-
munication as illustrated in Table 1. In BabelDr,
elliptical utterances are not translated literally, but
are instead mapped to the closest non-elliptical
core-sentence, based on the context.
To avoid a wrong backtranslation in elliptical
sentences, a context-level information (the previous
accepted utterance) is added to the model. There-
fore, when an utterance is identified as an ellipsis,
138
Utterance Translation
do you have pain in your stomach? ¿le duele el est´
omago?
in your head? *¿en tu cabeza?
Good Translation: ¿Le duele la cabeza?
Table 1: Example of a bad translation of ellipsis. The * means a bad translation.
it is concatenated with the previous translated ut-
terance before backtranslating. In the context of
BabelDr, elliptical utterances are detected using
a binary classifier. The model was trained using
handcrafted features, such as sentence length, ab-
sence of verbs or nouns, part of speech of the first
word, and identification of pronouns that refer to en-
tities in the context (using morphological features).
On an artificial ellipsis data set, the model achieves
98% accuracy on detecting elliptical sentences and
88% on backtranslating them to a core-sentence
(see more, Mutal et al.,2020).
7 Output
After validation of the backtranslation, BabelDr
presents the target language output to the patient in
written and spoken form, which are both based on
the same human translations of the core-sentences.
In the following sections we first outline the trans-
lation approach and then describe how the trans-
lations are rendered for the patient, in audio (for
spoken languages) or video format (for sign lan-
guage).
7.1 Translation
High translation quality is essential for a medi-
cal phraselator, therefore the translations are pro-
duced by professional translators. Translating for
BabelDr presents technical challenges, since lan-
guage resources must be in a specific structured
data format not easily accessible to translators. An
online translation platform which includes a trans-
lation memory and allows translators to efficiently
handle the compositional items was developed to
facilitate the translators’ task and ensure the quality
and coherence of the translations (Gerlach et al.,
2018).
The translations are aimed at patients with no
medical knowledge and designed to be understand-
able by patients with a low level of literacy. Sen-
tences were also adapted to account for cultural
aspects, such as sensitive or intimate topics that are
not commonly discussed, related for example to
sexual habits (Halimi et al.,2020). Since the sys-
tem provides translations both in written and spo-
ken form, the translators had to choose phrasings
that would function in both. A recent evaluation of
the translations for two of the system’s target lan-
guages (Albanian and Arabic) has shown that these
translations are easy to understand, and thereby
make the system more trustworthy in comparison
to MT (in publication, Gerlach et al.,2021).
Ongoing developments include the extension of
the system to new target languages and modalities
to make the system accessible to further popula-
tion groups. One addition involves translation to
pictographs targeted at people with intellectual dis-
abilities, another is translation into easy language,
beginning with Simple English.
7.2 Text-to-Speech
Audio has been an important output modality for
the BabelDr system, as it presents various compet-
itive advantages for the patients. It alleviates the
burden of looking on the screen, which proves to be
challenging in a medical setting, e.g. positioning of
the physician and patient. Especially, for illiterate
users, it is an essential component, and having a
system talking in their own language can improve
user experience. While it would be possible to have
a human record all the pre-translated sentences, due
to the number and repetitive nature of the sentences,
the time and cost involved in recording were con-
sidered too high. The option of a Text-to-Speech
(TTS) system was therefore adopted from the be-
ginning of the project in order to announce the
translated questions of the physician. State-of-the-
art systems like Nuance Vocalizer are now part
of our content creation pipeline for crafting the
prompts.
Systems of this kind, however, lack support for
low-resource languages that the BabelDr system
also targets. For this reason, we have investigated
the option of building our own TTS for those lan-
guages from scratch. In a previous study, posi-
tive feedback in terms of comprehensibility was
139
Figure 4: Doctor and patient interfaces
Task Model Metric Result
Speech to Text GRXML
Large Vocabulary WER 38.9%
14.4%
Back Translation NMT Accuracy 93.2%
Overall (3-best) SER 5%
Table 2: Performance by component and overall
received (Tsourakis et al.,2020), after building a
synthetic female voice for the Albanian language
based on Tacotron 2, a neural network architecture
for speech synthesis directly from text (Shen et al.,
2017). Among the target languages supported by
BabelDr, Tigrinya is one for which no public TTS
is available.
For this reason, a female voice talent was re-
cruited to record all the prompts that were subse-
quently used in the online system. This allowed us
to create a corpus with 18 hours of speech that we
exploit in order to create the Tigrinya synthesized
voice. The training process is similar to the one
found in (Tsourakis et al.,2020). As new content
is constantly added to the system, new recordings
of the translations are requested. This time we first
generate the output with the TTS and ask the voice
talent to listen to the prompts. If the result is accept-
able the TTS version is kept, otherwise, a human
recording is necessary. In a set of 2150 prompts
the human had to record 573 files (26.7%).
7.3 French Sign Language
Establishing effective and reliable communication
between a doctor and a deaf patient is a compli-
cated task. The scarcity of professional interpreters
and the lack of awareness of medical staff for
deaf culture severely impedes communication. To
create sign language output for our fixed-phrase
translator, we have investigated two different ap-
proaches: recorded human signers and an avatar
(using JASigning, Glauert and Elliott,2011). An
evaluation carried out with the deaf community
showed that the recorded human signers are supe-
rior in terms of understandability and acceptability,
but it was found that the avatar could be useful in
this context (in print, Bouillon et al.,2021). The
recorded videos were recorded by a sign language
interpreter in collaboration with a deaf nurse, and
are freely accessible in the online system, providing
a human translation reference in sign language for
medical questions. These resources present oppor-
tunities to evaluate what affects the communication
140
task with deaf people in this specialised context.
8 Patient response interface
The original BabelDr system was limited to yes-no
questions or questions where the patient could re-
spond non-verbally, for example by pointing at a
body part. This restrictive approach was problem-
atic both for doctors, who are used to asking open
questions, and for patients who had little means
to actively contribute to the direction of the dia-
logue. To build a bidirectional version that would
allow more complex responses from the patient,
we considered different options. Building a system
that would allow patients to respond with speech
presents numerous difficulties. No speech recog-
nisers exist for many of the minority languages
targeted by our system, and few or no resources
such as speech corpora are available to build such
systems. A text interface, as found in traditional
phraselators, while easier to implement, would not
be accessible to patients with low literacy. Addi-
tionally, in the context of a fixed phrase transla-
tor, some user training is necessary to familiarise
with system coverage, which is not possible for
patients who arrive at an emergency service. For
these reasons, we chose to add a simple pictograph
based response interface, shown in Figure 4. Each
core-sentence is linked to a set of corresponding re-
sponse pictographs among which the patient can se-
lect their response. Evaluation of these pictographs
in terms of understandability and acceptability by
patients of different educational and cultural back-
grounds is ongoing (Norr
´
e et al.,2020). A task-
based evaluation showed that all patients prefered
the bidirectional version since they could explain
their symptoms more efficiently.
9 Evaluation
9.1 Task based
A translation system for the healthcare domain
should be evaluated on the task it is designed to as-
sist, which in the case of BabelDr is the diagnostic
interview. To this end, we carried out several usage
studies. In a preliminary study, we asked four med-
ical students and five doctors to diagnose two stan-
dardised Arabic speaking patients, using BabelDr
and Google Translate (GT). Results showed that in
comparison to the generic machine translation tool,
BabelDr provides higher-quality translations and
led to a higher number of correct diagnoses (8/9
for BabelDr against 5/9 for GT), in particular with
medical students (Bouillon et al.,2017). A subse-
quent crossover study where 12 French speaking
doctors where asked to diagnose two Arabic speak-
ing standardised patients using BabelDr confirmed
that the application allows doctors to reach accu-
rate and reliable diagnoses (24/24 correct). It was
agreed among participating medical professionals
that BabelDr could be used in their everyday medi-
cal practice (Spechbach et al.,2019).
The system is currently in use at the HUG outpa-
tient emergency unit and a user satisfaction study
is ongoing to collect patients’ and doctors’ feed-
back on system usage in real emergency settings by
means of questionnaires (Janakiram et al.,2020).
The study includes only patients with no under-
standing of French and no common language with
the doctor. Overall, 90% of the 30 patients included
so far reported a positive level of satisfaction. The
doctors reported 87%.
9.2 System performance
To evaluate the performance of the current version
of the complete system, we have used the spoken
data set collected during the usage study described
above (Spechbach et al.,2019). Since the system
relies on human pre-translation, it is sufficient to
evaluate the output in terms of backtranslation, as a
correct core-sentence will result in a correct transla-
tion for the patient. We measured the performance
using sentence error rate (SER), which is defined
as the percentage of core-sentences that are not
identical to the annotated correct core-sentences.
Since the system interface presents a selection of
core-sentences to the doctor, for this evaluation we
considered 3-best backtranslation results, including
the GRXML result when it was above the confi-
dence threshold and two or three backtranslations
of large vocabulary recogniser results. With this
configuration, the system achieved 5% SER on this
data set.
10 Conclusion
Healthcare translation is required to facilitate the
engagement with people with diverse language, cul-
tural, and literacy backgrounds. The development
of culturally effective and patient-oriented trans-
lation tools has become increasingly urgent. Al-
though BabelDr is far from solving the problem of
miscommunication, it is an example of a concrete
application of natural language processing to help
minority groups communicate in a medical context.
141
The developed tool, resources and evaluations
are a first step toward accessible healthcare apps.
This research is essential to define criteria which
can be used in the development and evaluation of
new medical interpreting technologies with a view
to enhancing the usability among patients from
refugee, migrant, or other socioeconomically dis-
advantaged populations.
Acknowledgements
This project was supported by the ”Fondation
Priv
´
ee des H
ˆ
opitaux Universitaires de Gen
`
eve”.
We would also like to thank Nuance Inc for gen-
erously making their software available to us for
research purposes.
References
A.V. Aho and J.D. Ullman. 1969. Syntax directed
translations and the pushdown assembler.Journal
of Computer and System Sciences, 3(1):37–56.
Pierrette Bouillon, Bastien David, Irene Strasly, and
Herv´
e Spechbach. 2021. A speech translation sys-
tem for medical dialogue in sign language - ques-
tionnaire on user perspective of videos and the use
of avatar technology. In Proceedings of the 3rd
Swiss Conference on Barrier-free Communication
(BfC 2020), Winterthur, Switzerland.
Pierrette Bouillon, Johanna Gerlach, Herv´
e Spechbach,
Nikos Tsourakis, and Ismahene S. Halimi Mallem.
2017. BabelDr vs Google Translate: a user study at
Geneva University Hospitals (HUG), 20th Annual
Conference of the European Association for Ma-
chine Translation (EAMT). Prague, Czech Republic.
ID: unige:94511.
Glenn Flores, M. Barton Laws, Sandra J. Mayo, Barry
Zuckerman, Milagros Abreu, Leonardo Medina, and
Eric J. Hardt. 2003. Errors in medical interpretation
and their potential clinical consequences in pediatric
encounters.Pediatrics, 111(1):6–14.
Johanna Gerlach, Pierrette Bouillon, Rovena Troqe, So-
nia Halimi, and Herv´
e Spechbach. 2021. Patient ac-
ceptance of translation technology for medical dia-
logues in emergency situations, Translation in Times
of Cascading Crisis. Bloomsbury Academic.
Johanna Gerlach, Herv ´
e Spechbach, and Pierrette
Bouillon. 2018. Creating an Online Translation
Platform to Build Target Language Resources for a
Medical Phraselator, Proceedings of the 40th edi-
tion of Translating and the Computer Conference
(TC40), pages 60–65. AsLing, The International As-
sociation for Advancement in Language Technology,
Geneva. ID: unige:111776.
John Glauert and Ralph Elliott. 2011. Extending the
sigml notation: A progress report. In Second In-
ternational Workshop on Sign Language Translation
and Avatar Technology (SLTAT), Dundee, Scotland.
Sonia Halimi, Razieh Azari, Pierrette Bouillon, and
Herv´
e Spechbach. 2020. Pee or urinate? a
corpus-based analysis of medical communication
for context-specific responses. Corpus exploration
of lexis and genres in translation. Routledge, Taylor
& Francis Group.
Anissa Hamza. 2019. La d´
etection et la traduction
automatiques de l’ellipse : enjeux th´
eoriques et
pratiques. Ph.D. thesis, Universit´
e de Strasbourg
STRASBOURG.
Antony A. Janakiram, Johanna Gerlach, Alyssa
Vuadens-Lehmann, Pierrette Bouillon, and Herv´
e
Spechbach. 2020. User Satisfaction with a Speech-
Enabled Translator in Emergency Settings, Digi-
tal Personalized Health and Medicine, pages 1421–
1422. IOS. ID: unige:139233.
Jonathan Mutal, Pierrette Bouillon, Johanna Gerlach,
Paula Estrella, and Herv´
e Spechbach. 2019. Mono-
lingual backtranslation in a medical speech trans-
lation system for diagnostic interviews - a NMT
approach. In Proceedings of Machine Translation
Summit XVII Volume 2: Translator, Project and User
Tracks, pages 196–203, Dublin, Ireland. European
Association for Machine Translation.
Jonathan Mutal, Johanna Gerlach, Pierrette Bouillon,
and Herv´
e Spechbach. 2020. Ellipsis translation
for a medical speech to speech translation system.
In Proceedings of the 22nd Annual Conference of
the European Association for Machine Translation,
pages 281–290, Lisboa, Portugal. European Associ-
ation for Machine Translation.
Magali Norr´
e, Pierrette Bouillon, Johanna Gerlach,
and Herv´
e Spechbach. 2020. ´
Evaluation de la
compr´
ehension de pictogrammes Arasaac et Sclera
pour am´
eliorer l’accessibilit´
e du syst`
eme de traduc-
tion m´
edicale BabelDr, Handicap 2020 : technolo-
gies pour l’autonomie et l’inclusion, pages 179–182.
ID: unige:144565; 11e conf´
erence de l’IFRATH sur
les technologies d’assistance.
Anita Panayiotou, Anastasia Gardner, Sue Williams,
Emiliano Zucchi, Monita Mascitti-Meuter,
Anita MY Goh, Emily You, Terence WH Chong,
Dina Logiudice, Xiaoping Lin, Betty Haralambous,
and Frances Batchelor. 2019. Language translation
apps in health care settings: Expert opinion.JMIR
Mhealth Uhealth, 7(4):e11316.
Manny Rayner, Nikos Tsourakis, and Johanna Ger-
lach. 2017. Lightweight spoken utterance classifica-
tion with cfg, tf-idf and dynamic programming. In:
Camelin N., Est`
eve Y., Mart´
ın-Vide C. (eds) Statisti-
cal Language and Speech Processing. SLSP 2017.
142
Mark Seligman and Mike Dillinger. 2013. Automatic
speech translation for healthcare: Some internet and
interface aspects. In Proceedings of 10th Interna-
tional Conference on Terminology and Artificial In-
telligence (TIA-13), Paris, France.
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike
Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng
Chen, Yu Zhang, Yuxuan Wang, R. J. Skerry-Ryan,
Rif A. Saurous, Yannis Agiomyrgiannakis, and
Yonghui Wu. 2017. Natural TTS synthesis by con-
ditioning wavenet on mel spectrogram predictions.
CoRR, abs/1712.05884.
Herv´
e Spechbach, Johanna Gerlach, Sanae Ma-
zouri Karker, Nikos Tsourakis, Christophe Combes-
cure, and Pierrette Bouillon. 2019. A speech-
enabled fixed-phrase translator for emergency set-
tings: Crossover study.JMIR Med Inform,
7(2):e13167.
Ludovic Tanguy, C´
ecile Fabre, Lydia-Mai Ho-Dac,
and Josette Rebeyrolle. 2011. Caract´
erisation des
´
echanges entre patients et m´
edecins : approche out-
ill´
ee d’un corpus de consultations m´
edicales. Cor-
pus, 10 |2011, pages 137–154.
Nikos Tsourakis, Rovena Troqe, Johanna Gerlach, Pier-
rette Bouillon, and Herv´
e Spechbach. 2020. An al-
banian text-to-speech system for the babeldr medi-
cal speech translator. In Digital Personalized Health
and Medicine - Proceedings of MIE 2020, Medical
Informatics Europe, Geneva, Switzerland, April 28 -
May 1, 2020, volume 270 of Studies in Health Tech-
nology and Informatics, pages 527–531. IOS Press.
Anne M Turner, Yong K Choi, Kristin Dew, Ming-
Tse Tsai, Alyssa L Bosold, Shuyang Wu, Donahue
Smith, and Hendrika Meischke. 2019. Evaluating
the usefulness of translation technologies for emer-
gency response communication: A scenario-based
study.JMIR Public Health Surveill, 5(1):e11171.
... Pour cette étude, nous avons sélectionné 20 phrases extraites de l'échelle de tri des urgences hospitalières et provenant du système BabelDr [15]. Nous n'avons retenu que des phrases dont les traductions automatiques en pictogrammes Arasaac étaient correctes, afin d'éviter la post-édition. ...
Conference Paper
Full-text available
Cette étude a pour but de déterminer les facteurs qui influencent la compréhensibilité par des personnes avec une Déficience Intellectuelle (DI) d'interactions médicales traduites automatiquement en pictogrammes via deux systèmes : Text-to-Picto et PictoDr. Nous présentons une méthodologie d'évaluation originale. Les personnes avec une DI avaient pour tâches de dire ce qu'ils comprenaient des phrases et des pictogrammes évalués. Nous montrons que plusieurs facteurs influencent la compréhensibilité des - phrases traduites en - pictogrammes. Bien que ces personnes aient souvent des difficultés à communiquer avec un médecin, notre étude montre aussi que recourir uniquement à des phrases en pictogrammes n'est pas une solution pour toutes les personnes avec une DI. Les pictogrammes restent toutefois un moyen utile pour améliorer la communication, notamment dans une approche multimodale.
... Speech-to-text Translation (ST) is a technology that converts speech input in one language into text output in another language [1,2,3,4]. This technology has numerous real-world applications, such as simultaneous translation for international conferences [4,5,6], enhancing accessibility [7,8] and communication across language barriers [9,10]. Additionally, ST can be integrated with Text-to-Speech Synthesis (TTS) [11,12,13] to form a core component of cascaded Speech-to-Speech Translation systems [14,15], which are particularly useful in video dubbing applications [16,17]. ...
Preprint
Full-text available
End-to-end speech translation (ST), which translates source language speech directly into target language text, has garnered significant attention in recent years. Many ST applications require strict length control to ensure that the translation duration matches the length of the source audio, including both speech and pause segments. Previous methods often controlled the number of words or characters generated by the Machine Translation model to approximate the source sentence's length without considering the isochrony of pauses and speech segments, as duration can vary between languages. To address this, we present improvements to the duration alignment component of our sequence-to-sequence ST model. Our method controls translation length by predicting the duration of speech and pauses in conjunction with the translation process. This is achieved by providing timing information to the decoder, ensuring it tracks the remaining duration for speech and pauses while generating the translation. The evaluation on the Zh-En test set of CoVoST 2, demonstrates that the proposed Isochrony-Controlled ST achieves 0.92 speech overlap and 8.9 BLEU, which has only a 1.4 BLEU drop compared to the ST baseline.
... Lastly, the literature also describes that these technologies enable early diagnoses by detecting biomarkers in voice, potentially making them a tool to support clinical decision-making. (8,(11)(12)(13)(14)(15) Although the literature describes various advantages of applying these technologies in the healthcare area (enhancing efficiency, accuracy, reducing costs and overall patient care within the healthcare sector) (8,16), it is essential to understand the perceptions of healthcare professionals about implementing them, not only to do it successfully but also to make these systems truly useful and improve the quality of care. (17) According to Table 1, for healthcare providers, SRTs save significant money on administrative costs, boost productivity, lower errors, and shorten editing times-all of which save professionals' time. ...
Preprint
Full-text available
BACKGROUND The healthcare system faces challenges regarding responsiveness, quality, and safety, as costs and administrative burdens rise. Speech recognition technologies (SRT) have emerged as a solution to solve these problems, as speech can convey detailed information, potentially improving patient engagement and clinical priorities. However, understanding the perspectives of key users is critical for successfully implementing this technological solution. OBJECTIVE This study aimed to explore healthcare professionals' perspectives in Portugal regarding the significance of speech recognition technologies in enhancing healthcare. METHODS The study used a qualitative approach with a semi-structured interview survey, recruiting eleven participants from different regions of Portugal and various healthcare areas. Data was collected from November 2023 to February 2024, and thematic analysis was conducted using MAXQDA software, allowing participants to express their experiences and opinions. RESULTS The interviews revealed four themes: applications, advantages, disadvantages, and implementation recommendations. 73% of participants had never used these technologies in clinical practice. Most potential applications involve report generation and medical procedures, with surgery being the most common. 82% believed these technologies would increase time and quality of care, while 55% feared they would increase costs. CONCLUSIONS This study analyzes healthcare professionals' perspectives on speech recognition technologies in Portugal. It offers practical recommendations to stakeholders who wish to incorporate these technologies into the healthcare system. We suggest conducting more investigation and implementing the changes gradually, considering the unique opportunities and challenges in Portuguese healthcare.
Chapter
“Language Technologies (LTs) study and develop the means by which computer programs or data processing devices can analyze, produce, modify or respond to texts and human speech” (EU Host Paper 2019: 1). It is a broad field that includes various natural language processing (NLP) methods such as machine translation (MT) and speech technologies, as well as multilingual content management. LTs and computational linguistics have been steadily gaining popularity over the past 25 years, becoming both an exciting area of scientific research and practical technology that is increasingly being incorporated into consumer products (Hirschberg/Manning 2015).
Chapter
Digital health translation is an important application of machine translation and multilingual technologies, and there is a growing need for accessibility in digital health translation design for disadvantaged communities. This book addresses that need by highlighting state-of-the-art research on the design and evaluation of assistive translation tools, along with systems to facilitate cross-cultural and cross-lingual communications in health and medical settings. Using case studies as examples, the principles of designing assistive health communication tools are illustrated. These are (1) detectability of errors to boost user confidence by health professionals; (2) customizability for health and medical domains; (3) inclusivity of translation modalities to serve people with disabilities; and (4) equality of accessibility standards for localised multilingual websites of health contents. This book will appeal to readers from natural language processing, computer science, linguistics, translation studies, public health, media, and communication studies. This title is available as open access on Cambridge Core.
Conference Paper
Full-text available
Communication between physician and patients can lead to misunderstandings, especially for disabled people. An automatic system that translates natural language into a pictographic language is one of the solutions that could help to overcome this issue. In this preliminary study, we present the French version of a translation system using the Arasaac pictographs and we investigate the strategies used by speech therapists to translate into pictographs. We also evaluate the medical coverage of this tool for translating physician questions and patient instructions.
Article
Full-text available
In medical emergency situations, the language barrier is often a problem for healthcare quality. To face this situation, we developed BabelDr, an innovative and reliable fixed phrase speech-enabled translator specialised for medical language. Majority of participants (>85%) showed a positive satisfaction level using BabelDr.
Article
Full-text available
Background: In the context of the current refugee crisis, emergency services often have to deal with patients who have no language in common with the staff. As interpreters are not always available, especially in emergency settings, medical personnel rely on alternative solutions such as machine translation, which raises reliability and data confidentiality issues, or medical fixed-phrase translators, which sometimes lack usability. A collaboration between Geneva University Hospitals and Geneva University led to the development of BabelDr, a new type of speech-enabled fixed-phrase translator. Similar to other fixed-phrase translators (such as Medibabble or UniversalDoctor), it relies on a predefined list of pretranslated sentences, but instead of searching for sentences in this list, doctors can freely ask questions. Objective: This study aimed to assess if a translation tool, such as BabelDr, can be used by doctors to perform diagnostic interviews under emergency conditions and to reach a correct diagnosis. In addition, we aimed to observe how doctors interact with the system using text and speech and to investigate if speech is a useful modality in this context. Methods: We conducted a crossover study in December 2017 at Geneva University Hospitals with 12 French-speaking doctors (6 doctors working at the outpatient emergency service and 6 general practitioners who also regularly work in this service). They were asked to use the BabelDr tool to diagnose two standardized Arabic-speaking patients (one male and one female). The patients received a priori list of symptoms for the condition they presented with and were instructed to provide a negative or noncommittal answer for all other symptoms during the diagnostic interview. The male patient was standardized for nephritic colic and the female, for cystitis. Doctors used BabelDr as the only means of communication with the patient and were asked to make their diagnosis at the end of the dialogue. The doctors also completed a satisfaction questionnaire. Results: All doctors were able to reach the correct diagnosis based on the information collected using BabelDr. They all agreed that the system helped them reach a conclusion, even if one-half felt constrained by the tool and some considered that they could not ask enough questions to reach a diagnosis. Overall, participants used more speech than text, thus confirming that speech is an important functionality in this type of tool. There was a negative association (P=.02) between the percentage of successful speech interactions (spoken sentences sent for translation) and the number of translated text items, showing that the doctors used more text when they had no success with speech. Conclusions: In emergency settings, when no interpreter is available, speech-enabled fixed-phrase translators can be a good alternative to reliably collect information from the patient.
Conference Paper
Full-text available
We describe a simple spoken utterance classification method suitable for data-sparse domains which can be approximately described by CFG grammars. The central idea is to perform robust matching of CFG rules against output from a large-vocabulary recogniser, using a dynamic programming method which optimises the tf-idf score of the matched grammar string. We present results of experiments carried out on a substantial CFG-based medical speech translator and the publicly available Spoken CALL Shared Task. Robust utterance classification using the tf-idf method strongly outperforms plain CFG-based recognition for both domains. When comparing with Naive Bayes classifiers trained on data sampled from the CFG grammars, the tf-idf/dynamic programming method is much better on the complex speech translation domain, but worse on the simple Spoken CALL Shared Task domain.
Conference Paper
Full-text available
We describe Converser for Healthcare, Version 4.0, a real-time, multi-modal, broad-coverage, highly interactive translation system. Version 3.0 was success-fully tested in three departments of a large hospital complex belonging to a major US healthcare organization. Based on lessons learned, some implications of online use are discussed, along with selected interface issues.
Article
Full-text available
About 19 million people in the United States are limited in English proficiency, but little is known about the frequency and potential clinical consequences of errors in medical interpretation. To determine the frequency, categories, and potential clinical consequences of errors in medical interpretation. During a 7-month period, we audiotaped and transcribed pediatric encounters in a hospital outpatient clinic in which a Spanish interpreter was used. For each transcript, we categorized each error in medical interpretation and determined whether errors had a potential clinical consequence. Thirteen encounters yielded 474 pages of transcripts. Professional hospital interpreters were present for 6 encounters; ad hoc interpreters included nurses, social workers, and an 11-year-old sibling. Three hundred ninety-six interpreter errors were noted, with a mean of 31 per encounter. The most common error type was omission (52%), followed by false fluency (16%), substitution (13%), editorialization (10%), and addition (8%). Sixty-three percent of all errors had potential clinical consequences, with a mean of 19 per encounter. Errors committed by ad hoc interpreters were significantly more likely to be errors of potential clinical consequence than those committed by hospital interpreters (77% vs 53%). Errors of clinical consequence included: 1) omitting questions about drug allergies; 2) omitting instructions on the dose, frequency, and duration of antibiotics and rehydration fluids; 3) adding that hydrocortisone cream must be applied to the entire body, instead of only to facial rash; 4) instructing a mother not to answer personal questions; 5) omitting that a child was already swabbed for a stool culture; and 6) instructing a mother to put amoxicillin in both ears for treatment of otitis media. Errors in medical interpretation are common, averaging 31 per clinical encounter, and omissions are the most frequent type. Most errors have potential clinical consequences, and those committed by ad hoc interpreters are significantly more likely to have potential clinical consequences than those committed by hospital interpreters. Because errors by ad hoc interpreters are more likely to have potential clinical consequences, third-party reimbursement for trained interpreter services should be considered for patients with limited English proficiency.
Chapter
Translating and interpreting in crises is emotionally and cognitively demanding, with crisis communication in intercultural and multilingual disaster settings relying on a multitude of cross-cultural mediators and ever-emerging new technologies. This volume explores the challenges and demands involved in translating crises and the ways in which people, technologies and organisations look for effective, impactful solutions to the communicative problems. Problematising the major issues, but also providing solutions and recommendations, chapters reflect on and evaluate the role of translation and interpreting in crisis settings. Covering a diverse range of situations from across the globe, such as health emergencies, severe weather events, earthquakes, terrorist attacks, conflicts, and mass migration, this volume analyses practices and investigates the effectiveness of current approaches and communication strategies. The book considers perspectives, from interpreting specialists, educators, emergency doctors, healthcare professionals, psychologists, and members of key NGOs, to reflect the complex and multifaceted nature of crisis communication. Placing an emphasis on lessons learnt and innovative solutions, Translating Crises points the way towards more effective multilingual emergency communication in future crises.
Conference Paper
In diagnostic interviews, elliptical utterances allow doctors to question patients in a more efficient and economical way. However, literal translation of such incomplete utterances is rarely possible without affecting communication. Previous studies have focused on automatic ellipsis detection and resolution, but only few specifically address the problem of automatic translation of ellipsis. In this work, we evaluate four different approaches to translate ellipsis in medical dialogues in the context of the speech to speech translation system BabelDr. We also investigate the impact of training data, using an under-sampling method and data with elliptical utterances in context. Results show that the best model is able to translate 88% of elliptical utterances.
Article
In this paper we present work on creating and evaluating a Text-to-Speech system for the Albanian language to be used in the BabelDr medical speech translation system. Its quality was assessed by twelve native speakers who provided feedback on 60 prompts generated by the synthesizer and on 60 real human recordings across three dimensions, namely comprehensibility, naturalness and likeability. The results suggest that the newly created voice can be incorporated in the content creation pipeline of the BabelDr platform.
Thesis
Cette thèse a pour objet le traitement automatique du phénomène elliptique. À la croisée de plusieurs disciplines – linguistique théorique, linguistique de corpus, linguistique outillée et traductologie –, elle s’inscrit dans une démarche expérimentale en poursuivant deux objectifs essentiels. Il s’agit tout d’abord de vérifier la possibilité de détecter automatiquement le phénomène elliptique en anglais pour explorer ensuite les procédures facilitant sa traduction automatique de l’anglais vers le français. La détection automatique repose sur des analyses morphosyntaxiques qui paraissent suffisantes à la détection automatique de certaines catégories d’ellipse, puisqu’en décomposant le phénomène, elles permettent de l’identifier parmi d’autres. Un corpus parallèle et multi-genres, collecté et conçu pour répondre aux hypothèses de recherche, est utilisé. Afin d’élaborer des patrons de détection et exploiter le corpus, cette recherche utilise les outils CoreNLP développés à l’université de Stanford (USA) et met en lumière leurs limites lorsqu’ils sont confrontés à l’ellipse. Les résultats obtenus s’articulent autour du lien établi entre la détection et la traduction automatiques du phénomène elliptique, facteur déterminant dans la compréhension des erreurs de traduction générées lors de son traitement automatique.