ArticlePDF Available

A Pragmatic Assessment of Google Translate for Emergency Department Instructions

Authors:

Abstract and Figures

Background Because many hospitals have no mechanism for written translation, ED providers resort to the use of automated translation software, such as Google Translate (GT) for patient instructions. A recent study of discharge instructions in Spanish and Chinese suggested that accuracy rates of Google Translate (GT) were high. Study Objective To perform a pragmatic assessment of GT for the written translation of commonly used ED discharge instructions in seven commonly spoken languages. Methods A prospective assessment of the accuracy of GT for 20 commonly used ED discharge instruction phrases, as evaluated by a convenience sample of native speakers of seven commonly spoken languages (Spanish, Chinese, Vietnamese, Tagalog, Korean, Armenian, and Farsi). Translations were evaluated using a previously validated matrix for scoring machine translation, containing 5-point Likert scales for fluency, adequacy, meaning, and severity, in addition to a dichotomous assessment of retention of the overall meaning. Results Twenty volunteers evaluated 400 google translated discharge statements. Volunteers were 50% female and spoke Spanish (5), Armenian (2), Chinese (3), Tagalog (4), Korean (2), and Farsi (2). The overall meaning was retained for 82.5% (330/400) of the translations. Spanish had the highest accuracy rate (94%), followed by Tagalog (90%), Korean (82.5%), Chinese (81.7%), Farsi (67.5%), and Armenian (55%). Mean Likert scores (on a 5-point scale) were high for fluency (4.2), adequacy (4.4), meaning (4.3), and severity (4.3) but also varied. Conclusion GT for discharge instructions in the ED is inconsistent between languages and should not be relied on for patient instructions.
Content may be subject to copyright.
A Pragmatic Assessment of Google Translate for Emergency
Department Instructions
Breena R. Taira, MD, MPH
1
, Vanessa Kreger, MD, MPH
1
, Aristides Orue, NP
1
,and
Lisa C. Diamond, MD, MPH
2
1
Olive View-UCLA Medical Center, Sylmar, CA, USA;
2
Memorial Sloan Kettering Cancer Center, New York, NY, USA.
BACKGROUND: Because many hospitals have no mech-
anism for written translation, ED providers resort to the
use of automated translation software, such as Google
Translate (GT) for patient instructions. A recent study of
discharge instructions in Spanish and Chinese suggested
that accuracy rates of Google Translate (GT) were high.
STUDY OBJECTIVE: To perform a pragmatic assessment
of GT for the written translation of commonly used ED
discharge instructions in seven commonly spoken
languages.
METHODS: A prospective assessment of the accuracy of
GT for 20 commonly used ED discharge instruction
phrases, as evaluated by a convenience sample of native
speakers of seven commonly spoken languages (Spanish,
Chinese, Vietnamese, Tagalog, Korean, Armenian, and
Farsi). Translations were evaluated using a previously
validated matrix for scoring machine translation, contain-
ing 5-point Likert scales for fluency, adequacy, meaning,
and severity, in addition to a dichotomous assessment of
retention of the overall meaning.
RESULTS: Twenty volunteers evaluated 400 google
translated discharge statements. Volunteers were 50%
female and spoke Spanish (5), Armenian (2), Chinese (3),
Tagalog (4), Korean (2), and Farsi (2). The overall meaning
was retained for 82.5% (330/400) of the translations.
Spanish had the highest accuracy rate (94%), followed
by Tagalog (90%), Korean (82.5%), Chinese (81.7%), Farsi
(67.5%), and Armenian (55%). Mean Likert scores (on a 5-
point scale) were high for fluency (4.2), adequacy (4.4),
meaning (4.3), and severity (4.3) but also varied.
CONCLUSION: GT for discharge instructions in the ED is
inconsistent between languages and should not be relied
on for patient instructions.
KEY WORDS: communication barriers; translation; machine translation;
language services.
J Gen Intern Med
DOI: 10.1007/s11606-021-06666-z
© The Author(s) 2021
INTRODUCTION
Patients with limited English proficiency (LEP) have low rates
of understanding of appointment type and medications
1
,
higher rates of medication errors
2
, and unplanned return visits
to an emergency department.
3
The discharge process is a
particularly important point in terms of patientprovider com-
munication. Written discharge instructions contain critical
information about the patients diagnosis, treatment plan, and
follow-up.
Whereas most hospitals in the USA have access to spoken
language assistance via phone interpreters, a gap exists in the
capacity for written translation.
4
Many electronic health re-
cords (EHRs) have pre-written patient education sheets for
specific diagnoses such as Upper Respiratory Infectionin a
variety of languages and providers can easily use these to
provide written materials in the patients preferred language.
The challenge, however, is when the provider must convey
patient specific instructions such as Come to the ophthalmol-
ogy clinic at 8 am on Thursday and bring your records from
the outside hospital.Frequently, there is no mechanism for
requesting written translations in the acute setting. While the
optimal response in this situation is to write the patients
discharge instructions in English and have the instructions
verbally interpreted to the patient using a certified health care
interpreter, many providers resort to the use of machine trans-
lation for efficiency. Google Translate is an increasingly pop-
ular option for written translation
5,6
and, in some hospital
systems, has become the go-to source of written translations,
especially for patient-specific discharge instructions given to
LEP patients. In a frequently cited study, Patil and Davies
found that Google Translate was only 57% accurate and
concluded that it could not be trusted for the translation of
medical phrases.
7
This 2014 study, however, was completed
prior to an improvement in the Google Translate algorithm
8
and the phrases chosen for evaluation were in British English
rather than the English used in the USA and thus may repre-
sent a mis-estimation of the accuracy of Google Translate for
discharge instructions in US hospitals. Conversely, in an ab-
stract published in 2010, Khanna et al. found that Google
Translate was relatively accurate for patient education, but
they assessed only Spanish.
9
Recently, Khoong et al. studied
the use of Google Translate (GT) for ED discharge instruc-
tions in Spanish and Chinese and concluded that GT had high
This study was presented in part at the ACEP Research Forum in October
2019.
Received September 11, 2020
Accepted February 14, 2021
accuracy and GT translations can supplement but not replace
written English instructions and should include a warning
about potentially inaccurate instructions.
10
This study, how-
ever, only assessed Spanish and Chinesetwo of the most
common languages spokenand used professional translators
to evaluate the translations. Because Google Translate im-
proves its algorithms from user feedback, it would be expected
to perform differently for more common languages compared
to languages with fewer speakers. In addition, the understand-
ing of a professional interpreter who is trained in the nuances
of both languages may not be representativeof the understand-
ing of the average community member who presents to the ED
for care.
The primary objective of this study was to perform a prag-
matic assessment of the accuracy of GT for the written trans-
lation of commonly used ED discharge instructions given to
patients in each of the most common languages spoken by
LEP patients as assessed by bilingual community members.
The secondary objective is to compare the performance of GT
between languages.
METHODS
Study Design
We selected frequently used instructions written when
discharging a patient from an emergency department visit that
convey critical information about the treatment or follow-up
plan. We constructed a list of candidate statements that reflect
statements most often used in free-form written patient instruc-
tions in our ED. The candidate statements were then reviewed
by a group of practicing ED clinicians (MD, NP, and RN) not
involved in the study. The group was asked to comment and,
based on the responses, a final group of 20 ED discharge
instructions were chosen. The five most frequently spoken
languages in Los Angeles County were extracted for the study
(Spanish, Chinese (including Mandarin and Cantonese), Ta-
galog (including Filipino), Vietnamese, and Korean). Arme-
nian and Farsi are very common in our ED and were added not
only because of the direct utility of the data in our setting, but
also to compare the accuracy of GT for these languages of
lesser diffusion. Each of the 20 discharge instruction state-
ments was then translated using GT into all 7 of the target
languages.
Subjects
Volunteer native speakers of each of the target languages were
identified. Volunteers were included if they were native
speakers of one of the target languages (not heritage speakers),
currently fluent in English, and could read both languages.
Participants were excluded if they worked in any aspect of
health care or were a professional interpreter or linguist to
assure a pragmatic assessment of these instructions in
community members. IRB approval was obtained before the
initiation of the research.
Measures
Basic demographics of the participants included gender, years
in the USA, self-reported ability to understand English and
self-reported ability to understand the target language. In
addition, we asked each volunteer to complete a 4-question
acculturation scale.
11
Although this scale has been validated
for study participants of Hispanic origin, it has similar prop-
erties to that of validated tools for other groups.
12
Outcomes
Participants received a worksheet with each of the Google
Translated instruction statements in their native language and
were asked to verbally explain to the research team member
the meaning of each of the statements in English. The primary
outcomes were whether the intent of the statement was
retained (yes/no). The bilingual volunteer then used the ma-
chine translation scoring rubric to evaluate each statement.
Volunteers were given standardized instructions and oriented
to the rubric. This rubric contains a 5-point Likert scale for
fluency, adequacy, meaning, and severity and is standard for
rating machine translation.
13
The volunteers gave their rating
on fluency, adequacy, and meaning and the research team
member (an MD or NP) chose the clinical severity based on
the explanation given by the volunteer.
Analysis
Descriptive statistics (proportions with 95% confidence inter-
vals) were used for the accuracy rate of Google Translate
overall for simple discharge instructions (statements in which
the meaning was retained/total statements) and for each lan-
guage. Scores for each of the rubric categories were reported
using means.
RESULTS
Between March 5, 2019, and Feb 6, 2020, we recruited a total
of twenty participants who evaluated twenty discharge instruc-
tions each for a total of 400 discharge instructions examined.
There were an equal number of male and female volunteers.
They spoke Spanish (5), Armenian (2), Chinese (3), Tagalog
(4), Vietnamese (2), Korean (2), and Farsi (2). Their mean
years living in the USA was 23 (range 347). All self-reported
that they spoke English well (4/20) or very well (16/20) and
the target language very well (18/20) or well (2/20). Rates of
acculturation were high (see Table 1). Mean scores for fluency
adequacy, meaning, and severity were high, ranging from 4.2
to 4.4 on a 5-point Likert scale but varied by language (see
Table 2). Overall, GT accurately conveyed the meaning of
330/400 (82.5%) instructions examined but the accuracy var-
ied by language from 55 to 94%. Some of the translation errors
Taira et al.: Google Translate in the ED JGIM
reported by the volunteers made the GT translations non-
sensical (see Table 3for illustrative examples).
DISCUSSION
As the practice of using GT for medical communication be-
comes more widespread, it is crucial that we understand its
accuracy and limitations in the medical setting. Khoong et al.
studied the use of GT for ED discharge instructions in Spanish
and Chinese. They had professional translators rate the trans-
lations for accuracy and potential harm. They reported 8%
inaccuracies in Spanish and 19% in Chinese translations and
potential harm in 2% of Spanish discharge instructions and 8%
of Chinese. The authors concluded that GT had high accuracy
and GT translations can supplement but not replace written
English instructions and should include a warning about po-
tentially inaccurate instructions.
10
Our accuracy rates for these
two languages as assessed by volunteers from the community
were almost identical (Spanish 6% inaccuracies and Chinese
18%) to those of professional translators. This is important
information for future work in this area as the difference
between patient perception of machine translations and a
professional translators perception has been an ongoing ques-
tion. While we, like Khoong et al., found the overall accuracy
of GT to be better than historically reported, this did not hold
true for all languages. Alarmingly, Armenian and Farsi, which
are commonly spoken in our community, had accuracy rates
of 55 and 67.5% respectively.
Beyond the variability in the accuracy rates, we also found
several issues related to GT use that may not be appreciated by
clinicians with limited knowledge of the target languages. For
instance, when we first created our GT worksheets inFarsi, we
found that the directionality of the written language was not
accounted for by the software, i.e., that Farsi is written right to
left. When we presented the Farsi GT worksheet to the initial
volunteer, it was transposed to left to right by GT and was
illegible. If these were real discharge instructions, they would
Table 1 Participant Demographics
Native language Spanish 5 (25%)
Armenian 2(10%)
Chinese 3 (15%)
Tagalog 4 (20%)
Vietnamese 2 (10%)
Korean 2 (10%)
Farsi 2 (10%)
Female 10/20 (50%)
Mean years in the USA 23.7 years (range 3-47)
English proficiency Very well 16/20 (80%)
Well 4/20 (20%)
Target language proficiency Very well 18/20 (90%)
Well 2/20 (10%)
Acculturation scale
Read and speak English better than native 3 (15%)
Both equally 14 (70%)
Native better than English 3 (15%)
Speak at home More English than native language 3 (15%)
Both equally 9 (45%)
More native language than English 3 (15%)
Only native language 5 (25%)
Think More English than native 7 (35%)
Both equally 7 (35%)
More native than English 3 (15%)
Only target 3 (15%)
Speak with friends Only English 2 (10%)
More English than native 2 (10%)
Both equally 13 (65%)
More native than English 3 (15%)
Table 2 Mean Fluency, Adequacy, Meaning, Severity on 5-Point Likert Scales, and Overall Accuracy (# Accurate Statement/# Statements
Evaluated) by Language
Language (# participants) Fluency Adequacy Meaning Severity # Accurate statements/#
statements evaluated
Accuracy rate 95%CIs
Spanish (5) 4.8 4.8 4.8 4.8 94/100 94% 87.497.7
Armenian (2) 3.7 3.6 3.3 3.4 22/40 55% 38.470.7
Chinese (3) 4.1 4.5 4.1 4.1 49/60 81.7% 69.690.5
Tagalog (4) 4.4 4.7 4.6 4.7 54/60 90% 79.596.2
Vietnamese (2) 4.1 4.5 4.4 4.3 31/40 77.5% 61.689.2
Korean (2) 3.8 4.2 4.2 4.4 33/40 82.5% 67.292.7
Farsi (2) 3.1 3.7 3.6 3.7 27/40 67.5% 50.981.4
ALL 4.2 4.4 4.3 4.3 330/400 82.5% 78.486.1
Taira et al.: Google Translate in the EDJGIM
be unreadable to the patient. Furthermore, volunteers men-
tioned potential issues with traditional versus modern Chinese
writing systems and Persian versus Afghan versus Tajiki Farsi.
It is easy to imagine a well-meaning provider Google
Translate-ing instructions into one of these languages without
awareness of these potential issues and potentially causing
harm.
The important implication of our study is that, despite
recent reports of improvement in accuracy and the suggestion
that GT has a role for use in the clinical setting, we found that
GT accuracy varies substantially by language and is not yet a
reliable tool in the clinical setting. Even for languages in which
the accuracy is high, there is still the potential for important
inaccuracies and the potential for patient harm. The best
practice remains to use prewritten, professionally translated
discharge instructions in the patients native language for
general information about a diagnosis when such handouts
are available in the electronic health record. For patient-
specific instructions, clinicians should hand the patient a copy
of their discharge instructions in English and use an interpreter
to have the instructions verbally interpreted to the patient.
While the interpreter is on the line, use a teach-back to be sure
the patient understands the information.
LIMITATIONS
Our study is limited in that it may overestimate accuracy rates
as the participants had lived in the USA for long periods of
time and had high levels of acculturation and may not be
representative of the understanding of recent immigrants
who have the added barrier of lack of familiarity with our
health system. All of our volunteers were literate in both
English and the target language and we did not formally assess
health literacy. This may also cause an overestimation of the
accuracy levels that would be reported by participants with
limited literacy. Similarly, we used bilingual participants
whose language abilities may not accurately represent the
understanding of patients who are monolingual in a language
other than English. GT also uses an artificial intelligence
algorithm that is always changing. It is possible that further
improvements have been made since the time of this study.
CONCLUSIONS
Accuracy rates of translations by GT for ED discharge instruc-
tions vary by language. Although the future of written trans-
lation in hospitals is likely machine translation, GT is not
ready for prime time use in the emergency department.
Corresponding Author: Breena R. Taira, MD, MPH; Olive View-
UCLA Medical Center, Sylmar, CA, USA (e-mail: btaira@ucla.edu).
Author Contribution Concept and design: BT, LD
Logistics, recruitment, data acquisition, and analysis: BT, VK, AO
Writing: BT
Editing: VK, AO, LD
Declarations:
Conflict of Interest: The authors have no conflicts of interest to report.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format,
as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this
article are included in the article's Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the article's Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.
org/licenses/by/4.0/.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format,
as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this
article are included in the article's Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the article's Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.
org/licenses/by/4.0/.
REFERENCES
1. Karliner LS, Auerbach A, Napoles A, Schillinger D, Nickleach D,
Perez-Stable EJ. Language barriers and understanding of hospital
discharge instructions. Med Care 2012;50:283-9.
2. Samuels-Kalow ME, Stack AM, Porter SC. Parental language and
dosing errors after discharge from the pediatric emergency department.
Pediatric emergency care 2013;29:982-7.
3. Ngai KM, Grudzen CR, Lee R, Tong VY, Richardson LD, Fernandez A.
The Association Between Limited English Proficiency and Unplanned
Emergency Department Revisit Within 72 Hours. Annals of emergency
medicine 2016;68:213-21.
4. Regenstein M, Andres E. Hospital language service programs: a closer
look at translation practices. Journal of health care for the poor and
underserved 2014;25:2003-18.
5. Wade RG. Try Google Translate to overcome language barriers. Bmj
2011;343:d7217.
6. Randhawa G, Ferreyra M, Ahmed R, Ezzat O, Pottie K. Using machine
translation in clinical practice. Can Fam Physician 2013;59:382-3.
7. Patil S, Davies P. Use of Google Translate in medical communication:
evaluation of accuracy. Bmj 2014;349:g7392.
8. Castelvecchi D. Deep learning boosts Google Translate tool. Nature
2016.
Table 3 Selected Examples of Gross Errors in Translation
English statement Translation
You can take over the counter
ibuprofen as needed for pain.
Armenian: You may take anti-tank
missile as much as you need for
pain.
Your Coumadin level was too
high today. Do not take any
more Coumadin until your
doctor reviews the results.
Chinese: Your soybean level was
too high today. Do not take
anymore soybean until your doctor
reviews the results.
Do not blow your nose or put
pressure on your facial fracture.
Chinese: The character chosen for
blow is more commonly used in
relation to the wind blowing
Farsi: Do not explode your nose
because it could put pressure on the
break in your face.
Taira et al.: Google Translate in the ED JGIM
9. Khanna R, Eck M, Koenig C, Karliner L, Fang M. Accuracy of Google
Translate for Medical Education Material. Journal of Hospital Medicine
2010;5.
10. Khoong EC, Steinbrook E, Brown C, Fernandez A. Assessing the Use of
Google Translate for Spanish and Chinese Translations of Emergency
Department Discharge Instructions. JAMA Intern Med 2019.
11. Ellison J, Jandorf L, Duhamel K. Assessment of the Short Acculturation
Scale for Hispanics (SASH) among low-income, immigrant Hispanics. J
Cancer Educ 2011;26:478-83.
12. Dela Cruz FA, Yu CH, Vindua KI. The factor structure of a short
acculturation scale for Filipino Americans in an adult U.S.-born sample.
J Community Psychol 2018;46:535-50.
13. Chen X, Acosta S, Barry AE. Evaluating the Accuracy of Google
Translate for Diabetes Education Material. JMIR Diabetes 2016;1.
PublishersNote:Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Taira et al.: Google Translate in the EDJGIM
... Phrase-based communication technologies are well placed to advance research into personalization and the lived experience of caring and living with communication difficulty. Instead of generic technology such as translation AI tools like Google Translate, which still bring risks of inaccuracy, 18 phrase-based apps can contain phrases targeted to the individual and their care context. This can include consideration of the individual's language repertoire, the carer's language repertoire, and communication needs in the specific care environment. ...
... While the accuracy of the translations provided by Google Translate have improved over time, they are still inconsistent across different languages. 18,19 This means that non-speakers cannot be sure of the accuracy of the translation output even for widely spoken languages. Greater accuracy is required before translation tools such as Google Translate can be used safely as a communication tool in aged care settings. ...
... 1 Despite the flexibility offered by Google Translate, fixed-phrase translation tools, particularly those with audiovisual features such as iTranslate, were preferred by nursing staff and older people in a health care context. 18 Like iTranslate, Listen and Talk allows a corpus of verified phrases to be developed. However, unlike other fixed-phrase translators, these can be targeted for particular contexts and individual residents. ...
Article
Full-text available
Background Rich communication between staff and residents in aged care settings is essential. Digital communication devices used to support communication in aged care settings are often not well targeted to individual needs and contexts. In this pilot study, we investigate the adaptation of a phrase-based language learning app, to support communication between carestaff and residents in a residential aged care setting in Western Sydney, Australia. Methods An interdisciplinary team of researchers and aged care professionals worked with three aged care residents for whom English was not their first language, to co-design and trial a prototype digital language resource. Insights from carestaff members into communication issues they face in their roles were documented through focus group sessions. A database of phrases was developed and then loaded into the Listen N Talk app. Carestaff trialed the resource with the residents for six weeks. Feedback from carestaff was gathered through semi-structured interviews. Results Based on responses of carestaff to an initial focus group, the language content of the prototype was focused on the context of daily care. The residents who participated in the study were long-term residents already familiar with the daily routine of the facility and staff had already established communication strategies regarding residents’ needs or preferences. Three contexts were identified in which an app of this kind could be useful to facilitate communication: in a medical emergency, as a tool to strengthen English language skills of residents and staff and in the transition to residential aged care to support the development of routines with the new resident. Conclusion This study identified three contexts in which a phrase-based app can facilitate communication with culturally and linguistically diverse residents and carestaff. Feedback suggested avenues for further development such as the use of more open-ended translation together with sets of personalized phrases.
... A significant portion of the U.S. population speaks languages other than English, with top languages being Spanish/Spanish Creole, Chinese, Tagalog, Vietnamese, and Arabic [1]. However, limited access to linguistically appropriate healthcare information significantly contributes to healthcare inequities [2,3]. For the approximately 25 million individuals in the United States who speak English less than "very well," linguistic barriers corelate with worse comprehension, adherence, clinical outcomes, and higher healthcare costs [3][4][5]. ...
... However, limited access to linguistically appropriate healthcare information significantly contributes to healthcare inequities [2,3]. For the approximately 25 million individuals in the United States who speak English less than "very well," linguistic barriers corelate with worse comprehension, adherence, clinical outcomes, and higher healthcare costs [3][4][5]. ...
... While AI-based translation tools offer a promising solution due to rapid turnaround time, risks include potential inaccuracies, privacy concerns, and lack of understanding of cultural nuances [2,3,6]. In some cases, AI translations can perpetuate errors, especially for languages with limited representation in training data [3,6]. ...
... Aproximación al análisis de la calidad de la traducción automática desde la perspectiva de la evaluación ... calidad significativamente menor. Podemos ilustrar esta cuestión con los resultados obtenidos por Taira et al. (2021) a partir de la traducción generada por Google Translate de veinte instrucciones de alta hospitalaria frecuentemente empleadas en el servicio de urgencias. Veinte anotadores bilingües evaluaron, entre marzo de 2019 y febrero de 2020, traducciones automáticas del inglés a siete lenguas de uso común en los servicios de urgencias de Los Ángeles, California (EE. ...
... En primer lugar, porque la inmensa mayoría de estos trabajos analizan pares de lenguas con gran cantidad de datos o high-resource language pairs (Fomicheva et al. 2018) y, muy especialmente, traducciones del o al inglés en combinación con otra lengua mayoritaria como el francés, el alemán o el chino (Ranathunga 2023: 12). Por supuesto, también han salido a la luz estudios parcial o completamente dedicados a combinaciones entrenadas con un volumen de datos mucho menor, es decir, low-resource language pairs, como sería el caso, a modo de ejemplo, del inglés-farsi e inglés-armenio comentado en la sección 2 (Taira et al. 2021) o de las traducciones literarias en la combinación inglés-catalán (Toral y Way 2018). Sin embargo, existe una sorprendente (e incluso preocupante) escasez de trabajos de investigación sobre TQA en combinaciones no anglocéntricas. ...
Article
La generalización del uso de las aplicaciones gratuitas de traducción automática basadas en redes neuronales (NMT) demanda un mayor esfuerzo por parte de la comunidad científica por evaluar su calidad. En este artículo se presenta el estado de la cuestión, así como los resultados de un análisis piloto que trata de sacar a la luz el grado de satisfacción de los potenciales usuarios de estas traducciones en función de tres variables: fluidez, corrección gramatical y usabilidad. Con este fin, se realizó un experimento en el que veinte anotadores nativos de español evaluaron mediante una escala de valoración Likert las traducciones generadas por humanos profesionales y por las aplicaciones DeepL, Google Translate y ChatGPT de tres textos checos de diversa tipología (uno técnico, uno de marketing y uno literario). Los resultados muestran que, a pesar de que las traducciones humanas son las mejores valoradas, existe un elevado grado de satisfacción por parte de los usuarios respecto a las traducciones generadas por los sistemas NMT diseñados específicamente para este fin (DeepL y Google Translate) y, muy especialmente, en términos de fluidez y usabilidad.
... Technological tools can help overcome language barriers in healthcare contexts when professional interpreters are unavailable [11,18]. These tools and applications (apps) include online dictionaries, pre-translated fixedphrase apps, and machine translation (MT), such as Google Translate (GT) [18][19][20][21][22]. A 2021 study found that GT correctly conveyed the basic meaning of discharge instructions in 82% of cases, though accuracy varied by language, with errors leading to nonsensical translations in some instances [20]. ...
... These tools and applications (apps) include online dictionaries, pre-translated fixedphrase apps, and machine translation (MT), such as Google Translate (GT) [18][19][20][21][22]. A 2021 study found that GT correctly conveyed the basic meaning of discharge instructions in 82% of cases, though accuracy varied by language, with errors leading to nonsensical translations in some instances [20]. In 2023, GT's accuracy in translating mental health content was assessed, revealing challenges with medical terms and high error rates in Arabic, Persian, and Romanian, while Turkish showed fewer errors despite being a low-resource language [21]. ...
Article
Full-text available
Introduction In many healthcare contexts globally, where the languages of care providers and service users do not match, miscommunication can lead to inaccurate diagnoses and subpar treatment outcomes. The development and use of technological tools to overcome language barriers are increasing, but usability and evaluation of these tools vary widely. Objectives This scoping review’s objectives are (i) to identify and describe the technological tools used in direct service user–provider communication to overcome language barriers in a healthcare setting, (ii) to identify how the usability of these tools was evaluated, and (iii) to identify the challenges and benefits of using such technological tools. Methods and analysis The scoping review followed the JBI methodology. Studies published between January 2019 and July 2024 were identified using a search strategy with variations of the keywords “technological tools,” “language barrier,” and “health care” in the following six databases and research platforms: PubMed, PsycArticle, Scopus, EBSCOhost, ProQuest, and Web of Science. All literature on individuals using a technological tool to overcome language barriers in a healthcare context was included and exported into the screening assistant software Rayyan. The search was limited to articles written in German or English. The literature was screened twice by three independent reviewers in a blinded fashion, and all relevant data were presented in a descriptive summary. Results Based on 16 publications, this scoping review identified 16 technological tools, categorized as fixed-phrase or machine translation apps, to overcome language barriers in a healthcare setting. Usability was assessed in 13 publications applying diverse methods, i.e., surveys, observations, and application data analysis. Technological tools hold potential as a means to address language barriers in healthcare by facilitating communication and supporting diagnostic processes. However, their usability is often constrained by challenges related to translation accuracy, accessibility, and learnability. Conclusion Future research and policy efforts should focus on standardizing evaluation methods and diversifying development regionally, linguistically, and interdisciplinary. Rather than broadly promoting these tools, emphasis should be placed on ensuring they are reliable and efficient for their intended use to maximize their effectiveness and relevance in specific healthcare contexts. Supplementary Information The online version contains supplementary material available at 10.1186/s13690-025-01543-1.
... Ahmed Zaki Yamani (1930-2021 was the Minister of Petroleum and Mineral Resources in Saudi Arabia and represented the country in OPEC for 25 years. ...
Article
Full-text available
Machine Translation has seen rapid advancements due to progress in neural sciences and Large Language Models. While human translation remains highly competitive across many language pairs, especially those that are linguistically and culturally close like English and Spanish (Moslem et al., 2023), the situation between English and Arabic requires further exploration. This study investigates the performance of Neural Machine Translation (represented by Google Translate), Large Language Models (represented by ChatGPT), and human translation in translating English texts into Arabic across four genres: general, literary, scientific, and media. Human evaluations were used to measure translation quality based on accuracy, fluency, style, cultural fit, and terminology. The results indicate that human translation is the most accurate, especially in capturing cultural and contextual nuances. ChatGPT, when used with detailed prompts, often outperforms both Google Translate and ChatGPT with simple prompts, particularly in literary and media genres. On the contrary, Google Translate performs the worst overall, especially with scientific and general texts, due to issues like word confusion and cultural inaccuracies. These results offer practical insights for translation educators, professionals, and students on effectively integrating Machine Translation tools while appreciating the irreplaceable value of human translators.
... However, it is noteworthy that while Google Translate is very useful in our analyses, it can be inaccurate for some languages. One study showed that Google Translate had an 82.5% accuracy rate for Korean (Taira et al., 2021), indicating room for improvement. Another error arises from Japanese, with the terms "Prius" and "missile" co-occurring in many Japanese topics, which confused us. ...
Article
Full-text available
Journal of Social Media Research (JSOMER) is a multidisciplinary, blind peer-reviewed, open access, free of charge, international scientific academic journal published twice a year (June, December) focusing on the social, cultural, educational, psychological, economic, technological, and sociological dimensions of social media. JSOMER is an interdisciplinary journal with a broad scope that includes social sciences, humanities, arts, health, medicine, psychiatry, psychology, computational social sciences, artificial intelligence, and natural sciences, focusing on, or related to social media. We are pleased to publish current and innovative research articles, reviews and argumentative essays focusing on social media. Articles published in JSOMER are expected to raise issues related to social media in various fields, open discussions about these issues, and propose different methods to address these issues or solve related problems. It is also hoped that the papers published in JSOMER will provide a basis for current debates on various areas of social media and guide innovative research and practice. JSOMER welcomes a variety of theoretical paradigms and methodologies and considers this as a scientific enrichment. JSOMER aims to contribute to scientific accumulation by including original and qualified studies written by academic standards, copyrights, and ethical rules and to be among the first reference sources for those doing research in the field of social media. Researchers who want to publish their works in JSOMER are required to be aware that; • the research studies they submit are in any form of quantitative, qualitative, or mixed-method research; • meta-analyses, systematic reviews, literature analyses, meta-synthesis studies, book reviews, and brief reports can be sent to JSOMER for reviewing and publication. • JSOMER is published in English only (full text).
... Accuracy and GT translations can supplement but not replace written English instructions and should include a warning about potentially inaccurate instructions. (Taira et al., 2021). ...
Article
Full-text available
The author chose this topic because Google Translate is a tool commonly used by students, but it still has limitations in accuracy and understanding of context. Many students rely on this tool without adequate evaluation skills for the translation results and their academic quality. By understanding the existing challenges, this study can provide recommendations for students and teachers to improve the effectiveness of using Google Translate in translation learning. This study identifies and analyzes third-semester students' challenges in using Google Translate. The research method used is a qualitative approach with data collection techniques through interviews and questionnaires. The data obtained were analyzed using thematic analysis methods to find patterns and main obstacles faced by students. The results of the study show that students experience various challenges, such as translation inaccuracy, lack of understanding of context, and difficulty in translating technical and academic terms. In addition, dependence on Google Translate also has an impact on students' ability to develop translation skills independently. The contribution of this study in the field of translation is to provide insight into the limitations of Google Translate and suggest strategies to improve the effectiveness of its use in academic contexts. These findings can be the basis for developing a better curriculum for students to learn translation.
... One study showed that Google Translate had an 82.5% accuracy rate for Korean (Taira et al., 2021), indicating room for improvement. Another error arises from Japanese, with the terms "Prius" and "missile" co-occurring in many Japanese topics, which confused us. ...
Article
Full-text available
This study uses multiple languages to investigate the emergence of geopolitical topics on X / Twitter across two different time intervals: daily and hourly. For the daily interval, we examined the emergence of topics from February 4th, 2023, to March 23rd, 2023, at random three-hour intervals, compiling the topic modeling results for each day into a time series. For the hourly interval, we considered two days of data, June 1st, 2023, and June 6th, 2023, where we tracked the growth of topics for those days. We collected our data through the X / Twitter Filtered Stream using key bigrams (two-word phrases) for various geopolitical topics for multiple languages to identify emerging geopolitical events at the global and regional levels. Lastly, we compared the trends created by tracking emerging topics over time to Google Trends data, another data source for emerging topics. At the daily level, we found that our X / Twitter-based algorithm was able to identify multiple geopolitical events at least a day before they became relevant on Google Trends, and in the case of North Korean missile launches during this period, several languages identified more missile launches than the Google Trends data. As for the hourly data, we again found several topics that emerged hours before they started appearing on Google Trends. Our analyses also found that the different languages allowed for greater diversity in topics that would not have been possible if only one language had been used.
... For one, since lay users may have limited to no pro-ficiency in at least one of the involved languages, 3 they are more vulnerable to errors. Mistranslations can lead to discomfort, misunderstandings, and even life-threatening errors (Taira et al., 2021) and arrests (The Guardian, 2017). Besides, non-experts can have requirements and expectations of which little is known, and that cannot be directly informed by existing research on professionals, as shown in the context of LLMs (Szymanski et al., 2024). ...
Preprint
Full-text available
Converging societal and technical factors have transformed language technologies into user-facing applications employed across languages. Machine Translation (MT) has become a global tool, with cross-lingual services now also supported by dialogue systems powered by multilingual Large Language Models (LLMs). This accessibility has expanded MT's reach to a vast base of lay users, often with little to no expertise in the languages or the technology itself. Despite this, the understanding of MT consumed by this diverse group of users -- their needs, experiences, and interactions with these systems -- remains limited. This paper traces the shift in MT user profiles, focusing on non-expert users and how their engagement with these systems may change with LLMs. We identify three key factors -- usability, trust, and literacy -- that shape these interactions and must be addressed to align MT with user needs. By exploring these dimensions, we offer insights to guide future MT with a user-centered approach.
Article
Background The readability of most online patient educational materials (OPEMs) in orthopaedic surgery is above the American Medical Association/National Institutes of Health recommended reading level of sixth grade for both English- and Spanish-language content. The current project evaluates ChatGPT’s performance across English- and Spanish-language orthopaedic OPEMs when prompted to rewrite the material at a sixth-grade reading level. Methods We performed a cross-sectional study evaluating the readability of 57 English- and 56 Spanish-language publicly available OPEMs found by querying online in both English and Spanish for 6 common orthopaedic procedures. Five distinct, validated readability tests were used to score the OPEMs before and after ChatGPT 4.0 was prompted to rewrite the OPEMs at a sixth-grade reading level. We compared the averages of each readability test, the cumulative average reading grade level, average total word count, average number of complex words (defined as ≥3 syllables), and average number of long sentences (defined as >22 words) between original content and ChatGPT-rewritten content for both languages using paired t tests. Results The cumulative average reading grade level of original English- and Spanish-language OPEMs was 9.6 ± 2.6 and 9.5 ± 1.5, respectively. ChatGPT significantly lowered the reading grade level (improved comprehension) to 7.7 ± 1.9 (95% CI of difference, 1.68 to 2.15; p < 0.05) for English-language content and 8.3 ± 1.3 (95% CI, 1.17 to 1.45; p < 0.05) for Spanish-language content. English-language OPEMs saw a reduction of 2.0 ± 1.8 grade levels, whereas Spanish-language OPEMs saw a reduction of 1.5 ± 1.2 grade levels. Word count, use of complex words, and long sentences were also reduced significantly in both languages while still maintaining high accuracy and similarity compared with original content. Conclusions Our study supports the potential of artificial intelligence as a low-cost, accessible tool to assist health professionals in improving the readability of orthopaedic OPEMs in both English and Spanish. Clinical Relevance TK.
Article
Full-text available
The influx of non‐European immigrants since 1965 ushered the development and use of acculturation measures in immigrant health studies. A Short Acculturation Scale for Filipino Americans (ASASFA) represents a validated, unidirectional ethnic‐specific measure used with first‐generation FAs. ASASFA's psychometric properties with adult U.S.‐born children—the second generation—remain unexplored. This study determined (a) the factor structure of ASASFA with adult U.S.‐born FAs and (b) the predictors of their acculturation scores. A secondary analysis was conducted on ASASFA data from a mental health survey of 116 U.S.‐born FAs. Exploratory factor and parallel analyses showed a two‐factor solution: language use and preference (Factor 1) and ethnic social relations (Factor 2). Ordinary least squares regression indicated gender and ethnic self‐identification predict Factor 1 scores; self‐identification solely predicts Factor 2 scores. Results demonstrate ASASFA's validity and parsimony, supporting its use in FA health studies when lengthy bidirectional acculturation measures become impractical.
Article
Full-text available
Background: Approximately 21% of the US population speaks a language other than English at home; many of these individuals cannot effectively communicate in English. Hispanic and Chinese Americans, in particular, are the two largest minority groups having low health literacy in the United States. Fortunately, machine-generated translations represent a novel tool that non-English speakers can use to receive and relay health education information when human interpreters are not available. Objective: The purpose of this study was to evaluate the accuracy of the Google Translate website when translating health information from English to Spanish and English to Chinese. Methods: The pamphlet, "You are the heart of your family…take care of it," is a health education sheet for diabetes patients that outlines six tips for behavior change. Two professional translators translated the original English sentences into Spanish and Chinese. We recruited 6 certified translators (3 Spanish and 3 Chinese) to conduct blinded evaluations of the following versions: (1) sentences translated by Google Translate, and (2) sentences translated by a professional human translator. Evaluators rated the sentences on four scales: fluency, adequacy, meaning, and severity. We performed descriptive analysis to examine differences between these two versions. Results: Cronbach's alpha values exhibited high degrees of agreement on the rating outcome of both evaluator groups: .919 for the Spanish evaluators and .972 for the Chinese evaluators. The readability of the sentences in this study ranged from 2.8 to 9.0 (mean 5.4, SD 2.7). The correlation coefficients between the grade level and translation accuracy for all sentences translated by Google were negative (eg, rMeaning=-.660), which indicates that Google provided accurate translation for simple sentences. However, the likelihood of incorrect translation increased when the original English sentences required higher grade levels to comprehend. The Chinese human translator provided more accurate translation compared to Google. The Spanish human translator, on the other hand, did not provide a significantly better translation compared to Google. Conclusion: Google produced a more accurate translation from English to Spanish than English to Chinese. Some sentences translated by Google from English to Chinese exhibit the potential to result in delayed patient care. We recommend continuous training and credential practice standards for professional medical translators to enhance patient safety as well as providing health education information in multiple languages.
Article
Full-text available
Communication is the cornerstone of medicine, without which we cannot interact with our patients.1 The General Medical Council’s Good Medical Practice states that “Doctors must listen to patients, take account of their views, and respond honestly to their questions.”2 However, we still often interact with patients who do not speak the local language. In the United Kingdom most hospitals have access to translation services, but they are expensive and often cumbersome. A complex and nuanced medical, ethical, and treatment discussion with patients whose knowledge of the local language is inadequate remains challenging. Indeed, even in a native language there is an element of translation from medical to lay terminology. We recently treated a very sick child in our paediatric intensive care unit. The parents did not speak English, and there were no human translators available. Reluctantly we resorted to a web based translation tool. We were uncertain whether Google Translate was accurately translating our complex medical phrases.3 4 Fortunately our patient recovered, and a …
Article
Patients with limited English proficiency experience communication barriers to health care in English-speaking countries. Written communication improves comprehension,¹ but pretranslated standard instructions cannot address patient-specific issues (eg, medication titration). Machine translation tools, including Google Translate (GT), have potential to improve communication with these patients, but prior studies showed limited accuracy; 1 study found that GT Spanish translations of patient education materials were 60% accurate, with 4% resulting in serious error.
Article
Study objective: Language barriers are known to negatively affect many health outcomes among limited English proficiency patient populations, but little is known about the quality of care such patients receive in the emergency department (ED). This study seeks to determine whether limited English proficiency patients experience different quality of care than English-speaking patients in the ED, using unplanned revisit within 72 hours as a surrogate quality indicator. Methods: We conducted a retrospective cohort study in an urban adult ED in 2012, with a total of 41,772 patients and 56,821 ED visits. We compared 2,943 limited English proficiency patients with 38,829 English-speaking patients presenting to the ED after excluding patients with psychiatric complaints, altered mental status, and nonverbal states, and those with more than 4 ED visits in 12 months. Two main outcomes-the risk of inpatient admission from the ED and risk of unplanned ED revisit within 72 hours-were measured with odds ratios from generalized estimating equation multivariate models. Results: Limited English proficiency patients were more likely than English speakers to be admitted (32.0% versus 27.2%; odds ratio [OR]=1.20; 95% confidence interval [CI] 1.11 to 1.30). This association became nonsignificant after adjustments (OR=1.04; 95% CI 0.95 to 1.15). Included in the analysis of ED revisit within 72 hours were 32,857 patients with 45,546 ED visits; 4.2% of all patients (n=1,380) had at least 1 unplanned revisit. Limited English proficiency patients were more likely than English speakers to have an unplanned revisit (5.0% versus 4.1%; OR=1.19; 95% CI 1.02 to 1.45). This association persisted (OR=1.24; 95% CI 1.02 to 1.53) after adjustment for potential confounders, including insurance status. Conclusion: We found no difference in hospital admission rates between limited English proficiency patients and English-speaking patients. Yet limited English proficiency patients were 24% more likely to have an unplanned ED revisit within 72 hours, with an absolute difference of 0.9%, suggesting challenges in ED quality of care.
Article
Much of the information we have about the delivery of language services for patients with limited English proficiency (LEP) relates to interpreter services. Very little is known about hospitals' experiences responding to LEP patients' needs for written materials in their preferred languages. This study describes the translation practices of 35 hospitals with large interpreter services programs to inform guidance for the effective delivery of translation services in health care settings. We conducted in-depth telephone interviews with hospital staff members responsible for overseeing translation services at their hospitals. Translation practices varied considerably among study participants, with participants relying on a combination of interpreters serving as translators and contract translators to translate between 5 and 5,000 documents per year. This study showcases examples of hospitals with surprisingly robust translation service programs despite limited external funding. The variance in translation practices underscores a lack of guidance in this area.
Article
Safe and effective care after discharge requires parental education in the pediatric emergency department (ED). Parent-provider communication may be more difficult with parents who have limited health literacy or English-language fluency. This study examined the relationship between language and discharge comprehension regarding medication dosing. We completed a prospective observational study of the ED discharge process using a convenience sample of English- and Spanish-speaking parents of children 2 to 24 months presenting to a single tertiary care pediatric ED with fever and/or respiratory illness. A bilingual research assistant interviewed parents to ascertain their primary language and health literacy and observed the discharge process. The primary outcome was parental demonstration of an incorrect dose of acetaminophen for the weight of his or her child. A total of 259 parent-child dyads were screened. There were 210 potential discharges, and 145 (69%) of 210 completed the postdischarge interview. Forty-six parents (32%) had an acetaminophen dosing error. Spanish-speaking parents were significantly more likely to have a dosing error (odds ratio, 3.7; 95% confidence interval, 1.6-8.1), even after adjustment for language of discharge, income, and parental health literacy (adjusted odds ratio, 6.7; 95% confidence interval, 1.4-31.7). Current ED discharge communication results in a significant disparity between English- and Spanish-speaking parents' comprehension of a crucial aspect of medication safety. These differences were not explained purely by interpretation, suggesting that interventions to improve comprehension must address factors beyond language alone.
Article
Effective communication at hospital discharge is necessary for an optimal transition and to avoid adverse events. We investigated the association of a language barrier with patient understanding of discharge instructions. Spanish-speaking, Chinese-speaking, and English-speaking patients admitted to 2 urban hospitals between 2005 and 2008, comparing patient understanding of follow-up appointment type, and medication category and purpose between limited English-proficient (LEP) and English-proficient patients. Of the 308 patients, 203 were LEP. Rates of understanding were low overall for follow-up appointment type (56%) and the 3 medication outcomes (category 48%, purpose 55%, both 41%). In unadjusted analysis, LEP patients were less likely than English-proficient patients to know appointment type (50% vs. 66%; P=0.01), medication category (45% vs. 54%; P=0.05), and medication category and purpose combined (38% vs. 47%; P=0.04), but equally likely to know medication purpose alone. These results persisted in the adjusted models for medication outcomes: LEP patients had lower odds of understanding medication category (odds ratio 0.63; 95% confidence interval, 0.42-0.95); and category/purpose (odds ratio 0.59; 95% confidence interval, 0.39-0.89). Understanding of appointment type and medications after discharge was low, with LEP patients demonstrating worse understanding of medications. System interventions to improve communication at hospital discharge for all patients, and especially those with LEP, are needed.