Access to this full-text is provided by PLOS.
Content available from PLOS One
This content is subject to copyright.
RESEARCH ARTICLE
Assessment of ChatGPT-generated medical
Arabic responses for patients with metabolic
dysfunction–associated steatotic liver disease
Saleh A. AlqahtaniID
1,2
*, Reem S. AlAhmed
3
, Waleed S. AlOmaim
4
, Saad Alghamdi
5
,
Waleed Al-Hamoudi
5
, Khalid Ibrahim Bzeizi
5
, Ali Albenmousa
5
, Alessio Aghemo
6,7
,
Nicola Pugliese
6,7
, Cesare Hassan
6,7
, Faisal A. Abaalkhail
8,9
1Liver, Digestive, and Lifestyle Health Research Section, and Organ Transplant Center of Excellence, King
Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia, 2Division of Gastroenterology and
Hepatology, Weill Cornell Medicine, New York, New York, United States of America, 3Liver, Digestive, and
Lifestyle Health Research Section, and Biostatistics, Epidemiology and Scientific Computing Department,
King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia, 4Department of Pathology and
Laboratory Medicine, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia, 5Organ
Transplant Center of Excellence, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia,
6Department of Biomedical Sciences, Humanitas University, Pieve Emanuele (MI), Italy, 7Division of
Internal Medicine and Hepatology, Department of Gastroenterology, IRCCS Humanitas Research Hospital,
Rozzano (MI), Italy, 8Gastroenterology Section, Department of Medicine, King Faisal Specialist Hospital and
Research Center, Riyadh, Saudi Arabia, 9College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
*salalqahtani@kfshrc.edu.sa
Abstract
Background and aim
Artificial intelligence (AI)-powered chatbots, such as Chat Generative Pretrained Trans-
former (ChatGPT), have shown promising results in healthcare settings. These tools can
help patients obtain real-time responses to queries, ensuring immediate access to relevant
information. The study aimed to explore the potential use of ChatGPT-generated medical
Arabic responses for patients with metabolic dysfunction–associated steatotic liver disease
(MASLD).
Methods
An English patient questionnaire on MASLD was translated to Arabic. The Arabic questions
were then entered into ChatGPT 3.5 on November 12, 2023. The responses were evaluated
for accuracy, completeness, and comprehensibility by 10 Saudi MASLD experts who were
native Arabic speakers. Likert scales were used to evaluate: 1) Accuracy, 2) Completeness,
and 3) Comprehensibility. The questions were grouped into 3 domains: (1) Specialist refer-
ral, (2) Lifestyle, and (3) Physical activity.
Results
Accuracy mean score was 4.9 ±0.94 on a 6-point Likert scale corresponding to “Nearly all
correct.” Kendall’s coefficient of concordance (KCC) ranged from 0.025 to 0.649, with a
mean of 0.28, indicating moderate agreement between all 10 experts. Mean completeness
PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 1 / 9
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Alqahtani SA, AlAhmed RS, AlOmaim WS,
Alghamdi S, Al-Hamoudi W, Bzeizi KI, et al. (2025)
Assessment of ChatGPT-generated medical Arabic
responses for patients with metabolic dysfunction–
associated steatotic liver disease. PLoS ONE 20(2):
e0317929. https://doi.org/10.1371/journal.
pone.0317929
Editor: Anna Di Sessa, Universita degli Studi della
Campania Luigi Vanvitelli Scuola di Medicina e
Chirurgia, ITALY
Received: October 25, 2024
Accepted: January 7, 2025
Published: February 3, 2025
Copyright: ©2025 Alqahtani et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: All relevant data are
within the article and its Supporting information
files.
Funding: The author(s) received no specific
funding for this work.
Competing interests: The authors have declared
that no competing interests exist.
score was 2.4 ±0.53 on a 3-point Likert scale corresponding to “Comprehensive” (KCC:
0.03–0.553; mean: 0.22). Comprehensibility mean score was 2.74 ±0.52 on a 3-point Likert
scale, which indicates the responses were “Easy to understand” (KCC: 0.00–0.447; mean:
0.25).
Conclusion
MASLD experts found that ChatGPT responses were accurate, complete, and comprehen-
sible. The results support the increasing trend of leveraging the power of AI chatbots to revo-
lutionize the dissemination of information for patients with MASLD. However, many AI-
powered chatbots require further enhancement of scientific content to avoid the risks of cir-
culating medical misinformation.
Introduction
Metabolic dysfunction–associated steatotic liver disease (MASLD), formerly known as non-
alcoholic fatty liver disease (NAFLD), is a global health concern, closely linked to the obesity
epidemic and sedentary lifestyles [1,2]. MASLD involves a full spectrum of conditions result-
ing from metabolic imbalances, such as metabolic dysfunction-associated steatohepatitis
(MASH), previously called non-alcoholic steatohepatitis (NASH) [1,3]. MASLD and MASH
pose enormous financial and health burdens across countries, including those in the Arabic-
speaking world [4–6]. Early detection and treatment of MASLD are crucial to prevent the
progression of more severe stages like cirrhosis and hepatocellular carcinoma [1,7]. However,
barriers to healthcare access and patient literacy may create challenges in managing this condi-
tion effectively [8,9].
In the digital age, artificial intelligence (AI) applications in healthcare offer innovative solu-
tions to such challenges. Chatbots powered by advanced AI models, like the Chat Generative
Pretrained Transformer (ChatGPT), can supplement patient education and engagement out-
side the clinical setting [10]. With their ability to process and produce human-like text, these
AI tools can deliver instant, reliable medical information and support, potentially transform-
ing patient self-management practices [11,12].
From a previous study aimed to determine ChatGPT’s effectiveness in answering patient
inquiries concerning MASLD and associated lifestyle factors, findings indicated that ChatGPT
delivered accurate (mean score of 4.84 on a 6-point Likert scale), comprehensive (mean score
of 2.08 on a 3-point scale), and easy to understand (mean score of 2.87 on a 3-point scale)
responses. Nonetheless, it is noteworthy that the variability in ChatGPT’s responses may be
attributed to factors such as the training dataset, context, and language [13].
Despite the promise of AI-powered interventions, their effectiveness for Arabic-speaking
patients with MASLD remains underexplored. We aimed to explore the potential use of
ChatGPT in generating medical responses in Arabic for patients with MASLD, assessing its
accuracy, reliability, and comprehensiveness as an informative resource.
Materials and methods
A cross-sectional study assessed the effectiveness of ChatGPT in providing medical responses
to Arabic-speaking patients with MASLD. The process followed three main steps: 1) A vali-
dated English-language patient questionnaire on MASLD [13], was translated into Arabic by
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 2 / 9
the MASLD experts and an independent researcher, ensuring linguistic and contextual accu-
racy from a patient perspective; 2) The translated questions were then entered separately into
ChatGPT 3.5 on November 12, 2023, simulating a realistic scenario where a patient seeks
information regarding MASLD; and 3) Ten MASLD experts from Saudi Arabia, who were
native Arabic speakers and fluent in English, independently evaluated the AI-generated
responses. The data was collected from 01/31/2024 through 02/10/2024. For the survey and
questionnaire, we primarily used Classical Arabic, which is the standard for formal and busi-
ness writing, ensuring a common linguistic framework across diverse Arabic-speaking
populations.
Three domains were assessed using respective Likert scales: 1) Accuracy: Responses were
rated on a 6-point Likert scale ranging from ’Completely incorrect’ to ’Correct’; 2) Complete-
ness: A 3-point Likert scale was utilized, categorizing responses as ’Incomplete’, ’Adequate’, or
’Comprehensive’; and 3) Comprehensibility: The intelligibility of responses was determined
using a 3-point Likert scale marked by ’Difficult’, ’Partly difficult’, and ’Easy to understand’.
An additional open-ended question was integrated into the Arabic questionnaire to gather
detailed feedback and expert commentary on the AI-generated response quality. This struc-
tured evaluation method aimed to capture the nuanced perspectives of clinical experts regard-
ing the application of ChatGPT in patient education and its potential role in improving
MASLD patient care within Arabic-speaking populations.
Statistical analysis
The data was analyzed using the Statistical Package for Social Sciences (SPSS), version 28 (IBM
Corp., N.Y., USA). To assess the potential usability of ChatGPT’s Arabic responses for patients
with MASLD, the non-parametric Kendall Tau’s correlation test was employed. It examined
the association between experts’ ratings, using ordinal data from Likert scale assessments for
the three domains, to determine the direction and strength of relationships between the vari-
ables under study. The mean scores, measured on 6- and 3-point Likert scales, Kendall’s coeffi-
cient of concordance, and range values were expressed.
Ethical statement
Ethical approval for this study was obtained from the Research Ethics Committee (REC) of
King Faisal Specialist Hospital & Research Center, Riyadh, Saudi Arabia (RAC #2241013) on
01/29/2024. The REC recommended the approval of the study with a waiver of signing and
documentation of consent. The decision of participant MASLD experts to submit the survey
was considered consent.
Results
Accuracy
The mean score for accuracy was 4.92 ±0.94 on a 6-point Likert scale corresponding to
“Nearly all correct”. Kendall’s coefficient of concordance ranged from 0.025 to 0.649, with a
mean of 0.28, indicating a moderate level of agreement among all 10 experts. The highest
mean square was for question 5, with a mean of 5.3 corresponding to “Correct”. The lowest
mean was question 13, with a mean score of 4.3, corresponding to “More correct than incor-
rect”. Among the three domains, Physical Activity had the highest accuracy mean of
5.07 ±0.83, while specialist referral had the lowest mean score of 4.70 ±1.02 (Fig 1).
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 3 / 9
Completeness
The mean score for completeness was 2.37 ±0.53 on a 3-point Likert scale, corresponding to
“Comprehensive”. Kendall’s coefficient ranged from 0.03 to 0.553, with a mean of 0.22, indi-
cating a moderate level of agreement among all 10 experts. The highest question mean score
was Q8 of 2.6, corresponding to “Comprehensive”. The lowest mean was question 1, with a
mean score of 2.1, corresponding to “Adequate”. Among the three domains, Physical Activity
had the highest mean score of 2.43 ±0.57, while specialist referral had the lowest mean score
of 2.20 ±0.48 (Fig 2).
Fig 1. Accuracy score. Box plot showing the distribution of accuracy scores for each question. Graph shows the interquartile range
(box), median (horizontal line), mean (dot), and outliers (whiskers).
https://doi.org/10.1371/journal.pone.0317929.g001
Fig 2. Completeness score. Box plot showing the distribution of completeness scores for each question. Graph shows the
interquartile range (box), median (horizontal line), mean (dot), and outliers (whiskers).
https://doi.org/10.1371/journal.pone.0317929.g002
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 4 / 9
Comprehensibility
The average comprehensibility score was 2.74 ±0.52, which indicates that the ChatGPT-gener-
ated responses were “Easy to understand”. Kendall’s coefficient ranged from 0.00 to 0.447,
with a mean of 0.25, indicating a moderate level of agreement among all 10 experts. The high-
est question mean score of 2.9 was questions 2, 3, 6, 8, and 10. The lowest question mean score
of 2.4 was questions 7 and 14. Among the three domains, Physical Activity had the highest
mean score of 2.83 ±0.38, while specialist referral had the lowest mean score of 2.50 ±0.63
(Fig 3).
Expert comments
When comparing responses with the highest/lowest frequency of the expert comments, the fol-
lowing questions generated responses with no comments by the expert (Questions 8–10 and
12). The highest questions that had more than one expert comment were questions 1, 5, and
14. Grouping comments by theme, the following had been identified to be the most repeated
comments among the experts: 1) The generated responses used the term “NAFLD/NASH”
instead of “MASLD/MASH”; 2) The Arabic-generated response translation of “Biopsy”; 3) The
Arabic-generated response translation of “MRI”; and 4) The Arabic-generated response sen-
tences on alcohol consumption.
Discussion
AI is significantly impacting the medical field, including gastroenterology and hepatology
[14,15]. In recent years, AI has been successfully applied in liver pathology and radiology to
improve diagnostic accuracy and reduce inter- and intra-observer variability [14–16].
Recently, significant attention has been paid to the clinical applications of AI-based chatbots,
specifically ChatGPT in various contexts, including its potential use as an immediate, free,
and on-demand information dissemination tool for patients with MASLD [14]. Identifying
effective information dissemination tools for patients with MASLD is a clinical priority for
disease management, as MASLD management needs a multidisciplinary approach [17].
Patient education and information dissemination are an essential component for helping
Fig 3. Comprehensibility score. Box plot showing the distribution of comprehensibility scores for each question. Graph shows the
interquartile range (box), median (horizontal line), mean (dot), and outliers (whiskers).
https://doi.org/10.1371/journal.pone.0317929.g003
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 5 / 9
patients in achieving and maintaining lifestyle changes [18,19]. AI-based chatbots could be a
valuable tool for patients by providing simplified explanations and guidance on first-line
treatment options and disease management such as weight loss and physical activity
recommendations.
Pugliese et al. [13] recently conducted the first study on ChatGPT 3.5 as an information dis-
semination tool for patients with MASLD, demonstrating that ChatGPT 3.5 can provide
understandable and complete answers from the patient’s perspective to 15 pre-defined
MASLD-related questions in English. The AI-generated answers were evaluated by 10 experts
and found to be relatively accurate [13]. In addition, preliminary data from another study by
the same authors showed that using a different language from English did not seem to affect
the effectiveness of ChatGPT as a resource tool for patients with MASLD [20]. To date, no
study assessed the effectiveness of AI-powered interventions for Arabic-speaking patients with
MASLD.
In our study, we involved 10 MASLD experts from Saudi Arabia who were native Arabic
speakers and evaluated the same set of questions that were previously analyzed in English. We
found that ChatGPT’s ability to advise patients with MASLD was not affected by language, as
the Arabic answers were deemed to be complete (with a mean score of 2.4 on a 3-point scale)
and comprehensible (with a mean score of 2.74 on a 3-point scale). However, consistent with
other studies, the accuracy of ChatGPT still requires improvement, with a mean score of 4.9
on a 6-point Likert scale (Table 1). So, while the Arabic language does not influence the com-
pleteness and accuracy of ChatGPT generated answers, it also does not improve the inaccura-
cies observed in clinically meaningful answers. Similar to a previous study conducted in the
English language [13], the Physical Activity domain had the highest score as well for the Arabic
questionnaire (Table 2).
Limitations
Ten experts in the field of MASLD conducted the ratings using Likert scales. However, it is
important to note that such scales have limitations as they allow for partial accuracy ratings.
Table 1. Comparing the mean score result between the Arabic and English responses [13].
Evaluation Parameters Arabic English
Accuracy 4.92 ±0.94 4.84 ±0.74
Completeness 2.37 ±0.53 2.08 ±0.3
comprehensibility 2.74 ±0.52 2.87 ±0.14
https://doi.org/10.1371/journal.pone.0317929.t001
Table 2. Comparing domains mean score result between the Arabic and English responses [13].
Accuracy Mean Score Arabic English
Highest domain Physical Activity
5.07 ±0.83
Physical Activity
5.56 ±0.56
Lowest domain Specialist Referral
4.70 ±1.02
Specialist Referral
3.9 ±1.44
Completeness Mean Score Arabic English
Highest domain Physical Activity
2.43 ±0.57
Physical Activity
2.46 ±0.5
Lowest domain Specialist Referral
2.20 ±0.48
Specialist Referral
1.73 ±0.82
https://doi.org/10.1371/journal.pone.0317929.t002
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 6 / 9
This is unacceptable in the medical field as it can lead to misunderstandings and dangerous
consequences for patients. Another limitation is the availability of new and potentially better
versions of ChatGPT (ChatGPT 4), as the study used version 3.5. However, it should be noted
that ChatGPT 4 is not freely accessible to patients and thus it is unlikely to be used any time
soon. While a variety of large language models are accessible, including free options, our deci-
sion to employ ChatGPT was primarily driven by methodological consistency. To ensure a
reliable comparison between English and Arabic responses, it was crucial to maintain a stan-
dardized approach. By utilizing the same AI tool, we could isolate the impact of language dif-
ferences on the generated content. We acknowledge the rapid advancements in AI technology
and the potential benefits of exploring diverse models which may improve in accuracy and cul-
tural relevance. Future research endeavors will undoubtedly involve a comparative analysis of
various AI tools to assess their relative strengths and weaknesses in different language
contexts.
In addition, it is crucial to consider the impact of socio-cultural factors on ChatGPT
responses. The sociocultural background of the patient determines the tool’s capacity to offer
culturally sensitive guidance, and patient preferences, health literacy levels, and cultural quirks
may all affect how successful the responses are. Therefore, even if ChatGPT is a useful tool, its
use needs to be done with consideration for the patients’ cultural variety [20] Chatbots also
have other known limitations, including the risk of generating content that may not be
grounded in evidence-based knowledge, a phenomenon known as ’hallucinations’ [21].
Retrieval augmented generation (RAG) is a potential method to address this issue. RAG com-
bines the response-generating ability of AI-based chatbots with the ability to pull in verified
information from external sources, resulting in more accurate and complete answers. There is
a growing trend not only in acquiring information from AI-based apps and services but also in
decision-making based on such information. Hence, the professional community should use
AI responsibly by following the principles and ethics associated with it.
Conclusions
This study addresses the critical requirement for AI tools in the Arabic-speaking world, where
the prevalence of MASLD is estimated to be higher than in Western countries [22]. Although
our study confirms the promising results obtained by previous studies, the universal adoption
of ChatGPT as a resource tool for MASLD patients is challenging [13,20]. The identified limi-
tations highlight the need for continued improvement of AI models in healthcare settings.
Such improvement requires collaboration between AI experts and healthcare professionals,
which is necessary and crucial. While the study results showcase that the AI-generated
responses are accurate and consistent, patients should be informed not to replace conventional
doctor visits with these technologies, as they facilitate educational patient material specifically,
and are not a way to have a medical diagnosis or consultation.
Supporting information
S1 Table. Accuracy Likert scale reference.
(DOCX)
S2 Table. Completeness Likert scale reference.
(DOCX)
S3 Table. Comprehensiveness Likert scale reference.
(DOCX)
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 7 / 9
S4 Table. Accuracy coded responses.
(DOCX)
S5 Table. Completeness coded responses.
(DOCX)
S6 Table. Completeness coded responses.
(DOCX)
S7 Table. Accuracy—Kendall’s tau analysis.
(DOCX)
S8 Table. Completeness Kendall’s tau.
(DOCX)
S9 Table. Comprehensiveness Kendall’s tau.
(DOCX)
S10 Table. Arabic questionnaire.
(DOCX)
S11 Table. ChatGPT generated responses.
(DOCX)
Author Contributions
Conceptualization: Alessio Aghemo, Nicola Pugliese, Cesare Hassan.
Data curation: Saad Alghamdi, Waleed Al-Hamoudi.
Formal analysis: Reem S. AlAhmed, Waleed S. AlOmaim, Khalid Ibrahim Bzeizi, Ali Alben-
mousa, Faisal A. Abaalkhail.
Supervision: Saleh A. Alqahtani.
Validation: Khalid Ibrahim Bzeizi.
Writing – original draft: Saleh A. Alqahtani, Reem S. AlAhmed, Alessio Aghemo, Nicola Pug-
liese, Cesare Hassan.
Writing – review & editing: Saleh A. Alqahtani, Waleed S. AlOmaim, Saad Alghamdi, Waleed
Al-Hamoudi, Khalid Ibrahim Bzeizi, Ali Albenmousa, Alessio Aghemo, Nicola Pugliese,
Cesare Hassan, Faisal A. Abaalkhail.
References
1. Chan WK, Chuah KH, Rajaram RB, Lim LL, Ratnasingam J, Vethakkan SR. Metabolic Dysfunction-
Associated Steatotic Liver Disease (MASLD): A State-of-the-Art Review. J Obes Metab Syndr. 2023;
32(3):197–213. https://doi.org/10.7570/jomes23052 PMID: 37700494
2. Zelber-Sagi S, Ratziu V, Oren R. Nutrition and physical activity in NAFLD: An overview of the epidemio-
logical evidence. World J Gastroenterol WJG. 2011 Aug 7; 17(29):3377–89. https://doi.org/10.3748/
wjg.v17.i29.3377 PMID: 21876630
3. Staufer K, Stauber RE. Steatotic Liver Disease: Metabolic Dysfunction, Alcohol, or Both? Biomedicines.
2023 Jul 26; 11(8):2108. https://doi.org/10.3390/biomedicines11082108 PMID: 37626604
4. Alqahtani S. A., Broering D. C., Alghamdi S. A., Bzeizi K. I., Alhusseini N., Alabbad S. I., et al. (2021).
Changing trends in liver transplantation indications in Saudi Arabia: from hepatitis C virus infection to
nonalcoholic fatty liver disease. BMC gastroenterology, 21(1), 245. https://doi.org/10.1186/s12876-
021-01828-z PMID: 34074270
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 8 / 9
5. Coker T., Saxton J., Retat L., Alswat K., Alghnam S., Al-Raddadi R. M., et al. (2022). The future health
and economic burden of obesity-attributable type 2 diabetes and liver disease among the working-age
population in Saudi Arabia. PloS one, 17(7), e0271108. https://doi.org/10.1371/journal.pone.0271108
PMID: 35834577
6. Golabi P., Paik J. M., AlQahtani S., Younossi Y., Tuncer G., & Younossi Z. M. (2021). Burden of non-
alcoholic fatty liver disease in Asia, the Middle East and North Africa: Data from Global Burden of Dis-
ease 2009–2019. Journal of hepatology, 75(4), 795–809. https://doi.org/10.1016/j.jhep.2021.05.022
PMID: 34081959
7. Yin X, Guo X, Liu Z, Wang J. Advances in the Diagnosis and Treatment of Non-Alcoholic Fatty Liver Dis-
ease. Int J Mol Sci. 2023 Feb 2; 24(3):2844. https://doi.org/10.3390/ijms24032844 PMID: 36769165
8. Lazarus JV, Colombo M, Cortez-Pinto H, Huang TTK, Miller V, Ninburg M, et al. NAFLD—sounding the
alarm on a silent epidemic. Nat Rev Gastroenterol Hepatol. 2020 Jul; 17(7):377–9. https://doi.org/10.
1038/s41575-020-0315-7 PMID: 32514153
9. Allen-Meares P, Lowry B, Estrella ML, Mansuri S. Health Literacy Barriers in the Health Care System:
Barriers and Opportunities for the Profession. Health Soc Work. 2020 Jan 28; 45(1):62–4. https://doi.
org/10.1093/hsw/hlz034 PMID: 31993624
10. Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis
on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell. 2023 Oct 31; 6:1237704.
https://doi.org/10.3389/frai.2023.1237704 PMID: 38028668
11. Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative
perspective. BenchCouncil Trans Benchmarks Stand Eval. 2023 Feb 1; 3(1):100105.
12. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing
healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023 Sep 22; 23(1):689.
https://doi.org/10.1186/s12909-023-04698-z PMID: 37740191
13. Pugliese N, Wai-Sun Wong V, Schattenberg JM, Romero-Gomez M, Sebastiani G, NAFLD Expert
Chatbot Working Group, et al. Accuracy, Reliability, and Comprehensibility of ChatGPT-Generated
Medical Responses for Patients With Nonalcoholic Fatty Liver Disease. Clin Gastroenterol Hepatol Off
Clin Pract J Am Gastroenterol Assoc. 2023 Sep 15; S1542-3565(23)00704–8. https://doi.org/10.1016/j.
cgh.2023.08.033 PMID: 37716618
14. Le Berre C, Sandborn WJ, Aridhi S, et al. Application of Artificial Intelligence to Gastroenterology and
Hepatology. Gastroenterology. 2020; 158(1):76–94.e2. https://doi.org/10.1053/j.gastro.2019.08.058
PMID: 31593701
15. Schattenberg JM, Chalasani N, Alkhouri N. Artificial Intelligence Applications in Hepatology. Clin Gas-
troenterol Hepatol. 2023; 21(8):2015–2025. https://doi.org/10.1016/j.cgh.2023.04.007 PMID: 37088460
16. Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: Improving
diagnostics, prognostics and response prediction. JHEP Rep. 2022; 4(4):100443. Published 2022 Feb
2. https://doi.org/10.1016/j.jhepr.2022.100443 PMID: 35243281
17. Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD Practice Guidance on the clinical
assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023; 77(5):1797–1835.
https://doi.org/10.1097/HEP.0000000000000323 PMID: 36727674
18. Pugliese N, Plaz Torres MC, Petta S, Valenti L, Giannini EG, Aghemo A. Is there an ’ideal’ diet for
patients with NAFLD?. Eur J Clin Invest. 2022; 52(3):e13659. https://doi.org/10.1111/eci.13659 PMID:
34309833
19. Balakrishnan M, Liu K, Schmitt S, et al. Behavioral weight-loss interventions for patients with NAFLD: A
systematic scoping review. Hepatol Commun. 2023; 7(8):e0224. Published 2023 Aug 3. https://doi.org/
10.1097/HC9.0000000000000224 PMID: 37534947
20. Pugliese N., Polverini D., Lombardi R., Pennisi G., Ravaioli F., Armandi A., et al., & NAFLD Expert Chat-
bot Working Group. (2024). Evaluation of ChatGPT as a Counselling Tool for Italian-Speaking MASLD
Patients: Assessment of Accuracy, Completeness and Comprehensibility. Journal of Personalized
Medicine, 14(6), 568. https://doi.org/10.3390/jpm14060568 PMID: 38929789
21. Goddard J. Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers. Am J Med.
2023; 136(11):1059–1060. https://doi.org/10.1016/j.amjmed.2023.06.012 PMID: 37369274
22. Younossi ZM, Golabi P, Paik JM, Henry A, Van Dongen C, Henry L. The global epidemiology of nonal-
coholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review.
Hepatology. 2023; 77(4):1335–1347. https://doi.org/10.1097/HEP.0000000000000004 PMID:
36626630
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 9 / 9