ArticlePDF Available

Assessment of ChatGPT-generated medical Arabic responses for patients with metabolic dysfunction–associated steatotic liver disease

PLOS
PLOS One
Authors:

Abstract and Figures

Background and aim Artificial intelligence (AI)-powered chatbots, such as Chat Generative Pretrained Transformer (ChatGPT), have shown promising results in healthcare settings. These tools can help patients obtain real-time responses to queries, ensuring immediate access to relevant information. The study aimed to explore the potential use of ChatGPT-generated medical Arabic responses for patients with metabolic dysfunction–associated steatotic liver disease (MASLD). Methods An English patient questionnaire on MASLD was translated to Arabic. The Arabic questions were then entered into ChatGPT 3.5 on November 12, 2023. The responses were evaluated for accuracy, completeness, and comprehensibility by 10 Saudi MASLD experts who were native Arabic speakers. Likert scales were used to evaluate: 1) Accuracy, 2) Completeness, and 3) Comprehensibility. The questions were grouped into 3 domains: (1) Specialist referral, (2) Lifestyle, and (3) Physical activity. Results Accuracy mean score was 4.9 ± 0.94 on a 6-point Likert scale corresponding to “Nearly all correct.” Kendall’s coefficient of concordance (KCC) ranged from 0.025 to 0.649, with a mean of 0.28, indicating moderate agreement between all 10 experts. Mean completeness score was 2.4 ± 0.53 on a 3-point Likert scale corresponding to “Comprehensive” (KCC: 0.03–0.553; mean: 0.22). Comprehensibility mean score was 2.74 ± 0.52 on a 3-point Likert scale, which indicates the responses were “Easy to understand” (KCC: 0.00–0.447; mean: 0.25). Conclusion MASLD experts found that ChatGPT responses were accurate, complete, and comprehensible. The results support the increasing trend of leveraging the power of AI chatbots to revolutionize the dissemination of information for patients with MASLD. However, many AI-powered chatbots require further enhancement of scientific content to avoid the risks of circulating medical misinformation.
This content is subject to copyright.
RESEARCH ARTICLE
Assessment of ChatGPT-generated medical
Arabic responses for patients with metabolic
dysfunction–associated steatotic liver disease
Saleh A. AlqahtaniID
1,2
*, Reem S. AlAhmed
3
, Waleed S. AlOmaim
4
, Saad Alghamdi
5
,
Waleed Al-Hamoudi
5
, Khalid Ibrahim Bzeizi
5
, Ali Albenmousa
5
, Alessio Aghemo
6,7
,
Nicola Pugliese
6,7
, Cesare Hassan
6,7
, Faisal A. Abaalkhail
8,9
1Liver, Digestive, and Lifestyle Health Research Section, and Organ Transplant Center of Excellence, King
Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia, 2Division of Gastroenterology and
Hepatology, Weill Cornell Medicine, New York, New York, United States of America, 3Liver, Digestive, and
Lifestyle Health Research Section, and Biostatistics, Epidemiology and Scientific Computing Department,
King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia, 4Department of Pathology and
Laboratory Medicine, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia, 5Organ
Transplant Center of Excellence, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia,
6Department of Biomedical Sciences, Humanitas University, Pieve Emanuele (MI), Italy, 7Division of
Internal Medicine and Hepatology, Department of Gastroenterology, IRCCS Humanitas Research Hospital,
Rozzano (MI), Italy, 8Gastroenterology Section, Department of Medicine, King Faisal Specialist Hospital and
Research Center, Riyadh, Saudi Arabia, 9College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
*salalqahtani@kfshrc.edu.sa
Abstract
Background and aim
Artificial intelligence (AI)-powered chatbots, such as Chat Generative Pretrained Trans-
former (ChatGPT), have shown promising results in healthcare settings. These tools can
help patients obtain real-time responses to queries, ensuring immediate access to relevant
information. The study aimed to explore the potential use of ChatGPT-generated medical
Arabic responses for patients with metabolic dysfunction–associated steatotic liver disease
(MASLD).
Methods
An English patient questionnaire on MASLD was translated to Arabic. The Arabic questions
were then entered into ChatGPT 3.5 on November 12, 2023. The responses were evaluated
for accuracy, completeness, and comprehensibility by 10 Saudi MASLD experts who were
native Arabic speakers. Likert scales were used to evaluate: 1) Accuracy, 2) Completeness,
and 3) Comprehensibility. The questions were grouped into 3 domains: (1) Specialist refer-
ral, (2) Lifestyle, and (3) Physical activity.
Results
Accuracy mean score was 4.9 ±0.94 on a 6-point Likert scale corresponding to “Nearly all
correct.” Kendall’s coefficient of concordance (KCC) ranged from 0.025 to 0.649, with a
mean of 0.28, indicating moderate agreement between all 10 experts. Mean completeness
PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 1 / 9
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Alqahtani SA, AlAhmed RS, AlOmaim WS,
Alghamdi S, Al-Hamoudi W, Bzeizi KI, et al. (2025)
Assessment of ChatGPT-generated medical Arabic
responses for patients with metabolic dysfunction–
associated steatotic liver disease. PLoS ONE 20(2):
e0317929. https://doi.org/10.1371/journal.
pone.0317929
Editor: Anna Di Sessa, Universita degli Studi della
Campania Luigi Vanvitelli Scuola di Medicina e
Chirurgia, ITALY
Received: October 25, 2024
Accepted: January 7, 2025
Published: February 3, 2025
Copyright: ©2025 Alqahtani et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: All relevant data are
within the article and its Supporting information
files.
Funding: The author(s) received no specific
funding for this work.
Competing interests: The authors have declared
that no competing interests exist.
score was 2.4 ±0.53 on a 3-point Likert scale corresponding to “Comprehensive” (KCC:
0.03–0.553; mean: 0.22). Comprehensibility mean score was 2.74 ±0.52 on a 3-point Likert
scale, which indicates the responses were “Easy to understand” (KCC: 0.00–0.447; mean:
0.25).
Conclusion
MASLD experts found that ChatGPT responses were accurate, complete, and comprehen-
sible. The results support the increasing trend of leveraging the power of AI chatbots to revo-
lutionize the dissemination of information for patients with MASLD. However, many AI-
powered chatbots require further enhancement of scientific content to avoid the risks of cir-
culating medical misinformation.
Introduction
Metabolic dysfunction–associated steatotic liver disease (MASLD), formerly known as non-
alcoholic fatty liver disease (NAFLD), is a global health concern, closely linked to the obesity
epidemic and sedentary lifestyles [1,2]. MASLD involves a full spectrum of conditions result-
ing from metabolic imbalances, such as metabolic dysfunction-associated steatohepatitis
(MASH), previously called non-alcoholic steatohepatitis (NASH) [1,3]. MASLD and MASH
pose enormous financial and health burdens across countries, including those in the Arabic-
speaking world [46]. Early detection and treatment of MASLD are crucial to prevent the
progression of more severe stages like cirrhosis and hepatocellular carcinoma [1,7]. However,
barriers to healthcare access and patient literacy may create challenges in managing this condi-
tion effectively [8,9].
In the digital age, artificial intelligence (AI) applications in healthcare offer innovative solu-
tions to such challenges. Chatbots powered by advanced AI models, like the Chat Generative
Pretrained Transformer (ChatGPT), can supplement patient education and engagement out-
side the clinical setting [10]. With their ability to process and produce human-like text, these
AI tools can deliver instant, reliable medical information and support, potentially transform-
ing patient self-management practices [11,12].
From a previous study aimed to determine ChatGPT’s effectiveness in answering patient
inquiries concerning MASLD and associated lifestyle factors, findings indicated that ChatGPT
delivered accurate (mean score of 4.84 on a 6-point Likert scale), comprehensive (mean score
of 2.08 on a 3-point scale), and easy to understand (mean score of 2.87 on a 3-point scale)
responses. Nonetheless, it is noteworthy that the variability in ChatGPT’s responses may be
attributed to factors such as the training dataset, context, and language [13].
Despite the promise of AI-powered interventions, their effectiveness for Arabic-speaking
patients with MASLD remains underexplored. We aimed to explore the potential use of
ChatGPT in generating medical responses in Arabic for patients with MASLD, assessing its
accuracy, reliability, and comprehensiveness as an informative resource.
Materials and methods
A cross-sectional study assessed the effectiveness of ChatGPT in providing medical responses
to Arabic-speaking patients with MASLD. The process followed three main steps: 1) A vali-
dated English-language patient questionnaire on MASLD [13], was translated into Arabic by
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 2 / 9
the MASLD experts and an independent researcher, ensuring linguistic and contextual accu-
racy from a patient perspective; 2) The translated questions were then entered separately into
ChatGPT 3.5 on November 12, 2023, simulating a realistic scenario where a patient seeks
information regarding MASLD; and 3) Ten MASLD experts from Saudi Arabia, who were
native Arabic speakers and fluent in English, independently evaluated the AI-generated
responses. The data was collected from 01/31/2024 through 02/10/2024. For the survey and
questionnaire, we primarily used Classical Arabic, which is the standard for formal and busi-
ness writing, ensuring a common linguistic framework across diverse Arabic-speaking
populations.
Three domains were assessed using respective Likert scales: 1) Accuracy: Responses were
rated on a 6-point Likert scale ranging from ’Completely incorrect’ to ’Correct’; 2) Complete-
ness: A 3-point Likert scale was utilized, categorizing responses as ’Incomplete’, ’Adequate’, or
’Comprehensive’; and 3) Comprehensibility: The intelligibility of responses was determined
using a 3-point Likert scale marked by ’Difficult’, ’Partly difficult’, and ’Easy to understand’.
An additional open-ended question was integrated into the Arabic questionnaire to gather
detailed feedback and expert commentary on the AI-generated response quality. This struc-
tured evaluation method aimed to capture the nuanced perspectives of clinical experts regard-
ing the application of ChatGPT in patient education and its potential role in improving
MASLD patient care within Arabic-speaking populations.
Statistical analysis
The data was analyzed using the Statistical Package for Social Sciences (SPSS), version 28 (IBM
Corp., N.Y., USA). To assess the potential usability of ChatGPT’s Arabic responses for patients
with MASLD, the non-parametric Kendall Tau’s correlation test was employed. It examined
the association between experts’ ratings, using ordinal data from Likert scale assessments for
the three domains, to determine the direction and strength of relationships between the vari-
ables under study. The mean scores, measured on 6- and 3-point Likert scales, Kendall’s coeffi-
cient of concordance, and range values were expressed.
Ethical statement
Ethical approval for this study was obtained from the Research Ethics Committee (REC) of
King Faisal Specialist Hospital & Research Center, Riyadh, Saudi Arabia (RAC #2241013) on
01/29/2024. The REC recommended the approval of the study with a waiver of signing and
documentation of consent. The decision of participant MASLD experts to submit the survey
was considered consent.
Results
Accuracy
The mean score for accuracy was 4.92 ±0.94 on a 6-point Likert scale corresponding to
“Nearly all correct”. Kendall’s coefficient of concordance ranged from 0.025 to 0.649, with a
mean of 0.28, indicating a moderate level of agreement among all 10 experts. The highest
mean square was for question 5, with a mean of 5.3 corresponding to “Correct”. The lowest
mean was question 13, with a mean score of 4.3, corresponding to “More correct than incor-
rect”. Among the three domains, Physical Activity had the highest accuracy mean of
5.07 ±0.83, while specialist referral had the lowest mean score of 4.70 ±1.02 (Fig 1).
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 3 / 9
Completeness
The mean score for completeness was 2.37 ±0.53 on a 3-point Likert scale, corresponding to
“Comprehensive”. Kendall’s coefficient ranged from 0.03 to 0.553, with a mean of 0.22, indi-
cating a moderate level of agreement among all 10 experts. The highest question mean score
was Q8 of 2.6, corresponding to “Comprehensive”. The lowest mean was question 1, with a
mean score of 2.1, corresponding to “Adequate”. Among the three domains, Physical Activity
had the highest mean score of 2.43 ±0.57, while specialist referral had the lowest mean score
of 2.20 ±0.48 (Fig 2).
Fig 1. Accuracy score. Box plot showing the distribution of accuracy scores for each question. Graph shows the interquartile range
(box), median (horizontal line), mean (dot), and outliers (whiskers).
https://doi.org/10.1371/journal.pone.0317929.g001
Fig 2. Completeness score. Box plot showing the distribution of completeness scores for each question. Graph shows the
interquartile range (box), median (horizontal line), mean (dot), and outliers (whiskers).
https://doi.org/10.1371/journal.pone.0317929.g002
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 4 / 9
Comprehensibility
The average comprehensibility score was 2.74 ±0.52, which indicates that the ChatGPT-gener-
ated responses were “Easy to understand”. Kendall’s coefficient ranged from 0.00 to 0.447,
with a mean of 0.25, indicating a moderate level of agreement among all 10 experts. The high-
est question mean score of 2.9 was questions 2, 3, 6, 8, and 10. The lowest question mean score
of 2.4 was questions 7 and 14. Among the three domains, Physical Activity had the highest
mean score of 2.83 ±0.38, while specialist referral had the lowest mean score of 2.50 ±0.63
(Fig 3).
Expert comments
When comparing responses with the highest/lowest frequency of the expert comments, the fol-
lowing questions generated responses with no comments by the expert (Questions 8–10 and
12). The highest questions that had more than one expert comment were questions 1, 5, and
14. Grouping comments by theme, the following had been identified to be the most repeated
comments among the experts: 1) The generated responses used the term “NAFLD/NASH”
instead of “MASLD/MASH”; 2) The Arabic-generated response translation of “Biopsy”; 3) The
Arabic-generated response translation of “MRI”; and 4) The Arabic-generated response sen-
tences on alcohol consumption.
Discussion
AI is significantly impacting the medical field, including gastroenterology and hepatology
[14,15]. In recent years, AI has been successfully applied in liver pathology and radiology to
improve diagnostic accuracy and reduce inter- and intra-observer variability [1416].
Recently, significant attention has been paid to the clinical applications of AI-based chatbots,
specifically ChatGPT in various contexts, including its potential use as an immediate, free,
and on-demand information dissemination tool for patients with MASLD [14]. Identifying
effective information dissemination tools for patients with MASLD is a clinical priority for
disease management, as MASLD management needs a multidisciplinary approach [17].
Patient education and information dissemination are an essential component for helping
Fig 3. Comprehensibility score. Box plot showing the distribution of comprehensibility scores for each question. Graph shows the
interquartile range (box), median (horizontal line), mean (dot), and outliers (whiskers).
https://doi.org/10.1371/journal.pone.0317929.g003
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 5 / 9
patients in achieving and maintaining lifestyle changes [18,19]. AI-based chatbots could be a
valuable tool for patients by providing simplified explanations and guidance on first-line
treatment options and disease management such as weight loss and physical activity
recommendations.
Pugliese et al. [13] recently conducted the first study on ChatGPT 3.5 as an information dis-
semination tool for patients with MASLD, demonstrating that ChatGPT 3.5 can provide
understandable and complete answers from the patient’s perspective to 15 pre-defined
MASLD-related questions in English. The AI-generated answers were evaluated by 10 experts
and found to be relatively accurate [13]. In addition, preliminary data from another study by
the same authors showed that using a different language from English did not seem to affect
the effectiveness of ChatGPT as a resource tool for patients with MASLD [20]. To date, no
study assessed the effectiveness of AI-powered interventions for Arabic-speaking patients with
MASLD.
In our study, we involved 10 MASLD experts from Saudi Arabia who were native Arabic
speakers and evaluated the same set of questions that were previously analyzed in English. We
found that ChatGPT’s ability to advise patients with MASLD was not affected by language, as
the Arabic answers were deemed to be complete (with a mean score of 2.4 on a 3-point scale)
and comprehensible (with a mean score of 2.74 on a 3-point scale). However, consistent with
other studies, the accuracy of ChatGPT still requires improvement, with a mean score of 4.9
on a 6-point Likert scale (Table 1). So, while the Arabic language does not influence the com-
pleteness and accuracy of ChatGPT generated answers, it also does not improve the inaccura-
cies observed in clinically meaningful answers. Similar to a previous study conducted in the
English language [13], the Physical Activity domain had the highest score as well for the Arabic
questionnaire (Table 2).
Limitations
Ten experts in the field of MASLD conducted the ratings using Likert scales. However, it is
important to note that such scales have limitations as they allow for partial accuracy ratings.
Table 1. Comparing the mean score result between the Arabic and English responses [13].
Evaluation Parameters Arabic English
Accuracy 4.92 ±0.94 4.84 ±0.74
Completeness 2.37 ±0.53 2.08 ±0.3
comprehensibility 2.74 ±0.52 2.87 ±0.14
https://doi.org/10.1371/journal.pone.0317929.t001
Table 2. Comparing domains mean score result between the Arabic and English responses [13].
Accuracy Mean Score Arabic English
Highest domain Physical Activity
5.07 ±0.83
Physical Activity
5.56 ±0.56
Lowest domain Specialist Referral
4.70 ±1.02
Specialist Referral
3.9 ±1.44
Completeness Mean Score Arabic English
Highest domain Physical Activity
2.43 ±0.57
Physical Activity
2.46 ±0.5
Lowest domain Specialist Referral
2.20 ±0.48
Specialist Referral
1.73 ±0.82
https://doi.org/10.1371/journal.pone.0317929.t002
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 6 / 9
This is unacceptable in the medical field as it can lead to misunderstandings and dangerous
consequences for patients. Another limitation is the availability of new and potentially better
versions of ChatGPT (ChatGPT 4), as the study used version 3.5. However, it should be noted
that ChatGPT 4 is not freely accessible to patients and thus it is unlikely to be used any time
soon. While a variety of large language models are accessible, including free options, our deci-
sion to employ ChatGPT was primarily driven by methodological consistency. To ensure a
reliable comparison between English and Arabic responses, it was crucial to maintain a stan-
dardized approach. By utilizing the same AI tool, we could isolate the impact of language dif-
ferences on the generated content. We acknowledge the rapid advancements in AI technology
and the potential benefits of exploring diverse models which may improve in accuracy and cul-
tural relevance. Future research endeavors will undoubtedly involve a comparative analysis of
various AI tools to assess their relative strengths and weaknesses in different language
contexts.
In addition, it is crucial to consider the impact of socio-cultural factors on ChatGPT
responses. The sociocultural background of the patient determines the tool’s capacity to offer
culturally sensitive guidance, and patient preferences, health literacy levels, and cultural quirks
may all affect how successful the responses are. Therefore, even if ChatGPT is a useful tool, its
use needs to be done with consideration for the patients’ cultural variety [20] Chatbots also
have other known limitations, including the risk of generating content that may not be
grounded in evidence-based knowledge, a phenomenon known as ’hallucinations’ [21].
Retrieval augmented generation (RAG) is a potential method to address this issue. RAG com-
bines the response-generating ability of AI-based chatbots with the ability to pull in verified
information from external sources, resulting in more accurate and complete answers. There is
a growing trend not only in acquiring information from AI-based apps and services but also in
decision-making based on such information. Hence, the professional community should use
AI responsibly by following the principles and ethics associated with it.
Conclusions
This study addresses the critical requirement for AI tools in the Arabic-speaking world, where
the prevalence of MASLD is estimated to be higher than in Western countries [22]. Although
our study confirms the promising results obtained by previous studies, the universal adoption
of ChatGPT as a resource tool for MASLD patients is challenging [13,20]. The identified limi-
tations highlight the need for continued improvement of AI models in healthcare settings.
Such improvement requires collaboration between AI experts and healthcare professionals,
which is necessary and crucial. While the study results showcase that the AI-generated
responses are accurate and consistent, patients should be informed not to replace conventional
doctor visits with these technologies, as they facilitate educational patient material specifically,
and are not a way to have a medical diagnosis or consultation.
Supporting information
S1 Table. Accuracy Likert scale reference.
(DOCX)
S2 Table. Completeness Likert scale reference.
(DOCX)
S3 Table. Comprehensiveness Likert scale reference.
(DOCX)
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 7 / 9
S4 Table. Accuracy coded responses.
(DOCX)
S5 Table. Completeness coded responses.
(DOCX)
S6 Table. Completeness coded responses.
(DOCX)
S7 Table. Accuracy—Kendall’s tau analysis.
(DOCX)
S8 Table. Completeness Kendall’s tau.
(DOCX)
S9 Table. Comprehensiveness Kendall’s tau.
(DOCX)
S10 Table. Arabic questionnaire.
(DOCX)
S11 Table. ChatGPT generated responses.
(DOCX)
Author Contributions
Conceptualization: Alessio Aghemo, Nicola Pugliese, Cesare Hassan.
Data curation: Saad Alghamdi, Waleed Al-Hamoudi.
Formal analysis: Reem S. AlAhmed, Waleed S. AlOmaim, Khalid Ibrahim Bzeizi, Ali Alben-
mousa, Faisal A. Abaalkhail.
Supervision: Saleh A. Alqahtani.
Validation: Khalid Ibrahim Bzeizi.
Writing original draft: Saleh A. Alqahtani, Reem S. AlAhmed, Alessio Aghemo, Nicola Pug-
liese, Cesare Hassan.
Writing review & editing: Saleh A. Alqahtani, Waleed S. AlOmaim, Saad Alghamdi, Waleed
Al-Hamoudi, Khalid Ibrahim Bzeizi, Ali Albenmousa, Alessio Aghemo, Nicola Pugliese,
Cesare Hassan, Faisal A. Abaalkhail.
References
1. Chan WK, Chuah KH, Rajaram RB, Lim LL, Ratnasingam J, Vethakkan SR. Metabolic Dysfunction-
Associated Steatotic Liver Disease (MASLD): A State-of-the-Art Review. J Obes Metab Syndr. 2023;
32(3):197–213. https://doi.org/10.7570/jomes23052 PMID: 37700494
2. Zelber-Sagi S, Ratziu V, Oren R. Nutrition and physical activity in NAFLD: An overview of the epidemio-
logical evidence. World J Gastroenterol WJG. 2011 Aug 7; 17(29):3377–89. https://doi.org/10.3748/
wjg.v17.i29.3377 PMID: 21876630
3. Staufer K, Stauber RE. Steatotic Liver Disease: Metabolic Dysfunction, Alcohol, or Both? Biomedicines.
2023 Jul 26; 11(8):2108. https://doi.org/10.3390/biomedicines11082108 PMID: 37626604
4. Alqahtani S. A., Broering D. C., Alghamdi S. A., Bzeizi K. I., Alhusseini N., Alabbad S. I., et al. (2021).
Changing trends in liver transplantation indications in Saudi Arabia: from hepatitis C virus infection to
nonalcoholic fatty liver disease. BMC gastroenterology, 21(1), 245. https://doi.org/10.1186/s12876-
021-01828-z PMID: 34074270
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 8 / 9
5. Coker T., Saxton J., Retat L., Alswat K., Alghnam S., Al-Raddadi R. M., et al. (2022). The future health
and economic burden of obesity-attributable type 2 diabetes and liver disease among the working-age
population in Saudi Arabia. PloS one, 17(7), e0271108. https://doi.org/10.1371/journal.pone.0271108
PMID: 35834577
6. Golabi P., Paik J. M., AlQahtani S., Younossi Y., Tuncer G., & Younossi Z. M. (2021). Burden of non-
alcoholic fatty liver disease in Asia, the Middle East and North Africa: Data from Global Burden of Dis-
ease 2009–2019. Journal of hepatology, 75(4), 795–809. https://doi.org/10.1016/j.jhep.2021.05.022
PMID: 34081959
7. Yin X, Guo X, Liu Z, Wang J. Advances in the Diagnosis and Treatment of Non-Alcoholic Fatty Liver Dis-
ease. Int J Mol Sci. 2023 Feb 2; 24(3):2844. https://doi.org/10.3390/ijms24032844 PMID: 36769165
8. Lazarus JV, Colombo M, Cortez-Pinto H, Huang TTK, Miller V, Ninburg M, et al. NAFLD—sounding the
alarm on a silent epidemic. Nat Rev Gastroenterol Hepatol. 2020 Jul; 17(7):377–9. https://doi.org/10.
1038/s41575-020-0315-7 PMID: 32514153
9. Allen-Meares P, Lowry B, Estrella ML, Mansuri S. Health Literacy Barriers in the Health Care System:
Barriers and Opportunities for the Profession. Health Soc Work. 2020 Jan 28; 45(1):62–4. https://doi.
org/10.1093/hsw/hlz034 PMID: 31993624
10. Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis
on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell. 2023 Oct 31; 6:1237704.
https://doi.org/10.3389/frai.2023.1237704 PMID: 38028668
11. Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative
perspective. BenchCouncil Trans Benchmarks Stand Eval. 2023 Feb 1; 3(1):100105.
12. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing
healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023 Sep 22; 23(1):689.
https://doi.org/10.1186/s12909-023-04698-z PMID: 37740191
13. Pugliese N, Wai-Sun Wong V, Schattenberg JM, Romero-Gomez M, Sebastiani G, NAFLD Expert
Chatbot Working Group, et al. Accuracy, Reliability, and Comprehensibility of ChatGPT-Generated
Medical Responses for Patients With Nonalcoholic Fatty Liver Disease. Clin Gastroenterol Hepatol Off
Clin Pract J Am Gastroenterol Assoc. 2023 Sep 15; S1542-3565(23)00704–8. https://doi.org/10.1016/j.
cgh.2023.08.033 PMID: 37716618
14. Le Berre C, Sandborn WJ, Aridhi S, et al. Application of Artificial Intelligence to Gastroenterology and
Hepatology. Gastroenterology. 2020; 158(1):76–94.e2. https://doi.org/10.1053/j.gastro.2019.08.058
PMID: 31593701
15. Schattenberg JM, Chalasani N, Alkhouri N. Artificial Intelligence Applications in Hepatology. Clin Gas-
troenterol Hepatol. 2023; 21(8):2015–2025. https://doi.org/10.1016/j.cgh.2023.04.007 PMID: 37088460
16. Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: Improving
diagnostics, prognostics and response prediction. JHEP Rep. 2022; 4(4):100443. Published 2022 Feb
2. https://doi.org/10.1016/j.jhepr.2022.100443 PMID: 35243281
17. Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD Practice Guidance on the clinical
assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023; 77(5):1797–1835.
https://doi.org/10.1097/HEP.0000000000000323 PMID: 36727674
18. Pugliese N, Plaz Torres MC, Petta S, Valenti L, Giannini EG, Aghemo A. Is there an ’ideal’ diet for
patients with NAFLD?. Eur J Clin Invest. 2022; 52(3):e13659. https://doi.org/10.1111/eci.13659 PMID:
34309833
19. Balakrishnan M, Liu K, Schmitt S, et al. Behavioral weight-loss interventions for patients with NAFLD: A
systematic scoping review. Hepatol Commun. 2023; 7(8):e0224. Published 2023 Aug 3. https://doi.org/
10.1097/HC9.0000000000000224 PMID: 37534947
20. Pugliese N., Polverini D., Lombardi R., Pennisi G., Ravaioli F., Armandi A., et al., & NAFLD Expert Chat-
bot Working Group. (2024). Evaluation of ChatGPT as a Counselling Tool for Italian-Speaking MASLD
Patients: Assessment of Accuracy, Completeness and Comprehensibility. Journal of Personalized
Medicine, 14(6), 568. https://doi.org/10.3390/jpm14060568 PMID: 38929789
21. Goddard J. Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers. Am J Med.
2023; 136(11):1059–1060. https://doi.org/10.1016/j.amjmed.2023.06.012 PMID: 37369274
22. Younossi ZM, Golabi P, Paik JM, Henry A, Van Dongen C, Henry L. The global epidemiology of nonal-
coholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review.
Hepatology. 2023; 77(4):1335–1347. https://doi.org/10.1097/HEP.0000000000000004 PMID:
36626630
PLOS ONE
ChatGPT’s Arabic responses on MASLD
PLOS ONE | https://doi.org/10.1371/journal.pone.0317929 February 3, 2025 9 / 9
... These conversational tools, such as ChatGPT, are trained on large language datasets and can generate new content by identifying and replicating patterns from their training data [17,38]. ChatGPT, for example, is based on OpenAI's Generative Pretrained Transformer (GPT) model and has demonstrated its effectiveness in providing answers to a wide range of queries in a variety of healthcare settings, including mental health support and chronic disease management [17,39]. Despite their potential, LLMs also raise concerns, especially around privacy, the adequacy of their training, and the reliability of their output [17,38]. ...
... Several studies have investigated the performance of ChatGPT in responding to MASLD-related queries [23,24,26,39]. A study involving ten key opinion leaders in the field of MASLD evaluated ChatGPT 3.5's responses to patient queries in English, focusing on accuracy, completeness, and comprehensibility, using three-and six-point Likert scales. ...
Article
Full-text available
Metabolic dysfunction-associated steatotic liver disease (MASLD) is emerging as a leading cause of chronic liver disease. In recent years, artificial intelligence (AI) has attracted significant attention in healthcare, particularly in diagnostics, patient management, and drug development, demonstrating immense potential for application and implementation. In the field of MASLD, substantial research has explored the application of AI in various areas, including patient counseling, improved patient stratification, enhanced diagnostic accuracy, drug development, and prognosis prediction. However, the integration of AI in hepatology is not without challenges. Key issues include data management and privacy, algorithmic bias, and the risk of AI-generated inaccuracies, commonly referred to as “hallucinations”. This review aims to provide a comprehensive overview of the applications of AI in hepatology, with a focus on MASLD, highlighting both its transformative potential and its inherent limitations.
Article
Artificial intelligence (AI) methods enable humans to analyse large amounts of data, which would otherwise not be feasibly quantifiable. This is especially true for unstructured visual and textual data, which can contain invaluable insights into disease. The hepatology research landscape is complex and has generated large amounts of data to be mined. Many open questions can potentially be addressed with existing data through AI methods. However, the field of AI is sometimes obscured by hype cycles and imprecise terminologies. This can conceal the fact that numerous hepatology research groups already use AI methods in their scientific studies. In this review article, we aim to assess the contemporaneous use of AI methods in hepatology in Europe. To achieve this, we systematically surveyed all scientific contributions presented at the EASL Congress 2024. Out of 1,857 accepted abstracts (1,712 posters and 145 oral presentations), 6 presentations (∼4%) and 69 posters (∼4%) utilised AI methods. Of these, 55 posters were included in this review, while the others were excluded due to missing posters or incomplete methodologies. Finally, we summarise current academic trends in the use of AI methods and outline future directions, providing guidance for scientific stakeholders in the field of hepatology.
Article
Full-text available
Background: Artificial intelligence (AI)-based chatbots have shown promise in providing counseling to patients with metabolic dysfunction-associated steatotic liver disease (MASLD). While ChatGPT3.5 has demonstrated the ability to comprehensively answer MASLD-related questions in English, its accuracy remains suboptimal. Whether language influences these results is unclear. This study aims to assess ChatGPT’s performance as a counseling tool for Italian MASLD patients. Methods: Thirteen Italian experts rated the accuracy, completeness and comprehensibility of ChatGPT3.5 in answering 15 MASLD-related questions in Italian using a six-point accuracy, three-point completeness and three-point comprehensibility Likert’s scale. Results: Mean scores for accuracy, completeness and comprehensibility were 4.57 ± 0.42, 2.14 ± 0.31 and 2.91 ± 0.07, respectively. The physical activity domain achieved the highest mean scores for accuracy and completeness, whereas the specialist referral domain achieved the lowest. Overall, Fleiss’s coefficient of concordance for accuracy, completeness and comprehensibility across all 15 questions was 0.016, 0.075 and −0.010, respectively. Age and academic role of the evaluators did not influence the scores. The results were not significantly different from our previous study focusing on English. Conclusion: Language does not appear to affect ChatGPT’s ability to provide comprehensible and complete counseling to MASLD patients, but accuracy remains suboptimal in certain domains.
Article
Full-text available
The release of ChatGPT has initiated new thinking about AI-based Chatbot and its application and has drawn huge public attention worldwide. Researchers and doctors have started thinking about the promise and application of AI-related large language models in medicine during the past few months. Here, the comprehensive review highlighted the overview of Chatbot and ChatGPT and their current role in medicine. Firstly, the general idea of Chatbots, their evolution, architecture, and medical use are discussed. Secondly, ChatGPT is discussed with special emphasis of its application in medicine, architecture and training methods, medical diagnosis and treatment, research ethical issues, and a comparison of ChatGPT with other NLP models are illustrated. The article also discussed the limitations and prospects of ChatGPT. In the future, these large language models and ChatGPT will have immense promise in healthcare. However, more research is needed in this direction.
Article
Full-text available
Metabolic dysfunction-associated steatotic liver disease (MASLD) is the latest term for steatotic liver disease associated with metabolic syndrome. MASLD is the most common cause of chronic liver disease and is the leading cause of liver-related morbidity and mortality. It is important that all stakeholders be involved in tackling the public health threat of obesity and obesity-related diseases, including MASLD. A simple and clear assessment and referral pathway using non-invasive tests is essential to ensure that patients with severe MASLD are identified and referred to specialist care, while patients with less severe disease remain in primary care, where they are best managed. While lifestyle intervention is the cornerstone of the management of patients with MASLD, cardiovascular disease risk must be properly assessed and managed because cardiovascular disease is the leading cause of mortality. No pharmacological agent has been approved for the treatment of MASLD, but novel anti-hyperglycemic drugs appear to have benefit. Medications used for the treatment of diabetes and other metabolic conditions may need to be adjusted as liver disease progresses to cirrhosis, especially decompensated cirrhosis. Based on non-invasive tests, the concepts of compensated advanced chronic liver disease and clinically significant portal hypertension provide a practical approach to stratifying patients according to the risk of liver-related complications and can help manage such patients. Finally, prevention and management of sarcopenia should be considered in the management of patients with MASLD.
Article
Full-text available
Introduction Healthcare systems are complex and challenging for all stakeholders, but artificial intelligence (AI) has transformed various fields, including healthcare, with the potential to improve patient care and quality of life. Rapid AI advancements can revolutionize healthcare by integrating it into clinical practice. Reporting AI’s role in clinical practice is crucial for successful implementation by equipping healthcare providers with essential knowledge and tools. Research Significance This review article provides a comprehensive and up-to-date overview of the current state of AI in clinical practice, including its potential applications in disease diagnosis, treatment recommendations, and patient engagement. It also discusses the associated challenges, covering ethical and legal considerations and the need for human expertise. By doing so, it enhances understanding of AI’s significance in healthcare and supports healthcare organizations in effectively adopting AI technologies. Materials and Methods The current investigation analyzed the use of AI in the healthcare system with a comprehensive review of relevant indexed literature, such as PubMed/Medline, Scopus, and EMBASE, with no time constraints but limited to articles published in English. The focused question explores the impact of applying AI in healthcare settings and the potential outcomes of this application. Results Integrating AI into healthcare holds excellent potential for improving disease diagnosis, treatment selection, and clinical laboratory testing. AI tools can leverage large datasets and identify patterns to surpass human performance in several healthcare aspects. AI offers increased accuracy, reduced costs, and time savings while minimizing human errors. It can revolutionize personalized medicine, optimize medication dosages, enhance population health management, establish guidelines, provide virtual health assistants, support mental health care, improve patient education, and influence patient-physician trust. Conclusion AI can be used to diagnose diseases, develop personalized treatment plans, and assist clinicians with decision-making. Rather than simply automating tasks, AI is about developing technologies that can enhance patient care across healthcare settings. However, challenges related to data privacy, bias, and the need for human expertise must be addressed for the responsible and effective implementation of AI in healthcare.
Article
Full-text available
Background: Clinically significant weight loss-which requires sustained dietary and physical activity changes-is central to treating NAFLD. Although behavioral interventions have demonstrated effectiveness in promoting weight loss among primary prevention populations, the data are limited among patients with NAFLD who need weight loss for treatment. We undertook this scoping review to map the existing data on the characteristics, weight-loss outcomes, and determinants of success of interventions evaluated among patients with NAFLD. Methods: We searched Medline, EMBASE, Cochrane, PsycINFO, and Web of Science from inception to January 1, 2023 to identify publications reporting weight loss among adults with NAFLD in behavioral weight-loss interventions. We summarized interventions and classified them as successful if there was an average weight loss of ≥ 5% from baseline across enrolled participants or achieved by ≥ 50% of enrolled participants. Results: We included 28 studies: 10 randomized control trials, ten quasi-experimental, and 8 observational studies. Intervention delivery, duration, and counseling frequency varied; 12 were successful. Retention was highest among telephone interventions and lowest among "real-world" face-to-face interventions. Patients who were women, younger, and/or had multiple metabolic conditions were most likely to dropout. Successful interventions had biweekly counseling, specific physical activity, and calorie targets, behavioral theory grounding, and promoted goal-setting, self-monitoring, and problem-solving. Conclusion: There are limited data on behavioral weight-loss interventions in NAFLD. Research is needed to develop effective interventions generalizable to diverse patient populations and that maximize adherence, particularly among patients who are diabetic, women, and younger.
Article
Full-text available
Non-alcoholic fatty liver disease (NAFLD) and alcohol-related liver disease (ALD), both of them accounting for fatty liver disease (FLD), are among the most common chronic liver diseases globally, contributing to substantial public health burden. Both NAFLD and ALD share a similar picture of clinical presentation yet may have differences in prognosis and treatment, which renders early and accurate diagnosis difficult but necessary. While NAFLD is the fastest increasing chronic liver disease, the prevalence of ALD has seemingly remained stable in recent years. Lately, the term steatotic liver disease (SLD) has been introduced, replacing FLD to reduce stigma. SLD represents an overarching term to primarily comprise metabolic dysfunction-associated steatotic liver disease (MASLD), formerly known as non-alcoholic fatty liver disease (NAFLD), as well as alcohol-related liver disease (ALD), and MetALD, defined as a continuum across which the contribution of MASLD and ALD varies. The present review discusses current knowledge on common denominators of NAFLD/MASLD and ALD in order to highlight clinical and research needs to improve our understanding of SLD.
Article
Full-text available
Generative Pretrained Transformer, often known as GPT, is an innovative kind of Artificial Intelligence (AI) which can produce writing that seems to have been written by a person. OpenAI created this AI language model called ChatGPT. It is built using the GPT architecture and is trained on a large corpus of text data to respond to natural language inquiries that resemble a person’s requirements. This technology has lots of applications in healthcare. The need for accurate and current data is one of the major obstacles to adopting ChatGPT in healthcare. GPT must have access to precise and up-to-date medical data to provide trustworthy suggestions and treatment options. It might be accomplished by ensuring that the data used by GPT is received from reliable sources and that the data is updated regularly. Since sensitive medical information would be involved, it will also be crucial to consider privacy and security issues while utilising GPT in the healthcare industry. This paper briefs about ChatGPT and its need for healthcare, its significant Work Flow Dimensions and typical features of ChatGPT for the Healthcare domain. Finally, it identified and discussed significant applications of ChatGPT for healthcare. ChatGPT can comprehend the conversational context and provide contextually appropriate replies. Its effectiveness as a conversational AI tool makes it useful for chatbots, virtual assistants, and other applications. However, we see many limitations in medical ethics, data interpretation, accountability and other issues related to the privacy. Regarding specialised tasks like text creation, language translation, text categorisation, text summarisation, and creating conversation systems, ChatGPT has been pre-trained on a large corpus of text data, and somewhat satisfactory results can be expected. Moreover, it can also be utilised for various Natural Language Processing (NLP) activities, including sentiment analysis, part-of-speech tagging, and named entity identification.
Article
Over the past two decades, the field of hepatology has witnessed major developments in diagnostic tools, prognostic models, and treatment options making it one of the most complex medical subspecialties. Through artificial intelligence (AI) and machine learning, computers are now able to learn from complex and diverse clinical datasets to solve real-world medical problems with performance that surpasses that of physicians in certain areas. AI algorithms are currently being implemented in liver imaging, interpretation of liver histopathology, noninvasive tests, prediction models and more. In this review, we provide a summary of the state of AI in hepatology and discuss current challenges for large-scale implementation including some ethical aspects. We would like to emphasize to the readers that most AI-based algorithms that will be discussed in this review are still considered in early development and their utility and impact on patient outcomes still need to be assessed in future large-scale and inclusive studies. Our vision is that the use of AI in hepatology will enhance physician performance, decrease the burden and time spent on documentation, and re-establish the personalized patient-physician relationship that is of utmost importance for obtaining good outcomes.