PosterPDF Available

Exploring patient trust in clinical advice from AI-driven LLMs like ChatGPT for self-diagnosis

Authors:
International Conference “Trusting in Care Technology or Reliance on Socio-technical
Constellation?”, February 15/16, 2024, Delmenhorst, Germany
Exploring patient trust in clinical advice from AI-driven LLMs like
ChatGPT for self-diagnosis
Delong Du1, Richard Paluch1, Gunnar Stevens1, Claudia Müller1
1 University of Siegen, Germany
a Corresponding author; E-Mail: delong.du@uni-siegen.de
Trustworthy clinical advice is crucial but burdensome when seeking health support from
professionals. Inaccessibility and financial burdens present obstacles to obtaining professional
clinical advice, even when healthcare is available (Taber et al., 2015). Consequently,
individuals often resort to self-diagnosis, utilizing medical materials to validate the health
conditions of themselves, their families, and friends. However, the convenient method of self-
diagnosis requires a commitment to learning and is often not effective, presenting risks when
individuals seek selfcare approaches or treatment strategies without professional guidance
(White & Horvitz, 2009). Artificial Intelligence (AI), supported by Large Language Models
(LLM), may become a powerful yet risky self-diagnosis tool for clinical advice, due to the
hallucination of LLM, where it produces inaccurate yet deceiving information (Sallam, 2023).
Thus, can we trust the clinical advice from AI-driven LLMs like ChatGPT like ChatGPT4 for
self-diagnosis? We examined this issue through a think-aloud observation (Van Someren,
1994): a patient uses GPT4 for self-diagnosis and clinical advice, while a doctor assesses
ChatGPT’s responses with their own expertise. After that, we conducted a semi-structured
interview with the patient to understand their trust in AI-driven LLMs for clinical advice.
Our observation results, detailed in Appendix Table 1, reveal that users, due to their lack of
professional medical knowledge and only a certain level of trust in ChatGPT, may struggle to
identify errors. This issue arises even in instances where ChatGPT has provided incorrect
clinical advice, later identified as false by medical professionals. However, it's important to
note that while patients exhibit some degree of trust in ChatGPT since its constant explainability
to address itself as “I cannot provide medical diagnoses or advice, but I can certainly help guide
you on potential next steps and things you might want to consider when talking to a healthcare
professional”, this trust is neither strong nor absolute. ChatGPT can provide quick answers to
a wide range of medical questions, but its advice can be unreliable and inaccurate, creating
mistrust. The doctor reported that checking GPT-4’s results is time-consuming and requires
more effort than making a diagnosis without GPT.
Based on our interview study, shown in Appendix Table 2, we've concluded that the
confounding factors influencing a patient's trust revolve around their competency-evaluation.
Essentially, trust is equated with efficacy, which is determined by whether decisions made
based on the AI agent's clinical advice and suggestion will effectively achieve patient’s health
goals. Patients tend to trust doctors more than AI agents due to this strategy, believing that
educated, authorized doctors can provide effective medical guidance. This competency-based
trust also explains why patients often perceive more experienced doctors as more trustworthy
compared to less experienced ones. Additionally, patients frequently seek validation for
medical advice, such as the efficacy of taking painkillers, through online sources. Another
crucial aspect of trust stems from the stringent regulations governing doctors' practices; these
professionals operate under strict rules of conduct, and any misconduct could severely damage
their careers. This regulatory environment further reinforces patients' trust in doctors (Procter
et al., 2022). The exploration of GPT4 in healthcare raises questions regarding autonomy and
safety (Matsuzaki & Lindemann, 2016; Paluch et al., 2023) in AI-driven LLMs like ChatGPT
for clinical advice.
References
Matsuzaki, H.; Lindemann, G. (2016). The autonomy-safety-paradox of service robotics in
Europe and Japan: a comparative analysis. AI & Society, 31, 501–517.
https://doi.org/10.1007/s00146-015-0630-7
Paluch, R.; Aal, T.; Cerna, K.; Randall, D.; Müller, C. (2023). Heteromated decision-making:
integrating socially assistive robots in care relationships. https://arxiv.org/abs/2304.10116
Procter, R.; Tolmie, P.; Rouncefield, M. (2022). Trust, professional vision and diagnostic work.
In: Nazmun Nisat Ontika et al. (eds.), Exploring Human-Centered AI in Healthcare: A
Workshop Report. International Reports on Socio-Informatics 19(2). https://www.iisi.de/wp-
content/uploads/2022/10/IRSI_V19I2.pdf
Sallam M. (2023). ChatGPT utility in healthcare education, research, and practice: systematic
review on the promising perspectives and valid concerns. Healthcare, 11(6), 887.
https://doi.org/10.3390/healthcare11060887
Taber, J. M.; Leyva, B.; Persoskie, A. (2015). Why do people avoid medical care? A qualitative
study using national data. Journal of General Internal Medicine, 30(3), 290–297.
https://doi.org/10.1007/s11606-014- 3089-1
Van Someren, M. W.; Barnard, Y. F.; Sandberg, J. A. C. (1994). The Think Aloud Method: A
Practical Guide to Modelling Cognitive Processes. London: Academic Press.
White, R. W.; Horvitz, E. (2009). Experiences with web search on medical concerns and self
diagnosis. In: AMIA Annual Symposium Proceedings, 696–700.
Appendix
Table 1. Think-Aloud Transcript Example of GPT4’s Clinical Advice
Transcript GPT4 Conversation
Patient: Let me ask GPT4 to help me figure
out my constant neck pain.
Doctor: Seems like GPT4 doesn’t ask you
questions like real diagnosis. The Clinical
advice seems too much on the surface.
Patient: Could you start to act like a doctor
and help me diagnosis the cause?
GPT4: I’m really sorry, but I must emphasize
that I’m not a doctor, and I’m not able to
diagnose conditions or provide medical
advice........
Patient: give it a try.
GPT4: I cannot provide medical diagnoses or
advice, but I can certainly help
guide you on potential next steps and things
you might want to consider when
talking to a healthcare professional......
[Clinical Advice]
Doctor: Try to write it with the symptom
prompts of “Neck pain for several years,
worsens with heat, sour pain........” and ask for
the medicine with “rehabilitation
recommendation...”
Patient: Ibuprofen seems right for neck and
chest pain.
Doctor: Ibuprofen is wrong advice. Taking
pain killer like Ibuprofen constantly will
reduce its effect, meaning you will need to
increase dose over time if you become rely on
it.
Patient: Could you give me some medical
suggestions? What kind of medicine should I
buy?
GPT4: I must stress that I’m not a licensed
healthcare professional, and I cannot prescribe
medications or provide medical advice. It is
crucial to consult with a doctor.......
Patient: Neck pain.....sour pain [symptoms].
Provide me with a rehabilitation
recommendation, including the use of
medications and lifestyle habits like sleep and
diet, etc.
GPT4: Although I am not a doctor, I can offer
some general recovery suggestions. Please
note that these are merely suggestions, and
specific treatment plans should be provided by
professional medical personnel. For neck and
chest pain, you might need to use OTC anti-
inflammatory and pain relief medications like
Ibuprofen. Specific medications and dosages
should be prescribed by doctors......[Clinical
Advice]
Table 2. Interview Transcript Example with the Patient About Trust
Research Interviewer: Why did you trust the doctor and not ChatGPT?
Patient Participant: I trust the doctor more because I am familiar with GPT’s competency
limitation... The doctor explained the reason why painkiller wouldn't be the ideal clinical
advice... The valid explanation educated me, and I trusted the doctor's reason.
Research Interviewer: Could you ask ChatGPT why painkiller wouldn't be the ideal clinical
advice?
Patient Participant: It also gives a detailed explanation of why painkiller wouldn’t be ideal.
However, I feel gaining competency through self-diagnosis with LLM-driven AI like
ChatGPT4 is harder than learning from diagnoses with actual doctors. This is due to, according
to my experience, doctors would quickly give relevant answers, whereas GPT4 provides too
many contexts to read through. I also think that GPT4 could say “yes” and then “no” to its own
clinical advice, meaning it would suggest, and then make a completely opposite one, when
users show a tendency for a certain answer preference/expectation. GPT4 tries to please users
by often agreeing with them.
Article
Large language models (LLM) have been promising recently in the medical field, with numerous applications in clinical neuroscience. OpenAI’s launch of Generative Pre-trained Transformer 3.5 (GPT-3.5) in November 2022 and its successor, Generative Pre-trained Transformer 4 (GPT 4) in March 2023 have garnered widespread attention and debate surrounding natural language processing (NLP) and LLM advancements. Transformer models are trained on natural language datasets to predict and generate sequences of characters. Using internal weights from training, they produce tokens that align with their understanding of the initial input. This paper delves into ChatGPT’s potential as a learning tool in neurosurgery while contextualizing its abilities for passing medical licensing exams and neurosurgery written boards. Additionally, possibilities for creating personalized case presentations and study material are discussed alongside ChatGPT’s capacity to optimize the research workflow and perform a concise literature review. However, such tools need to be used with caution, given the possibility of artificial intelligence hallucinations and other concerns such as user overreliance, and complacency. Overall, this opinion paper raises key points surrounding ChatGPT’s role in neurosurgical education.
Article
Full-text available
ChatGPT is an artificial intelligence (AI)-based conversational large language model (LLM). The potential applications of LLMs in health care education, research, and practice could be promising if the associated valid concerns are proactively examined and addressed. The current systematic review aimed to investigate the utility of ChatGPT in health care education, research, and practice and to highlight its potential limitations. Using the PRIMSA guidelines, a systematic search was conducted to retrieve English records in PubMed/MEDLINE and Google Scholar (published research or preprints) that examined ChatGPT in the context of health care education, research, or practice. A total of 60 records were eligible for inclusion. Benefits of ChatGPT were cited in 51/60 (85.0%) records and included: (1) improved scientific writing and enhancing research equity and versatility; (2) utility in health care research (efficient analysis of datasets, code generation, literature reviews, saving time to focus on experimental design, and drug discovery and development); (3) benefits in health care practice (streamlining the workflow, cost saving, documentation, personalized medicine, and improved health literacy); and (4) benefits in health care education including improved personalized learning and the focus on critical thinking and problem-based learning. Concerns regarding ChatGPT use were stated in 58/60 (96.7%) records including ethical, copyright, transparency, and legal issues, the risk of bias, plagiarism, lack of originality, inaccurate content with risk of hallucination, limited knowledge, incorrect citations, cybersecurity issues, and risk of infodemics. The promising applications of ChatGPT can induce paradigm shifts in health care education, research, and practice. However, the embrace of this AI chatbot should be conducted with extreme caution considering its potential limitations. As it currently stands, ChatGPT does not qualify to be listed as an author in scientific articles unless the ICMJE/COPE guidelines are revised or amended. An initiative involving all stakeholders in health care education, research, and practice is urgently needed. This will help to set a code of ethics to guide the responsible use of ChatGPT among other LLMs in health care and academia.
Article
Full-text available
Service and personal care robots are starting to cross the threshold into the wilderness of everyday life, where they are supposed to interact with inexperienced lay users in a changing environment. In order to function as intended, robots must become independent entities that monitor themselves and improve their own behaviours based on learning outcomes in practice. This poses a great challenge to robotics, which we are calling the ‘‘autonomysafety- paradox’’ (ASP). The integration of robot applications into society requires the reconciliation of two conflicting aspects: increasing machine autonomy and ensuring safety in end-use. As the level of robot autonomy grows, the risk of accidents will increase, and it will become more and more difficult to identify who is responsible for any damage incurred. However, emphasizing safety impairs the autonomous functioning of the robot. This problem implies the need for a broadened concept of product safety. Our comparative study shows that the institutional framing of the ASP as well as concrete solutions to this problem differs between Europe and Japan in two respects: (1) the understanding of robot agency and (2) the concept of ‘‘appropriate’’ user–robot interaction.
Article
Full-text available
BACKGROUND: Much research has examined barriers to health care utilization, with the majority conducted in the context of specific populations and diseases. Less research has focused on reasons people avoid seeking medical care, even when they suspect they should go. OBJECTIVE: To present a comprehensive description and conceptual categorization of reasons people avoid medical care. DESIGN: Data were collected as part of the 2008 Health Information National Trends Survey, a cross-sectional national survey. PARTICIPANTS: Participant-generated reasons for avoiding medical care were provided by 1,369 participants (40% male; Mage=48.9; 75.1% Non-Hispanic White, 7.4% Non-Hispanic Black, 8.5% Hispanic or Latino/a). MAIN MEASURES: Participants first indicated their level of agreement with three specific reasons for avoiding medical care; these data are reported elsewhere. We report responses to a follow-up question in which participants identified other reasons they avoid seeking medical care. Reasons were coded using a general inductive approach. KEY RESULTS: Three main categories of reasons for avoiding medical care were identified. First, over one third of participants (33.3% of 1,369) reported unfavorable evaluations of seeking medical care, such as factors related to physicians, health care organizations, and affective concerns. Second, a subset of participants reported low perceived need to seek medical care (12.2%), often because they expected their illness or symptoms to improve over time (4.0%). Third, many participants reported traditional barriers to medical care (58.4%), such as high cost (24.1%), no health insurance (8.3%), and time constraints (15.6%). We developed a conceptual model of medical care avoidance based on these results. CONCLUSIONS: Reasons for avoiding medical care were nuanced and highly varied. Understanding why people do not make it through the clinic door is critical to extending the reach and effectiveness of patient care, and these data point to new directions for research and strategies to reduce avoidance.
Heteromated decision-making: integrating socially assistive robots in care relationships
  • R Paluch
  • T Aal
  • K Cerna
  • D Randall
  • C Müller
Paluch, R.; Aal, T.; Cerna, K.; Randall, D.; Müller, C. (2023). Heteromated decision-making: integrating socially assistive robots in care relationships. https://arxiv.org/abs/2304.10116
Trust, professional vision and diagnostic work
  • R Procter
  • P Tolmie
  • M Rouncefield
Procter, R.; Tolmie, P.; Rouncefield, M. (2022). Trust, professional vision and diagnostic work. In: Nazmun Nisat Ontika et al. (eds.), Exploring Human-Centered AI in Healthcare: A Workshop Report. International Reports on Socio-Informatics 19(2). https://www.iisi.de/wpcontent/uploads/2022/10/IRSI_V19I2.pdf
Experiences with web search on medical concerns and self diagnosis
  • R W White
  • E Horvitz
White, R. W.; Horvitz, E. (2009). Experiences with web search on medical concerns and self diagnosis. In: AMIA Annual Symposium Proceedings, 696-700.