Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Animal owners may increasingly rely on large language models for gathering animal health information alongside internet sources in the future. This study therefore aims to provide initial results on the accuracy of ChatGPT-4o in triage and tentative diagnostics, using horses as a case study. Ten test vignettes were used to prompt situation assessments from the tool, which were then compared to original assessments made by a veterinary specialist for horses. The most probable diagnosis suggested by ChatGPT-4o was found to be quite accurate in most cases, with the urgency to contact a veterinarian sometimes assessed as higher than necessary. When provided with all relevant information, the tool does not seem to compromise horse health by recommending excessively long waiting times, although there is still potential for improving the relief of veterinarians’ workload.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Clinical decision-making is one of the most impactful parts of a physician’s responsibilities and stands to benefit greatly from artificial intelligence solutions and large language models (LLMs) in particular. However, while LLMs have achieved excellent performance on medical licensing exams, these tests fail to assess many skills necessary for deployment in a realistic clinical decision-making environment, including gathering information, adhering to guidelines, and integrating into clinical workflows. Here we have created a curated dataset based on the Medical Information Mart for Intensive Care database spanning 2,400 real patient cases and four common abdominal pathologies as well as a framework to simulate a realistic clinical setting. We show that current state-of-the-art LLMs do not accurately diagnose patients across all pathologies (performing significantly worse than physicians), follow neither diagnostic nor treatment guidelines, and cannot interpret laboratory results, thus posing a serious risk to the health of patients. Furthermore, we move beyond diagnostic accuracy and demonstrate that they cannot be easily integrated into existing workflows because they often fail to follow instructions and are sensitive to both the quantity and order of information. Overall, our analysis reveals that LLMs are currently not ready for autonomous clinical decision-making while providing a dataset and framework to guide future studies.
Article
Full-text available
Large language models (LLMs) are rapidly advancing and demonstrating high performance in understanding textual information, suggesting potential applications in interpreting patient histories and documented imaging findings. As LLMs continue to improve, their diagnostic abilities are expected to be enhanced further. However, there is a lack of comprehensive comparisons between LLMs from different manufacturers. In this study, we aimed to test the diagnostic performance of the three latest major LLMs (GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro) using Radiology Diagnosis Please Cases, a monthly diagnostic quiz series for radiology experts. Clinical history and imaging findings, provided textually by the case submitters, were extracted from 324 quiz questions originating from Radiology Diagnosis Please cases published between 1998 and 2023. The top three differential diagnoses were generated by GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, using their respective application programming interfaces. A comparative analysis of diagnostic performance among these three LLMs was conducted using Cochrane’s Q and post hoc McNemar’s tests. The respective diagnostic accuracies of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro for primary diagnosis were 41.0%, 54.0%, and 33.9%, which further improved to 49.4%, 62.0%, and 41.0%, when considering the accuracy of any of the top three differential diagnoses. Significant differences in the diagnostic performance were observed among all pairs of models. Claude 3 Opus outperformed GPT-4o and Gemini 1.5 Pro in solving radiology quiz cases. These models appear capable of assisting radiologists when supplied with accurate evaluations and worded descriptions of imaging findings.
Article
Full-text available
ChatGPT, the most accessible generative artificial intelligence (AI) tool, offers considerable potential for veterinary medicine, yet a dedicated review of its specific applications is lacking. This review concisely synthesizes the latest research and practical applications of ChatGPT within the clinical, educational, and research domains of veterinary medicine. It intends to provide specific guidance and actionable examples of how generative AI can be directly utilized by veterinary professionals without a programming background. For practitioners, ChatGPT can extract patient data, generate progress notes, and potentially assist in diagnosing complex cases. Veterinary educators can create custom GPTs for student support, while students can utilize ChatGPT for exam preparation. ChatGPT can aid in academic writing tasks in research, but veterinary publishers have set specific requirements for authors to follow. Despite its transformative potential, careful use is essential to avoid pitfalls like hallucination. This review addresses ethical considerations, provides learning resources, and offers tangible examples to guide responsible implementation. A table of key takeaways was provided to summarize this review. By highlighting potential benefits and limitations, this review equips veterinarians, educators, and researchers to harness the power of ChatGPT effectively.
Article
Full-text available
The integration of artificial intelligence (AI) into health care has seen remarkable advancements, with applications extending to animal health. This article explores the potential benefits and challenges associated with employing AI chatbots as tools for pet health care. Focusing on ChatGPT, a prominent language model, the authors elucidate its capabilities and its potential impact on pet owners' decision‐making processes. AI chatbots offer pet owners access to extensive information on animal health, research studies and diagnostic options, providing a cost‐effective and convenient alternative to traditional veterinary consultations. The fate of a case involving a Border Collie named Sassy demonstrates the potential benefits of AI in veterinary medicine. In this instance, ChatGPT played a pivotal role in suggesting a diagnosis that led to successful treatment, showcasing the potential of AI chatbots as valuable tools in complex cases. However, concerns arise regarding pet owners relying solely on AI chatbots for medical advice, potentially resulting in misdiagnosis, inappropriate treatment and delayed professional intervention. We emphasize the need for a balanced approach, positioning AI chatbots as supplementary tools rather than substitutes for licensed veterinarians. To mitigate risks, the article proposes strategies such as educating pet owners on AI chatbots' limitations, implementing regulations to guide AI chatbot companies and fostering collaboration between AI chatbots and veterinarians. The intricate web of responsibilities in this dynamic landscape underscores the importance of government regulations, the educational role of AI chatbots and the symbiotic relationship between AI technology and veterinary expertise. In conclusion, while AI chatbots hold immense promise in transforming pet health care, cautious and informed usage is crucial. By promoting awareness, establishing regulations and fostering collaboration, the article advocates for a responsible integration of AI chatbots to ensure optimal care for pets.
Article
Full-text available
Introduction: One possibility to support veterinarians in times of a vet shortage is by providing animal owners with a technical decision support for deciding whether their animal needs to be seen by a vet. As the first step in the user-centered development of such an mHealth application for equestrians, an analysis of the context of use was done. Methods: The analysis was carried out by reviewing existing literature and conducting an online survey with 100 participants. Results: Characteristics of the user group and the usage context are presented using an adaptation of the four layers of diversity. Many equestrians are lacking health-related knowledge and competencies as well as social networks supporting them in decision making and gaining further information. This may apply to owners of other animal species in broad ranges as well. Conclusion: The results of the analysis provide information to software developers and researchers on mHealth applications for pet owners in general and equestrians in particular to focus their work on the users' needs and therefore provide efficient results/software.
Preprint
Full-text available
Importance: Artificial intelligence (AI) applications in health care have been effective in many areas of medicine, but they are often trained for a single task using labeled data, making deployment and generalizability challenging. Whether a general-purpose AI language model can perform diagnosis and triage is unknown. Objective: Compare the general-purpose Generative Pre-trained Transformer 3 (GPT-3) AI model's diagnostic and triage performance to attending physicians and lay adults who use the Internet. Design: We compared the accuracy of GPT-3's diagnostic and triage ability for 48 validated case vignettes of both common (e.g., viral illness) and severe (e.g., heart attack) conditions to lay people and practicing physicians. Finally, we examined how well calibrated GPT-3's confidence was for diagnosis and triage. Setting and Participants: The GPT-3 model, a nationally representative sample of lay people, and practicing physicians. Exposure: Validated case vignettes (<60 words; <6th grade reading level). Main Outcomes and Measures: Correct diagnosis, correct triage. Results: Among all cases, GPT-3 replied with the correct diagnosis in its top 3 for 88% (95% CI, 75% to 94%) of cases, compared to 54% (95% CI, 53% to 55%) for lay individuals (p<0.001) and 96% (95% CI, 94% to 97%) for physicians (p=0.0354). GPT-3 triaged (71% correct; 95% CI, 57% to 82%) similarly to lay individuals (74%; 95% CI, 73% to 75%; p=0.73); both were significantly worse than physicians (91%; 95% CI, 89% to 93%; p<0.001). As measured by the Brier score, GPT-3 confidence in its top prediction was reasonably well-calibrated for diagnosis (Brier score = 0.18) and triage (Brier score = 0.22). Conclusions and Relevance: A general-purpose AI language model without any content-specific training could perform diagnosis at levels close to, but below physicians and better than lay individuals. The model was performed less well on triage, where its performance was closer to that of lay individuals.
Chapter
Full-text available
Angewandte Forschung in der Wirtschaftsinformatik 2022 Tagungsband zur 35. Jahrestagung des Arbeitskreises Wirtschaftsinformatik an Hochschulen für Angewandte Wissenschaften im deutschsprachigen Raum (AKWI) vom 11.09. bis 13.09.2022, ausgerichtet von der Hochschule für Technik und Wirtschaft Berlin (HTW Berlin) und der Hochschule für Wirtschaft und Recht Berlin (HWR Berlin)
Article
The internet has been found to be a popular source for human health information. However, there is a lack of information on pet owners’ use of the internet to source pet health information and implications for the owner–veterinarian relationship. Therefore, the aim of this study was to address this gap in knowledge by focusing on UK pet owners’ general use of the internet to find online pet health information and the impact of this behaviour on the owner–veterinarian relationship. An online survey targeting UK pet owners resulted in 571 respondents. Respondents reported the most frequently used source for pet health information was the internet (78.6 per cent), followed by their veterinarian (72 per cent). Veterinarians and other pet owners, however, were rated as the most trustworthy sources. The topics searched for most often online were specific medical problems (61.3 per cent) and diet/nutrition (58.5 per cent). Regarding the owner–veterinarian relationship, 42.1 per cent of participants reported discussing information they found online ‘sometimes’ with their veterinarian. When asked if their veterinarian recommended specific websites, nearly half (49.6 per cent) stated that their veterinarian ‘never’ made such recommendations, yet over 90 per cent said they would visit veterinarian-recommended websites.
The Working Limitations of Large Language Models. MIT SMR
  • M Burtsev
  • M Reeves
  • A Job
Burtsev M, Reeves M, Job A. The Working Limitations of Large Language Models. MIT SMR. 2023 Nov 30;(Winter 2024):2-4.
Can ChatGPT diagnose my collapsing dog? Front Vet Sci
  • S Abani
  • De Decker
  • S Tipold
  • A Nessler
  • J N Volk
Abani S, De Decker S, Tipold A, Nessler JN, Volk HA. Can ChatGPT diagnose my collapsing dog? Front Vet Sci. 2023 Oct 10;10:1245168, doi: 10.3389/fvets.2023.1245168.
Husten bei Pferden: Akute und chronische Atemwegserkrankungen
  • Pferdeklinik Aschheim
Pferdeklinik Aschheim. Husten bei Pferden: Akute und chronische Atemwegserkrankungen [Internet]. Pferdeklinik Aschheim. [cited 2024 Jun 11]. Available from: https://www.pferdeklinikaschheim.de/husten-bei-pferden/