Article

Is This Chatbot Safe and Evidence-Based? A Call for the Critical Evaluation of Gen AI Mental Health Chatbots (Preprint)

Authors:
  • Happify.com
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Smartphone applications are one of the main delivery modalities in digital health. Many of these mHealth apps use gamification to engage users, improve user experience, and achieve better health outcomes. Yet, it remains unclear whether gamified approaches help to deliver effective, safe, and clinically beneficial products to users. This study examines the compliance of 69 gamified mHealth apps with the EU Medical Device Regulation and assesses the specific risks arising from the gamified nature of these apps. Of the identified apps, 32 (46.4%) were considered non-medical devices; seven (10.1%) were already cleared/approved by the regulatory authorities, and 31 (44.9%) apps were assessed as likely non-compliant or potentially non-compliant with regulatory requirements. These applications and one approved application were assessed as on the market without the required regulatory approvals. According to our analysis, a higher proportion of these apps would be classified as medical devices in the US. The level of risk posed by gamification remains ambiguous. While most apps showed only a weak link between the degree of gamification and potential risks, this link was stronger for those apps with a high degree of gamification or an immersive game experience.
Article
Full-text available
This study conducted an in-depth analysis of responses generated by ChatGPT, a popular language model-based chat application, in the context of mental health conversations. The analysis encompassed three key dimensions: word count, sentiment analysis, and response quality. ChatGPT consistently outperformed the dataset, delivering longer responses with an average word count of 352, compared to the dataset’s 70. This raises questions about the depth of information conveyed. ChatGPT responses demonstrated an average sentiment score of 0.53, surpassing the dataset’s average of 0.23. This heightened sentiment in ChatGPT responses, coupled with their more extensive word count, suggests a potential for enhanced emotional engagement and nuanced communication, particularly crucial in mental health support scenarios. Finally, ChatGPT responses exhibited a more substantial incorporation of relevant information, showcased a higher degree of empathy toward user queries, and elevated sentiment scores, positioning it as a potentially effective mental health assistant. These findings offer valuable insights into the potential role of ChatGPT as a mental health assistant. However, it is imperative to emphasize the need for further research employing larger and more diverse datasets to comprehensively assess its practical applicability. Additionally, ethical considerations surrounding chatbots in mental health support warrant careful examination in future studies.
Article
Full-text available
Objective Smartphone apps (apps) are widely recognised as promising tools for improving access to mental healthcare. However, a key challenge is the development of digital interventions that are acceptable to end users. Co-production with providers and stakeholders is increasingly positioned as the gold standard for improving uptake, engagement, and healthcare outcomes. Nevertheless, clear guidance around the process of co-production is lacking. The objectives of this review were to: (i) present an overview of the methods and approaches to co-production when designing, producing, and evaluating digital mental health interventions; and (ii) explore the barriers and facilitators affecting co-production in this context. Methods A pre-registered (CRD42023414007) systematic review was completed in accordance with The Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines. Five databases were searched. A co-produced bespoke quality appraisal tool was developed with an expert by experience to assess the quality of the co-production methods and approaches. A narrative synthesis was conducted. Results Twenty-six studies across 24 digital mental health interventions met inclusion criteria. App interventions were rarely co-produced with end users throughout all stages of design, development, and evaluation. Co-producing digital mental health interventions added value by creating culturally sensitive and acceptable interventions. Reported challenges included resource issues exacerbated by the digital nature of the intervention, variability across stakeholder suggestions, and power imbalances between stakeholders and researchers. Conclusions Variation in approaches to co-producing digital mental health interventions is evident, with inconsistencies between stakeholder groups involved, stage of involvement, stakeholders’ roles and methods employed.
Article
Full-text available
Background: Chatbots are an emerging technology that show potential for mental health care apps to enable effective and practical evidence-based therapies. As this technology is still relatively new, little is known about recently developed apps and their characteristics and effectiveness. Objective: In this study, we aimed to provide an overview of the commercially available popular mental health chatbots and how they are perceived by users. Methods: We conducted an exploratory observation of 10 apps that offer support and treatment for a variety of mental health concerns with a built-in chatbot feature and qualitatively analyzed 3621 consumer reviews from the Google Play Store and 2624 consumer reviews from the Apple App Store. Results: We found that although chatbots' personalized, humanlike interactions were positively received by users, improper responses and assumptions about the personalities of users led to a loss of interest. As chatbots are always accessible and convenient, users can become overly attached to them and prefer them over interacting with friends and family. Furthermore, a chatbot may offer crisis care whenever the user needs it because of its 24/7 availability, but even recently developed chatbots lack the understanding of properly identifying a crisis. Chatbots considered in this study fostered a judgment-free environment and helped users feel more comfortable sharing sensitive information. Conclusions: Our findings suggest that chatbots have great potential to offer social and psychological support in situations where real-world human interaction, such as connecting to friends or family members or seeking professional support, is not preferred or possible to achieve. However, there are several restrictions and limitations that these chatbots must establish according to the level of service they offer. Too much reliance on technology can pose risks, such as isolation and insufficient assistance during times of crisis. Recommendations for customization and balanced persuasion to inform the design of effective chatbots for mental health support have been outlined based on the insights of our findings.
Article
Full-text available
In supervised learning model development, domain experts are often used to provide the class labels (annotations). Annotation inconsistencies commonly occur when even highly experienced clinical experts annotate the same phenomenon (e.g., medical image, diagnostics, or prognostic status), due to inherent expert bias, judgments, and slips, among other factors. While their existence is relatively well-known, the implications of such inconsistencies are largely understudied in real-world settings, when supervised learning is applied on such 'noisy' labelled data. To shed light on these issues, we conducted extensive experiments and analyses on three real-world Intensive Care Unit (ICU) datasets. Specifically, individual models were built from a common dataset, annotated independently by 11 Glasgow Queen Elizabeth University Hospital ICU consultants, and model performance estimates were compared through internal validation (Fleiss' κ = 0.383 i.e., fair agreement). Further, broad external validation (on both static and time series datasets) of these 11 classifiers was carried out on a HiRID external dataset, where the models' classifications were found to have low pairwise agreements (average Cohen's κ = 0.255 i.e., minimal agreement). Moreover, they tend to disagree more on making discharge decisions (Fleiss' κ = 0.174) than predicting mortality (Fleiss' κ = 0.267). Given these inconsistencies, further analyses were conducted to evaluate the current best practices in obtaining gold-standard models and determining consensus. The results suggest that: (a) there may not always be a "super expert" in acute clinical settings (using internal and external validation model performances as a proxy); and (b) standard consensus seeking (such as majority vote) consistently leads to suboptimal models. Further analysis, however, suggests that assessing annotation learnability and using only 'learnable' annotated datasets for determining consensus achieves optimal models in most cases.
Article
Full-text available
https://www.jmir.org/2021/3/e24387 Background Digital mental health interventions (DMHIs), which deliver mental health support via technologies such as mobile apps, can increase access to mental health support, and many studies have demonstrated their effectiveness in improving symptoms. However, user engagement varies, with regard to a user’s uptake and sustained interactions with these interventions. Objective This systematic review aims to identify common barriers and facilitators that influence user engagement with DMHIs. Methods A systematic search was conducted in the SCOPUS, PubMed, PsycINFO, Web of Science, and Cochrane Library databases. Empirical studies that report qualitative and/or quantitative data were included. Results A total of 208 articles met the inclusion criteria. The included articles used a variety of methodologies, including interviews, surveys, focus groups, workshops, field studies, and analysis of user reviews. Factors extracted for coding were related to the end user, the program or content offered by the intervention, and the technology and implementation environment. Common barriers included severe mental health issues that hampered engagement, technical issues, and a lack of personalization. Common facilitators were social connectedness facilitated by the intervention, increased insight into health, and a feeling of being in control of one’s own health. Conclusions Although previous research suggests that DMHIs can be useful in supporting mental health, contextual factors are important determinants of whether users actually engage with these interventions. The factors identified in this review can provide guidance when evaluating DMHIs to help explain and understand user engagement and can inform the design and development of new digital interventions.
Article
Full-text available
The present review is a comprehensive examination of the therapist's personal attributes and in-session activities that positively influence the therapeutic alliance from a broad range of psychotherapy perspectives. Therapist's personal attributes such as being flexible, honest, respectful, trustworthy, confident, warm, interested, and open were found to contribute positively to the alliance. Therapist techniques such as exploration, reflection, noting past therapy success, accurate interpretation, facilitating the expression of affect, and attending to the patient's experience were also found to contribute positively to the alliance. This review reveals how these therapist personal qualities and techniques have a positive influence on the identification or repair of ruptures in the alliance.
Article
Mental health services across the globe are overburdened due to increased patient need for psychological therapies and a shortage of qualified mental health practitioners. This is unlikely to change in the short-to-medium term. Digital support is urgently needed to facilitate access to mental healthcare while creating efficiencies in service delivery. In this paper, we evaluate the use of a conversational artificial intelligence (AI) solution ( Limbic Access ) to assist both patients and mental health practitioners with referral, triage, and clinical assessment of mild-to-moderate adult mental illness. Assessing this solution in the context of England’s National Health Service (NHS) Talking Therapies services, we demonstrate in a cohort study design that deploying such an AI solution is associated with improved recovery rates. We find that those NHS Talking Therapies services that introduced the conversational AI solution improved their recovery rates, while comparable NHS Talking Therapies services across the country reported deteriorating recovery rates during the same time period. Further, we provide an economic analysis indicating that the usage of this AI solution can be highly cost-effective relative to other methods of improving recovery rates. Together, these results highlight the potential of AI solutions to support mental health services in the delivery of quality care in the context of worsening workforce supply and system overburdening. For transparency, the authors of this paper declare our conflict of interest as employees and shareholders of Limbic Access, the AI solution referred to in this paper.
Chapter
As of 2023 chatbots and AI are becoming rather ubiquitous in the world of medicine. Gone is the awry advice of Dr. Sbaitso, a doctor only in name (as examined in Chap. 4). In this chapter we’ll be exploring how modern AI technology is used by the healthcare industry. We’re going to cover both chatbots and AI in general for these matters. A part of the chapter will be dedicated to the numerous ethical implications that inevitably arise when AI and medicine come together.