Mental health is a growing concern across demographics, with one in five adults (National Institute of Mental Health, 2022) and one in seven children aged three to seventeen (Centers for Disease Control and Prevention, 2023) in the United States diagnosed with a mental health condition. Despite it being a prevalent issue, access to mental health support remains limited for many people, a
... [Show full abstract] challenge exacerbated by the pandemic (Lattie, 2022). In recent years, AI chatbots have emerged as a potential avenue to overcome these obstacles. With the rise of the development and use of such mental health support chatbots, it has been integral to have evaluation frameworks that ensure that these chatbots consistently provide empathetic, safe, and effective responses to the users. For this purpose, this paper introduces ESHRO, an innovative evaluation framework to analyze the LLM-generated responses on five critical metrics: Empathy, Safety, Helpfulness, Relevance, and Overall Quality. By incorporating multi-dimensional metrics and integrating both automated and human evaluation, ESHRO overcomes many limitations of existing frameworks. Moreover, to showcase its application, we developed ELY Chatbot, an AI-driven mental health chatbot developed to deliver emotional support and motivation. We utilized the ESHRO framework to evaluate it. The ESHRO framework demonstrates the potential to improve evaluations of mental health chatbots. The paper concludes by discussing limitations and highlighting opportunities for future research, ultimately paving the way for safer, more empathetic, and more impactful mental health solutions.