Distribution of meta-evaluation scores assigned by individual meta-evaluators. Text counts are for all generators combined.

Distribution of meta-evaluation scores assigned by individual meta-evaluators. Text counts are for all generators combined.

Source publication
Preprint
Full-text available
The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts rises many concerns regarding their misuse. Previous research has shown that LLMs can be effectively misused for generating disinformation news articles following predefined narratives. Their capabilities to...

Context in source publication

Context 1
... ρ reached 0.76 in validation subset and 0.83 in all dataset. Figure 5 shows that Gemma was less likely to assign the highest and the lowest scores than Llama and GPT-4o. ...

Similar publications

Preprint
Full-text available
Modern large language models (LLMs) are optimized for human-aligned responses using Reinforcement Learning from Human Feedback (RLHF). However, existing RLHF approaches assume a universal preference model and fail to account for individual user preferences, limiting their effectiveness in personalized applications. We introduce a framework that ext...
Preprint
Full-text available
While Reinforcement Learning from Human Feedback (RLHF) is widely used to align Large Language Models (LLMs) with human preferences, it typically assumes homogeneous preferences across users, overlooking diverse human values and minority viewpoints. Although personalized preference learning addresses this by tailoring separate preferences for indiv...

Citations

Article
Purpose This study aims to investigate how safe large language model (LLM)-based artificial intelligence (AI) Chatbots are for young consumers of Generation Z for use in their purchase decisions. The research findings intend to inform potential security issues with LLM-based AI chatbots putting at risk the well-being of such consumers. Design/methodology/approach The study adopted the JAILBREAKHUB framework to evaluate the effectiveness of LLM guardrails against negative prompts for purchase-related decisions. The guardrails of LLM-based AI chatbots such as OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, X’s Grok, Meta’ Llama and Mistral were evaluated. Findings There are variations in the effectiveness of LLM guardrails against negative purchase-related prompts. While there is general effectiveness of existing guardrails for some LLM-based AI chatbots, they are not impervious to manipulation by users. Research limitations/implications The study involved a limited set of LLM-based AI chatbots and on existing guardrails. Practical implications The landscape of prompt engineering techniques to bypass LLM guardrails is in a constant state of evolution. Developers of LLMs need to ensure more robust and adaptive safety protocols to ensure responsible purchase decisions among Generation Z consumers. Weaknesses in age verification mechanisms are also highlighted. Social implications The findings highlight safety concerns with the use of LLM-based chatbots by Generation Z consumers for their purchase decisions. While the use of prompt manipulation techniques on LLMs may be uncommon among these young consumers, such acts are implicative of the young consumer having already developed a precarious attitude or opinion toward a purchase decision leading to the use of LLMs to guide such decisions. There are factors of social concern that could instigate the precarious use of LLMs. Originality/value The current study is among the first to evaluate the use of LLM-based AI chatbots by young consumers of Generation Z for their purchase decisions.