Eujeong Choi’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Figure 1: A confusion map of the final label.
RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts
  • Preprint
  • File available

January 2025

·

4 Reads

Eujeong Choi

·

Younghun Jeong

·

·

User interactions with conversational agents (CAs) evolve in the era of heavily guardrailed large language models (LLMs). As users push beyond programmed boundaries to explore and build relationships with these systems, there is a growing concern regarding the potential for unauthorized access or manipulation, commonly referred to as "jailbreaking." Moreover, with CAs that possess highly human-like qualities, users show a tendency toward initiating intimate sexual interactions or attempting to tame their chatbots. To capture and reflect these in-the-wild interactions into chatbot designs, we propose RICoTA, a Korean red teaming dataset that consists of 609 prompts challenging LLMs with in-the-wild user-made dialogues capturing jailbreak attempts. We utilize user-chatbot conversations that were self-posted on a Korean Reddit-like community, containing specific testing and gaming intentions with a social chatbot. With these prompts, we aim to evaluate LLMs' ability to identify the type of conversation and users' testing purposes to derive chatbot design implications for mitigating jailbreaking risks. Our dataset will be made publicly available via GitHub.

Download

Citations (1)


... We had our participants use a state-of-the-art NLP-based disclosure detection model from prior work by Dou et al. [27] on two self-authored posts made on and for Reddit. We chose this model because, unlike other similar models that operate at the post or sentence level [17,88], Dou et al. 's model operates at the text-span level: i.e., it is capable of identifying specific segments of text in a broader post that constitutes a potentially risky self-disclosure (see Fig.1). This higher-level of granularity let us localize user feedback to the specific words in their posts that the model estimated as constituting disclosure risk. ...

Reference:

Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI
Assessing How Users Display Self-Disclosure and Authenticity in Conversation with Human-Like Agents: A Case Study of Luda Lee
  • Citing Conference Paper
  • January 2022