Hyoungjun Park’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Figure 3: Quantitative Analysis of Augmented Datasets (i.e., character count, word count).
Performance Comparison of Different Augmentation Methods (F1 Score).
GPT-3.5-turbo Full Performance Comparison.
GPT-4-turbo Full Performance Comparison.
A Persuasion-Based Prompt Learning Approach to Improve Smishing Detection through Data Augmentation
  • Preprint
  • File available

October 2024

·

19 Reads

Ho Sung Shim

·

Hyoungjun Park

·

Kyuhan Lee

·

[...]

·

Seonhye Kang

Smishing, which aims to illicitly obtain personal information from unsuspecting victims, holds significance due to its negative impacts on our society. In prior studies, as a tool to counteract smishing, machine learning (ML) has been widely adopted, which filters and blocks smishing messages before they reach potential victims. However, a number of challenges remain in ML-based smishing detection, with the scarcity of annotated datasets being one major hurdle. Specifically, given the sensitive nature of smishing-related data, there is a lack of publicly accessible data that can be used for training and evaluating ML models. Additionally, the nuanced similarities between smishing messages and other types of social engineering attacks such as spam messages exacerbate the challenge of smishing classification with limited resources. To tackle this challenge, we introduce a novel data augmentation method utilizing a few-shot prompt learning approach. What sets our approach apart from extant methods is the use of the principles of persuasion, a psychology theory which explains the underlying mechanisms of smishing. By designing prompts grounded in the persuasion principles, our augmented dataset could effectively capture various, important aspects of smishing messages, enabling ML models to be effectively trained. Our evaluation within a real-world context demonstrates that our augmentation approach produces more diverse and higher-quality smishing data instances compared to other cutting-edging approaches, leading to substantial improvements in the ability of ML models to detect the subtle characteristics of smishing messages. Moreover, our additional analyses reveal that the performance improvement provided by our approach is more pronounced when used with ML models that have a larger number of parameters, demonstrating its effectiveness in training large-scale ML models.

Download


Citations (1)


... Notable platforms contributing to theses datasets include Twitter [4,12,25,27], Facebook [5,21], Whisper [25], Instagram [5], Yahoo [18] and Reddit [23]. Researchers have also used video sharing platforms such as YouTube comments [19] and social forums such as 4chan and Gab [28]. There are also multilingual datasets to facilitate understanding of hate speech across diverse linguistic contexts and geographical regions [19] [5] [1]. ...

Reference:

ProvocationProbe: Instigating Hate Speech Dataset from Twitter
Uncovering the Root of Hate Speech: A Dataset for Identifying Hate Instigating Speech
  • Citing Conference Paper
  • January 2023