Figure - available from: PLOS One
This content is subject to copyright.
Inter-annotator agreement on the fine-grained categories related to cyberbullying

Inter-annotator agreement on the fine-grained categories related to cyberbullying

Source publication
Article
Full-text available
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overlo...

Similar publications

Article
Full-text available
In recent times, online harassment due to cyber-bullying is significantly increased with the growth of social media users. Cyber-bullying is a technique to harass users using electronic messages. Many researchers attack this problem using natural language processing. Most of them detect whether a message is a bully or not. In this paper, multiple d...
Article
Full-text available
Cyber bullying activities are increasing day by day with the increase of Social Media Platforms such as Face book, Twitter, Instagram etc. Bullies take the advantage of these large online connected platforms due to which it became as a big challenging task in Natural Language Processing (NLP). In this paper, we compare the performance of various wo...
Article
Full-text available
Social Media, in addition to having a positive impact on society, also has a negative effect. Based on statistics, 95 percent of internet users in Indonesia use the internet to access social networks. Especially for young people, Instagram is more widely used than other social media such as Twitter and Facebook. In terms of cyberbullying cases, cas...
Article
Full-text available
As a facet impact of more and more popular social media, cyberbullying has emerged as a extreme hassle afflicting kids, kids and teenagers. device gaining knowledge of strategies make automated detection of bullying messages in social media possible, and this could help to construct a wholesome and secure social media environment. in this significa...
Article
Full-text available
Cyberbullying is a menace in today’s socially networked world. It can have damaging physical and mental effects on the victims and hence, it needs to be tackled efficiently—several detection approaches are proposed in literature but those are mostly standalone. In this paper, we revisit the distributed and collaborative approach for detecting cyber...

Citations

... However, the comprehensive exchange of personal information raises data security issues and the risk of misuse. On SMPs, individuals can face humiliation, insults, cyber threats, and cyberbullying from anonymous users [5], exacerbated by the constant accessibility and the ability for some users to remain unidentified [6]. Bullying through the use of digital technology is known as cyberbullying. ...
Article
Full-text available
Cyberbullying in social media significantly impacts mental well-being of individuals and poses noteworthy barriers to creating safe online environments, especially in non-English speaking communities. Addressing cyberbullying challenges requires collaborative efforts from communities, educators, and technology platforms developers or designers. The primary concern of this study is to detect cyberbullying in Bangla language, utilizing various machine learning (ML) approaches. A cyberbullying Bangla dataset encompasses a range of texts, including both cyberbullying and non-cyberbullying content. This dataset undergoes preprocessing stage, whilst utilizing diverse techniques, including tokenization, data augmentation, and transformation into sequences, for facilitating the creation of appropriate inputs for various ML approaches such as XGBoost (XGB), Gradient Boosting (GB), Decision Tree (DT), Random Forest (RF), Artificial Neural Network (ANN), Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). Data is collected using web scraping from different social media platforms, which contains five distinct categories: neutral, threat, troll, political and sexual categories. Experimental results indicated that the proposed cyberbullying detection model achieves an exceptional accuracy of 99.80% with LSTM, surpassing other deep learning based algorithms. Conversely, XGB achieves a commendable accuracy of over 74% with the same dataset, outperforming other traditional ML algorithms. The findings contribute significantly to the development of proactive measures to prevent and mitigate cyberbullying, eventually advancing a safer online environment for individuals communicating in Bangla.
... Because of this continual shift in technology and media, researchers should be informed and stay up to date on the risky behaviors and factors that contribute to bullying trends. Online bullying negatively influences adolescents' mental health, self-esteem, and performance in school [8]. ...
... The programs used were able to detect threats, insults, curse words, defamation, defensive language, sexual phrases, and encouragement to get people to respond. The study was successful in finding that machine learning programs can identify online bullying based on social media posts [8]. Another study among Chinese adolescents found the top seven predictors of in-person bullying included gender, being physically ill, showing signs of depression, being younger, being in a lower grade, being exposed to secondhand smoke, and having few close friends. ...
Article
Full-text available
Background Bullying, encompassing physical, psychological, social, or educational harm, affects approximately 1 in 20 United States teens aged 12-18. The prevalence and impact of bullying, including online bullying, necessitate a deeper understanding of risk and protective factors to enhance prevention efforts. This study investigated the key risk and protective factors most highly associated with adolescent bullying victimization. Methods Data from the Student Health and Risk Prevention (SHARP) survey, collected from 345,506 student respondents in Utah from 2009 to 2021, were analyzed using a machine learning approach. The survey included 135 questions assessing demographics, health outcomes, and adolescent risk and protective factors. LightGBM was used to create the model, achieving 70% accuracy, and SHapley Additive exPlanations (SHAP) values were utilized to interpret model predictions and to identify risk and protective predictors most highly associated with bullying victimization. Results Younger grade levels, feeling left out, and family issues (severity and frequent arguments, family member insulting each other, and family drug use) are strongly associated with increased bullying victimization - whether in person or online. Gender analysis showed that for male and females, family issues and hating school were most highly predictive. Online bullying victimization was most highly associated with early onset of drinking. Conclusions This study provides a risk and protective factor profile for adolescent bullying victimization. Key risk and protective factors were identified across demographics with findings underscoring the important role of family relationships, social inclusion, and demographic variables in bullying victimization. These resulting risk and protective factor profiles emphasize the need for prevention programming that addresses family dynamics and social support. Future research should expand to diverse geographical areas and include longitudinal data to better understand causal relationships.
... Prior research has defined the relationships between bullies, victims and bystanders (Van Hee et al., 2018). Specifically, the most widely adopted typology, developed by Salmivalli et al. (1996), examines the interpersonal bullying phenomenon based on the role played by adolescents. ...
Article
Full-text available
Extant literature has identified cyberbullying tactics and consequences as well as school-and community-based anti-bullying strategies and policies. However, research that explains bullying behavior from a communication perspective in a social network via social media platforms is still lacking. This work theorizes cyberbullying as a relational communication behavior by proposing a conceptual framework that integrates the theories and constructs of personality traits, bystander behavior, spiral of silence, relational aggression, uses and gratifications, and communication competency. Based on the analysis, synthesis and theorization, a set of research propositions and empirical study designs is presented to help guide future research. abstract text. ccording to the National Center for Education Statistics (Diliberti et al., 2019), 33% and 30% of middle and high school students reported cyberbullying incidents at school or away from school at least once a week, respectively. Cyberbullying is considered a type of bullying, just as verbal, physical, and relational bullying (Olweus & Limber, 2017). Researchers have maintained that while bullying and cyberbullying behaviors are unquestionably related, cyberbullying can exceed traditional bullying in causing social and psychological harm, due to its public nature in an online environment (Englander et al., 2017). Bullying and cyberbullying behaviors usually begin during school years. Common types of peer bullying include appearance-based teasing against overweight or obese children-according to a national sample of students in 6th-10th A
... These studies explore the creation of NLP models that protect privacy, guarantee responsible data processing, and uphold individual privacy rights within the framework of Bangladeshi cyber threat assessments. Research by Ullah et al. (2021) and Van et al. (2018) examines cooperative efforts in spreading threat intelligence through NLP-powered systems [18,19]. These studies emphasize how crucial it is to strengthen cooperative efforts across government, business, and academic institutions to improve collective cyber defense systems. ...
... These studies explore the creation of NLP models that protect privacy, guarantee responsible data processing, and uphold individual privacy rights within the framework of Bangladeshi cyber threat assessments. Research by Ullah et al. (2021) and Van et al. (2018) examines cooperative efforts in spreading threat intelligence through NLP-powered systems [18,19]. These studies emphasize how crucial it is to strengthen cooperative efforts across government, business, and academic institutions to improve collective cyber defense systems. ...
... Hence, the text data were then categorized into four cyberbullying classes. This was done based on the knowledge acquired from prior studies such as [35][36][37][38]. The identified categories were as follows: ...
Article
Full-text available
Cyberbullying involves the use of social media platforms to harm or humiliate people online. Victims may resort to self-harm due to the abuse they experience on these platforms, where users can remain anonymous and spread malicious content. This highlights an urgent need for efficient systems to identify and classify cyberbullying. Many researchers have approached this problem using various methods such as binary and multi-class classification, focusing on text, image, or multi-modal data. While deep learning has advanced cyberbullying detection and classification, the multi-class classification of cyberbullying using multi-modal data, such as memes, remains underexplored. This paper addresses this gap by proposing several multi-modal hybrid deep learning models, such as LSTM+ResNet, LSTM+CNN, LSTM+ViT, GRU+ResNet, GRU+CNN, GRU+ViT, BERT+ResNet, BERT+CNN, BERT+ViT, DistilBERT+ResNet, DistilBERT+CNN, DistilBERT+ViT, RoBERTa+ResNet, RoBERTa+CNN, and RoBERTa+ViT, for classifying multi-classes of cyberbullying. The proposed model incorporates a late fusion process, combining the LSTM, GRU, BERT, DistilBERT, and RoBERTa models for text extraction and the ResNet, CNN, and ViT models for image extraction. These models are trained on two datasets: a private dataset, collected from various social media platforms, and a public dataset, obtained from previously published research. Our experimental results demonstrate that the RoBERTa+ViT model achieves an accuracy of 99.20% and an F1-score of 0.992 on the public dataset, and an accuracy of 96.10% and an F1-score of 0.959 on the private dataset when compared with other hybrid models.
... The challenges of implementing such techniques include the need for large amounts of labeled data and addressing many scenarios where the roles of cyberbullying can overlap or the context is limited.This study addresses the task of identifying cyberbullying roles in social media interactions and sheds light on the merits and limitations of existing methods and datasets, paving the way for future research in this area. To this end, we examine and process the AMiCA dataset [22] and employ oversampling methods to address the challenge of the imbalanced nature of the dataset. We then develop and evaluate the performance of various machine learning models that are based on four large language models (LLMs): BERT, RoBERTa, T5, and GPT-2. ...
... To the best of our knowledge, the only models that address this problem are the ones by Jacobs et al. [11] and Rathnayake et al. [16], each of which is included in the present performance evaluation. Rathnayak et al. [16] used the AMiCA dataset [22] to develop a DistilBERT-based ensemble model [19] to classify cyberbullying roles. While the authors report that their algorithm (OffensEval) achieves an F1 score of 83%, their evaluation only considered 4 of the 5 roles in the AMiCA dataset. ...
... Data Collection and Processing. The AMiCA dataset [22] is, to our knowledge, the only available dataset of adequate size that is explicitly labeled with cyberbullying roles. This dataset was gathered from ASKfm, a social networking site where users can anonymously ask and answer questions. ...
Preprint
Full-text available
Social media has revolutionized communication, allowing people worldwide to connect and interact instantly. However, it has also led to increases in cyberbullying, which poses a significant threat to children and adolescents globally, affecting their mental health and well-being. It is critical to accurately detect the roles of individuals involved in cyberbullying incidents to effectively address the issue on a large scale. This study explores the use of machine learning models to detect the roles involved in cyberbullying interactions. After examining the AMiCA dataset and addressing class imbalance issues, we evaluate the performance of various models built with four underlying LLMs (i.e., BERT, RoBERTa, T5, and GPT-2) for role detection. Our analysis shows that oversampling techniques help improve model performance. The best model, a fine-tuned RoBERTa using oversampled data, achieved an overall F1 score of 83.5%, increasing to 89.3% after applying a prediction threshold. The top-2 F1 score without thresholding was 95.7%. Our method outperforms previously proposed models. After investigating the per-class model performance and confidence scores, we show that the models perform well in classes with more samples and less contextual confusion (e.g., Bystander Other), but struggle with classes with fewer samples (e.g., Bystander Assistant) and more contextual ambiguity (e.g., Harasser and Victim). This work highlights current strengths and limitations in the development of accurate models with limited data and complex scenarios.
... Dinakar et al. [89] were among the first in demonstrating the efficacy of machine learning for identifying harmful content, illustrating how algorithms can be trained to detect patterns associated with cyberbullying. Van Hee et al. [90] enhanced this approach by utilizing natural language processing (NLP) and deep learning techniques to improve the accuracy of offensive language detection. Al-Garadi et al. [85] highlighted the significance of these technologies in real-time moderation, proposing that machine learning can proactively notify moderators and avert the escalation of cyberbullying. ...
Article
Full-text available
In today's digital age, cyberbullying has emerged as a pervasive issue that affects individuals across various social media platforms and digital communication channels. This review explores the developmental trajectory of cyberbullying as an interdisciplinary academic field, employing a unique combination of co-word analysis and main path analysis (MPA) across a substantial body of 5183 documents. This integrated methodological approach allows for a nuanced examination of the evolution of themes and influential works within the realm of cyberbullying research. The findings highlight a complex landscape where initial focus areas, such as the behavioral and psychological triggers of cyberbullying, progressively expand towards exploring effective preventive measures and intervention strategies. Key themes identified include the impact of digital literacy, the dual role of social media as both a vector and a tool against cyberbullying, and the potential of technological advancements in detecting and mitigating cyberbullying. This comprehensive mapping and analysis deepens our understanding of cyberbullying and highlights the dynamic nature of this field, suggesting new directions for future research and practical applications to effectively address cyberbullying across various social and technological contexts. This study represents a pioneering effort in synthesizing a broad spectrum of research to offer detailed insights into the changing dynamics of cyberbullying, marking a significant contribution to both academic knowledge and practical approaches to handling cyberbullying.
... This form of bullying significantly affects around 70% of young individuals worldwide and is particularly prevalent in the Philippines [4]. The digital domain, while seemingly innocuous, amplifies the negative consequences of bullying behavior, leading to antisocial behaviors, emotional distress, and academic difficulties for those victimized [5][6][7][8]. The prevalence of cyberbullying has been facilitated significantly by social media platforms. ...
... Cyberbullying has become a pervasive issue with the rise of social media, adversely impacting users' mental health and leading to serious consequences such as depression and suicidal tendencies [5][6][7]. The rapid evolution of Natural Language Processing (NLP) and machine learning offers promising solutions to detect and mitigate cyberbullying by analyzing the vast amounts of textual data generated on social media platforms. ...
... A crucial stage in the data curation process entailes the systematic organization and comprehension of derogatory or insulting phrases commonly used in tweets from the Filipino community. Researchers have started to explore automatic procedures for detecting language patterns signaling harmful content, like previous researchers [6,24,36,37]. Based on the available data, the researchers have identified twentythree phrases that exhibit frequent association with cyberbullying content for the purpose of context-driven analysis. ...
Article
Full-text available
The research addresses the escalating challenge of cyberbullying in the Philippines, a concern magnified by widespread social media use. A dataset of 146,661 tweets is analyzed using a pre-trained natural language processing model tailored to detect derogatory Filipino terms. The methodology is designed to preprocess data for clarity and analyze derogatory phrases, using the 23 key terms to indicate cyberbullying. Through quantitative analysis, specific patterns of derogatory term co-occurrence are uncovered. The research specifically focuses on Filipino digital discourse, uncovering patterns of derogatory language usage, which is unique to this context. Combining data mining and machine learning techniques, including Frequent Pattern (FP)-growth for pattern identification, cosine similarity for phrase correlation , and classification technique, the research achieves an accuracy rate of 97.91%. To assess the model's reliability and precision, a 10-fold cross-validation is utilized. Moreover, by examining specific tweets, the analysis highlights the alignment between automated classifications and human judgment. The co-occurrence of derogatory terms, identified through methods like FP-growth and cosine similarity, reveals underlying cyberbullying narratives that are not immediately obvious. This approach validates the high accuracy of the models and emphasizes the importance of a comprehensive framework for detecting cyberbullying in a linguistically and culturally specific context. The findings substantiate the effectiveness of the targeted approach, providing essential insights for developing cyberbullying prevention strategies. Furthermore, the research enriches the literature on digital discourse analysis and online harassment prevention by addressing cyberbullying patterns and behaviors. Importantly, the research offers valuable guidance for policymakers in crafting more effective online safety measures in the Philippines.
... The SVM model generates the hyperplane in an iterative manner to minimize error. The objective of Support Vector Machines (SVM) is to partition datasets into distinct groups by identifying a hyperplane with the largest margin [9]. ...
Article
This study employed the stacking of three machine learning techniques: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Logistic Regression algorithms to develop a model for detecting cyberbullying using a post dataset acquired from the X Platform. The proposed model's task is to extract keywords from the post dataset and then classify them as either 1 ("cyberbullying word") or 0 ("not cyberbullying word"). The model generated an accuracy of 85.52%, and it was deployed using a simple Graphical User Interface (GUI) web application. This study recommends that the model be included on social media platforms to help reduce the growing use of cyberbullying phrases.
... For each issue thread in our dataset, we manually annotated four categories: type of incivility, trigger, target, and consequence. Each category was selected based on existing literature on harmful interactions across various domains such as social media [6,15,36,44], online gaming communities [30,32], and software engineering [13,14,17,19,21,23,25,33,41,42]. Using a combination of deductive and inductive coding methods, we refined the feature set for each category to enhance the quality of our annotations [9,11]. ...