Role graph of a bullying event. Each vertex represents an actor, labeled by their role in the event: bully (B), victim (V), bystander (BY), reinforcer (AB), assistant (BF), defender (S), reporter (S), accuser (A), and friend (VF). Each edge indicates a stream of communication, labeled by whether this is positive (þ) or negative (-) in nature, and its strength indicating the frequency of interaction. Dotted edges indicate nonparticipation in the event, and vertices those added by Xu et al. (2012) to account for socialmedia-specific roles

Role graph of a bullying event. Each vertex represents an actor, labeled by their role in the event: bully (B), victim (V), bystander (BY), reinforcer (AB), assistant (BF), defender (S), reporter (S), accuser (A), and friend (VF). Each edge indicates a stream of communication, labeled by whether this is positive (þ) or negative (-) in nature, and its strength indicating the frequency of interaction. Dotted edges indicate nonparticipation in the event, and vertices those added by Xu et al. (2012) to account for socialmedia-specific roles

Source publication
Article
Full-text available
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...

Context in source publication

Context 1
... there is a commonly made distinction between several actors within a cyberbullying event. A naive role allocation includes a bully B, a victim V and bystander BY, the latter of whom may or may not approve of the act of bullying. More nuanced models such as that of Xu et al. (2012) include additional roles (see Fig. 1 for a role interaction visualization), where different roles can be assigned to one person; for example, being bullied and reporting this. Most importantly, all shown roles can be present in the span of one single thread on social media, as demonstrated in Table 1. While some roles clearly show from frequent interaction with either a ...

Citations

... Thus, several studies have used synthetic oversampling or undersampling techniques to improve performance [36], [44], [52], [53]. Emmery et al. [55] for instance, proposed a crowdsourcing method by simulating bullying scenarios in a lab setting to generate data that could be used to complement real data. Bassignana et al. [56] developed HurtLex, a multilingual lexicon to identify hate speech. ...
Article
Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Lastly, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts.
... Mining social media sites for online bullying presents a number of difficulties and issues. It is difficult to precisely perceive users' motives and connotations in social media based solely on their texts (like posts, tweets, comments), which are usually brief, use colloquialisms, and may contain multimedia such as videos and images (Emmery et al., 2021). Twitter, for instance, restricts its users' texts to 140 characters, which can include text, colloquial language, emoticons, and animations. ...
... Cyberbullying is an intentional, aggressive behavior performed repetitively over time through electronic media and performed by either a person or a group online against a victim who cannot simply protect himself (Emmery et al., 2021). Cyberbullying can range in different forms, and the most common ones are through sending messages and posting comments, such as Exclusion, which is a state at which the victims may be deliberately removed from any social media platforms, Harassment, which involves abusive messages posted online, outing, the act of openly insulting someone without consent, Cyberstalking, which by in turn involves talkers harassing the victims, Fraping, when a person logs into one social media and impersonates by posting inappropriate content (Emmery et al., 2021). ...
... Cyberbullying is an intentional, aggressive behavior performed repetitively over time through electronic media and performed by either a person or a group online against a victim who cannot simply protect himself (Emmery et al., 2021). Cyberbullying can range in different forms, and the most common ones are through sending messages and posting comments, such as Exclusion, which is a state at which the victims may be deliberately removed from any social media platforms, Harassment, which involves abusive messages posted online, outing, the act of openly insulting someone without consent, Cyberstalking, which by in turn involves talkers harassing the victims, Fraping, when a person logs into one social media and impersonates by posting inappropriate content (Emmery et al., 2021). Cyberbullying has the potential to reach a wider audience, with ever-lasting effects as compared to traditional bullying. ...
Conference Paper
Full-text available
Usage of social media has resulted in threats such as Cyberbullying or Internet bullying that can lead to emotional and psychological effects like depression, stress, anxiety, and in more severe cases, suicide attempts. A more effective detection method is essential to detect cyberbullying to provide support to the victims. The majority of cyberbullying detection is done using machine learning on English text and yet fewer are done on Turkish text. The objective of this study was to compare the performance of Machine learning models such as Multinomial Naive bayes (MNB), Logistic Regression (LR), Support vector machine (SVM), and Deep learning methods like Convolutional Neural Network (CNN), Long Short-Term memory (LSTM), and Bidirectional Long Short-Term memory (BiLSTM) on Turkish text with traditional feature extraction methods such as Term Frequency Inverse Document Frequency (TF-IDF) and the word embedding such as Global Vector representation (Glove). We utilized an open-source dataset of offensive Turkish comments (ATC) that contains the original and oversampled ATC variants. The performance of the classifiers was assessed using 5-fold Cross- Validation, and evaluated using an F-score metric. Based on our experimental results, the Convolutional Neural Network performed the overall best with an average F-score of 0.9211 with Glove word embedding on the Oversampled ATC. This study shows the impact of deep learning with word embedding, on a balanced dataset that can improve the performance of Turkish text classification.
... Moreover, many studies have solely focused on filtering cyberbullying using profanity, which represents only a single aspect of this issue. Profanity may not consistently signify bullying, especially on platforms predominantly used by young individuals [18,19]. Therefore, developers and media managers benefit from having a robust system that comprehends context more effectively to improve cyberbullying detection. ...
Article
Full-text available
Social media platforms and online gaming sites play a pervasive role in facilitating peer interaction and social development for adolescents, but they also pose potential threats to health and safety. It is crucial to tackle cyberbullying issues within these platforms to ensure the healthy social development of adolescents. Cyberbullying has been linked to adverse mental health outcomes among adolescents, including anxiety, depression, academic underperformance, and an increased risk of suicide. While cyberbullying is a concern for all adolescents, those with disabilities are particularly susceptible and face a higher risk of being targets of cyberbullying. Our research addresses these challenges by introducing a personalized online virtual companion guided by artificial intelligence (AI). The web-based virtual companion’s interactions aim to assist adolescents in detecting cyberbullying. More specifically, an adolescent with ASD watches a cyberbullying scenario in a virtual environment, and the AI virtual companion then asks the adolescent if he/she detected cyberbullying. To inform the virtual companion in real time to know if the adolescent has learned about detecting cyberbullying, we have implemented fast and lightweight cyberbullying detection models employing the T5-small and MobileBERT networks. Our experimental results show that we obtain comparable results to the state-of-the-art methods despite having a compact architecture.
... The review found that existing studies on cyberbullying text detection used sparse data representation to efficiently store and process datasets, especially those with many missing values [120]. Sparse representations store only relevant values and their indices, enhancing efficiency [121]. ...
... The dataset might be viewed as limited due to factors such as incomplete data percentages, inadequate details for comprehensive analytical evaluations, and the utilization of a small dataset for training learning models [120]. Based on the utilization of a small dataset for training learning models to detect cyberbullying on social media platforms, the review identified that a small volume of data was used to train models for the classification of both cyberbullying and non-cyberbullying text [147,148]. ...
... Access restrictions on high-quality data constrain the utility of cutting-edge techniques, thereby limiting their applicability [123]. Small datasets impede the direct comparison of progress and the assessment of applicability, as they do not adequately capture the necessary complex social dynamics [120]. Using incorrect data may result in decreased classifier performance since it overlooks the nuances and complications present in real-life bullying situations [154]. ...
Preprint
Full-text available
Cyberbullying has been a significant challenge in the digital era world, given the huge number of people, especially adolescents, who use social media platforms to communicate and share information. Some individuals exploit these platforms to embarrass others through direct messages, electronic mail, speech, and public posts. This behavior has direct psychological and physical impacts on victims of bullying. While several studies have been conducted in this field and various solutions proposed to detect, prevent, and monitor cyberbullying instances on social media platforms, the problem continues. Therefore, it is necessary to conduct intensive studies and provide effective solutions to address the situation. These solutions should be based on detection, prevention, and prediction criteria methods. This paper presents a comprehensive systematic review of studies conducted on cyberbullying detection. It explores existing studies, proposed solutions, identified gaps, datasets, technologies, approaches, challenges, and recommendations, and then proposes effective solutions to address research gaps in future studies.
... Most of the data related to cyberbullying offences are scarce in terms of public access and often sensitive, as instances of cyberbullying mostly occur in private conversations. In addition, data access from social media is often protected by the user's privacy (Emmery et al., 2021). Due to these limitations, the present study adopted the approach used by Yaneva et al. (2018): data was collected from a variety of online sources, mainly news websites. ...
Article
Full-text available
Introduction The majority of research conducted into cyberbullying tends to focus on the victims, due to the serious consequences and effects that this crime has on them. However, there is a need to explore, categorize and identify cyberbullies and their characteristics so that inferences and crime links can be made to prevent the crime. The present study aimed to investigate whether the Narrative Action System Model (NASM) could be used to identify and examine the psychological underpinnings of different cyberbully offending styles. Methods This model proposes four distinct narrative offender styles: the Professional, The Revenger, The Hero and the Victim. A total of 70 cases were analysed using a non‐metric multidimensional scaling procedure (Smallest Space Analysis I). Results Results produced four types of cyberbully styles, which can be related to the differentiation proposed by the NASM, demonstrating an effective application of the model. The thematic structure of each cyberbully style was discussed. Limitations and implications were provided.
... CB is commonly misinterpreted, leading to flawed systems with little practical use. Additionally, several studies only evaluated using swear words to filter CB, which is only one aspect of this topic, and swear words may not always indicate bullying on platforms with a high concentration of youngsters [6,24]. Thus, it is practically useful for developers and media handlers to have a robust system that understand context better, to enhance CB detection. ...
... Unfortunately, there are some obstacles to CB detection. One is the issue of unavailable balanced and enriched benchmark datasets [6,23,25]. The issue of class imbalance has been a popular problem in machine learning (ML) applications, as the ML algorithms tend to be biased towards the majority class [26]. ...
... Past studies emphasised on the class imbalance problem in the CB context [27]. In most studies, the proportion of bullying posts is in the range of 4-20% of the entire dataset compared to non-bullying posts [6,[28][29][30]. This opens the need to create a new, enriched dataset with balanced classes for effective CB detection and make it publicly available. ...
Article
Full-text available
The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today's cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.
... This problem of cyberbullying (CB) is a relatively new trend that has recently gained more popularity as a subject. Cyberbullying is the repetitive, aggressive, targeted, and intentional behaviour aimed to hurt an individual's or a group's feelings through an electronic medium (Emmery et al. 2021;Dinakar et al. 2011). CB takes many forms, including flaming, harassment, denigration, impersonation, exclusion, cyberstalking, grooming, outing, and trickery (Emmery et al. 2021;2 of 15 imbalance of power and the victim's perceived differences in race, sexual orientation, gender, socioeconomic level, physical appearance, and mannerism. ...
... Cyberbullying is the repetitive, aggressive, targeted, and intentional behaviour aimed to hurt an individual's or a group's feelings through an electronic medium (Emmery et al. 2021;Dinakar et al. 2011). CB takes many forms, including flaming, harassment, denigration, impersonation, exclusion, cyberstalking, grooming, outing, and trickery (Emmery et al. 2021;2 of 15 imbalance of power and the victim's perceived differences in race, sexual orientation, gender, socioeconomic level, physical appearance, and mannerism. Xu et al. (2012) stated that CB participants can play the role of either a bully, victim, bystander, bully assistant, reinforcer, reporter, or accuser. ...
... CB is commonly misinterpreted, leading to flawed systems with little practical use. Additionally, several studies only evaluated using swear words to filter CB, which is only one aspect of this topic, and swear words may not always indicate bullying on platforms with a high concentration of youngsters (Emmery et al. 2021;Rosa et al. 2018). Thus, it is practically useful for developers and media handlers to have a robust system that understand context better to enhance CB detection. ...
Preprint
Full-text available
The dominance of social media has added to the channels of bullying to perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance issue and generalisation. In recent years, large language models (LLM) like BERT and RoBERTa have achieved state of the art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed RoBERTa outperformed other models.
... It helps to organize, find and understand data. Using this data together with a classifier trained on LDA (Linear Discriminant Analysis) and weighted Tf-Idf features, Nahar et al. (Author) were the first to apply this approach to identify bullies and victims in the datasets[15]. This task's use of a graph to visualize prospective bullies and their connections is notable but less well liked. ...
Article
Full-text available
The shrinking of the planet by technology is causing new age difficulties in youth culture. Technology surely has a lot of benefits, but it also has risks. It is where cyberbullying first started. Thus, there are many different types of cyberbullying. It might not necessarily involve pretending to be someone else or breaking into their online accounts. It also includes criticizing someone or spreading lies about them in an effort to cast doubt on them. Social media is widely used, making it incredibly easy for anyone to misuse this access. Cyberbullying is a serious issue today. It includes actions that harass, mislead, or defame someone. These violent behaviors are incredibly hazardous and can harm anyone quickly and severely. They appear on open discussion forums, social media sites, and other internet chat boards. A cyberbully is not always an anonymous person; they could be someone you know. The detection of online cyberbullying has grown in societal significance, research interest, and accessibility of open data. Even so, despite the continued rise in processing power and resource affordability, access limitations to high quality data constrain the use of cutting-edge methodologies. As a result, many recent studies use limited, heterogeneous datasets without fully assessing their usefulness. This study discusses effective techniques used to detect online abusive and bullying messages by merging natural language processing and machine learning algorithms with distinct features to analyze the accuracy levels of the algorithms.
... A common issue regarding cyberbullying detection concerns evaluation criteria, reproducibility, and model comparison [11]. Since some studies have not provided all the experimental details of their proposed models, this study attempts to fairly compare the proposed approach with the previous benchmarks. ...
Preprint
Full-text available
Cyberbullying is a hurtful phenomenon that spreads widely on social networks and negatively affects the lives of individuals. Detecting this phenomenon is of utmost necessity to make the digital environment safer for youth. This study uses a bilingual classification of cyberbullying on Ara-bic and English datasets. A four-module approach is proposed. It consists of preprocessing the textual data, generating sentence embeddings, performing the classification, and evaluating the results of the models. The approach relies on two strategies based on transfer learning of pre-trained NLP models. The first uses PLMs (ELMo, Universal Sentence Encoder, BERT, distilBERT, and RoBERTa) to generate sentence embeddings, while the second adopts a fine-tuning procedure of BERT-based PLMs for cyberbullying classification. Due to the frequent class imbalance problem in the research literature, this study used cost-sensitive learning algorithms trained to maximize the Recall/F1 score. The aim is to search for the best classification model that most accurately separates the cyber-bullying and non-cyberbullying classes. The models achieve 75-84%.