Figure 1 - uploaded by Emily Ohman
Content may be subject to copyright.
Official results of the LT@Helsinki Team
Contexts in source publication
Context 1
... Danish, using Nordic BERT for the final submission, we obtained an accuracy of 92.38% and F1-score of 81.18% (Figure 1 a). The confusion matrix shows that only 9 out of 34 offensive tweets (OFF) were misclassified, and 16 out of 294 not offensive tweets (NOT) were wrongly classified as offensive. ...
Context 2
... sub-task C our submission ranked second with an accuracy of 79.74% and F1-score of 66.99%. The relatively low F1-score is due to the high numbers of misclassifications of the minority class (only 33 OTH instances were correctly classified out of 82) (see Figure 1 b). Despite our efforts in balancing the dataset, it seems like the skewed class distribution inevitably added bias to the model. ...
Context 3
... Danish, using Nordic BERT for the final submission, we obtained an accuracy of 92.38% and F1-score of 81.18% (Figure 1 a). The confusion matrix shows that only 9 out of 34 offensive tweets (OFF) were misclassified, and 16 out of 294 not offensive tweets (NOT) were wrongly classified as offensive. ...
Context 4
... sub-task C our submission ranked second with an accuracy of 79.74% and F1-score of 66.99%. The relatively low F1-score is due to the high numbers of misclassifications of the minority class (only 33 OTH instances were correctly classified out of 82) (see Figure 1 b). Despite our efforts in balancing the dataset, it seems like the skewed class distribution inevitably added bias to the model. ...
Similar publications
In this work, we present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability---MELA, with 46K samples covering 10 languages from a diverse set of language families. We establish LLM baselines on this benchmark, and investigate cross-lingual transfer in acceptability judgements with XLM-R. I...
Since the majority of audio DeepFake (DF) detection methods are trained on English-centric datasets, their applicability to non-English languages remains largely unexplored. In this work, we present a benchmark for the multilingual audio DF detection challenge by evaluating various adaptation strategies. Our experiments focus on analyzing models tr...
Text-guided image generation models, such as DALL-E 2 and Stable Diffusion, have recently received much attention from academia and the general public. Provided with textual descriptions, these models are capable of generating high-quality images depicting various concepts and styles. However, such models are trained on large amounts of public data...
The study of language teacher cognition (LTC) allows us to understand better language teaching regarding what teachers know, how they come to know it, and how they draw on their knowledge. Due to the recently increasing LTC research interest but simultaneously lack of synthesis and overview in Norway and Sweden, a descriptive review of the literatu...
Citations
... These observations align with academic reports on hate speech recognition, which highlight difficulties in separating hate speech from other types of offensive language (e.g., Davidson et al., 2017). Indeed, Pàmies et al. (2020) showed that offensive social media messages included more emotion-associated words than non-offensive messages-both positive and negative. ...
One prominent application of computational methods is the identification of affectivity and emotions in textual data, commonly known as sentiment analysis. In this chapter, we explore the datafication of affective language by focusing on operationalization and translation involved in the analysis processes behind common methods to identify affectivity or specific emotions in text. We draw examples from popular cases and from our own empirical studies that apply and develop sentiment and hate speech analysis. We suggest that sentiment analysis is a fruitful case for discussing the role of and the tensions involved in applying computational techniques in the automated analysis of meaning-laden phenomena. We highlight that any application of sentiment analysis techniques to investigate emotional expression in texts amounts to an effort of constructing sentiment measurements – a process essentially driven by judgments made by researchers in an attempt to reconcile diverging conventions and conceptions of good/proper research practices.
... However, with the advances in natural language processing (NLP) and deep learning, non-English HS detection solutions steadily increase. Therefore, there exist HS detection models for Arabic (e.g., [20]- [24]), Turkish (e.g., [24], [25]), Greek (e.g., [24], [26], [27], Danish (e.g., [24], [28], Hindi (e.g., [29]- [31]), German (e.g., [15], [32]), Malayalam (e.g., [33], [34]), Tamil (e.g., [34], [35]), Chinese (e.g., [36]- [38]), Italian (e.g., [39]), Urdu (e.g., [40]- [42]), Bengali (e.g., [43]- [45]), Korean (e.g., [46]), French (e.g., [47]- [49]), Indonesian and Portugese (e.g., [50]), Spanish (e.g., [51]), Polish (e.g., [52]) and some others as well, not mentioned due to limits of scope of this article. ...
The ever-increasing amount of online content and the opportunity for everyone to express their opinions online leads to frequent encounters with social problems: bullying, insults, and hate speech. Some online portals are taking steps to stop this, such as no longer allowing user-generated comments to be made anonymously, removing the possibility to comment under the articles, and some portals employ moderators who identify and eliminate hate speech. However, given the large number of comments, an appropriately large number of people are required to do this work. The rapid development of artificial intelligence in the language technology area may be the solution to this problem. Automated hate speech detection would allow to manage the ever-increasing amount of online content, therefore we report hate speech classification for Lithuanian language by application of deep learning.
... Nevertheless, with the advances in multilingual parsers and deep learning technology, together with increasing pressures from policy-makers to handle hate speech issues at local resources, non-English HS detection toolkits have seen a steady increase. The figure indicates that about 51% of all works in this field are performed on English dataset, with an increase of proportion of other languages as well where Arabic (13%) [93,60,12,176], Turkish (6%) [176,104], Greek (4%) [176,6,136], Danish (5%) [106,176], Hindi (4%) [121,22,88], German (4%) [73,120], Malayalam (3%) [130,109], Tamil (3%) [130,20], Chinese (1%) [138,139,188], Italian (2%) [116], Urdu (1%) [126,95,7], Russian(1%) [17], Bengali (1%) [63,127,70], Korean (1%) [91], French (1%) [16,102,51], Indonesian (1%) [14], Portuguese (1%) [14], Spanish (1%) [57] and Polish (1%) [118] seem to dominate the rest of the languages in this field. ...
... In the third task, the first system was based on an XLM-RoBERTa model [29]. The second used an oversampled BERT model to improve unbalanced classes [32]. The third system combined BERT with some features of texts such as the length of tweets, misspelled words, or use of emojis [33]. ...
Social networks allow us to communicate with people around the world. However, some users usually take advantage of anonymity for writing offensive comments to others, which might affect those who receive offensive messages or discourage the use of these networks. However, it is impossible to manually check every message. This has promoted several proposals for automatic detection systems. Current state-of-the-art systems are based on the transformers’ architecture and most of the work has been focused on the English language. However, these systems do not pay too much attention to the unbalanced nature of data, since there are fewer offensive comments than non-offensive in a real environment. Besides, these previous works have not studied the impact on the final results of pre-processing or the corpora used for pre-training the models. In this work, we propose and evaluate a series of automatic methods aimed at detecting offensive language in Spanish texts addressing the unbalanced nature of data. We test different learning models, from those based on classical Machine Learning algorithms using Bag-of-Words as data representation to those based in large language models and neural networks such as transformers, paying more attention to minor classes and the corpora used for pre-training the transformer-based models. We show how transformer-based models continue obtaining the best results, but we improved previous results by a 6,2% by adding new steps of pre-processing and using models pre-trained with Spanish social-media data, setting new state-of-the-art results.
... Nevertheless, with the advances in multilingual parsers and deep learning technology, together with increasing pressures from policy-makers to handle hate speech issues at local resources, non-English HS detection toolkits have seen a steady increase. The figure indicates that about 51% of all works in this field are performed on English dataset, with an increase of proportion of other languages as well where Arabic (13% ) [93,59,12,143], Turkish (6%) [143,104], Greek (4%) [143,6,136], Danish (5%) [106,143], Hindi (4%) [121,22,88], German (4% ) [72,120], Malayalam (3%) [130,109], Tamil (3%) [130,20], Chinese (1%) [138,139,155], Italian (2%) [116], Urdu (1%) [126,95,7], Russian(1%) [17], Bengali (1% ) [62,127,69], Korean (1%) [91], French (1%) [16,102,50], Indonesian (1%) [14], Portuguese (1%) [14], Spanish (1%) [56] and Polish (1%) [118] seem to dominate the rest of the languages in this field. ...
With the multiplication of social media platforms, which offer anonymity, easy access and online community formation, and online debate, the issue of hate speech detection and tracking becomes a growing challenge to society, individual, policy-makers and researchers. Despite efforts for leveraging automatic techniques for automatic detection and monitoring, their performances are still far from satisfactory, which constantly calls for future research on the issue. This paper provides a systematic review of literature in this field, with a focus on natural language processing and deep learning technologies, highlighting the terminology, processing pipeline, core methods employed, with a focal point on deep learning architecture. From a methodological perspective, we adopt PRISMA guideline of systematic review of the last 10 years literature from ACM Digital Library and Google Scholar. In the sequel, existing surveys, limitations, and future research directions are extensively discussed.
THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS.
Different types of abusive content such as offensive language, hate speech, aggression, etc. have become prevalent in social media and many efforts have been dedicated to automatically detect this phenomenon in different resource-rich languages such as English. This is mainly due to the comparative lack of annotated data related to offensive language in low-resource languages, especially the ones spoken in Asian countries. To reduce the vulnerability among social media users from these regions, it is crucial to address the problem of offensive language in such low-resource languages. Hence, we present a new corpus of Persian offensive language consisting of 6,000 out of 520,000 randomly sampled micro-blog posts from X (Twitter) to deal with offensive language detection in Persian as a low-resource language in this area. We introduce a method for creating the corpus and annotating it according to the annotation practices of recent efforts for some benchmark datasets in other languages which results in categorizing offensive language and the target of offense as well. We perform extensive experiments with three classifiers in different levels of annotation with a number of classical Machine Learning (ML), Deep learning (DL), and transformer-based neural networks including monolingual and multilingual pre-trained language models. Furthermore, we propose an ensemble model integrating the aforementioned models to boost the performance of our offensive language detection task. Initial results on single models indicate that SVM trained on character or word n-grams are the best performing models accompanying monolingual transformer-based pre-trained language model ParsBERT in identifying offensive vs non-offensive content, targeted vs untargeted offense, and offensive towards individual or group. In addition, the stacking ensemble model outperforms the single models by a substantial margin, obtaining 5% respective macro F1-score improvement for three levels of annotation.
Media analysis (MA) is an evolving area of research in the field of text mining and an important research area for intelligent media analytics. The fundamental purpose of MA is to obtain valuable insights that help to improve many different areas of business, and ultimately customer experience, through the computational treatment of opinions, sentiments, and subjectivity on mostly highly subjective text types. These texts can come from social media, the internet, and news articles with clearly defined and unique targets. Additionally, MA-related fields include emotion, irony, and hate speech detection, which are usually tackled independently from one another without leveraging the contextual similarity between them, mainly attributed to the lack of annotated datasets. In this paper, we present a unified framework to the complete intelligent media analysis, where we propose a shared parameter layer architecture with a joint learning approach that takes advantage of each separate task for the classification of sentiments, emotions, irony, and hate speech in texts. The proposed approach was evaluated on Greek expert-annotated texts from social media posts, news articles, and internet articles such as blog posts and opinion pieces. The results show that this joint classification approach improves the classification effectiveness of each task in terms of the micro-averaged F1-score.
Humans are increasingly integrated with devices enabling the collection of vast unstructured opinionated data. Accurately analysing subjective information is the task of sentiment analysis, an actively researched area in NLP. Deep learning has surged past other machine learning to become the foremast approach for sentiment analysis. There is a diverse selection of architectures to model tasks in sentiment analysis. Recent studies utilising pre-trained language models to transfer knowledge to downstream tasks have been a breakthrough in NLP and have transformed the field.
In this survey, we provide a taxonomy of recent literature applying deep learning architectures in sentiment analysis and provide a survey of the recent approaches and trends. We categorise studies according to their task focus and cover the theory, design and implementation of these architecture. To the best of our knowledge, this is the only survey to cover the recent trend towards transformer based language models and their impact on sentiment analysis. Performance and analysis of top algorithms on the commonly used datasets is also provided. The paper also provides a discussion of the open issues in NLP and sentiment analysis. We covered a five-year period between January 2017 to January 2022.
This paper examines the shift in focus on content policies and user attitudes on the social media platform Reddit. We do this by focusing on comments from general Reddit users from five posts made by admins (moderators) on updates to Reddit Content Policy. All five concern the nature of what kind of content is allowed to be posted on Reddit, and which measures will be taken against content that violates these policies. We use topic modeling to probe how the general discourse for Redditors has changed around limitations on content, and later, limitations on hate speech, or speech that incites violence against a particular group. We show that there is a clear shift in both the contents and the user attitudes that can be linked to contemporary societal upheaval as well as newly passed laws and regulations, and contribute to the wider discussion on hate speech moderation.
Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this article, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish. We report results of 0.8415 F1 macro for Bengali in TRAC-2 shared task [23], 0.8532 F1 macro for Danish and 0.8701 F1 macro for Greek in OffensEval 2020 [58], 0.8568 F1 macro for Hindi in HASOC 2019 shared task [27], and 0.7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) [7], showing that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages. Additionally, we report competitive performance on Arabic and Turkish using the training and development sets of OffensEval 2020 shared task. The results for all languages confirm the robustness of cross-lingual contextual embeddings and transfer learning for this task.