Figure - uploaded by Guillaume Le Noé-Bienvenu
Content may be subject to copyright.
Source publication
The information spread through the Web influences politics, stock markets, public health, people's reputation and brands. For these reasons, it is crucial to filter out false information. In this paper, we compare different automatic approaches for fake news detection based on statistical text analysis on the vaccination fake news dataset provided...
Context in source publication
Similar publications
The goal of this proposed work is to design a gender identification system that identifies the gender of the speaker. Gender classification is an emerging area of research for the accomplishment of efficient interaction between human and machine using speech files. Numerous ways have been proposed for the gender classification in the past. Speech r...
Multilingual based voice activated human computer interaction systems are currently in high demand. The Spoken Language Identification Unit (SPLID) is an inevitable front end unit of such a multilingual system. These systems will be a great boon to a country like India where around 24 official languages are spoken. Deep learning architectures for s...
The information spread through the Web influences politics, stock markets, public health, people’s reputation and brands. For these reasons, it is crucial to filter out false information. In this paper, we compare different automatic approaches for fake news detection based on statistical text analysis on the vaccination fake news dataset provided...
Citations
... In the digital realm, where interactions are increasingly mediated by algorithms, discerning humor has become crucial. This imperative extends to various applications, including chatbots, recommender systems, social media reputation management, and the crucial task of identifying and combating fake news and hate speech [1,2]. Early efforts in humor detection primarily focused on the intricate dynamics of wordplay. ...
In the context of the JOKER 2024 Task 2 Challenge, this paper presents an emerging approach that leverages the latent representations derived from different Large Language Models (LLMs) to drive a classification mechanism. Our methodology involves exploiting the "knowledge" encoded in LLMs to effectively discriminate humor genres. Experimental results show promising results, demonstrating the effectiveness of our approach. However, inherent complexities remain, such as the proximity between certain classes and biases arising from the dataset distributions. These complexities warrant further investigation to refine the classification process and improve overall performance.
... Numerous studies have been established for detecting fake news in high-resourced languages like English, Arabic, Spanish, French, German, etc. (Ahuja and Kumar, 2023;Mohawesh et al., 2023;Zhou et al., 2023;Al-Yahya et al., 2021;Guibon et al., 2019). But there is still much work to be done, especially for low-resourced languages like Malayalam, particularly in codemixed text (Thara and Poornachandran, 2021). ...
Due to technological advancements, various methods have emerged for disseminating news to the masses. The pervasive reach of news, however, has given rise to a significant concern: the proliferation of fake news. In response to this challenge, a shared task in DravidianLangTech EACL2024 was initiated to detect fake news and classify its types in the Malayalam language. The shared task consisted of two sub-tasks. Task 1 focused on a binary classification problem, determining whether a piece of news is fake or not. Whereas task 2 delved into a multi-class classification problem, categorizing news into five distinct levels. Our approach involved the exploration of various machine learning (RF, SVM, XGBoost, Ensemble), deep learning (BiLSTM, CNN), and transformer-based models (MuRIL, Indic-SBERT, m-BERT, XLM-R, Distil-BERT) by emphasizing parameter tuning to enhance overall model performance. As a result, we introduce a fine-tuned MuRIL model that leverages parameter tuning, achieving notable success with an F1-score of 0.86 in task 1 and 0.5191 in task 2. This successful implementation led to our system securing the 3rd position in task 1 and the 1st position in task 2. The source code will be found in the GitHub repository at this link: https://github.com/Salman1804102/DravidianLangTech-EACL-2024-FakeNews
... Reference [27] proposed a multilingual approach to fake news detection that takes into account the presence of satire. The authors use a dataset of news articles in multiple languages and developed a machine learning model to classify them as real or fake. ...
Sarcasm is a mode of expression whereby individuals communicate their positive or negative sentiments through words contrary to their intent. This communication style is prevalent in news headlines and social media platforms, making it increasingly challenging for individuals to detect sarcasm accurately. To mitigate this challenge, developing an intelligent system that can detect sarcasm in headlines and news is imperative. This research paper proposes a deep learning architecture-based model for sarcasm identification in news headlines. The proposed model has three main objectives: (1) to comprehend the original meaning of the text or headlines, (2) to learn the nature of sarcasm, and (3) to detect sarcasm in the text or headlines. Previous studies on sarcasm detection have utilized datasets of tweets and employed hashtags to differentiate between ordinary and sarcastic tweets depending on the limited dataset. However, these datasets were prone to noise regarding language and tags. In contrast, using multiple datasets in this study provides a comprehensive understanding of sarcasm detection in online communication. By incorporating different types of sarcasm from the Sarcasm Corpus V2 from Baskin Engineering and sarcastic news headlines from The Onion and HuffPost, the study aims to develop a model that can generalize well across different contexts. The proposed model uses LSTM to capture temporal dependencies, while the proposed model utilizes a GlobalMaxPool1D layer for better feature extraction. The model was evaluated on training and test data with an accuracy score of 0.999 and 0.925, respectively.
... However, the spread of COVID-19 fake news is a global issue, so it is important to develop machine learning models that can work in multiple languages. Recently, computational approaches have been developed that can deal with fake news in multiple languages [23,[75][76][77], but multilingual biomedical information extraction remains a challenge. MetaMap and scispaCy can be applied only to English text [52,78]. ...
The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 fake news based on the content of news articles. However, the use of biomedical information, which is often featured in COVID-19 news, has not been explored in the development of these models. We present a novel approach for predicting COVID-19 fake news by leveraging biomedical information extraction (BioIE) in combination with machine learning models. We analyzed 1164 COVID-19 news articles and used advanced BioIE algorithms to extract 158 novel features. These features were then used to train 15 machine learning classifiers to predict COVID-19 fake news. Among the 15 classifiers, the random forest model achieved the best performance with an area under the ROC curve (AUC) of 0.882, which is 12.36% to 31.05% higher compared to models trained on traditional features. Furthermore, incorporating BioIE-based features improved the performance of a state-of-the-art multi-modality model (AUC 0.914 vs. 0.887). Our study suggests that incorporating biomedical information into fake news detection models improves their performance, and thus could be a valuable tool in the fight against the COVID-19 infodemic.
... We rely on a dataset created from scratch, which contains 2300 articles annotated by two experts. To our knowledge, there are no other French fake news datasets on climate change, as most of the existing studies concerning fake news in French have focused on satire and social media [11][12][13]. In this study, we classify articles according to three labels by adding a class of biased articles which are neither fake nor real news. ...
The unprecedented scale of disinformation on the Internet for more than a decade represents a serious challenge for democratic societies. When this process is focused on a well-established subject such as climate change, it can subvert measures and policies that various governmental bodies have taken to mitigate the phenomenon. It is therefore essential to effectively identify and counteract fake news on climate change. To do this, our main contribution represents a novel dataset with more than 2300 articles written in French, gathered using web scraping from all types of media dealing with climate change. Manual labeling was performed by two annotators with three classes: “fake”, “biased”, and “true”. Machine Learning models ranging from bag-of-words representations used by an SVM to Transformer-based architectures built on top of CamemBERT were built to automatically classify the articles. Our results, with an F1-score of 84.75% using the BERT-based model at the article level coupled with hand-crafted features specifically tailored for this task, represent a strong baseline. At the same time, we highlight perceptual properties as text sequences (i.e., fake, biased, and irrelevant text fragments) at the sentence level, with a macro F1 of 45.01% and a micro F1 of 78.11%. Based on these results, our proposed method facilitates the identification of fake news, and thus contributes to better education of the public.
... Meanwhile, the recent rise of conversational agents and the need to process large volumes of social media content point to the necessity of automatic humour recognition [32]. Humour and irony studies are now crucial when it comes to social listening [15,24,25,37], dialogue systems (chatbots), recommender systems, reputation monitoring, and the detection of fake news [18] and hate speech [14]. However, the automatic detection, location, and interpretation of humorous wordplay in particular has so far been limited to punning. ...
... Guibon et al. [24] proposed different approaches for fake news detection systems and, based on redundant information, tried to find a connection between satire and fake news, achieving an accuracy of 93% for several datasets. ...
This paper proposes a supervised machine learning system to detect fake news in online sources published in Romanian. Additionally, this work presents a comparison of the obtained results by using recurrent neural networks based on long short-term memory and gated recurrent unit cells, a convolutional neural network, and a Bidirectional Encoder Representations from Transformers (BERT) model, namely RoBERT, a pre-trained Romanian BERT model. The deep learning architectures are compared with the results achieved by two classical classification algorithms: Naïve Bayes and Support Vector Machine. The proposed approach is based on a Romanian news corpus containing 25,841 true news items and 13,064 fake news items. The best result is over 98.20%, achieved by the convolutional neural network, which outperforms the standard classification methods and the BERT models. Moreover, based on irony detection and sentiment analysis systems, additional details are revealed about the irony phenomenon and sentiment analysis field which are used to tackle fake news challenges.
... The style of fake news and how it is written may differ country to country as well, so a dataset from a country speaking that language would be a good contribution, instead of translating existing datasets into other languages. Some research have detected multi lingual satire detection [106], others have detected general fake news in English, Spanish and Portuguese only. [107] D. REAL TIME DETECTION ...
One of the major components of Societal Digitalization is Online social networks (OSNs). OSNs can expose people to different popular trends in various aspects of life and alter people’s beliefs, behaviors, and decisions and communication. Social bots and malicious users are the significant sources for spreading misinformation on social media and can pose serious cyber threats in society. The degree of similarity of user profiles of a cyber bot and a malicious user spreading fake news is so great that it is very difficult to differentiate both based on their attributes. Over the years, researchers have attempted to find a way to mitigate this problem. However, the detection of fake news spreaders across OSNs remains a challenge. In this paper, we have provided a comprehensive survey of the state of art methods for detecting malicious users and bots based on different features proposed in our novel taxonomy. We have also aimed to avert the crucial problem of fake news detection by discussing several key challenges and potential future research areas to help researchers who are new to this field.
... One of the promising works (Abonizio et al., 2020) in multilingual misinformation detection explores language-independent fake news detection, which successfully differentiate fake, satirical, and legitimate news across three different languages. Another multilingual work presented in Guibon et al. (2019) uses the convolutional neural network (CNN) to detect fake news with satire on a multilingual dataset. The works presented in (Li et al., 2020b;Glenski et al., 2019) are the major contributors for multilingual multimodal misinformation detection. ...
... However, the studies of satire detection in computational perspective is rare [10]. Apart from being used as an indicator in the studies of fake news detection [3], [11], [12], there is only a handful of credible studies on satire detection. Most of these existing works are only done in the manual way [13]. ...
This work discuss the task of automatically detecting satire instances in short articles. It is the study of extracting the most optimal features by using a deep learning architecture combined with carefully handcrafted contextual features. It is found that a few sets can perform well when they are used independently, but the others not so much. However, even the latter sets become very useful after the combination process with the former sets. This shows that each of the feature sets are significant. Finally, the combined feature sets undergoes the classification using well-known machine learning classification algorithms. The best algorithm for this task is found to be Logistic Regression. The outcome of all the experiments are good in all the metrics used. The result comparison to existing works in the same domain shows that the proposed method is slightly better with 0.94 in terms of F1-measure, while existing works managed to obtain 0.91 (Yang
et al.
, 2017), 0.90 (Zhang
et al.
, 2016), and 0.88 (Rubin
et al.
, 2016). The performance of each feature sets are also given as additional information. The main purpose of this work is to show that the combination of features extracted using supervised learning with the ones extracted manually can yield a good performance. It is also to open doors for other researchers to take into account the contextual meaning behind a figurative language type such as satire.