Automatic Fake News Detection with Pre-trained Transformer Models

To read the full-text of this research, you can request a copy directly from the authors.


The automatic detection of disinformation and misinformation has gained attention during the last years, since fake news has a critical impact on democracy, society, and journalism and digital literacy. In this paper, we present a binary content-based classification approach for detecting fake news automatically, with several recently published pre-trained language models based on the Transformer architecture. The experiments were conducted on the FakeNewsNet dataset with XLNet, BERT, RoBERTa, DistilBERT, and ALBERT and various combinations of hyperparameters. Different preprocessing steps were carried out with only using the body text, the titles and a concatenation of both. It is concluded that Transformers are a promising approach to detect fake news, since they achieve notable results, even without using a large dataset. Our main contribution is the enhancement of fake news’ detection accuracy through different models and parametrizations with a reproducible result examination through the conducted experiments. The evaluation shows that already short texts are enough to attain 85% accuracy on the test set. Using the body text and a concatenation of both reach up to 87% accuracy. Lastly, we show that various preprocessing steps, such as removing outliers, do not have a significant impact on the models prediction output.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... -Removing only punctuation -Removing mentions, hashtags, and links -Removing mentions, hashtags, links, digits, punctuation, and non-ASCII symbols Based on related work on sexism detection [8] and hate speech detection with transformer models [7], we decided to test different pre-processing pipelines for both languages. Also, corresponding approaches have shown promising results in detecting disinformation with transformer models and using various pre-processing pipelines [11]. Of all pre-processing steps, the last pipeline had the best fine-tuning results for the multilingual approach. ...
Full-text available
Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sEXism Identification in Social neTworks (EXIST) at IberLEF 2021 is an international competition in the field of Natural Language Processing (NLP) with the aim to automatically identify sexism in social media content by applying machine learning methods. Thereby sexism detection is formulated as a coarse (binary) classification problem and a fine-grained classification task that distinguishes multiple types of sexist content (e.g., dominance, stereotyping, and objectification). This paper presents the contribution of the AIT_FHSTP team at the EXIST2021 benchmark for both tasks. To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R. Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data. For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets and fine-tuning on the provided dataset. The best run for the binary classification (task 1) achieves a macro F1-score of 0.7752 and scores 5th rank in the benchmark; for the multiclass classification (task 2) our best submission scores 6th rank with a macro F1-score of 0.5589.
Full-text available
In recent years, the problem of rumours on online social media (OSM) has attracted lots of attention. Researchers have started investigating from two main directions. First is the descriptive analysis of rumours and secondly, proposing techniques to detect (or classify) rumours. In the descriptive line of works, where researchers have tried to analyse rumours using NLP approaches, there isn't much emphasis on psycho-linguistics analyses of social media text. These kinds of analyses on rumour case studies are vital for drawing meaningful conclusions to mitigate misinformation. For our analysis, we explored the PHEME-9 rumour dataset (consisting of 9 events), including source tweets (both rumour and non-rumour categories) and response tweets. We compared the rumour and non-rumour source tweets and then their corresponding reply (response) tweets to understand how they differ linguistically for every incident. Furthermore, we also evaluated if these features can be used for classifying rumour vs. non-rumour tweets through machine learning models. To this end, we employed various classical and ensemble-based approaches. To filter out the highly discriminative psycholinguistic features, we explored the SHAP AI Explainability tool. To summarise, this research contributes by performing an in-depth psycholinguistic analysis of rumours related to various kinds of events.
Full-text available
The explosive growth in fake news and its erosion to democracy, justice, and public trust has increased the demand for fake news detection and intervention. This survey reviews and evaluates methods that can detect fake news from four perspectives: (1) the false knowledge it carries, (2) its writing style, (3) its propagation patterns, and (4) the credibility of its source. The survey also highlights some potential research tasks based on the review. In particular, we identify and detail related fundamental theories across various disciplines to encourage interdisciplinary research on fake news. We hope this survey can facilitate collaborative efforts among experts in computer and information sciences, social sciences, political science, and journalism to research fake news, where such efforts can lead to fake news detection that is not only efficient but more importantly, explainable.
Full-text available
Social media has become a popular means for people to consume and share the news. At the same time, however, it has also enabled the wide dissemination of fake news, that is, news with intentionally false information, causing significant negative effects on society. To mitigate this problem, the research of fake news detection has recently received a lot of attention. Despite several existing computational solutions on the detection of fake news, the lack of comprehensive and community-driven fake news data sets has become one of major roadblocks. Not only existing data sets are scarce, they do not contain a myriad of features often required in the study such as news content, social context, and spatiotemporal information. Therefore, in this article, to facilitate fake news-related research, we present a fake news data repository FakeNewsNet, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information. We present a comprehensive description of the FakeNewsNet, demonstrate an exploratory analysis of two data sets from different perspectives, and discuss the benefits of the FakeNewsNet for potential applications on fake news study on social media.
Full-text available
We investigate BERT in an evidence retrieval and claim verification pipeline for the task of evidence-based claim verification. To this end, we propose to use two BERT models, one for retrieving evidence sentences supporting or rejecting claims, and another for verifying claims based on the retrieved evidence sentences. To train the BERT retrieval system, we use pointwise and pairwise loss functions and examine the effect of hard negative mining. Our system achieves a new state of the art recall of 87.1 for retrieving evidence sentences out of the FEVER dataset 50K Wikipedia pages, and scores second in the leaderboard with the FEVER score of 69.7.
Full-text available
With the ever-increasing rate of information dissemination and absorption, “Fake News” has become a real menace. People these days often fall prey to fake news that is in line with their perception. Checking the authenticity of news articles manually is a time-consuming and laborious task, thus, giving rise to the requirement for automated computational tools that can provide insights about degree of fake news for news articles. In this paper, a Natural Language Processing (NLP) based mechanism is proposed to combat this challenge of classifying news articles as either fake or real. Transfer learning on the Bidirectional Encoder Representations from Transformers (BERT) language model has been applied for this task. This paper demonstrates how even with minimal text pre-processing, the fine-tuned BERT model is robust enough to perform significantly well on the downstream task of classification of news articles. In addition, LSTM and Gradient Boosted Tree models have been built to perform the task and comparative results are provided for all three models. Fine-tuned BERT model could achieve an accuracy of 97.021% on NewsFN data and is able to outperform the other two models by approximately eight percent.
Conference Paper
Full-text available
This paper presents state of the art methods for addressing three important challenges in automated fake news detection: fake news detection, domain identification, and bot identification in tweets. The proposed solutions achieved first place in a recent international competition on fake news. For fake news detection, we present two models. The winning model in the competition combines similarity between the embedding of each article’s title and the embedding of the top five corresponding google search results. The new model relies on advances in Natural Language Understanding (NLU) end to end deep learning models to identify stylistic differences between legitimate and fake news articles. This second model was developed after the competition and outperforms the winning approach. For news domain detection, the winning model is a hybrid approach composed of named entity features concatenated with semantic embeddings derived from end to end models. For twitter bot detection, we propose to use the following features: duration between account creation and tweet date, presence of a tweet’s link, presence of user’s location, other tweet’s features, and the tweets’ metadata. Experiments include insights into the importance of the different features and the results indicate the superior performances of all proposed models.
Full-text available
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT (Devlin et al., 2019). Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.
Full-text available
News currently spreads rapidly through the internet. Because fake news stories are designed to attract readers, they tend to spread faster. For most readers, detecting fake news can be challenging and such readers usually end up believing that the fake news story is fact. Because fake news can be socially problematic, a model that automatically detects such fake news is required. In this paper, we focus on data-driven automatic fake news detection methods. We first apply the Bidirectional Encoder Representations from Transformers model (BERT) model to detect fake news by analyzing the relationship between the headline and the body text of news. To further improve performance, additional news data are gathered and used to pre-train this model. We determine that the deep-contextualizing nature of BERT is best suited for this task and improves the 0.14 F-score over older state-of-the-art models.
Conference Paper
Full-text available
Now is “The Information Age” i.e. the data age. It is in this period of human history that many aspects have undergone a paradigm shift including means of consumption of news. Recently witnessed is an inevitable dependency on Social Media for gaining information or news on a daily basis. This dependency further accounts vulnerability as there are manipulators who are intentionally willing to exploit this scenario to spread Fake News. Also, adding fuel to the fire is that the advent of Big Data disposal in form of articles, headlines, videos, tweets, posts and hashtags, has increased this vulnerability manifolds. Due to diversified sources of information and the increase in use of social media to consume news, the legitimacy of the data has become a serious concern today. It is a matter of critical apprehension that what is being asserted as fact or being accepted as authentic information should be tested on anvil of veracity. This problem has potential to cause political as well as social harm. This paper focuses on comprehensive study of all such approaches and analyzing the approaches adapted to combat the issue of Fake News for laying a better foreground of the issue and finding out further research scope in this area. A comparative analysis of publicly available datasets for research in this area is also presented.
Conference Paper
Full-text available
The proliferation and rapid diffusion of fake news on the Internet highlight the need of automatic hoax detection systems. In the context of social networks, machine learning (ML) methods can be used for this purpose. Fake news detection strategies are traditionally either based on content analysis (i.e. analyzing the content of the news) or - more recently - on social context models, such as mapping the news’ diffusion pattern. In this paper, we first propose a novel ML fake news detection method which, by combining news content and social context features, outperforms existing methods in the literature, increasing their already high accuracy by up to 4.8%. Second, we implement our method within a Facebook Messenger chatbot and validate it with a real-world application, obtaining a fake news detection accuracy of 81.7%.
Full-text available
Social media for news consumption is becoming popular nowadays. The low cost, easy access and rapid information dissemination of social media bring benefits for people to seek out news timely. However, it also causes the widespread of fake news, i.e., low-quality news pieces that are intentionally fabricated. The fake news brings about several negative effects on individual consumers, news ecosystem, and even society trust. Previous fake news detection methods mainly focus on news contents for deception classification or claim fact-checking. Recent Social and Psychology studies show potential importance to utilize social media data: 1) Confirmation bias effect reveals that consumers prefer to believe information that confirms their existing stances; 2) Echo chamber effect suggests that people tend to follow likeminded users and form segregated communities on social media. Even though users' social engagements towards news on social media provide abundant auxiliary information for better detecting fake news, but existing work exploiting social engagements is rather limited. In this paper, we explore the correlations of publisher bias, news stance, and relevant user engagements simultaneously, and propose a Tri-Relationship Fake News detection framework (TriFN). We also provide two comprehensive real-world fake news datasets to facilitate fake news research. Experiments on these datasets demonstrate the effectiveness of the proposed approach.
Full-text available
This paper is based on a review of how previous studies have defined and operationalized the term “fake news.” An examination of 34 academic articles that used the term “fake news” between 2003 and 2017 resulted in a typology of types of fake news: news satire, news parody, fabrication, manipulation, advertising, and propaganda. These definitions are based on two dimensions: levels of facticity and deception. Such a typology is offered to clarify what we mean by fake news and to guide future studies.
Full-text available
Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of "fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Conference Paper
Full-text available
A fake news detection system aims to assist users in detecting and filtering out varieties of potentially deceptive news. The prediction of the chances that a particular news item is intentionally deceptive is based on the analysis of previously seen truthful and deceptive news. A scarcity of deceptive news, available as corpora for predictive modeling, is a major stumbling block in this field of natural language processing (NLP) and deception detection. This paper discusses three types of fake news, each in contrast to genuine serious reporting, and weighs their pros and cons as a corpus for text analytics and predictive modeling. Filtering, vetting, and verifying online information continues to be essential in library and information science (LIS), as the lines between traditional news and online information are blurring.
Online social media promotes the development of the news industry and make it easy for everyone to obtain the latest news. Meanwhile, the circumstances get worse because of fake news. Fake news is flooding and become a serious threat which may cause high societal and economic losses, making fake news detection important. Unlike traditional one, news on social media tends to be short and misleading, which is more confusing to identify. On the other hand, fake news may contain parts of the facts and parts of the incorrect contents in one statement, which is not so clear and simple to classify. Hence, we propose a two-stage model to deal with the difficulties. Our model is built on BERT, a pre-trained model with a more powerful feature extractor Transformer instead of CNN or RNN. Besides, some accessible information is used to extend features and calculate attention weights. At last, inspired by fine-grained sentiment analysis, we treat fake news detection as fine-grained multiple-classification task and use two similar sub-models to identify different granularity labels separately. We evaluate our model on a real-world benchmark dataset. The experimental results demonstrate its effectiveness in fine-grained fake news detection and its superior performance to the baselines and other competitive approaches.
The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users’ engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions.
The phenomenon of fake news is nothing new. It has been around as long as people have had a vested interest in manipulating opinions and images, dating back to historical times, for which written accounts exist and probably much beyond. Referring to it as post-truth seems futile, as there’s probably never been an era of truth when it comes to news. More recently, however, the technical means and the widespread use of social media have propelled the phenomenon onto a new level altogether. Individuals, organizations, and state-actors actively engage in propaganda and the use of fake news to create insecurity, confusion, and doubt and promote their own agenda frequently of a financial or political nature. We discuss the history of fake news and some reasons as to why people are bound to fall for it. We address signs of fake news and ways to detect it or, to at least become more aware of it and discuss the subject of truthfulness of messages and the perceived information quality of platforms. Some examples from the recent past demonstrate how fake news have played a role in a variety of scenarios. We conclude with remarks on how to tackle the phenomenon – to eradicate it will not be possible in the near term. But employing a few sound strategies might mitigate some of the harmful effects.
Conference Paper
The widespread of fake news on social media has resulted with serious real-world impacts, mounting concerns among the global net users in the last few years. This has also drawn interest from researchers around the globe to work on deception detection mechanisms to mitigate the problem. The goal is to realize a mechanism that is automatic, robust, reliable and efficient, despite various challenges that might hamper the efforts. In this paper, we present a review on the state-of-the-art of fake news detection mechanisms on social media. We first discuss the background of the problems that are surrounding fake news and the impacts it has on the users. We further describe the definition of fake news and discuss on different deception detection approaches presented in categories such as the content-based, social context-based and hybrid-based methods. We conclude the paper with four keys of open research challenges that may guide future research.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Language models are few-shot learners
  • T B Brown
Brown, T.B., et al.: Language models are few-shot learners (2020)
Automatic online fake news detection combining content and social signals
  • Della Vedova
  • M L Tacchini
  • E Moret
  • S Ballarin
  • G Dipierro
  • M De Alfaro
Della Vedova, M.L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., de Alfaro, L.: Automatic online fake news detection combining content and social signals. In: 2018 22nd Conference of Open Innovations Association (FRUCT), pp. 272-279 (2018)
A survey on natural language processing for fake news detection
  • R Oshikawa
  • J Qian
  • W Y Wang
Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection (2018)
Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter
  • V Sanh
  • L Debut
  • J Chaumond
  • T Wolf
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter (2019)
Transfer learning from transformers to fake news challenge stance detection (FNC-1) task
  • V Slovikovskaya
Slovikovskaya, V.: Transfer learning from transformers to fake news challenge stance detection (FNC-1) task (2019)
Transformer: a novel neural network architecture for language understanding
  • J Uszkoreit
Uszkoreit, J.: Transformer: a novel neural network architecture for language understanding, August 2017. Accessed 01 Dec 2019
Localization of fake news detection via multitask transfer learning
  • J C B Cruz
  • J A Tan
  • C Cheng
Cruz, J.C.B., Tan, J.A., Cheng, C.: Localization of fake news detection via multitask transfer learning. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 2596-2604. European Language Resources Association, Marseille, France, May 2020.
Taking a stance on fake news: towards automatic disinformation assessment via deep bidirectional transformer language models for stance detection
  • C Dulhanty
  • J L Deglint
  • I B Daya
  • A Wong
Dulhanty, C., Deglint, J.L., Daya, I.B., Wong, A.: Taking a stance on fake news: towards automatic disinformation assessment via deep bidirectional transformer language models for stance detection (2019)
From word embeddings to pretrained language models -a new age in NLP
  • S Ghelanie
Ghelanie, S.: From word embeddings to pretrained language models -a new age in NLP -part 2 (2019). dings-to-pretrained-language-models-a-new-age-in-nlp-part-2-e9af9a0bdcd9?gi= bf1f5e22e8e4. Accessed 03 Mar 2020
Understanding the promise and limits of automated fact-checking
  • L Graves
Graves, L.: Understanding the promise and limits of automated fact-checking, February 2018
Bert explained: state of the art language model for NLP
  • R Horev
Horev, R.: Bert explained: state of the art language model for NLP, November 2018. Accessed 05 Nov 2019
Defining “fake news”. Digit
  • E C Tandoc
  • Z W Lim
  • R Ling
  • EC Tandoc Jr
Fake news detection using deep learning
  • À I Rodríguez
  • L L Iglesias
Rodríguez,À.I., Iglesias, L.L.: Fake news detection using deep learning (2019)
BERT: pre-training of deep bidirectional transformers for language understanding
  • J Devlin
  • M W Chang
  • K Lee
  • K Toutanova
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://www.
Factuality classification using the pre-trained language representation model BERT
  • J Mao
  • W Liu
Mao, J., Liu, W.: Factuality classification using the pre-trained language representation model BERT. In: IberLEF@SEPLN (2019)
Demystifying BERT: a comprehensive guide to the groundbreaking NLP framework
  • M S Z Rizvi
Rizvi, M.S.Z.: Demystifying BERT: a comprehensive guide to the groundbreaking NLP framework, September 2019. 09/demystifying-bert-groundbreaking-nlp-framework/. Accessed 05 Nov 2019
Fake news detection as natural language inference
  • K C Yang
  • T Niven
  • H Y Kao
Yang, K.C., Niven, T., Kao, H.Y.: Fake news detection as natural language inference (2019)