Article

Editorial: Applying Machine Learning for Combating Fake News and Internet/Media Content Manipulation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Nowadays, societies, businesses and citizens are strongly dependent on information, and information has become one of the most crucial (societal and economical) values. People expect that both traditional and online media provide trustful and reliable news and content. The right to be informed is one of the fundamental requirements for making the right decisions on a small scale (e.g., during shopping) and large scale (e.g., during general or presidential elections). However, information is not always reliable because digital content may be manipulated, and its spreading could also be used for disinformation. This is true especially with the proliferation of online media, where news travels fast and is often based on User Generated Content (UGC), while there is often little time and few resources for the information to be carefully cross-checked. Moreover, disinformation and media manipulation can be part of hybrid warfare and malicious propaganda. Such false content should be detected as soon as possible to avoid its negative influence on the readers, and, in some cases, on political decisions. Part of these challenges and vivid problems can be addressed by innovative machine learning, artificial intelligence and soft computing methods. Therefore, the main aim of our special issue on Applying Machine Learning for Combating Fake News and Internet/Media Content Manipulation in the Applied Soft Computing journal was to gather a set of high-quality papers presenting new approaches and solutions for media and content manipulation and disinformation detection. We also encourage papers concerning the problem of early detection of radicalization and hate speech based on fake information and/or manipulated content. We were very positively surprised by the strong feedback and the considerable number of submissions we received, from which we finally could select 14 for publication. The papers in this issue can roughly be divided into three categories: survey papers, fake news detection in social media, and image manipulation detection. It is also worth mentioning that the work on this special issue was accompanied by the webinar “Machine Learning to combat Fake News and Media Manipulation” which took place on April 20, 2021,1 organized jointly by the Applied Soft Computing journal, Patterns Journal, Elsevier editorial team and invited editors of this special issue.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
В статті виконано аналіз небезпечного впливу хвостосховищ утворених від промислового виробництва на навколишнє середовище методами машинного навчання з метою їх кластеризації. При різних розмірах хвостосховищ, їх токсичності та параметрів була визначена їх положення в кластерах, що дозволило класифікувати стан рівня їх небезпеки. Зроблено порівняльний аналіз підходів машинного навчання та інших методів пошуку аналогів для визначення потенційної небезпеки хвостосховищ. За допомогою методу DBSCAN було визначено хвостосховища, які за своїми параметрами потрапляли до кореневого, граничного та шумового кластерів. Всі хвостосховища, що потрапили до кореневого та граничного кластерів, було класифіковано наступним методом кластеризації – k-середніх, і вони були розподілені до одного з чотирьох кластерів. Кластеризація методом k-середніх будувалася на вибірках даних хвостосховищ шляхом порівняння метрики відстані між витягнутими в одновимірний вектор матрицями суміжності. Наведено результати оцінки роботи алгоритму, що підтверджують, що моделювання та пошук центрів кластеризації за допомогою k-середніх є більш комплексним рішенням задачі в порівнянні з методом агломеративної кластеризації, який було приведено до порівняння k-середніх. Таким чином точність використаного підходу кластеризації виявився вищою, ніж у існуючих класичних алгоритмах кластеризації. Метою роботи є кластеризація хвостосховищ, що утворені від діяльності промислового виробництва, за рівнем їх небезпечного впливу на навколишнє середовище з використанням методів машинного навчання. Методологія. Для досягнення мети використані методи машинного навчання та аналізу даних, що реалізуються в рамках фреймворку Sklearn мови програмування Python, використаний пакет роботи з даними pandas, що дозволяє обробляти великі масиви даних з використанням гнучкої системи форматування та запитів до них. Групування даних здійснювалося з використанням групи методів навчання без вчителя (unsupervised learning): алгоритмів кластеризації DBSCAN, кластерного аналізу методом ієрархічної деревоподібної кластеризації та методом k-середніх, методу головних компонентів. Наукова новизна. Наукова новизна полягає у виборі способів кластеризації хвостосховищ за допомогою методів кластерного аналізу. Висновки. За допомогою методів машинного навчання виконана кластеризація хвостосховищ, в залежності від ступеня їх небезпеки для навколишнього середовища.
Article
Full-text available
Copy-move is a very popular image falsification where a semantically coherent part of the image, the source area, is copied and pasted at another position within the same image as the so-called target area. The majority of existing copy-move detectors search for matching areas and thus identify the source and target zones indifferently, while only the target really represents a tampered area. To the best of our knowledge, at the moment of preparing this paper there has been only one published method called BusterNet that is capable of performing source and target disambiguation by using a specifically designed deep neural network. Different from the deep-learning-based BusterNet method, we propose in this paper a source and target disentangling approach based on local statistical model of image patches. Our proposed method acts as a second-stage detector after a first stage of copy-move detection of duplicated areas. We had the following intuition: even if no manipulation (e.g., scaling and rotation) is added on target area, its boundaries should expose a statistical deviation from the pristine area and the source area; further, if the target area is manipulated, the deviation should appear not only on the boundaries but on the full zone. Our method relies on machine learning tool with Gaussian Mixture Model to describe likelihood of image patches. Likelihoods are then compared between the pristine region and the candidate source/target areas as identified by the first-stage detector. Experiments and comparisons demonstrate the effectiveness of the proposed method.
Article
Full-text available
Identifying the origin of information posted on social media and how this may have changed over time can be very helpful to users in determining whether they trust it or not. This currently requires disproportionate effort for the average social media user, who instead has to rely on fact-checkers or other intermediaries to identify information provenance for them. We show that it is possible to disintermediate this process by providing an automated mechanism for determining the information cascade where a post belongs. We employ a transformer-based language model as well as pre-trained ResNet50 model for image similarity, to decide whether two posts are sufficiently similar to belong to the same cascade. By using semantic similarity, as well as image in addition to text, we increase accuracy where there is no explicit diffusion of reshares. In a new dataset of 1,200 news items on Twitter, our approach is able to increase clustering performance above 7% and 4.5% for the validation and test sets respectively over the previous state of the art. Moreover, we employ a probabilistic subsampling mechanism, reducing significantly cascade creation time without affecting the performance of large-scale semantic text analysis and the quality of information cascade generation. We have implemented a prototype that offers this new function-ality to the user and have deployed it in our own instance of social media platform Mastodon.
Article
Full-text available
Concernings related to image security have increased in the last years. One of the main reasons relies on the replacement of conventional photography to digital images, once the development of new technologies for image processing, as much as it has helped in the evolution of many new techniques in forensic studies, it also provided tools for image tampering. In this context, many companies and researchers devoted many efforts towards methods for detecting such tampered images, mostly aided by autonomous intelligent systems. Therefore, this work focuses on introducing a rigorous survey contemplating the state-of-the-art literature on computer-aided tampered image detection using machine learning techniques, as well as evolutionary computation, neural networks, fuzzy logic, Bayesian reasoning, among others. Besides, it also contemplates anomaly detection methods in the context of images due to the intrinsic relation between anomalies and tampering. Moreover, it aims at recent and in-depth researches relevant to the context of image tampering detection, performing a survey over more than 100 works related to the subject, spanning across different themes related to image tampering detection. Finally, a critical analysis is performed over this comprehensive compilation of literature, yielding some research opportunities and discussing some challenges in an attempt to align future efforts of the community with the niches and gaps remarked in this exciting field.
Article
Fake news has become a major concern over the Internet. It influences people directly and should be identified. In the recent years, various Machine Learning (ML) and Deep Learning (DL) based data-driven approaches have been suggested for fake news classification. Most of the ML based approaches use hand-crafted features extracted from input textual content. Moreover, in DL based approaches, an efficient word embedding representation of input data is also a major concern. This paper presents a deep learning framework, BerConvoNet, to classify the given news text into fake or real with minimal error. The presented framework has two main building blocks: a news embedding block (NEB) and a multi-scale feature block (MSFB). NEB uses Bidirectional Encoder Representations from Transformers (BERT) for extracting word embeddings from a news article. Next, these embeddings are fed as an input to MSFB. The MSFB consists of multiple kernels (filters) of varying sizes. It extracts various features from news word embedding. The output of MSFB is fed as an input to a fully connected layer for classification. To validate the performance of BerConvoNet, several experiments have been performed on four benchmark datasets and various performance measures are used to evaluate the results. Furthermore, the ablative experiments with respect to news article embedding, kernel size, and batch size have been carried out to ensure the quality of prediction. Comparative analysis of the presented model is done with other state of the art models. It shows that BerConvoNet outplays other models on various performance metrics.
Article
Social media platforms have radically transformed the creation and dissemination of news. Users can easily access this news in a fast and efficient manner. However, some users might post negative and fraudulent content in the form of comments or posts. Such content can constitute a threat to an individual or an organization. Therefore, the identification of fake news has become a major research field in natural language processing (NLP). The main challenge is to determine whether the news is real or fake. In this paper, we propose an attention-based convolutional bidirectional long short-term memory (AC-BiLSTM) approach for detecting fake news and classifying them into six categories. The evaluation of our proposed approach on a benchmarked dataset shows a significant improvement in accuracy rate in comparison with other existing classification models. In particular, this work contributes to the progress in the field of detecting fake news and confirms the feasibility of our proposed approach in classifying fake news on social media.
Article
In recent years, with the fast development of the internet and online platforms such as social media feeds, news blogs, and online newspapers, deceptive reports have been universally spread online. This manipulated news is a matter of concern due to its potential role in shaping public opinion. Therefore, the fast spread of fake news creates an urgent need for automatic systems to detect deceitful articles. This motivates many researchers to introduce solutions for the automatic classification of news items. This paper proposed a novel system to detect fake news articles based on content-based features and the WOA-Xgbtree algorithm. The proposed system can be applied in different scenarios to classify news articles. The proposed approach consists of two main stages: first, the useful features are extracted and analyzed, and then an Extreme Gradient Boosting Tree (xgbTree) algorithm optimized by the Whale Optimization Algorithm (WOA) to classify news articles using extracted features. In our experiments, we considered the bases of the investigation on classification accuracy and the F1-measure. Then, we compared the optimized model with several benchmark classification algorithms based on a dataset that compiled over 40,000 various news articles recently. The results indicate that the proposed approach achieved good classification accuracy and F1 measure rate and successfully classified over 91 percent of articles.
Article
Undoubtedly, social media, such as Facebook and Twitter, constitute a major part of our everyday life due to the incredible possibilities they offer to their users. However, Twitter and generally online social networks (OSNs), are increasingly used by automated accounts, widely known as bots, due to their immense popularity across a wide range of user categories. Their main purpose is the dissemination of fake news, the promotion of specific ideas and products, the manipulation of the stock market and even the diffusion of sexually explicit material. Therefore, the early detection of bots in social media is quite essential. In this paper, two methods are introduced targeting this that are mainly based on Natural Language Processing (NLP) to distinguish legitimate users from bots. In the first method, a feature extraction approach is proposed for identifying accounts posting automated messages. After applying feature selection techniques and dealing with imbalanced datasets, the subset of features selected is fed in machine learning algorithms. In the second method, a deep learning architecture is proposed to identify whether tweets have been posted by real users or generated by bots. To the best of the authors’ knowledge, there is no prior work on using an attention mechanism for identifying bots. The introduced approaches have been evaluated over a series of experiments using two large real Twitter datasets and demonstrate valuable advantages over other existing techniques targeting the identification of malicious users in social media.
Article
The rapid progress of sophisticated image editing tools has made it easier to manipulate original face images and create fake media content by putting one’s face to another. In addition to image editing tools, creating natural-looking fake human faces can be easily achieved by Generative Adversarial Networks (GANs). However, malicious use of these new media generation technologies can lead to severe problems, such as the development of fake pornography, defamation, or fraud. In this paper, we introduce a novel Handcrafted Facial Manipulation (HFM) image dataset and soft computing neural network models (Shallow-FakeFaceNets) with an efficient facial manipulation detection pipeline. Our neural network classifier model, Shallow-FakeFaceNet (SFFN), shows the ability to focus on the manipulated facial landmarks to detect fake images. The detection pipeline only relies on detecting fake facial images based on RGB information, not leveraging any metadata, which can be easily manipulated. Our results show that our method achieves the best performance of 72.52% in Area Under the Receiver Operating Characteristic (AUROC), gaining 3.99% F1-score and 2.91% AUROC on detecting handcrafted fake facial images, and 93.99% on detecting small GAN-generated fake images, gaining 1.98% F1-score and 10.44% AUROC compared to the best performing state-of-the-art classifier. This study is targeted for developing an automated defense mechanism to combat fake images used in different online services and applications, leveraging our state-of-the-art hand-crafted fake facial dataset (HFM) and the neural network classifier Shallow-FakeFaceNet (SFFN). In addition, our work presents various experimental results that can help guide better applied soft computing research in the future to effectively combat and detect human and GAN-generated fake face images.
Article
Over recent years, the development of online social media has dramatically changed the way people connect and share information. It is undeniable that social platform has promoted the quickest type of spread for fake stories. Almost all the current online fact-checking sources and researches are concentrating on the validating political content and context. The proposed system in this paper provides a complete visual data analytics methods to assist users in achieving a comprehensive understanding of malicious activities at multiple levels such as adversary’s behaviour, victim’s behaviour, content, and context level. In this paper, we investigate a variety of datasets from different aspects such as role, vulnerabilities, influential level, and distribution pattern. The proposed method in this paper focuses on automatic fake/hostile activity detection by utilizing a variety of machine learning (ML) techniques, deep learning models, natural language processes (NLP), and social network analysis (SNA) techniques. Different auxiliary models, such as bot detection, user credibility, and text readability, are deployed to generate additional influential features. The classification performance of ten different machine learning algorithms using a variety of well-known datasets is evaluated by utilizing 10-fold cross-validation.
Article
Fake news has now grown into a big problem for societies and also a major challenge for people fighting disinformation. This phenomenon plagues democratic elections, reputations of individual persons or organizations, and has negatively impacted citizens, (e.g., during the COVID-19 pandemic in the US or Brazil). Hence, developing effective tools to fight this phenomenon by employing advanced Machine Learning (ML) methods poses a significant challenge. The following paper displays the present body of knowledge on the application of such intelligent tools in the fight against disinformation. It starts by showing the historical perspective and the current role of fake news in the information war. Proposed solutions based solely on the work of experts are analysed and the most important directions of the application of intelligent systems in the detection of misinformation sources are pointed out. Additionally, the paper presents some useful resources (mainly datasets useful when assessing ML solutions for fake news detection) and provides a short overview of the most important R&D projects related to this subject. The main purpose of this work is to analyse the current state of knowledge in detecting fake news; on the one hand to show possible solutions, and on the other hand to identify the main challenges and methodological gaps to motivate future research.
Article
Today’s advanced multimedia tools allow us to create photorealistic computer graphic images, effortlessly. There are various fields such as the film industry, virtual reality, video games where computer-generated (CG) images are used widely. CG images can also be misused in many ways. Therefore, there is a need of distinguishing CG images from real photographic (PG) images. This paper proposes a method to distinguish CG images from PG images using a two-stream convolutional neural network (CNN). In the proposed method, the first stream takes the advantage of the knowledge learned by the pre-trained VGG-19 network, and then this knowledge is transferred to learn the distinct features of CG and PG images. Here, we propose a second stream, that preprocesses the images using three high-pass filters which aim to help the network to focus on noise-based distinct features of CG and PG images. Finally, we propose an ensemble model to merge the outcomes of the proposed two streams. Comparative and self-analysis experiments demonstrates that the proposed method outperforms the state-of-the-art methods in terms of classification accuracy. The experimental results also show that the proposed method performs satisfactorily under the additive white Gaussian noise postprocessing operation in the images.
Article
Fake news has increased dramatically in social media in recent years. This has prompted the need for effective fake news detection algorithms. Capsule neural networks have been successful in computer vision and are receiving attention for use in Natural Language Processing (NLP). This paper aims to use capsule neural networks in the fake news detection task. We use different embedding models for news items of different lengths. Static word embedding is used for short news items, whereas non-static word embeddings that allow incremental uptraining and updating in the training phase are used for medium length or long news statements. Moreover, we apply different levels of n-grams for feature extraction. Our proposed models are evaluated on two recent well-known datasets in the field, namely ISOT and LIAR. The results show encouraging performance, outperforming the state-of-the-art methods by 7.8% on ISOT and 3.1% on the validation set, and 1% on the test set of the LIAR dataset.
Article
Commencement of research towards compromised account detection in email and web services foreshadows the growth of the same in social network scenario. In this paper, continuous authentication of textual content has been performed for incessant authorship verification to detect compromised accounts in social networks. Four categories of features namely, content free, content specific, stylometric and folksonomy are extracted and evaluated. Experiments are performed with 3057 twitter users taking 4000 latest tweets for each user. It is evident from the experiments that consistency maintained on features is different for each user. Hence, various statistical and similarity-based feature selection techniques are used to rank and select optimal features for each user which are further combined using a popular rank aggregation technique called BORDA. Also, performance of various supervised machine learning classifiers is analyzed on the basis of different evaluation metrics. Experimental results state that for the undertaken problem, SVM with rbf kernel outperformed other classifiers namely, kNN, Random Forest, Gradient Boosted and Multi Layer Perceptron, attaining a maximum F-score of 94.57% under the varied parameter settings.
Article
Image manipulation plays important role in fake news spreading and it may cause ethical, economic, or political problems for people and sometimes for countries. Image integrity verification becomes a very important research issue due to increasing the forged images on the Internet and social media. The objective of this paper is presenting an accurate approach for digital image forgery detection has enough capability to sense any small image tampering and robustness against image manipulation attacks. The first step in the proposed approach is converting the RGB image into YCbCr space, then, the Hilbert–Huang Transform (HHT) features extracted from the chrominance-red component Cr, then, three different classifiers; Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Artificial Neuron Networks (ANN) have been tested and compared for image classification into authentic or forged. The results are verified using Structural-Similarity (SSIM) to calculate the forgery detection accuracy. The proposed approach has been tested with seven different manipulation images datasets; CASIA-V1, CASIA-V2, MICC-F2000, MICC-F600, MICC-F220, CoMoFoD and additional dataset collected from different Internet websites and social media. Furthermore, the proposed approach has been tested against post-processing attacks such as; image compression, adding Gaussian noises or adjusting the contrast of the image. The results show that, SVM classifier has achieved the highest accuracy compared to ANN and KNN classifiers. The proposed approach has been compared with other published approaches, and the comparison proved its superiority over the previously published approaches.