Conference PaperPDF Available

ConTheModel: Can We Modify Tweets to Confuse Classifier Models?

Authors:

Abstract

News on social media can significantly influence users, manipulating them for political or economic reasons. Adversarial manipulations in the text have proven to create vulnerabilities in classifiers, and the current research is towards finding classifier models that are not susceptible to such manipulations. In this paper, we present a novel technique called ConTheModel, which slightly modifies social media news to confuse machine learning (ML)-based classifiers under the black-box setting. ConTheModel replaces a word in the original tweet with its synonym or antonym to generate tweets that confuse classifiers. We evaluate our technique on three different scenarios of the dataset and perform a comparison between five well-known machine learning algorithms, which includes Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP) to demonstrate the performance of classifiers on the modifications done by ConTheModel. Our results show that the classifiers are confused after modification with the utmost drop of 16.36%. We additionally conducted a human study with 25 participants to validate the effectiveness of ConTheModel and found that the majority of participants (65%) found it challenging to classify the tweets correctly. We hope our work will help in finding robust ML models against adversarial examples.
ConTheModel: Can We Modify Tweets
to Confuse Classifier Models?
Aishwarya Ram Vinay , Mohsen Ali Alawami , and Hyoungshick Kim(B
)
Department of Electrical and Computer Engineering, Sungkyunkwan University,
Suwon, South Korea
{aishwarya,mohsencomm,hyoung}@skku.edu
Abstract. News on social media can significantly influence users,
manipulating them for political or economic reasons. Adversarial manip-
ulations in the text have proven to create vulnerabilities in classifiers,
and the current research is towards finding classifier models that are not
susceptible to such manipulations. In this paper, we present a novel tech-
nique called ConTheModel, which slightly modifies social media news
to confuse machine learning (ML)-based classifiers under the black-box
setting. ConTheModel replaces a word in the original tweet with its syn-
onym or antonym to generate tweets that confuse classifiers. We evaluate
our technique on three different scenarios of the dataset and perform a
comparison between five well-known machine learning algorithms, which
includes Support Vector Machine (SVM), Naive Bayes (NB), Random
Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multilayer
Perceptron (MLP) to demonstrate the performance of classifiers on the
modifications done by ConTheModel. Our results show that the classi-
fiers are confused after modification with the utmost drop of 16.36%. We
additionally conducted a human study with 25 participants to validate
the effectiveness of ConTheModel and found that the majority of par-
ticipants (65%) found it challenging to classify the tweets correctly. We
hope our work will help in finding robust ML models against adversarial
examples.
Keywords: Machine learning ·Social media ·Adversarial examples ·
Tweets
1 Introduction
Various social media platforms are used by a large number of population world-
wide for communication as it is easily accessible. Statistics show that in 2020,
approximately 3.6 billion people were using social media, and this number would
increase by almost another billion by 2025 [6]. All is fine unless otherwise, when
what is passed off as “news” on social media is often disinformation. Contrary to
real news, fake news develops stories instead of reporting facts. Last October, a
new law was passed in Singapore, which bans the spreading of false information.
This law does so by allowing the government to instruct popular online social
c
Springer Nature Switzerland AG 2021
Y. Park et al. (Eds.): SVCC 2020, CCIS 1383, pp. 205–219, 2021.
https://doi.org/10.1007/978-3-030-72725-3_15
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
There have been many efforts to detect rumors using various machine learning (ML) models, but there is still a lack of understanding of their performance against different rumor topics and available features, resulting in a significant performance degrade against completely new and unseen (unknown) rumors. To address this issue, we investigate the relationship between ML models, features, and rumor topics to select the best rumor detection model under specific conditions using 13 different ML models. Our experiment results demonstrate that there is no clear winner among the ML models in all different rumor topics with respect to the detection performance. To overcome this problem, a possible way is to use an ensemble of ML models. Although previous work presents an improved detection of rumors using ensemble solutions (ES), their evaluation did not consider detecting unknown rumors. Further, they did not present nor evaluate the configuration of the ES to ensure that it indeed performs better than using a single ML model. Based on these observations, we propose to evaluate the use of an ES by examining their unknown rumor detection performance compared with single ML models but as well as different configurations of the ESes. Our experimental results using real-world datasets found that an ES of Random Forest, XGBoost and Multilayer perceptron overall produced the best F1 score of 0.79 for detecting unknown rumors, a significant improvement compared with a single best ML model which only achieved a 0.58 F1 score. We also showed that not all ESes are the same, with significantly degraded detection and large variations in performance when different ML models are used to construct the ES. Hence, it is infeasible to rely on any single ML model-based rumor detector. Finally, our solution also performed better than other recent detectors, such as eventAI and NileTMRG that performed similar to using a single ML model – making it a much more attractive solution to detect unknown rumors in practice.
Article
Full-text available
The widespread adoption of smartphones dramatically increases the risk of attacks and the spread of mobile malware, especially on the Android platform. Machine learning-based solutions have been already used as a tool to supersede signature-based anti-malware systems. However, malware authors leverage features from malicious and legitimate samples to estimate statistical difference in-order to create adversarial examples. Hence, to evaluate the vulnerability of machine learning algorithms in malware detection, we propose five different attack scenarios to perturb malicious applications (apps). By doing this, the classification algorithm inappropriately fits the discriminant function on the set of data points, eventually yielding a higher misclassifica-tion rate. Further, to distinguish the adversarial examples from benign samples, we propose two defense mechanisms to counter attacks. To validate our attacks and solutions, we test our model on three different benchmark datasets. We also test our methods using various classifier algorithms and compare them with the state-of-the-art data poisoning method using the Jacobian matrix. Promising results show that generated adver-sarial samples can evade detection with a very high probability. Additionally, evasive variants generated by our attack models when used to harden the developed anti-malware system improves the detection rate up to
Conference Paper
Full-text available
Rumors can cause devastating consequences to individual and/or society. Analysis shows that widespread of rumors typically results from deliberately promoted information campaigns which aim to shape collective opinions on the concerned news events. In this paper, we attempt to fight such chaos with itself to make automatic rumor detection more robust and effective. Our idea is inspired by adversarial learning method originated from Generative Adversarial Networks (GAN). We propose a GAN-style approach, where a generator is designed to produce uncertain or conflicting voices, complicating the original conversational threads in order to pressurize the discriminator to learn stronger rumor indicative representations from the augmented, more challenging examples. Different from traditional data-driven approach to rumor detection, our method can capture low-frequency but stronger non-trivial patterns via such adversarial training. Extensive experiments on two Twitter benchmark datasets demonstrate that our rumor detection method achieves much better results than state-of-the-art methods.
Chapter
Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG’s news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text.
Article
We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.