PreprintPDF Available

Adversarial machine learning for protecting against online manipulation

Preprints and early-stage research may not have been peer reviewed yet.


Adversarial examples are inputs to a machine learning system that result in an incorrect output from that system. Attacks launched through this type of input can cause severe consequences: for example, in the field of image recognition, a stop signal can be misclassified as a speed limit indication.However, adversarial examples also represent the fuel for a flurry of research directions in different domains and applications. Here, we give an overview of how they can be profitably exploited as powerful tools to build stronger learning models, capable of better-withstanding attacks, for two crucial tasks: fake news and social bot detection.
Adversarial machine learning
for protecting against online manipulation
Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi
Abstract—Adversarial examples are inputs to a machine learn-
ing system that result in an incorrect output from that system.
Attacks launched through this type of input can cause severe
consequences: for example, in the field of image recognition,
a stop signal can be misclassified as a speed limit indication.
However, adversarial examples also represent the fuel for a flurry
of research directions in different domains and applications. Here,
we give an overview of how they can be profitably exploited
as powerful tools to build stronger learning models, capable of
better-withstanding attacks, for two crucial tasks: fake news and
social bot detection.
KeywordsI.2.4 Knowledge representation formalisms and meth-
ods; H.2.8.d Data mining; O.8.15 Social science methods or tools
The year was 1950, and in his paper ‘Computing Machinery
and Intelligence’, Alan Turing asked this question to his
audience: Can a machine think rationally? A question partly
answered by the machine learning (ML) paradigm, whose
traditional definition is as follows: “A computer program is
said to learn from experience Ewith respect to some class
of tasks Tand performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E” [1].
If we define the experience Eas ‘what data to collect’, the
task Tas ‘what decisions the software needs to make’, and
the performance measurement Pas ‘how we will evaluate its
results’, then it becomes possible to evaluate the capability
of the program to complete the task correctly – that is, to
recognize the type of data, evaluating its performance.
To date, ML helps us to achieve multiple goals, it pro-
vides recommendations to customers based on their previous
purchases or gets rid of spam in the inbox based on spam
received previously, just to name a couple of examples. In the
image recognition field, the above program has been trained by
feeding it different images, thus learning to distinguish them.
However, in 2014, Google and NYU researchers showed how
it was possible to fool the classifier, specifically an ensemble
of neural networks called ConvNets, by adding noise to the
image of a panda. The program classified the panda plus the
added noise as a gibbon, with a 99% confidence. The modified
S. Cresci is with the Institute of Informatics and Telematics, IIT-CNR, Pisa,
M. Petrocchi is with the Institute of Informatics and Telematics, IIT-CNR,
Pisa, Italy and Scuola IMT Alti Studi Lucca, Lucca, Italy.
A. Spognardi is with the Dept. of Computer Science, Università di Roma
‘La Sapienza’, Roma, Italy.
S. Tognazzi is with the Centre for the Advanced Study of Collective
Behaviour and the Dept. of Computer and Information Science, Konstanz
University, Konstanz, Germany.
image is called adversarial example [2]. Formally, given a
data distribution p(x, y)over images xand labels yand a
classifier fsuch that f(x) = y, an adversarial example is
a modified input ˜x=x+δsuch that δis a very small
(human-imperceptible) perturbation and fx)6=y, namely ˜x
is misclassified while xwas not. Still in visual recognition,
it is possible to ‘perturb’ a road sign that reproduces, e.g., a
stop sign by placing small stickers on it, so that the classifier
identifies it as a speed limit sign [3], see Figure 1(a). Another
noteworthy attack exploits the so-called ‘adversarial patches’:
the opponent does not even need to know the target that the
classifier has been trained to recognize. Simply adding a patch
to the input can lead the system to decide that what it has been
given to classify is exactly what the patch represents. It became
popular the case of the trained model exchanging a banana for
a toaster, having patched an adversarial example next to the
banana [4]. The few previous examples demonstrate the risks
caused by the vulnerability of ML systems to intentional data
manipulations, for the field of computer vision.
In recent years, ML was also at the core of a plethora of
efforts in other domains. In particular, some of those that have
seen massive application of ML and AI are intrinsically adver-
sarial – that is, they feature the natural and inevitable presence
of adversaries motivated in fooling the ML systems. In fact,
adversarial examples are extremely relevant in all security-
sensitive applications, where any misclassification induced by
an attacker represents a security threat. A paramount example
of such applications is the fight against online abuse and
manipulation, which often come under the form of fake news
and social bots.
So, how to defend against attackers who try to deceive
the model through adversarial examples? It turned out that
adversarial examples are not exclusively a threat to the reli-
ability of ML models. Instead, they can also be leveraged as
a very effective mean to strengthen the models themselves. A
‘brute force’ mode, the so-called Adversarial Training, sees the
model designers pretend to be attackers: they generate several
adversarial examples against their own model and, then, train
the model not to be fooled by them.
Along these lines, in the remainder we briefly survey
relevant literature on adversarial examples and Adversarial
Machine Learning (AML). AML aims at understanding when,
why, and how learning models can be attacked, and the
techniques that can mitigate attacks. We will consider two
phenomena whose detection is polluted by adversaries and
that are bound to play a crucial role in the coming years for
the security of our online ecosystems: fake news and social
bots. Contrary to computer vision, the adoption of AML in
these fields is still in its infancy, as shown in Figure 2, despite
arXiv:2111.12034v1 [cs.LG] 23 Nov 2021
(a) Computer vision. Images can be mod-
ified by adding adversarial patches so as to
fool image classification systems (e.g., those
used by autonomous vehicles).
(b) Automatic speech recogni-
tion. Adding adversarial noise to
a speech waveform may result in
wrong textual translations.
(c) Social bot detection. Simi-
larly to computer vision and au-
tomatic speech recognition, ad-
versarial attacks can alter the fea-
tures of social bots, without im-
pacting their activity, thus allow-
ing them to evade detection.
(d) Fake news detection. Tam-
pering with the textual content of
an article, or even with its com-
ments, may yield wrong article
Fig. 1. Adversarial examples and their consequences, for a few notable ML tasks.
its many advantages. Current endeavors mainly focus on the
identification of adversarial attacks, and only seldom on the
development of solutions that leverage adversarial examples
for improving detection systems. The application of AML in
these fields is still largely untapped, and its study will provide
valuable insights for driving future research efforts and getting
practical advantages.
Fake news are often defined as ‘fabricated information that
mimics news media content in form but not in organizational
process or intent’ [5], [6]. Their presence has been documented
in several contexts, such as politics, vaccinations, food habits
and financial markets.
False stories have always circulated centuries before the
Internet. One thinks, for instance, of the manoeuvres carried
out by espionage and counter-espionage to get the wrong
information to the enemy. If fake stories have always existed,
why are we so concerned about them now? The advent of
the Internet, while undoubtedly facilitating the access to news,
has lowered the editorial standards of journalism, and its open
nature has led to a proliferation of user-generated content, un-
screened by any moderator [7].
Usually, fake news is published on some little-known outlet,
and amplified through social media posts, quite often using
so-called social bots. Those are software algorithms that can
perfectly mimic the behaviour of a genuine account and
maliciously generate artificial hype [8], [9].
In recent years, research has intensified efforts to combat
both the creation and spread of fake news, as well as the use
of social bots. Currently, the most common detection methods
for both social bots (as tools for spreading) and fake news are
based on supervised machine learning algorithms. In many
cases, these approaches achieve very good performances on
considered test cases. Unfortunately, state-of-the-art detection
techniques suffer from attacks that critically degrade the per-
formances of the learning algorithms. In a classical adversarial
game, social bots evolved over time [8]: while early bots
in the late 2000s were easily detectable only by looking at
static account information or simple indicators of activity,
sophisticated bots are nowadays almost indistinguishable from
genuine accounts. We can observe the same adversarial game
in fake news. Recent studies show that it is possible to subtly
act on the title, content, or source of the news, to invert
the result of a classifier: from true to false news, and vice
versa [10].
Learning algorithms have been adopted with the aim of
detecting false news by, e.g., using textual features, such as the
title and content of the article. Also, it has been shown that
users’ comments and replies can be valid features to unveil
low or no reputable textual content [11].
Regrettably. algorithms can be fooled. As an example,
TextBugger [12] is a general attack framework for generating
adversarial text that can trick sentiment analysis classifiers,
such as [13], into erroneous classifications via marginal mod-
ifications of the text, such as adding or removing individual
words, or even single characters. Moreover, not only can a fake
news classifier be fooled by tampering with part of the news,
but also by acting on comments and replies. Figure 1(d) exem-
plifies the attack: a detector correctly identifies a real article as
indeed real. Unfortunately, by inserting a fake comment as part
of its inputs, the same detector is misled to predict the article
Fig. 2. Adversarial machine learning lead to a rise of adversarial approaches for the detection of manipulated multimedia, fake news, and social bots.
as fake instead. Fooling fake news detectors via adversarial
comment generation has been demonstrated feasible by Le
et al. in [14]. Leveraging the alteration of social responses,
such as comments and replies, to fool the classifier prediction
is advantageous because the attacker does not have to own
the published piece (in order to be able to modify it after
the publication), and the passage from ‘written-by-humans’ to
self-generated text is less susceptible to detection by the naked
eye. In fact, comments and replies are usually accepted, even
if written in an informal style and with scarce quality. Also,
work in [14] shows how it is possible to generate adversarial
comments of high quality and relevance with the original news,
even at the level of the whole sentence.
Recent advances in text generation make even possible to
generate coherent paragraphs of text. This is the case, for ex-
ample, of GPT-2 [15], a language model trained on a dataset of
8M Web pages. Being trained on a myriad of different subjects,
GPT-2 leads to the generation of surprisingly high quality texts,
outperforming other language models with domain-specific
training (like news, or books). Furthermore, in [16], the authors
study how to preserve a topic in synthetic news generation.
Contrary to GPT-2, which selects the most probable word from
the vocabulary as the next word to generate, a reinforcement
learning agent tries to select words that optimize the matching
of a given topic.
Achievements in text generation have positive practical im-
plications, such as, e.g., translation. Concerns, however, have
arisen because malicious actors can exploit these generators
to produce false news automatically. While most of online
disinformation today is manually written, as progress continues
in natural language text generation, the creation of propaganda
and realistic-looking hoaxes will grow at scale [17]. In [18],
for example, the authors present Grover, a model for con-
trollable text generation, with the aim of defending against
fake news. Given a headline, Grover can generate the rest of
the article, and vice versa. Interestingly, investigating the level
of credibility, articles generated with propagandist tone result
more credible to human readers, rather than articles with the
same tone, but written by humans. If, on the one hand, this
shows how to exploit text generators to obtain ‘reliable fake
news’, on the other hand, it is the double-edged blade that
allows the reinforcement of the model. Quoting from [18]:
‘the best defense against Grover turns out to be Grover itself’,
as able to achieve 92% accuracy in discriminating between
human-written and auto-generated texts. Grover is just one
of other news generators that obtain noticeable results: a vast
majority of the news generated by Grover and four others can
fool human readers, as well as a neural network classifier,
specifically trained to detect fake news [16]. Texts generated by
the recent upgrade of GPT-2, GPT-3, get even more impressive
results in resembling hand-written stories [19].
Finally, Miller et al. consider the discrimination between
true and fake news very challenging [20]: it is enough, e.g.,
to change a verb from the positive to the negative form
to completely change the meaning of the sentence. They
therefore see the study of the news source as a possible way
of combating this type of attack.
The roots of adversarial bot detection date back to 2011 and
almost coincide with the initial studies on bots themselves [8].
Between 2011 and 2013 – that is, soon after the first efforts
for detecting automated online accounts – several scholars
became aware of the evolutionary nature of social bots. In fact,
while the first social bots that inhabited our online ecosystems
around 2010 were extremely simple and visibly untrustworthy
accounts, those that emerged in subsequent years featured
increased sophistication. This change was the result of the
development efforts put in place by botmasters and puppeteers
for creating automated accounts capable of evading early
detection techniques [21]. Comparative studies between the
first bots and subsequent ones, such as those in [22], unveiled
the evolutionary nature of social bots and laid the foundations
for adversarial bot detection. Notably, bot evolution still goes
on, fueled by the latest advances in powerful computational
techniques that allow mimicking human behavior better than
ever before [23].
Based on these initial findings, since 2011 some initial
solutions were proposed for detecting evolving social bots [22].
These techniques, however, were still based on traditional
approaches to the task of social bot detection, such as those
based on general purpose, supervised machine learning algo-
rithms [8]. Regarding the methodological approach, the novelty
of this body of work mainly revolved around the identification
of those machine learning features that seemed capable of
allowing the detection of the sophisticated bots. The test
of time, however, proved such assumptions wrong. In fact,
those features that initially seemed capable of identifying the
sophisticated bots, started yielding unsatisfactory performance
soon after their proposal [21].
It was not until 2017 that adversarial social bot detection
really ignited. Since then, several approaches were proposed
in rapid succession for testing the detection capabilities of
existing bot detectors, when faced with artfully created ad-
versarial examples. Among the first adversarial examples of
social bots there were accounts that did not exist yet, but whose
behaviors and characteristics were simulated, as done in [24],
[25]. There, authors used genetic algorithms to ‘optimize’
the sequence of actions of groups of bots so that they could
achieve their malicious goals, while being largely misclassified
as legitimate, human-operated accounts. Similarly, [26] trained
a text-generation deep learning model based on latent user
representations (i.e., embeddings) to create adversarial fake
posts that would allow malicious users to escape Facebook’s
detector TIES [27]. Other adversarial social bot examples
were accounts developed and operated ad-hoc for the sake of
evaluating the detection capabilities of existing bot detectors,
as done in [28]. Experimentation with such examples helped
scholars understand the weaknesses of existing bot detection
systems, as a first step for improving them. However, the
aforementioned early body of work on adversarial social bot
examples still suffered from a major drawback. All such
works adopted ad-hoc solutions for generating artificial bots,
thus lacking broad applicability. Indeed, some solutions were
tailored for testing specific detectors [24], [25], while others
relied on manual interventions, thus lacking scalability and
generality [28].
With the widespread recognition of AML as an extremely
powerful learning paradigm, also came new and state-of-the-art
approaches for adversarial social bot detection. A paramount
example of this spillover is the work proposed in [29]. There,
authors leverage a generative adversarial network (GAN) for
artificially generating a large number of adversarial bot exam-
ples with which they trained downstream bot detectors. Results
demonstrated that this approach augments the training phase
of the bot detector, thus significantly boosting its detection
performance. Similarly, a GAN is also used in [30] to generate
latent representations of malicious users solely based on the
representations of benign ones. The representations of the real
benign users are leveraged in combination with the artificial
representations of the malicious users to train a discriminator
for distinguishing between benign and malicious users.
The success of a learning system is crucial in many scenarios
of our life, be it virtual or not: correctly recognizing a
road sign, or discriminating between genuine and fake news.
Here, we focused on adversarial examples—created to fool a
trained model, and on AML—which exploits such examples
to strengthen the model.
Remarkably, adversarial examples, originally in the lime-
light especially in the field of computer vision, now threaten
various domains. We concentrated on the recognition of false
news and false accounts and we highlighted how, despite
the antagonistic nature of the examples, scholars are moving
proactively to let attack patterns be curative and reinforce the
learning machines. Outside computer vision, these efforts are
still few and far between. Improvements along this direction
are thus much needed, especially in those domains that are
naturally polluted by adversaries.
[1] T. Mitchell, Machine Learning. McGraw-Hill, 1997.
[2] I. J. Goodfellow et al., “Explaining and harnessing adversarial exam-
ples,” in ICLR, 2015.
[3] K. Eykholt et al., “Robust physical-world attacks on deep learning
visual classification,” in CVPR, 2018.
[4] T. B. Brown et al., “Adversarial patch,” in NeurIPS Workshops, 2017.
[5] D. M. J. Lazer et al., “The science of fake news,” Science, vol. 359,
no. 6380, pp. 1094–1096, 2018.
[6] T. Quandt et al., “Fake news,” in The International Encyclopedia of
Journalism Studies. Wiley, 2019, pp. 1–6.
[7] C. Gangware and W. Nemr, Weapons of Mass Distraction: Foreign
State-Sponsored Disinformation in the Digital Age. Park Advisors,
[8] S. Cresci, “A decade of social bot detection,Communications of the
ACM, vol. 63, no. 10, pp. 72–83, 2020.
[9] X. Zhou and R. Zafarani, “A survey of fake news: Fundamental theories,
detection methods, and opportunities,” ACM Comput. Surv., vol. 53,
no. 5, Sep. 2020.
[10] B. D. Horne et al., “Robust fake news detection over time and attack,”
ACM Trans. Intell. Syst. Technol., vol. 11, no. 1, 2019.
[11] K. Shu et al., “dEFEND: Explainable fake news detection,” in ACM
KDD, 2019.
[12] J. Li et al., “Textbugger: Generating adversarial text against real-world
applications,” in NDSS, 2019.
[13] X. Zhang et al., “Character-level convolutional networks for text clas-
sification,” in NeurIPS, 2015.
[14] T. Le et al., “MALCOM: Generating malicious comments to attack
neural fake news detection models,” in IEEE ICDM, 2020.
[15] A. Radford et al., “Language models are unsupervised multitask learn-
ers,” OpenAI, vol. 1, no. 8, p. 9, 2019.
[16] A. Mosallanezhad et al., “Topic-preserving synthetic news gen-
eration: An adversarial deep reinforcement learning approach,”
arXiv:2010.16324, 2020.
[17] G. Da San Martino et al., “A survey on computational propaganda
detection,” in IJCAI, 2020.
[18] R. Zellers et al., “Defending against neural fake news,” in NeurIPS,
[19] T. B. Brown et al., “Language models are few-shot learners,” in
NeurIPS, 2020.
[20] D. J. Miller et al., “Adversarial learning targeting deep neural network
classification: A comprehensive review of defenses against attacks,”
Proceedings of the IEEE, vol. 108, no. 3, pp. 402–433, 2020.
[21] S. Cresci et al., “The paradigm-shift of social spambots: Evidence,
theories, and tools for the arms race,” in ACM WWW, 2017.
[22] C. Yang et al., “Empirical evaluation and new design for fighting evolv-
ing Twitter spammers,IEEE Transactions on Information Forensics
and Security, vol. 8, no. 8, pp. 1280–1293, 2013.
[23] D. Boneh et al., “How relevant is the Turing test in the age of
sophisbots?” IEEE Security & Privacy, vol. 17, no. 6, pp. 64–71, 2019.
[24] S. Cresci et al., “On the capability of evolved spambots to evade
detection via genetic engineering,” Online Social Networks and Media,
vol. 9, pp. 1–16, 2019.
[25] ——, “Better safe than sorry: An adversarial approach to improve social
bot detection,” in ACM WebSci, 2019.
[26] B. He et al., “PETGEN: Personalized text generation attack on deep
sequence embedding-based classification models,” in ACM KDD, 2021.
[27] N. Noorshams et al., “TIES: Temporal interaction embeddings for
enhancing social media integrity at Facebook,” in ACM KDD, 2020.
[28] C. Grimme et al., “Social bots: Human-like by means of human
control?” Big data, vol. 5, no. 4, pp. 279–293, 2017.
[29] B. Wu et al., “Using improved conditional generative adversarial
networks to detect social bots on Twitter,” IEEE Access, vol. 8, pp.
36 664–36 680, 2020.
[30] P. Zheng et al., “One-class adversarial nets for fraud detection,” in AAAI,
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
On the morning of November 9th 2016, the world woke up to the shocking outcome of the US Presidential elections: Donald Trump was the 45th President of the United States of America. An unexpected event that still has tremendous consequences all over the world. Today, we know that a minority of social bots – automated social media accounts mimicking humans – played a central role in spreading divisive messages and disinformation, possibly contributing to Trump's victory [16, 19]. In the aftermath of the 2016 US elections, the world started to realize the gravity of widespread deception in social media. Following Trump's exploit, we witnessed to the emergence of a strident dissonance between the multitude of efforts for detecting and removing bots, and the increasing effects that these malicious actors seem to have on our societies [27, 29]. This paradox opens a burning question: What strategies should we enforce in order to stop this social bot pandemic? In these times – during the run-up to the 2020 US elections – the question appears as more crucial than ever. Particularly so, also in light of the recent reported tampering of the electoral debate by thousands of AI-powered accounts. What stroke social, political and economic analysts after 2016 – deception and automation – has been however a matter of study for computer scientists since at least 2010. In this work, we briefly survey the first decade of research in social bot detection. Via a longitudinal analysis, we discuss the main trends of research in the fight against bots, the major results that were achieved, and the factors that make this never-ending battle so challenging. Capitalizing on lessons learned from our extensive analysis, we suggest possible innovations that could give us the upper hand against deception and manipulation. Studying a decade of endeavors at social bot detection can also inform strategies for detecting and mitigating the effects of other – more recent – forms of online deception, such as strategic information operations and political trolls.
Full-text available
The explosive growth in fake news and its erosion to democracy, justice, and public trust has increased the demand for fake news detection and intervention. This survey reviews and evaluates methods that can detect fake news from four perspectives: (1) the false knowledge it carries, (2) its writing style, (3) its propagation patterns, and (4) the credibility of its source. The survey also highlights some potential research tasks based on the review. In particular, we identify and detail related fundamental theories across various disciplines to encourage interdisciplinary research on fake news. We hope this survey can facilitate collaborative efforts among experts in computer and information sciences, social sciences, political science, and journalism to research fake news, where such efforts can lead to fake news detection that is not only efficient but more importantly, explainable.
Full-text available
The detection and removal of malicious social bots in social networks has become an area of interest in industry and academia. The widely used bot detection method based on machine learning leads to an imbalance in the number of samples in different categories. Classifier bias leads to a low detection rate of minority samples. Therefore, we propose an improved conditional generative adversarial network (improved CGAN) to extend imbalanced data sets before applying training classifiers to improve the detection accuracy of social bots. To generate an auxiliary condition, we propose a modified clustering algorithm, namely, the Gaussian kernel density peak clustering algorithm (GKDPCA), which avoids the generation of data-augmentation noise and eliminates imbalances between and within social bot class distributions. Furthermore, we improve the CGAN convergence judgment condition by introducing the Wasserstein distance with a gradient penalty, which addresses the model collapse and gradient disappearance in the traditional CGAN. Three common oversampling algorithms are compared in experiments. The effects of the imbalance degree and the expansion ratio of the original data on oversampling are studied, and the improved CGAN performs better than the others. Experimental results comparing with three common oversampling algorithms show that the improved CGAN achieves the higher evaluation scores in terms of F1-score, G-mean and AUC.
Full-text available
Popular culture has contemplated societies of intelligent machines for generations. Today, we find ourselves at the doorstep of technology that can at least simulate thinking, feeling, and other behaviors. The question is: Now what?
Full-text available
Conference Paper
In recent years, to mitigate the problem of fake news, computational detection of fake news has been studied, producing some promising early results. While important, however, we argue that a critical missing piece of the study be the explainability of such detection, i.e., why a particular piece of news is detected as fake. In this paper, therefore, we study the explainable detection of fake news. We develop a sentence-comment co-attention sub-network to exploit both news contents and user comments to jointly capture explainable top-$k$ check-worthy sentences and user comments for fake news detection. We conduct extensive experiments on real-world datasets and demonstrate that the proposed explainable detection method not only significantly outperforms 7 state-of-the-art fake news detection methods by at least 5.33\% in F1-score but also (concurrently) identifies top-k user comments that explain why a news piece is fake, better than baselines by 28.2\% in NDCG and 30.7\% in Precision.
With wide deployment of machine learning (ML)-based systems for a variety of applications including medical, military, automotive, genomic, multimedia, and social networking, there is great potential for damage from adversarial learning (AL) attacks. In this article, we provide a contemporary survey of AL, focused particularly on defenses against attacks on deep neural network classifiers. After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), backdoor DP, and reverse engineering (RE) attacks and particularly defenses against the same. In so doing, we distinguish robust classification from anomaly detection (AD), unsupervised from supervised, and statistical hypothesis-based defenses from ones that do not have an explicit null (no attack) hypothesis. We also consider several scenarios for detecting backdoors. We provide a technical assessment for reviewed works, including identifying any issues/limitations, required hyperparameters, needed computational complexity, as well as the performance measures evaluated and the obtained quality. We then delve deeper, providing novel insights that challenge conventional AL wisdom and that target unresolved issues, including: robust classification versus AD as a defense strategy; the belief that attack success increases with attack strength, which ignores susceptibility to AD; small perturbations for TTE attacks: a fallacy or a requirement; validity of the universal assumption that a TTE attacker knows the ground-truth class for the example to be attacked; black, gray, or white-box attacks as the standard for defense evaluation; and susceptibility of query-based RE to an AD defense. We also discuss attacks on the privacy of training data. We then present benchmark comparisons of several defenses against TTE, RE, and backdoor DP attacks on images. The article concludes with a discussion of continuing research directions, including the supreme challenge of detecting attacks whose goal is not to alter classification decisions, but rather simply to embed, without detection, "fake news" or other false content.
In this study, we examine the impact of time on state-of-the-art news veracity classifiers. We show that, as time progresses, classification performance for both unreliable and hyper-partisan news classification slowly degrade. While this degradation does happen, it happens slower than expected, illustrating that hand-crafted, content-based features, such as style of writing, are fairly robust to changes in the news cycle. We show that this small degradation can be mitigated using online learning. Last, we examine the impact of adversarial content manipulation by malicious news producers. Specifically, we test three types of attack based on changes in the input space and data availability. We show that static models are susceptible to content manipulation attacks, but online models can recover from such attacks.