Adversarial machine learning
for protecting against online manipulation
Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi
Abstract—Adversarial examples are inputs to a machine learn-
ing system that result in an incorrect output from that system.
Attacks launched through this type of input can cause severe
consequences: for example, in the ﬁeld of image recognition,
a stop signal can be misclassiﬁed as a speed limit indication.
However, adversarial examples also represent the fuel for a ﬂurry
of research directions in different domains and applications. Here,
we give an overview of how they can be proﬁtably exploited
as powerful tools to build stronger learning models, capable of
better-withstanding attacks, for two crucial tasks: fake news and
social bot detection.
Keywords—I.2.4 Knowledge representation formalisms and meth-
ods; H.2.8.d Data mining; O.8.15 Social science methods or tools
The year was 1950, and in his paper ‘Computing Machinery
and Intelligence’, Alan Turing asked this question to his
audience: Can a machine think rationally? A question partly
answered by the machine learning (ML) paradigm, whose
traditional deﬁnition is as follows: “A computer program is
said to learn from experience Ewith respect to some class
of tasks Tand performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E” .
If we deﬁne the experience Eas ‘what data to collect’, the
task Tas ‘what decisions the software needs to make’, and
the performance measurement Pas ‘how we will evaluate its
results’, then it becomes possible to evaluate the capability
of the program to complete the task correctly – that is, to
recognize the type of data, evaluating its performance.
To date, ML helps us to achieve multiple goals, it pro-
vides recommendations to customers based on their previous
purchases or gets rid of spam in the inbox based on spam
received previously, just to name a couple of examples. In the
image recognition ﬁeld, the above program has been trained by
feeding it different images, thus learning to distinguish them.
However, in 2014, Google and NYU researchers showed how
it was possible to fool the classiﬁer, speciﬁcally an ensemble
of neural networks called ConvNets, by adding noise to the
image of a panda. The program classiﬁed the panda plus the
added noise as a gibbon, with a 99% conﬁdence. The modiﬁed
S. Cresci is with the Institute of Informatics and Telematics, IIT-CNR, Pisa,
M. Petrocchi is with the Institute of Informatics and Telematics, IIT-CNR,
Pisa, Italy and Scuola IMT Alti Studi Lucca, Lucca, Italy.
A. Spognardi is with the Dept. of Computer Science, Università di Roma
‘La Sapienza’, Roma, Italy.
S. Tognazzi is with the Centre for the Advanced Study of Collective
Behaviour and the Dept. of Computer and Information Science, Konstanz
University, Konstanz, Germany.
image is called adversarial example . Formally, given a
data distribution p(x, y)over images xand labels yand a
classiﬁer fsuch that f(x) = y, an adversarial example is
a modiﬁed input ˜x=x+δsuch that δis a very small
(human-imperceptible) perturbation and f(˜x)6=y, namely ˜x
is misclassiﬁed while xwas not. Still in visual recognition,
it is possible to ‘perturb’ a road sign that reproduces, e.g., a
stop sign by placing small stickers on it, so that the classiﬁer
identiﬁes it as a speed limit sign , see Figure 1(a). Another
noteworthy attack exploits the so-called ‘adversarial patches’:
the opponent does not even need to know the target that the
classiﬁer has been trained to recognize. Simply adding a patch
to the input can lead the system to decide that what it has been
given to classify is exactly what the patch represents. It became
popular the case of the trained model exchanging a banana for
a toaster, having patched an adversarial example next to the
banana . The few previous examples demonstrate the risks
caused by the vulnerability of ML systems to intentional data
manipulations, for the ﬁeld of computer vision.
In recent years, ML was also at the core of a plethora of
efforts in other domains. In particular, some of those that have
seen massive application of ML and AI are intrinsically adver-
sarial – that is, they feature the natural and inevitable presence
of adversaries motivated in fooling the ML systems. In fact,
adversarial examples are extremely relevant in all security-
sensitive applications, where any misclassiﬁcation induced by
an attacker represents a security threat. A paramount example
of such applications is the ﬁght against online abuse and
manipulation, which often come under the form of fake news
and social bots.
So, how to defend against attackers who try to deceive
the model through adversarial examples? It turned out that
adversarial examples are not exclusively a threat to the reli-
ability of ML models. Instead, they can also be leveraged as
a very effective mean to strengthen the models themselves. A
‘brute force’ mode, the so-called Adversarial Training, sees the
model designers pretend to be attackers: they generate several
adversarial examples against their own model and, then, train
the model not to be fooled by them.
Along these lines, in the remainder we brieﬂy survey
relevant literature on adversarial examples and Adversarial
Machine Learning (AML). AML aims at understanding when,
why, and how learning models can be attacked, and the
techniques that can mitigate attacks. We will consider two
phenomena whose detection is polluted by adversaries and
that are bound to play a crucial role in the coming years for
the security of our online ecosystems: fake news and social
bots. Contrary to computer vision, the adoption of AML in
these ﬁelds is still in its infancy, as shown in Figure 2, despite
arXiv:2111.12034v1 [cs.LG] 23 Nov 2021
(a) Computer vision. Images can be mod-
iﬁed by adding adversarial patches so as to
fool image classiﬁcation systems (e.g., those
used by autonomous vehicles).
(b) Automatic speech recogni-
tion. Adding adversarial noise to
a speech waveform may result in
wrong textual translations.
(c) Social bot detection. Simi-
larly to computer vision and au-
tomatic speech recognition, ad-
versarial attacks can alter the fea-
tures of social bots, without im-
pacting their activity, thus allow-
ing them to evade detection.
(d) Fake news detection. Tam-
pering with the textual content of
an article, or even with its com-
ments, may yield wrong article
Fig. 1. Adversarial examples and their consequences, for a few notable ML tasks.
its many advantages. Current endeavors mainly focus on the
identiﬁcation of adversarial attacks, and only seldom on the
development of solutions that leverage adversarial examples
for improving detection systems. The application of AML in
these ﬁelds is still largely untapped, and its study will provide
valuable insights for driving future research efforts and getting
FAKE NE WS A ND SO CI AL BOT S
Fake news are often deﬁned as ‘fabricated information that
mimics news media content in form but not in organizational
process or intent’ , . Their presence has been documented
in several contexts, such as politics, vaccinations, food habits
and ﬁnancial markets.
False stories have always circulated centuries before the
Internet. One thinks, for instance, of the manoeuvres carried
out by espionage and counter-espionage to get the wrong
information to the enemy. If fake stories have always existed,
why are we so concerned about them now? The advent of
the Internet, while undoubtedly facilitating the access to news,
has lowered the editorial standards of journalism, and its open
nature has led to a proliferation of user-generated content, un-
screened by any moderator .
Usually, fake news is published on some little-known outlet,
and ampliﬁed through social media posts, quite often using
so-called social bots. Those are software algorithms that can
perfectly mimic the behaviour of a genuine account and
maliciously generate artiﬁcial hype , .
In recent years, research has intensiﬁed efforts to combat
both the creation and spread of fake news, as well as the use
of social bots. Currently, the most common detection methods
for both social bots (as tools for spreading) and fake news are
based on supervised machine learning algorithms. In many
cases, these approaches achieve very good performances on
considered test cases. Unfortunately, state-of-the-art detection
techniques suffer from attacks that critically degrade the per-
formances of the learning algorithms. In a classical adversarial
game, social bots evolved over time : while early bots
in the late 2000s were easily detectable only by looking at
static account information or simple indicators of activity,
sophisticated bots are nowadays almost indistinguishable from
genuine accounts. We can observe the same adversarial game
in fake news. Recent studies show that it is possible to subtly
act on the title, content, or source of the news, to invert
the result of a classiﬁer: from true to false news, and vice
ADVERSARIAL FAKE NEWS DETECTION
Learning algorithms have been adopted with the aim of
detecting false news by, e.g., using textual features, such as the
title and content of the article. Also, it has been shown that
users’ comments and replies can be valid features to unveil
low or no reputable textual content .
Regrettably. algorithms can be fooled. As an example,
TextBugger  is a general attack framework for generating
adversarial text that can trick sentiment analysis classiﬁers,
such as , into erroneous classiﬁcations via marginal mod-
iﬁcations of the text, such as adding or removing individual
words, or even single characters. Moreover, not only can a fake
news classiﬁer be fooled by tampering with part of the news,
but also by acting on comments and replies. Figure 1(d) exem-
pliﬁes the attack: a detector correctly identiﬁes a real article as
indeed real. Unfortunately, by inserting a fake comment as part
of its inputs, the same detector is misled to predict the article
Fig. 2. Adversarial machine learning lead to a rise of adversarial approaches for the detection of manipulated multimedia, fake news, and social bots.
as fake instead. Fooling fake news detectors via adversarial
comment generation has been demonstrated feasible by Le
et al. in . Leveraging the alteration of social responses,
such as comments and replies, to fool the classiﬁer prediction
is advantageous because the attacker does not have to own
the published piece (in order to be able to modify it after
the publication), and the passage from ‘written-by-humans’ to
self-generated text is less susceptible to detection by the naked
eye. In fact, comments and replies are usually accepted, even
if written in an informal style and with scarce quality. Also,
work in  shows how it is possible to generate adversarial
comments of high quality and relevance with the original news,
even at the level of the whole sentence.
Recent advances in text generation make even possible to
generate coherent paragraphs of text. This is the case, for ex-
ample, of GPT-2 , a language model trained on a dataset of
8M Web pages. Being trained on a myriad of different subjects,
GPT-2 leads to the generation of surprisingly high quality texts,
outperforming other language models with domain-speciﬁc
training (like news, or books). Furthermore, in , the authors
study how to preserve a topic in synthetic news generation.
Contrary to GPT-2, which selects the most probable word from
the vocabulary as the next word to generate, a reinforcement
learning agent tries to select words that optimize the matching
of a given topic.
Achievements in text generation have positive practical im-
plications, such as, e.g., translation. Concerns, however, have
arisen because malicious actors can exploit these generators
to produce false news automatically. While most of online
disinformation today is manually written, as progress continues
in natural language text generation, the creation of propaganda
and realistic-looking hoaxes will grow at scale . In ,
for example, the authors present Grover, a model for con-
trollable text generation, with the aim of defending against
fake news. Given a headline, Grover can generate the rest of
the article, and vice versa. Interestingly, investigating the level
of credibility, articles generated with propagandist tone result
more credible to human readers, rather than articles with the
same tone, but written by humans. If, on the one hand, this
shows how to exploit text generators to obtain ‘reliable fake
news’, on the other hand, it is the double-edged blade that
allows the reinforcement of the model. Quoting from :
‘the best defense against Grover turns out to be Grover itself’,
as able to achieve 92% accuracy in discriminating between
human-written and auto-generated texts. Grover is just one
of other news generators that obtain noticeable results: a vast
majority of the news generated by Grover and four others can
fool human readers, as well as a neural network classiﬁer,
speciﬁcally trained to detect fake news . Texts generated by
the recent upgrade of GPT-2, GPT-3, get even more impressive
results in resembling hand-written stories .
Finally, Miller et al. consider the discrimination between
true and fake news very challenging : it is enough, e.g.,
to change a verb from the positive to the negative form
to completely change the meaning of the sentence. They
therefore see the study of the news source as a possible way
of combating this type of attack.
ADVERSARIAL SOCIAL BOT DETECTION
The roots of adversarial bot detection date back to 2011 and
almost coincide with the initial studies on bots themselves .
Between 2011 and 2013 – that is, soon after the ﬁrst efforts
for detecting automated online accounts – several scholars
became aware of the evolutionary nature of social bots. In fact,
while the ﬁrst social bots that inhabited our online ecosystems
around 2010 were extremely simple and visibly untrustworthy
accounts, those that emerged in subsequent years featured
increased sophistication. This change was the result of the
development efforts put in place by botmasters and puppeteers
for creating automated accounts capable of evading early
detection techniques . Comparative studies between the
ﬁrst bots and subsequent ones, such as those in , unveiled
the evolutionary nature of social bots and laid the foundations
for adversarial bot detection. Notably, bot evolution still goes
on, fueled by the latest advances in powerful computational
techniques that allow mimicking human behavior better than
ever before .
Based on these initial ﬁndings, since 2011 some initial
solutions were proposed for detecting evolving social bots .
These techniques, however, were still based on traditional
approaches to the task of social bot detection, such as those
based on general purpose, supervised machine learning algo-
rithms . Regarding the methodological approach, the novelty
of this body of work mainly revolved around the identiﬁcation
of those machine learning features that seemed capable of
allowing the detection of the sophisticated bots. The test
of time, however, proved such assumptions wrong. In fact,
those features that initially seemed capable of identifying the
sophisticated bots, started yielding unsatisfactory performance
soon after their proposal .
It was not until 2017 that adversarial social bot detection
really ignited. Since then, several approaches were proposed
in rapid succession for testing the detection capabilities of
existing bot detectors, when faced with artfully created ad-
versarial examples. Among the ﬁrst adversarial examples of
social bots there were accounts that did not exist yet, but whose
behaviors and characteristics were simulated, as done in ,
. There, authors used genetic algorithms to ‘optimize’
the sequence of actions of groups of bots so that they could
achieve their malicious goals, while being largely misclassiﬁed
as legitimate, human-operated accounts. Similarly,  trained
a text-generation deep learning model based on latent user
representations (i.e., embeddings) to create adversarial fake
posts that would allow malicious users to escape Facebook’s
detector TIES . Other adversarial social bot examples
were accounts developed and operated ad-hoc for the sake of
evaluating the detection capabilities of existing bot detectors,
as done in . Experimentation with such examples helped
scholars understand the weaknesses of existing bot detection
systems, as a ﬁrst step for improving them. However, the
aforementioned early body of work on adversarial social bot
examples still suffered from a major drawback. All such
works adopted ad-hoc solutions for generating artiﬁcial bots,
thus lacking broad applicability. Indeed, some solutions were
tailored for testing speciﬁc detectors , , while others
relied on manual interventions, thus lacking scalability and
With the widespread recognition of AML as an extremely
powerful learning paradigm, also came new and state-of-the-art
approaches for adversarial social bot detection. A paramount
example of this spillover is the work proposed in . There,
authors leverage a generative adversarial network (GAN) for
artiﬁcially generating a large number of adversarial bot exam-
ples with which they trained downstream bot detectors. Results
demonstrated that this approach augments the training phase
of the bot detector, thus signiﬁcantly boosting its detection
performance. Similarly, a GAN is also used in  to generate
latent representations of malicious users solely based on the
representations of benign ones. The representations of the real
benign users are leveraged in combination with the artiﬁcial
representations of the malicious users to train a discriminator
for distinguishing between benign and malicious users.
The success of a learning system is crucial in many scenarios
of our life, be it virtual or not: correctly recognizing a
road sign, or discriminating between genuine and fake news.
Here, we focused on adversarial examples—created to fool a
trained model, and on AML—which exploits such examples
to strengthen the model.
Remarkably, adversarial examples, originally in the lime-
light especially in the ﬁeld of computer vision, now threaten
various domains. We concentrated on the recognition of false
news and false accounts and we highlighted how, despite
the antagonistic nature of the examples, scholars are moving
proactively to let attack patterns be curative and reinforce the
learning machines. Outside computer vision, these efforts are
still few and far between. Improvements along this direction
are thus much needed, especially in those domains that are
naturally polluted by adversaries.
 T. Mitchell, Machine Learning. McGraw-Hill, 1997.
 I. J. Goodfellow et al., “Explaining and harnessing adversarial exam-
ples,” in ICLR, 2015.
 K. Eykholt et al., “Robust physical-world attacks on deep learning
visual classiﬁcation,” in CVPR, 2018.
 T. B. Brown et al., “Adversarial patch,” in NeurIPS Workshops, 2017.
 D. M. J. Lazer et al., “The science of fake news,” Science, vol. 359,
no. 6380, pp. 1094–1096, 2018.
 T. Quandt et al., “Fake news,” in The International Encyclopedia of
Journalism Studies. Wiley, 2019, pp. 1–6.
 C. Gangware and W. Nemr, Weapons of Mass Distraction: Foreign
State-Sponsored Disinformation in the Digital Age. Park Advisors,
 S. Cresci, “A decade of social bot detection,” Communications of the
ACM, vol. 63, no. 10, pp. 72–83, 2020.
 X. Zhou and R. Zafarani, “A survey of fake news: Fundamental theories,
detection methods, and opportunities,” ACM Comput. Surv., vol. 53,
no. 5, Sep. 2020.
 B. D. Horne et al., “Robust fake news detection over time and attack,”
ACM Trans. Intell. Syst. Technol., vol. 11, no. 1, 2019.
 K. Shu et al., “dEFEND: Explainable fake news detection,” in ACM
 J. Li et al., “Textbugger: Generating adversarial text against real-world
applications,” in NDSS, 2019.
 X. Zhang et al., “Character-level convolutional networks for text clas-
siﬁcation,” in NeurIPS, 2015.
 T. Le et al., “MALCOM: Generating malicious comments to attack
neural fake news detection models,” in IEEE ICDM, 2020.
 A. Radford et al., “Language models are unsupervised multitask learn-
ers,” OpenAI, vol. 1, no. 8, p. 9, 2019.
 A. Mosallanezhad et al., “Topic-preserving synthetic news gen-
eration: An adversarial deep reinforcement learning approach,”
 G. Da San Martino et al., “A survey on computational propaganda
detection,” in IJCAI, 2020.
 R. Zellers et al., “Defending against neural fake news,” in NeurIPS,
 T. B. Brown et al., “Language models are few-shot learners,” in
 D. J. Miller et al., “Adversarial learning targeting deep neural network
classiﬁcation: A comprehensive review of defenses against attacks,”
Proceedings of the IEEE, vol. 108, no. 3, pp. 402–433, 2020.
 S. Cresci et al., “The paradigm-shift of social spambots: Evidence,
theories, and tools for the arms race,” in ACM WWW, 2017.
 C. Yang et al., “Empirical evaluation and new design for ﬁghting evolv-
ing Twitter spammers,” IEEE Transactions on Information Forensics
and Security, vol. 8, no. 8, pp. 1280–1293, 2013.
 D. Boneh et al., “How relevant is the Turing test in the age of
sophisbots?” IEEE Security & Privacy, vol. 17, no. 6, pp. 64–71, 2019.
 S. Cresci et al., “On the capability of evolved spambots to evade
detection via genetic engineering,” Online Social Networks and Media,
vol. 9, pp. 1–16, 2019.
 ——, “Better safe than sorry: An adversarial approach to improve social
bot detection,” in ACM WebSci, 2019.
 B. He et al., “PETGEN: Personalized text generation attack on deep
sequence embedding-based classiﬁcation models,” in ACM KDD, 2021.
 N. Noorshams et al., “TIES: Temporal interaction embeddings for
enhancing social media integrity at Facebook,” in ACM KDD, 2020.
 C. Grimme et al., “Social bots: Human-like by means of human
control?” Big data, vol. 5, no. 4, pp. 279–293, 2017.
 B. Wu et al., “Using improved conditional generative adversarial
networks to detect social bots on Twitter,” IEEE Access, vol. 8, pp.
36 664–36 680, 2020.
 P. Zheng et al., “One-class adversarial nets for fraud detection,” in AAAI,