ArticlePDF Available

Abstract and Figures

own, while it will have a low proba-
bility of interaction with opinions far
from their own.
Simulations of the algorithmic bias
model show several results that suggest
that online platforms can have impor-
tant effect on opinion formation and
consensus in society. First, the number
of opinion clusters grows when algo-
rithmic bias grows (see illustration).
This means that online platforms can
favour fragmentation of opinions.
Second, this leads also to polarisation,
where the distance between the opin-
ions of the people is larger compared to
the situation without algorithmic bias.
Third, the changes in opinion are much
slower when the bias is in operation.
Even when consensus is obtained, the
time to reach it becomes very long. In
practice, this means that it could take
years for people to agree on an issue,
being in a highly fragmented state while
this occurs.
These results bring important evidence
that algorithmic bias may affect out-
comes of public debates and consensus
in society. Thus, we believe measures
are required to at least stop its effects, if
no t reverse the m. Researchers ar e
investigating means of promoting con-
sensus to counteract for the algorithmic
bias effects. In the meantime, users
could be informed of the way platforms
feed information and the fact that this
could affect their opinions, and maybe
the mechanisms implemented by the
platforms could be slowly withdrawn.
[1] Alina Sîrbu, et al.: “Algorithmic
bias amplifies opinion polarization:
A bounded confidence model”,
arXiv preprint arXiv:1803.02111,
Please contact:
Alina Sîrbu,
University of Pisa, Italy
ERCIM NEWS 116 January 201916
Machine learning and deep learning are
pervading the application space in many
directions. The ability of Deep Neural
Network (DNN) to learn an optimised
hierarchy of representations of the input
has been proven in many sophisticated
tasks, such as computer vision, natural
language process ing and automa tic
speech recognition. As a consequence,
deep learnin g m eth odologi es are
increasingly tested in security- (e.g.
malware detection, content moderation,
biometric access control) and safety-
aware (e.g. autonomous driving vehi-
cles, medical diagnostics) applications
in which their performance plays a crit-
ical role.
However, one of the main roadblocks to
their adoption in these stringent contexts
is the diffuse difficulty to ground the
decision the model is taking. The phe-
no men on of ad versarial i npu ts is a
st rik ing exam ple of thi s pro ble m.
Adversarial inputs are carefully crafted
samples (generated by an adversary
thus the name) that look authentic to
human inspection, but cause the tar-
geted model to misbehave (see Figure
1). Although they resemble legitimate
inputs, the high non-linearity of DNNs
permits maliciously added perturba-
tions to steer at will the decisions the
model takes without being noticed.
Moreover, the generation of these mali-
cious samples does not require a com-
plete knowledge of the attacked system
and is often efficient. This exposes sys-
tems with machine learning technolo-
gies to potential security threats.
Many techniques for increasing the
model’s robustness or removing the
adversarial perturbations have been
developed, but unfortunately, only a
few provide effective countermeasures
for specific attacks, while no or mar-
ginal miti gations exist for stronger
attack models. Improving the explain-
ability of models and getting deeper
insights into their internals are funda-
mental steps toward effective defensive
mechanisms for adversarial inputs and
machine learning security in general.
To this end, in a joint effort between the
AIMIR Research Group of ISTI-CNR
and the CNIT Research Unit at MICC
(University of Florence), we analysed
the internal representations learned by
deep neural networks and their evolu-
ti on thr oug hout the ne two rk whe n
ad ver sarial a ttacks are pe rformed.
Opening the “black box” permitted us
to characterise the trace left in the acti-
vations throughout the layers of the net-
work and discern adversarial inputs
among authentic ones.
We recently proposed solutions for the
detection of adversarial inputs in the
context of large-scale image recognition
wi th dee p neur al net wor ks. The
rationale of our approaches is to attach
to each p red iction of the model an
authenticity score estimating how much
the internal representations differ from
ex pec ted ones (represe nte d b y the
by Fabio Carrara, Fabrizio Falchi, Giuseppe Amato (ISTI-CNR), Rudy Becarelli and Roberto Caldelli
(CNIT Research Unit at MICC University of Florence)
The astonishing and cryptic effectiveness of Deep Neural Networks comes with the critical
vulnerability to adversarial inputs — samples maliciously crafted to confuse and hinder machine
learning models. Insights into the internal representations learned by deep models can help to
explain their decisions and estimate their confidence, which can enable us to trace, characterise,
and filter out adversarial attacks.
ERCIM NEWS 116 January 2019 17
model’s training set). In [1], such a
sc ore is o bta ine d by analysing t he
ne igh bou rho od of the input wi th a
nearest-neighbour search in the activa-
tion space of a particular layer. Our
experiments on adversarial detection
permitted us to identify the internal acti-
vations which are influenced the most
by common adversarial attacks and to
filter out most of the spurious predic-
tions in the basic zero-knowledge attack
model (see [L1]).
Bu ild ing on this idea, in [2] we
improved our detection scheme consid-
ering the entire evolution of activations
throughout the network. An evolution
map is built by tracing the positions an
input occupies in the feature spaces of
each layer with respect to most common
reference points (identified by looking
to training set inputs). Experiments
showed that adversarial inputs usually
tend to deviate from reference points
leading to different activation traces in
the network with respect to authentic
inputs (see Figure 2). Thus, c ondi-
tioning our detector on such informa-
tion permitted us to obtain remarkable
detection performance under commonly
used attacks.
We plan to extend our analysis in order
to fully characterise the effect of adver-
sarial attacks on internal activations
even in stricter attack models, i.e. when
the attacker is aware of defensive sys-
tems and tries to circumvent it.
Despite our experimentation on adver-
sarial input detection, both the pre-
sented approaches actually aim to cope
wi th a broader problem, wh ich is
assigning a confidence to a model’s
decision by explaining it in terms of the
observed training data. We believe this
is a promising direction for reliable and
dependable AI.
[1] Carrara et al.: “Adversarial image
detection in deep neural networks”,
Multimedia Tools and
Applications, 1-21, 2018
Carrara et al.: “Adversarial
examples detection in features
distance spaces”, ECCV 2018
Workshops, 2018.
Please contact:
Fabio Carrara, ISTI-CNR, Italy
Roberto Caldelli, CNIT Research Unit
at MICC ‒ University of Florence,
Figure 2: Conceptualisation of the evolution of features while traversing the network. Each plane represents a feature space defined by the
activations of a particular layer of the deep neural network. Circles on the features space represent clusters of features belonging to a specific
class. Blue trajectories represent authentic inputs belonging to three different classes, and the red trajectory represent an adversarial input. We
rely on the distances in the feature space (red dashed lines) between the input and some reference points representatives of the classes to encode
the evolution of the activations.
Figure 1:Example of a common adversarial attack on image classifiers. The adversarial
perturbation added (magnified for visualization purposes) fools the network to predict a wrong
class with high confidence.
... These activities were also part of the Carrara PhD Thesis (see Section 4.1) defended in 2019. An overview of the same technique was given on ERCIM News [4]. During this year, an extensive analysis of the layer activation in case of adversarial attacks were re-ported in [5]. ...
Technical Report
Full-text available
The Artificial Intelligence for Multimedia Information Retrieval (AIMIR) research group is part of the NeMIS laboratory of the Information Science and Technologies Institute ``A. Faedo'' (ISTI) of the Italian National Research Council (CNR). The AIMIR group has a long experience in topics related to: Artificial Intelligence, Multimedia Information Retrieval, Computer Vision and Similarity search on a large scale. We aim at investigating the use of Artificial Intelligence and Deep Learning, for Multimedia Information Retrieval, addressing both effectiveness and efficiency. Multimedia information retrieval techniques should be able to provide users with pertinent results, fast, on huge amount of multimedia data. Application areas of our research results range from cultural heritage to smart tourism, from security to smart cities, from mobile visual search to augmented reality. This report summarize the 2019 activities of the research group.
ResearchGate has not been able to resolve any references for this publication.