ArticlePDF Available

Tricking Neural Networks, 3 practical examples for tricking Image Recognition using GA and FGSM

Authors:
  • Profil Software

Abstract

Tricking Neural Networks, 3 practical examples for tricking Image Recognition using Genetic Algorithm and FGSM (Fast Gradient Sign Method)
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 1/8
3 practical examples for tricking Neural Networks
using GA and FGSM. How can object classication
be easily fooled?
Przemysław Przybyt
Feb 26 · 6 min read
Hi! I’m Przemysław from Prol Software, a software house located in Northern Poland
where I’m working as a Python developer. My interests in AI were raised while studying
the topic of reinforcement learning and computer vision. I have a strong inner need to
see how things are done under the hood so I wanted to check if I could mess with some
well known object classication models such as CNNs (Convolutional Neural
Networks). They are just a bunch of numbers and mathematical operations, so let’s see
if we can play with that!
Image classication
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 2/8
Image classication refers to a process in computer vision that can classify an image
according to its visual content. It should not be mistaken with other similar operations
such as localization, object detection or segmentation. The
below image shows the dierence to make sure that everything is clear:
Image tasks comparison
Experiments’ description
For the purpose of this article I’ve chosen two algorithms to go through. The rst one is
a genetic algorithm used for One Pixel Attack which, as its name suggests, changes
only a single pixel value to fool the classication model. The second one is FGSM (Fast
Gradient Sign Method) which modies an image with a little noise which is practically
unseen by humans but can manipulate the model’s prediction.
One Pixel Attack
When I was searching the net to nd ways to fool DNN (Deep Neural Network) models,
I ran across the very interesting concept of One Pixel Attack, and I knew I needed to
check it out. My intuition was telling me that changing only one pixel in the original
image wouldn’t be enough to break all those concepts of lters and convolutional layers
used in neural networks that do a great job when it comes to object classication.
The only information that was used to manipulate the input image was the probability
of classication (percentage values for each label). The way I wanted to achieve that
without a brute force method was by using GA (Genetic Algorithm). The idea was easy:
1. Get the true label for a given image.
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 3/8
2. Draw a base population of changed pixels (encoded as xyrgb), where x and y is the
position of a pixel and r, g and b are its color components.
3. Do GA magic (crossing, mutation, selection) taking into account the population
diversity.
4. End calculations when the probability decreases under 20% or after a certain
number of steps without appreciable results.
For the experiments I used a model based on the VGG16 architecture for the cifar10
dataset with pretrained weights (https://github.com/geifmany/cifar-vgg). It was done
like this to eliminate the impact from a ‘potentially’ badly trained model. The sample
code below is presented to get a kick-start with training your own models on that
dataset:
# cifar10 dataset preparation
from keras.datasets import cifar10
from keras.utils import to_categorical
cifar_10_categories = {
0: 'airplane',
1: 'automobile',
2: 'bird',
3: 'cat',
4: 'deer',
5: 'dog',
6: 'frog',
7: 'horse',
8: 'ship',
9: 'truck',
}
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
# training and evaluation goes here
...
The results obtained from the attack were really good because for almost 20% of the
images, changing only one pixel successfully led to misclassication.
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 4/8
Example of image before and after a one pixel attack.
FGSM
Another method I found is FGSM (Fast Gradient Sign Method), which is extremely
easy in its concepts, but also leads to great eects. Without getting too deep into all the
technical issues, this method is based on calculating the gradient (between input and
output of neural net) for a given image that will increase the classication of the true
label.
For an untargeted approach the next step is just to add the sign value of the gradient
(-1, 0 or 1 for each pixel component) to an image to avoid a good prediction. Some
studies also use a param called epsilon which is a multiplier for the sign value, but in
this experiment we considered images that are represented by integer rgb values. This
step can be repeated a few times to get satisfying results.
Another approach is a targeted attack which diers in the way the gradient is
calculated. For this type of attack it is taken between the input image and target label
(not true label). It is then subtracted from image to move the classication closer to the
aim. Easy isn’t it? I’ve pasted some sample code below to make it easier to understand.
# sample code that calculates the gradients and updates an image
import keras.backend as K
sess = K.get_session()
...
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 5/8
target = K.one_hot(target_class if target_class is not None else
base_class, num_classes)
def get_image_update_function(target_class):
def target(img, delta):
return img - epsilon * delta
def non_target(img, delta):
return img + epsilon * delta
if target_class is not None:
return target
return non_target
update_fun = get_image_update_function(target_class)
# calculate delta - difference noise
loss = losses.categorical_crossentropy(target, model.output)
grads = K.gradients(loss, evaluated_model.input)
delta = K.sign(grads[0])
delta = sess.run(delta, feed_dict={model.input: image})
# update image
image = update_fun(image, delta)
The model that was used in this experiment is resnet18 with imagenet weights. The
sample code that enables its loading (using image-classiers==0.2.2) is pasted below:
# loading resnet pretrained models (224x224px, 1000 classes)
from classification_models import Classifiers
ResNet18, preprocess_input = Classifiers.get('resnet18')
resnet_dim = (224, 224)
model = ResNet18(input_shape=(*resnet_dim, 3), weights='imagenet',
classes=1000)
The below image presents an original and adversarial example generated using FGSM
+ generated noise after 2 steps of the algorithm:
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 6/8
Original and adversarial image with predicted classes
Noise from red component (white + 2, gray + 0, black — 2)
Black-box FGSM
The previous method was an easy case where we have full info about the attacked
model, but what about when it is not available? Here is a study that estimates the
gradient by using a large amount of queries to the target model. I tried to fool the target
model using my own model that had a dierent architecture but did similar tasks. The
modied images were prepared based on my model (it took 7 steps to decrease true
label prediction under 1%) and checked by the target model (vgg16 cifar10 model
used in previous steps). Results from this experiment are shown below:
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 7/8
Original and fake image obtained during black-box approach with probabilities from target model.
Accumulated (r+g+b) noise generated during 7 steps of algorithm.
4.03.2020
3 practical examples for tricking Neural Networks using GA and FGSM | Blog - Profil Software, Python Software House With Heart a
https://medium.com/profil-software-blog/few-practical-examples-of-cheating-ai-models-including-ga-genetic-algorithm-and-fgsm-fast-c522da8938af 8/8
Chart showing how prediction for true label changes during experiment.
These results look promising but we have to take into account that these are relatively
simple tasks (classifying 32x32 pixel images), and the diculty of fooling other models
will probably grow with the complexity of the structures that are used.
Conclusion
The approaches that were presented show that we can perturb images in a way to
manipulate classication results. This is easy when we have full info about model
structure. Otherwise it is hard to estimate perturbed samples with limited access to the
target model.
The knowledge that comes from these experiments can help to defend from such
attacks by extending the training set with slightly modied images.
Resources
https://arxiv.org/pdf/1710.08864.pdf
https://arxiv.org/pdf/1712.07107.pdf
https://github.com/geifmany/cifar-vgg
https://arxiv.org/pdf/1904.05181.pdf
Thanks to Peter Plesa.
Machine Learning Image Recognition Genetic Algorithm Neural Networks Python
About Help Legal
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.