Conference PaperPDF Available

GAN-based attack on recaptured images to fool human and machine

Authors:

Abstract and Figures

Recapture is often used to hide the traces left by some operations such as JPEG compression, copy-move, etc. However, various detectors have been proposed to detect recaptured images. To counter these techniques, in this paper, we propose a method that can translate recaptured images to fake \original images" to fool both human and machines. Our method is proposed based on Cycle-GAN which is a classic framework for image translation. To obtain better results, two improvements are proposed: (1) Considering that the difference between original and recaptured images focuses on the the part of high frequency, high pass filter are used in the generator and discriminator to improve the performance. (2) In order to guarantee that the images content is not changed too much, a penalty term is added on the loss function which is the L1 norm of the difference between images before and after translation. Experimental results show that the proposed method can not only eliminate traces left by recapturing in visual effect but also change the statistical characteristics effectively.
Content may be subject to copyright.
Cycle GAN-Based Attack on Recaptured Images
to Fool Both Human and Machine?
Wei Zhaoa,b, Pengpeng Yanga,b, Rongrong Nia,b, Yao Zhaoa,b, and Wenjie Lia,b
a.Institute of Information Science,Beijing Jiaotong University, Beijing 100044, China.
b.Beijing Key Laboratory of Advanced Information Science and Network Technology,
Beijing 100044, China {rrni@bjtu.edu.cn yzhao@bjtu.edu.cn}
Abstract. Recapture is often used to hide the traces left by some op-
erations such as JPEG compression, copy-move, etc. However, various
detectors have been proposed to detect recaptured images. To counter
these techniques, in this paper, we propose a method that can translate
recaptured images to fake “original images” to fool both human and ma-
chines. Our method is proposed based on Cycle-GAN which is a classic
framework for image translation. To obtain better results, two improve-
ments are proposed: (1) Considering that the difference between original
and recaptured images focuses on the the part of high frequency, high
pass filter are used in the generator and discriminator to improve the
performance. (2) In order to guarantee that the images content is not
changed too much, a penalty term is added on the loss function which is
the L1 norm of the difference between images before and after transla-
tion. Experimental results show that the proposed method can not only
eliminate traces left by recapturing in visual effect but also change the
statistical characteristics effectively.
Keywords: recaptured images ·Cycle-GAN ·fool human and machine
1 Introduction
Nowadays, with the popularity of digital cameras and the rapid development of
Internet technology, it is an indisputable fact that digital images have become
important carriers. And image editing software is widely used with the advantage
of operability and practicability, which makes it easy to tamper an image. Some
tampered images in the fields of politics, military and judicature will bring great
harm to the society. Therefore, the identification of digital image authenticity is
of particular importance.
One common type of image tampering is recapturing images. The process of
recapture is as follows: firstly, the original image is projected onto a new media,
?This work was supported in part by the National Key Research and Development
of China (2016YFB0800404), National NSF of China (61672090, 61332012), Fun-
damental Research Funds for the Central Universities (2017YJS054). We greatly
acknowledge the support of NVIDIA Corporation with the donation of the GPU
used for this research.
2 Wei Zhao,Pengpeng Yang et al.
such as computer screen, mobile phone screen or printed paper. Then a new
image can be obtained by recapturing the projection. Recaptured images may
bring about bad effect on the society if they are used maliciously. For example,
due to that all the tampering will leave traces on the image, attackers can e-
liminate these traces by recapturing the forged image. To against it, the most
simple and convenient way is to make a recaptured image decision in advance.
To discriminate between the recaptured and original images, numbers of algo-
rithms have been proposed and mainly include two branches: statistical charac-
teristics [1-3] and deep learning based [4]. In terms of statistical features, Farid
et al. first proposed a scheme which can to distinguish between natural and
unnatural images based on high-order wavelet statistical features . Unnatural
images include recaptured images and computer generated images. Cao et al.
[2] proposed three kinds of statistical features to detect good-quality recaptured
images, namely local binary pattern (LBP), multi-scale wavelet statistics (M-
SWS), and color features (CF). Li et al. [3] proposed new features based on
the block effect and blurriness effect due to JPEG compression and the screen
effect described by wavelet decomposition. And the deep learning based method
has been proved that it has better detection performance than that statistical
characteristics based. Yang et al. [4] proposed a laplacian convolutional neu-
ral networks (L-CNN) and improved the detection performance especially for
small-size recaptured image.
On the other hand, from the point of view of an attacker, if he want to trans-
late a recaptured image to a fake original image, two goals need to be achieved:
the visual effect of LCD should be avoided and it can attack various detection
schemes. Generative adversarial networks have achieved many state-of-the-art
results in image translation. Generative adversarial networks include generator
network and discriminator network. The generator learns the potential distribu-
tion of the real data and generates new data and the discriminator is a binary
classifier that determines whether the input is real or generated. In training
phase, the generator need to be continuously optimized to improve its gener-
ating ability and the discriminator need to improve its discriminating ability.
The learning process is to find a Nash equilibrium between the two networks.
CGAN [5] adds extra information in the generator and discriminator to guide
the process of training. Pix2pix-GAN [6] can achieve image-to-image translation
tasks with paired images which include an input image and a corresponding tar-
get output image. Cycle-GAN [7] used two generators and two discriminators to
learn mapping functions between two domains without paired images.
Considering that it is difficult to get the paired images for original images and
recaptured images, in this work, a method based on Cycle-GAN is proposed. Due
to the fact that the difference between original and recaptured images focuses
on high frequency, generator and discriminator with high-pass filter are designed
to make a better image translation. Additionally, to guarantee the content of
images not change a lot after being translated, a penalty term is added to the
loss function which is the L1 norm of the difference between images before and
Attack on recaptured images to fool both human and machine 3
after translation. Experimental results show that the proposed method can not
only fool human in visual effect but also the machine with a high probability.
The rest of the paper is organized as follows. In Section 2, proposed architec-
ture and object function is introduced. Experiments are conducted in Section 3,
and conclusions are drawn in Section 4.
2 Proposed method
Our task is to translate recaptured images to target images which is similar to
original images not only in visual effect but also in statistical characteristics.
It can be formulated as learning a mapping G from recaptured images X to
the original images Y given training samples {xi} ∈ Xand {yi} ∈ Y, where
i= 1,2, ..., m. Note that X and Y are not corresponding one by one because it
is difficult to collect recaptured images which are completely same with original
images.
The overall framework of the model is shown in Fig. 1,, two generators and
discriminators are used. yis present as recaptured images and xis present as
original images. Generator G learns the distribution of Y and F learns the distri-
bution of X. Discriminator DXaims to distinguish between recaptured images
{x}and fake-recaptured images {F(y)}, and DYaims to distinguish between
original images {y}and fake-original images {G(x)}. To promise the mapping
is meaningful, cycle network structure is used. As shown in dotted arrow, the
translated images {G(x)}and {F(y)}are fed into the generator Fand G. By
limiting the difference between xand F(G(x)) and the difference between yand
G(F(y)), the model can be further standardized. The training process is based on
game theory and it aims at achieving the Nash equilibrium between generators
and discriminators.
Fig. 1. The overall framework of the model.
4 Wei Zhao,Pengpeng Yang et al.
2.1 Architecture
Generator Two generators G and F are included. The Generator G can trans-
late X to Y and Generator F can translate Y to X. The two generators have
the same architecture which is shown in Fig. 2. Considering that the difference
between original and recaptured images focus on the high frequency part, so
only the part of high frequency is extracted and fed into generators. Generators
are only responsible for learning the difference of high frequency, which is more
easier to train than reconstructing the whole image. In this work, the laplace
filter are used:
LF =
0,1,0
1,4,1
0,1,0
(1)
Fig. 2. The architecture of generator.
Fig. 3. The architecture of block.
In addition, six units and five residual blocks are combined together. Each
of the unit 1, 2, 3 and 6 include a convolution layer, a batch normalization and
a Relu function. Each of the unit 4 and 5 include a deconvolution layer, a batch
normalization and a Relu function. Each residual block includes two convolution
layers and a Relu function. The structure of residual block is shown in Fig. 3.
In the end, a TanH activation function is used. The parameters of generator are
presented in Table 1.
Attack on recaptured images to fool both human and machine 5
Fig. 4. The architecture of discriminator.
Discriminator Two discriminators DXand DYare included. As shown in
Fig. 4, the discriminator is designed to distinguish real images and fake images.
Similarly, a laplace filter is used, followed by four units and one convolution layer
are used. Each unit includes a convolution layer, a batch normalization and a
Leaky-ReLU function. The parameters of discriminator are presented in Table
1.
Table 1. The detailed parameters of the architecture
Generator
Unit 1 Conv(7*7*32),padding=3,stride=1;batchnorm; Relu
Unit 2 Conv(3*3*64), stride=2;batchnorm; Relu
Unit 3 Conv(3*3*128), stride=2; batchnorm; Relu
block Conv(3*3*128), stride=1; Conv(3*3*128), stride=1; Relu
Unit 4 deconv(3*3*128), stride=2; batchnorm; Relu
Unit 5 deconv(3*3*256), stride=2; batchnorm; Relu
Unit 6 Conv(7*7*3), stride=1; batchnorm; Relu
Discriminator
Unit 1 Conv(4*4*32), stride=2; batchnorm; Leaky-ReLU
Unit 2 Conv(4*4*64), stride=2; batchnorm; Leaky-ReLU
Unit 3 Conv(4*4*128), stride=2; batchnorm; Leaky-ReLU
Unit 4 Conv(4*4*256), stride=1; batchnorm; Leaky-ReLU
Conv Conv(4*4*1), stride=1;
2.2 Object Function
The loss function of proposed method contains three parts: adversarial loss, cycle
consistency loss and low frequency consistency loss.
6 Wei Zhao,Pengpeng Yang et al.
Adversarial Loss The optimization process of GAN is actually a game between
two competing networks: the generator is responsible for generating data which is
similar to the real data, and the discriminator is responsible for distinguishing the
generated data from the real data. Formally, the game between the generator G
and the discriminator D has the minimax objective. Note that the distribution of
recaptured images is Pdata(x) and the distribution of original images is Pdata (y).
We need to translate a recaptured image xto a target image G(x) which follows
the distribution of Pdata(y). Therefore, for the mapping function G:XY
and its discriminator DY:
Ladv(G, DY, X, Y ) =Eypdata(y)(log DY(y))
+Expdata(x)(1log DY(G(x))),(2)
where Gtries to generate images G(x) that look similar to images from do-
main Y, while DYaims to distinguish between translated samples G(x) and real
samples y. Gtries to minimize this objective and Dtries to maximum it.
Due to it is meaningless to learn the translation from original images to
recaptured images, so the results of F:YXis not involved in our experiment.
But it’s a essential part in the entire framework for cycle consistency. So, for
the mapping function F:YXand its discriminator DX, there is another
constraint:
Ladv(F , DX, X, Y ) =Expdata(x)(log DX(x))
+Eypdata(y)(1log DX(F(y))),(3)
Cycle Consistency Loss Compared with other generation models, the great-
est advantage of GAN is that it doesn’t need to formulate a target distribution,
but to learn the distribution directly using two group of images. However, this
mechanism also brings a shortcoming that the model is too free and uncon-
trollable. A generator can map the input images to any random permutation of
images in the target domain, which may cause there is not any semantic links be-
tween input images and output images. Thus, it’s difficult to guarantee that the
learned function can map input Xto desired output Y. To ensure the mapping
is practical, cycle consistency loss is introduced.
For each image xfrom domain X, the image translation cycle should be able
to bring xback to itself: xG(x)F(G(x)) x. And for each image y
from domain Y, the image translation cycle also need to bring yback to itself:
yF(y)G(F(y)) y:
Lcyc(G, F) =Expdata (x)(kF(G(x)) xk1)
+Eypdata(y)(kG(F(y)) yk1),(4)
Attack on recaptured images to fool both human and machine 7
Low Frequency Consistency Loss It has been noted that the dataset is
unpaired which is convenient to be collected. But it also bring a disadvantage
that no groundtruth for recaptured images to constrain the model when training.
Due to the deep learning model is driven by data, the generator may learn the
difference between original image and recaptured image in training dataset. So if
the dataset is not rich enough, the model is likely to be overfitting. The generator
can not only learn the difference left by recapturing, but also other difference
between two group of data, such as: color distribution, the content of images and
so on. In that cases, the translated images may have large chromatic differences
from target images. Considering the characteristic of recaptured images, the
content of images is similar with original images and the main difference is focus
on the high frequency. So an extra term need to be added to ensure that the low
frequency part is not changed. In proposed method, median filtering is used to
extract the part of low frequency:
LLow(G, F) =Expdata (x)(kf(G(x)) f(x)k1)
+Eypdata(y)(kf(F(y)) f(y)k1),(5)
where, f(.) is a median filter function which can reserve the low frequency part.
In total, the full objective is:
L(G, F, DX, DY) =Ladv (G, DY, X, Y )
+Ladv(F , DX, X, Y )
+αLcyc(G, F )
+βLlow (G, F ),
(6)
where α,βare weight coefficients. In the experiments, αis set as 10 and βis set
as 5.
Finally, by optimizing the loss function according to Eq. (7), we can get
the well-trained generators and discriminators. According to the purpose of this
work, only generator G is needed.
G= min
G,F max
DX,DY
L(G, F, DX, DY),(7)
where, Gpresents the well-trained generator G.
3 Experimental results and analysis
Image database in the experiments includes 20000 images: 10000 original images
and 10000 recapture images. The size of the images is 256 ×256. The images
derive from the image databases provided in [2]. We crop the block with size
of 1024 ×1024 from the center of the images. Then the images are cut into
non-overlapping blocks of 512×512. Finally, 256 ×256 images are got by center
8 Wei Zhao,Pengpeng Yang et al.
clipping. And training dataset, validation and test dataset are randomly divided
by percent 40/10/50. Hyper-parameter setting in the experiment is as follows:
the learning rate is 0.0001 and iteration epoch is 15. And the Adam optimizer
with β= 0.5 are used. All the results shown in this section are averaged over 6
random experiments.
In the experiment, three recaptured image detection methods are involved.
They include the method based on statistical characteristics: LBP feature [3]
and wavelet statistical feature [3] and based on deep learning: L-CNN [4]. Firstly,
these three methods are well-trained to get the different accuracies for different
images. Furthermore, to analyze the validity of model modification, a contrast
experiment is performed in which the model is original Cycle-GAN without any
modification. Finally, in order to verify the effectiveness of proposed method, it
was trained using the training dataset and the recaptured images in test set are
fed into the model to be transferred to a fake images.
In Table 2, the detection accuracies using three methods for different im-
ages are presented. Noted that IM AGEnor means the recaptured images in test
dataset without any translation. IM AGEcyc means the images translated by
original Cycle-GAN and IMAGEprop means the images translated by proposed
method. It can be seen that there detection methods mentioned above can all
detect the recaptured images effectively. And the the ability of original Cycle-
GAN to attack the detection methods is worse than the proposed method. It
can be seen that after being transferred by proposed method, the classic schemes
will be fooled with great probability. At the same time, it is noticed that the
attack effect is different for three detection methods, and the proposed method
can attack the L-CNN effectively but don’t perform very well in attacking the
methods of LBP and wavelet. We guess it is because the method of L-CNN is
more similar to the discriminator of proposed method.
In the aspect of visual effects, six group of images are shown in Fig. 5. In
each group, recaptured image is on the top and the translated image is on the
bottom. From these images, we can find that proposed method can remove the
traces of texture left by recapturing LCD screen effectively.
In conclusion, the proposed method can not only eliminate of traces left
by recapturing in visual effect but also change the statistical characteristics to
attack the detection methods effectively.
4 Conclusion
In this paper, we proposed a method to translate recaptured images to fake
“original images” based on Cycle-GAN. According to the characteristics of re-
captured images, generator and discriminator with high-pass filter are designed
to make a better image translation. Additionally, to guarantee the content of
images don’t change a lot after being translated, a penalty term is added to the
loss function which is the L1 norm of the difference between images before and
after translation. Experimental results show that the proposed method can not
only fool human in visual effect but also the machine with a high probability.
Attack on recaptured images to fool both human and machine 9
Fig. 5. The visual effect of recaptured images and corresponding translated images. In
each group, recaptured image is on the top and the translated image is on the bottom.
10 Wei Zhao,Pengpeng Yang et al.
Table 2. The classification accuracy using three methods for different images
image
method
LCN N LBP W avl et
IM AGEnor 99.0% 95.6% 82.0%
IM AGEcy c 34.83% 70.96% 53.3%
IM AGEpr op 9.4% 32.85% 39.44%
References
1. S. Lyu and H. Farid: How Realistic is Photorealistic?, IEEE Trans. on Signal Pro-
cessing, 845-850(2005).
2. Cao. H, Alex, K.C:Identification of recaptured photographs on LCD screens. IEEE
International Conference on Acoustics,Speech and Signal Processing, 1790–1793
(2010)
3. Li, R., Ni, R., Zhao, Y.: An effective detection method based on physical traits of
recaptured images on LCD screens. International Workshop on Digital-forensics and
Watermaking,107-116 (2015)
4. Yang Pengpeng, Rongrong Ni, Yao Zhao: Recapture image forensics based on lapla-
cian convolutional neural networks. InternationalWorkshop on DigitalWatermark-
ing, Springer, Cham, (2016).
5. Mirza, Mehdi, and Simon Osindero: Conditional generative adversarial nets. arXiv
preprint arXiv, 1411–1784 (2014).
6. Isola, P., Zhu, J. Y., Zhou, T., Efros, A. A: Image-to-image translation with condi-
tional adversarial networks. arXiv preprint.(2017).
7. Zhu, J. Y., Park, T., Isola, P., Efros, A. A: Unpaired image-to-image translation
using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)
... • Untargeted attack: An adversarial sample is generated for an image such that the annotation on it is independent of the original annotation, i.e., as long as the attack succeeds, there is no restriction on which class the adversarial sample ultimately belongs to. Because of the vulnerability and instability inherent in deep neural networks, researchers have provided more research space in the field of adversarial attacks on deep neural networks by proposing various attack methods [30]- [33]. This paper presents several representative models of adversarial attacks. ...
Article
Full-text available
Deep learning has become one of the most popular research topics today. Researchers have developed cutting-edge learning algorithms and frameworks around deep learning, applying them to a wide range of fields to solve real-world problems. However, we are more concerned about the security risks associated with deep learning models themselves—such as adversarial attacks, which will be discussed in this article. Attackers can use the deep learning model to create the conditions for an attack, maliciously manipulating the input images to deceive the classification model and produce false positives. This paper proposes a method of pre-denoising all input images to prevent adversarial attacks by adding a purification layer before the classification model. The method in this paper is proposed based on the basic architecture of Conditional Generative Adversarial Networks. It adds the image perception loss to the original algorithm Pix2pix to achieve more efficient image recovery. Our method can recover noise-attacked images to a level close to the actual image to ensure the correctness of the classification results. Experimental results show that our approach can quickly recover noisy images, and the recovery accuracy is 20.22% higher than the previous state-of-the-art.
... An anti-forensic method for recaptured image detection was proposed by Zhao et al. [176]. The authors proposed to employ Cycle-GANs typically used for image translation to accomplish this anti-forensic task of hiding traces of image recapturing. ...
Article
Full-text available
Seeing is not believing anymore. Different techniques have brought to our fingertips the ability to modify an image. As the difficulty of using such techniques decreases, lowering the necessity of specialized knowledge has been the focus for companies who create and sell these tools. Furthermore, image forgeries are presently so realistic that it becomes difficult for the naked eye to differentiate between fake and real media. This can bring different problems, from misleading public opinion to the usage of doctored proof in court. For these reasons, it is important to have tools that can help us discern the truth. This paper presents a comprehensive literature review of the image forensics techniques with a special focus on deep-learning-based methods. In this review, we cover a broad range of image forensics problems including the detection of routine image manipulations, detection of intentional image falsifications, camera identification, classification of computer graphics images and detection of emerging Deepfake images. With this review it can be observed that even if image forgeries are becoming easy to create, there are several options to detect each kind of them. A review of different image databases and an overview of anti-forensic methods are also presented. Finally, we suggest some future working directions that the research community could consider to tackle in a more effective way the spread of doctored images.
... The adversarial loss represents the standard loss function of GANs. In order to fool deep learning-based methods for recaptured image detection, Zhao et al. [111] proposed a Cycle-GAN-based scheme by fusing the adversarial loss, the cycle consistency loss and the low frequency consistency loss. In addition to the loss function used in Cycle-GAN, a low frequency consistency loss based on a median filter is proposed to keep the generated image similar to the original one. ...
Article
Full-text available
Image source forensics is widely considered as one of the most effective ways to verify in a blind way digital image authenticity and integrity. In the last few years, many researchers have applied data-driven approaches to this task, inspired by the excellent performance obtained by those techniques on computer vision problems. In this survey, we present the most important data-driven algorithms that deal with the problem of image source forensics. To make order in this vast field, we have divided the area in five sub-topics: source camera identification, recaptured image forensic, computer graphics (CG) image forensic, GAN-generated image detection, and source social network identification. Moreover, we have included the works on anti-forensics and counter anti-forensics. For each of these tasks, we have highlighted advantages and limitations of the methods currently proposed in this promising and rich research field.
Article
Due to the copyright issues often involved in the recapture of LCD screen content, recaptured screen image identification has received lots of concerns in image source forensics. This paper analyzes the characteristics of convolutional neural network (CNN) and vision transformer (ViT) in extracting features and proposes a cascaded network structure that combines local-feature and global-feature extraction modules to detect the recaptured screen image from original images with or without demoiréing operation. We first extract the local features of the input images with five convolutional layers and feed the local features into the ViT to enhance the local perception capability of the ViT module, and further extract the global features of the input images. Through thorough experiments, our method achieves a detection accuracy rate of 0.9691 in our generated dataset and 0.9940 in the existing mixture dataset, both showing the best performance among the compared methods.
Article
Full-text available
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G: X -> Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y -> X and introduce a cycle consistency loss to push F(G(X)) \approx X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Article
Full-text available
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Conference Paper
Full-text available
With advances in image display technology, recapturing good-quality images from the high-fidelity artificial scenery on a LCD screen becomes possible. Such image recapturing posts a security threat, which allows the forgery images to bypass the current forensic systems. In this paper, we first recapture some good-quality photos on different LCD screens by properly setting up the recapturing environment and tuning the controllable settings. In a perceptional study, we find that such finely recaptured images can hardly be identified by human eyes. To prevent the image recapturing attack, we propose a set of statistical features, which capture the common anomalies introduced in the camera recapturing process on LCD screens. With a probabilistic support vector machine classifier, comparison results show that our proposed features work very well, which outperform the conventional image forensic features in identification of the finely recaptured images.
Article
Full-text available
Computer graphics rendering software is capable of generating highly photorealistic images that can be impossible to differentiate from photographic images. As a result, the unique stature of photographs as a definitive recording of events is being diminished (the ease with which digital images can be manipulated is, of course, also contributing to this demise). To this end, we describe a method for differentiating between photorealistic and photographic images. Specifically, we show that a statistical model based on first-order and higher order wavelet statistics reveals subtle but significant differences between photorealistic and photographic images.
An effective detection method based on physical traits of recaptured images on LCD screens
  • R Li
  • R Ni
  • Y Zhao
Li, R., Ni, R., Zhao, Y.: An effective detection method based on physical traits of recaptured images on LCD screens. International Workshop on Digital-forensics and Watermaking,107-116 (2015)