Conference PaperPDF Available

Detection of Deepfake Video Manipulation

Authors:

Abstract and Figures

The Deepfake algorithm allows a user to switch the face of one actor in a video with the face of a different actor in a photorealistic manner. This poses forensic challenges with regards to the reliability of video evidence. To contribute to a solution, photo response non uniformity (PRNU) analysis is tested for its effectiveness at detecting Deepfake video manipulation. The PRNU analysis shows a significant difference in mean normalised cross correlation scores between authentic videos and Deepfakes.
Content may be subject to copyright.
Detection of Deepfake Video Manipulation
Marissa Koopman, Andrea Macarulla Rodriguez, Zeno Geradts
University of Amsterdam & Netherlands Forensic Institute
Abstract
The Deepfake algorithm allows a user to switch the face of one actor in a video with the face of a different
actor in a photorealistic manner. This poses forensic challenges with regards to the reliability of video
evidence. To contribute to a solution, photo response non uniformity (PRNU) analysis is tested for its
effectiveness at detecting Deepfake video manipulation. The PRNU analysis shows a significant difference
in mean normalised cross correlation scores between authentic videos and Deepfakes.
Keywords: Video Manipulation, Digital Forensics, PRNU, Neural Network, Deepfake.
1 Introduction
Photographic and video evidence are commonly used in the courtroom and police investigations, and are seen
as reliable types of evidence. With the advances in video editing techniques however, video evidence is becom-
ing potentially unreliable. It is probable that, in the near future, it will be required that video evidence needs to
be examined for traces of tampering before being deemed admissible to court.
A new video manipulation technique known as Deepfake has established itself online over the last few
months. Deepfake manipulation allows a user to replace the face of an actor in a video with the face of a
second actor, provided that enough images (several hundred to thousands) are available of both actors. These
videos are known as ’Deepfakes’. Deepfakes quickly gained notoriety in the media due to their application to
porn videos, where the faces of famous actresses and politicians were ’Deepfaked’ into existing porn videos on
websites such as Reddit and PornHub [Matsakis, 2018].
What distinguishes Deepfakes from other video manipulation techniques are, firstly, its potential for pho-
torealistic results; with enough images of both actors, and enough computer training time, the resulting videos
can be extremely convincing. Secondly, the availability of the technique to laypersons. An app named FakeApp
was quickly released on Reddit, which is essentially a guided user interface wrapped around the Deepfake al-
gorithm, allowing users with limited knowledge of programming and machine learning to create Deepfakes.
Several other versions followed such as OpenFaceSwap [Anonymous, 2018].
The combination of photorealistic results and ease of use poses a unique forensic challenge. It becomes
increasingly feasible that every day videographic evidence has been manipulated, creating an increased need
for verified authentication methods to detect the Deepfake manipulation. This is an especially sensitive and
urgent problem in the current ’fake news’ era, going beyond law enforcement and becoming relevant also to
journalists, video hosting websites, and social media users. Due to this, authentication methods which are ap-
proachable and usable for a wide and private audience are ideal.
Considering the above, this paper explores the use of photo response non uniformity (PRNU) analysis ap-
plied to Deepfakes to assess the method’s accuracy and ease of use in detecting the Deepfake manipulation. The
PRNU pattern of a digital image is a noise pattern created by small factory defects in the light sensitive sensors
of a digital camera [Lukas et al., 2006]. This noise pattern is highly individualising, and is often referred to as
Proceedings of the 20th Irish Machine Vision and Image Processing conference
IMVIP 2018
29th – 31st August 2018, Belfast, Northern Ireland
133
ISBN 978-0-9934207-3-3
the fingerprint of the digital image [Rosenfeld and Sencar, 2009]. PRNU analysis is considered a method of
interest because it is expected that the manipulation of the facial area will affect the local PRNU pattern in the
video frames. Furthermore, it is widely used in image forensics and is as such familiar to the experts working
in the field.
2 Current authentication efforts
No academic papers could be found on the detection of Deepfakes, although efforts to detect and remove them
are being made at websites such as Gyfcat [Matsakis, 2018]. Gyfcat attempts to use artificial intelligence and
facial recognition software to spot inconsistencies in the rendering of the facial area of an uploaded video.
When a video has been flagged as suspicious, a second program masks the facial area and checks whether a
video with the same body and background has been uploaded before. If such a video is found, but the faces of
the original and the newly uploaded video do not match, then the software concludes that the new video has
been manipulated [Matsakis, 2018].
Such a method is uniquely suited to a website such as Gyfcat, where millions of videos are uploaded and
there are vast databases of reference videos. The method may not be as applicable to forensic cases, where for
instance some CCTV footage from a store robbery could be Deepfaked. Nor would it detect Deepfakes which
do not have an original version stored in the databases. As such, techniques which do not rely on vast databases
are needed.
3 Methods
3.1 Dataset
The dataset consists of ten authentic, unmanipulated videos between 20 and 40 seconds in length, and of 16
Deepfakes made by the researcher. The videos are made by a Canon PowerShot SX210 IS, and are in .MOV
format, with a resolution of 1280x720 pixels. The Deepfakes are made using the ten authentic videos captured
by the Canon PowerShot SX210 IS, and using the Deepfake GUI OpenFaceSwap [Anonymous, 2018]. Three
different actors are used interchangeably to create the authentic videos, as well as the Deepfakes.
3.2 PRNU analysis
The videos are turned into a series of frames as PNGs with the use of the software ’FFmpeg’ [FFmpeg De-
velopers, 2018], named sequentially, and kept in labelled folders. In order to increase the significance of the
expected change in PRNU pattern in the facial area of the frame, the frames will be cropped to frame the face,
also with the use of ’FFmpeg’. Each frame of a video is cropped by the exact same pixels in order to leave the
portion of the PRNU pattern which is examined consistent between each cropped frame. An example of how
the frames are cropped can be found in figure 1.
The frames are then sequentially divided into eight groups of equal size, and an average PRNU pattern
is created for each group using the second order (FSTV) method [Baar et al., 2012] with the software ’PR-
NUCompare’ [Ministry of Security and Justice, 2013]. These eight PRNU patterns are then compared to one
another, and normalised cross correlation scores are returned. The variations in correlation scores and the av-
erage correlation score for each video are calculated. A Welch’s t-test will be applied to the results in order to
assess the statistical significance between the results for Deepfakes and for authentic videos [Welch, 1947].
Proceedings of the 20th Irish Machine Vision and Image Processing conference
IMVIP 2018
29th – 31st August 2018, Belfast, Northern Ireland
134
ISBN 978-0-9934207-3-3
Figure 1: a) Frames are extracted from the video and cropped to contain the questioned face. The cropped
frames are split evenly and sequentially over eight groups. An average PRNU pattern is calculated for each
group. The PRNU pattern of each group is then compared to the PRNU patterns of the other seven groups.
Normalised cross correlation scores are calculated for each comparison. b) Frames are extracted from the
video, and cropped down to the exact same pixels which contain the questioned face.
4 Results
The mean normalised cross correlation scores per video and the variance in normalised cross correlation scores
per video are calculated. The results are illustrated in figure 2.
The results indicate that there is no correlation between the authenticity of the video and the variance in cor-
relation scores. There does appear to be a correlation between the mean correlation scores and the authenticity
of the video, where on average original videos have higher mean normalised cross correlation scores compared
to the Deepfakes. The difference in the distribution of mean normalised cross correlation scores is statistically
significant, with a p-value of 5.21 105.
Figure 2: a) The average variation in correlation scores per authentic and per Deepfake video. b) The average
correlation score per authentic and per Deepfake video.
5 Conclusion
It appears that the mean normalised cross correlation score can be used to distinguish Deepfakes from authentic
videos. The dataset is too small to formulate guidelines for likelihood ratios, as is desired in forensic sciences.
However, a cut-off value of 0.05 results in a 3.8% false positive rate, and a 0% false negative rate within our
Proceedings of the 20th Irish Machine Vision and Image Processing conference
IMVIP 2018
29th – 31st August 2018, Belfast, Northern Ireland
135
ISBN 978-0-9934207-3-3
dataset. As such, PRNU analysis may be suitable for the detection of Deepfakes. Before such an application
can be advised however, further research must be done with larger datasets in order to confirm the correlation
and to formulate reliable likelihood ratios.
Acknowledgments
I would like to thank Hugo Dictus for his continued support and advice throughout the project.
References
[Anonymous, 2018] Anonymous (2018). Openfaceswap: A deepfakes gui. https://www.deepfakes.
club/openfaceswap-deepfakes-software/.
[Baar et al., 2012] Baar, T., van Houten, W., and Geradts, Z. (2012). Camera identification by grouping images
from database, based on shared noise patterns. arXiv preprint arXiv:1207.2641.
[FFmpeg Developers, 2018] FFmpeg Developers (2018). ffmpeg tool (version 4.0 "wu"). http://ffmpeg.
org/.
[Lukas et al., 2006] Lukas, J., Fridrich, J., and Goljan, M. (2006). Digital camera identification from sensor
pattern noise. IEEE Transactions on Information Forensics and Security, 1(2):205–214.
[Matsakis, 2018] Matsakis, L. (2018). Artificial intelligence is now fighting fake porn. https://www.
wired.com/story/gfycat-artificial-intelligence-deepfakes/.
[Ministry of Security and Justice, 2013] Ministry of Security and Justice (2013). Finding the link between
camera and image. camera individualisation with prnu compare. professional from the netherlands foren-
sic institute. https://www.forensicinstitute.nl/binaries/forensicinstitute/
documents/publications/2017/03/06/brochure-prnu-compare-professional/
brochure-nfi-prnu-compare-professional_tcm36-21580.pdf.
[Rosenfeld and Sencar, 2009] Rosenfeld, K. and Sencar, H. T. (2009). A study of the robustness of prnu-based
camera identification. In Media Forensics and Security, volume 7254, page 72540M. International Society
for Optics and Photonics.
[Welch, 1947] Welch, B. L. (1947). The generalization of student’s’ problem when several different population
variances are involved. Biometrika, 34(1/2):28–35.
Proceedings of the 20th Irish Machine Vision and Image Processing conference
IMVIP 2018
29th – 31st August 2018, Belfast, Northern Ireland
136
ISBN 978-0-9934207-3-3
... In [42] and [43], the authors found that GANs leave unique fingerprints and show how it is possible to classify the generator given the content, even in the presence of compression and noise. In [44] the authors analyze a camera's unique sensor noise (PRNU) to detect pasted content. ...
... Models which identify blurred content [30] are affected by noise and sharpening GANs [83,84], and models which search for the boundary where the face was blended in [32,27,28,29,30,31] do not work on deepfakes passed through refiner networks, which use in-painting, or those which output full frames (e.g., [85,86,87,88] and many more). Finally, solutions which search for forensic evidence [42,43,44] can be evaded (or at least raise the false alarm rate) by passing the generated content through filters, or by performing physical replication or compression. ...
Preprint
Social engineering (SE) is a form of deception that aims to trick people into giving access to data, information, networks and even money. For decades SE has been a key method for attackers to gain access to an organization, virtually skipping all lines of defense. Attackers also regularly use SE to scam innocent people by making threatening phone calls which impersonate an authority or by sending infected emails which look like they have been sent from a loved one. SE attacks will likely remain a top attack vector for criminals because humans are the weakest link in cyber security. Unfortunately, the threat will only get worse now that a new technology called deepfakes as arrived. A deepfake is believable media (e.g., videos) created by an AI. Although the technology has mostly been used to swap the faces of celebrities, it can also be used to `puppet' different personas. Recently, researchers have shown how this technology can be deployed in real-time to clone someone's voice in a phone call or reenact a face in a video call. Given that any novice user can download this technology to use it, it is no surprise that criminals have already begun to monetize it to perpetrate their SE attacks. In this paper, we propose a lightweight application which can protect organizations and individuals from deepfake SE attacks. Through a challenge and response approach, we leverage the technical and theoretical limitations of deepfake technologies to expose the attacker. Existing defence solutions are too heavy as an end-point solution and can be evaded by a dynamic attacker. In contrast, our approach is lightweight and breaks the reactive arms race, putting the attacker at a disadvantage.
... Because of the presumed negative social and political consequences of deepfake technology, there have been many studies on deepfake video detection by using computing systems (e.g., Guera & Delp, 2018;Hasan & Salah, 2019;Koopman et al., 2018), and several studies on media literacy education to develop debunking strategies (e.g., Hwang et al., 2021;Iacobucci et al., 2021). Empirical research has shown that exposure to deceptive deepfake videos would cause negative attitudinal consequences towards the deepfake targeted politicians (Dobber et al., 2021), generate a negative attitude towards deepfake videos and lower sharing intention on social media (Iacobucci et al., 2021), and elicit uncertainty and thus diminish overall trust in news on social media in general (Vaccari & Chadwick, 2020). ...
... In fact, the findings of this study showed that individuals are able to identify not only deepfake videos compared with real videos, but also video targets' messages in the deepfake videos versus real videos, no matter there is a deepfake description for the videos or not. The results may seemingly imply that the numerous studies on deepfake video detection conducted by computer scientists (Guera & Delp, 2018;Hasan & Salah, 2019;Koopman et al., 2018) and media literacy education aiming to establish debunking strategies (Hwang et al., 2021;Iacobucci et al., 2021) may not be useful and necessary because people can recognize deepfake videos. However, people do not always make the correct judgements to identify deepfakes; in some cases, they would commit systematic cognitive errors to perceive deepfake videos as real as real videos, and perceive real videos as fake as deepfake videos. ...
Article
Full-text available
The present study examined individuals’ perceived message and video fakeness of deepfake (versus real) videos and deepfake-described (versus non-described) videos, and the moderating effect of perceived deepfake targeted politicians’ personality characteristics on the perceived fakeness of deepfake videos. Six hundred and one participants were randomly assigned to one of the four groups: deepfake-described deepfake videos (n = 148), non-described deepfake videos (n = 157), deepfake-described real videos (n = 143), and non-described real videos (n = 153). Participants in each group watched deepfake or real Trump and Obama videos, and then answered questions about their perceived fakeness of each video and the perceived dangerousness and trustworthiness of the video target. Results indicated that participants were able to make the correct decisions to perceive the deepfake and deepfake-described videos and messages as less realistic than real and non-described videos. As predicted by error management theory, participants were inclined to commit the less costly false positive error to perceive dangerous video targets’ messages in real and non-described videos as fake as deepfake and deepfake-described videos. In contrast, they were prone to make the less costly false negative error to perceive trustworthy video targets’ messages in deepfake videos, and the deepfake videos, as real as real videos. Theoretical and practical implications for deepfakes research were discussed.
... While quite a few papers [26,24] choose to detect inconsistency artifacts as most of the current face manipulation methods share the common step of blending an altered face into an existing background image, which leaves intrinsic image discrepancies and wrapping artifacts across the blending boundaries. Other papers [33,23,22], on the other hand, focus on the color discrepancy. The most significant advantage of deep neural networks is that it is unnecessary to search underlying features for detecting fake images. ...
Preprint
Full-text available
The accelerated growth in synthetic visual media generation and manipulation has now reached the point of raising significant concerns and posing enormous intimidations towards society. There is an imperative need for automatic detection networks towards false digital content and avoid the spread of dangerous artificial information to contend with this threat. In this paper, we utilize and compare two kinds of handcrafted features(SIFT and HoG) and two kinds of deep features(Xception and CNN+RNN) for the deepfake detection task. We also check the performance of these features when there are mismatches between training sets and test sets. Evaluation is performed on the famous FaceForensics++ dataset, which contains four sub-datasets, Deepfakes, Face2Face, FaceSwap and NeuralTextures. The best results are from Xception, where the accuracy could surpass over 99\% when the training and test set are both from the same sub-dataset. In comparison, the results drop dramatically when the training set mismatches the test set. This phenomenon reveals the challenge of creating a universal deepfake detection system.
... Current methods focus on a variety of aspects in DeepFake detection. Some methods focused on the detection of special fingerprints during the GAN generation process [27,45], or device-based fingerprint in real images (i.e., Photo response non uniformity (PRNU) ) [19]. There are also studies that explored the differences in biological signals between real and fake videos in DeepFake detection by estimating heart rate [9,12]. ...
Preprint
Full-text available
There is strong interest in the generation of synthetic video imagery of people talking for various purposes, including entertainment, communication, training, and advertisement. With the development of deep fake generation models, synthetic video imagery will soon be visually indistinguishable to the naked eye from a naturally capture video. In addition, many methods are continuing to improve to avoid more careful, forensic visual analysis. Some deep fake videos are produced through the use of facial puppetry, which directly controls the head and face of the synthetic image through the movements of the actor, allow the actor to 'puppet' the image of another. In this paper, we address the question of whether one person's movements can be distinguished from the original speaker by controlling the visual appearance of the speaker but transferring the behavior signals from another source. We conduct a study by comparing synthetic imagery that: 1) originates from a different person speaking a different utterance, 2) originates from the same person speaking a different utterance, and 3) originates from a different person speaking the same utterance. Our study shows that synthetic videos in all three cases are seen as less real and less engaging than the original source video. Our results indicate that there could be a behavioral signature that is detectable from a person's movements that is separate from their visual appearance, and that this behavioral signature could be used to distinguish a deep fake from a properly captured video.
Thesis
In dieser Masterarbeit wird der Einsatz von Deepfakes im Marketing untersucht. Es geht darum, Deepfake-Marketing zu analysieren und vier eigene Deepfakes zu erstellen, um mögliche Effekte bei Rezipienten zu erkennen. Dazu wird die folgende Forschungsfrage „Welche Effekte lösen hyperpersonalisierte Werbebotschaften durch Deepfakes bei Probanden aus?“ aufgestellt. Für die Beantwortung der Forschungsfrage wird zunächst die Personalisierung von Werbung, sowie Deepfakes im Allgemeinen thematisiert. Im Anschluss daran wurden die Vor- und Nachteile der Hyperpersonalisierung durch die Integration von Deepfakes in die Werbung analysiert und vier eigene Deepfakes auf Basis realer Video-Werbungen generiert. Zur Überprüfung der möglichen Effekte bei Rezipienten wurde eine Online-Umfrage erarbeitet, bei der 22 Probanden die vier selbst produzierten Deepfakes sahen und Fragen zu ihrer persönlichen Einstellung zur Personalisierung durch Deepfakes mitteilten. Die Ergebnisse der nicht repräsentativen Umfrage zeigen, dass sich die Auswirkungen bei hyperpersonalisierten Werbebotschaften durch Deepfakes primär in zwei Kategorien einordnen lassen: positive Überraschung und Misstrauen. Im Detail können die Probanden Deepfakes nicht eindeutig erkennen und wünschen sich eine offene Deklarierung von Deepfake- Inhalten in der Werbung. Bereits 4 von 10 Teilnehmern würden sogar ihre Daten für die Produktion von Deepfake-Werbung zur Verfügung stellen. Doch auch die Angst vor dem Missbrauch persönlicher Daten und möglichen Gefahren, welche von Deepfakes ausgehen könnten, belasten die Teilnehmer. Für zukünftige Forschung bietet sich an, zur Produktvermarktung mit Deepfake-Werbung weiterzuforschen. Dafür bedarf es in erster Linie eine umfangreichere Datenerhebung.
Article
With the development of deep learning, AI-synthesized techniques, such as DeepFake, are widely spread on the Internet. Although many state-of-the-art detection methods have been able to obtain a good detection performance, most neural network models based on data-driven training lack interpretability during feature extraction and analysis. In this study, we propose an interpretable DeepFake video detection method based on facial textural disparities in multi-color channels. We observe that the face region from the DeepFake video appears to be smoother than that of the real one. First, we analyze the statistical disparities between the real and fake frame in each color channel. Next, it is proposed to use the co-occurrence matrix to construct a low-dimensional set of features to distinguish the real video from the DeepFake video. Meanwhile, we evaluate the video-level and frame-level detection performance on the benchmark, where the method can achieve AUC value of 0.996 on FaceForensics++, and 0.718 on Celeb-DF. Our proposed method performs remarkably better than the traditional machine learning based detectors, and comparably to some current deep learning based detectors. More importantly, our proposed method is robust in the face of compression attacks, and more time-efficient compared to existing methods based on deep learning.
Article
Full-text available
In this paper, we propose a new method for the problem of digital camera identification from its images based on the sensor's pattern noise. For each camera under investigation, we first determine its reference pattern noise, which serves as a unique identification fingerprint. This is achieved by averaging the noise obtained from multiple images using a denoising filter. To identify the camera from a given image, we consider the reference pattern noise as a spread-spectrum watermark, whose presence in the image is established by using a correlation detector. Experiments on approximately 320 images taken with nine consumer digital cameras are used to estimate false alarm rates and false rejection rates. Additionally, we study how the error rates change with common image processing, such as JPEG compression or gamma correction.
Conference Paper
We investigate the robustness of PRNU-based camera identification in cases where the test images have been passed through common image processing operations. We address the issue of whether current camera identification systems remain effective in the presence of a nontechnical, mildly evasive photographer who makes efforts at circumvention using only standard and/or freely available software. We study denoising, recompression, out-of-camera demosaicing.
Artificial intelligence is now fighting fake porn
  • L Matsakis
[Matsakis, 2018] Matsakis, L. (2018). Artificial intelligence is now fighting fake porn. https://www. wired.com/story/gfycat-artificial-intelligence-deepfakes/.