Conference PaperPDF Available

Fake Face Detection Methods: Can They Be Generalized?

Authors:

Abstract and Figures

With advancements in technology, it is now possible to create representations of human faces in a seamless manner for fake media, leveraging the large-scale availability of videos. These fake faces can be used to conduct personation attacks on the targeted subjects. Availability of open source software and a variety of commercial applications provides an opportunity to generate fake videos of a particular target subject in a number of ways. In this article, we evaluate the generalizability of the fake face detection methods through a series of studies to benchmark the detection accuracy. To this extent, we have collected a new database of more than 53, 000 images, from 150 videos, originating from multiple sources of digitally generated fakes including Computer Graphics Image (CGI) generation and many tampering based approaches. In addition, we have also included images (with more than 3, 200) from the predominantly used Swap-Face application that is commonly available on smart-phones. Extensive experiments are carried out using both texture-based handcrafted detection methods and deep learning based detection methods to find the suitability of detection methods. Through the set of evaluation, we attempt to answer if the current fake face detection methods can be generalizable.
Content may be subject to copyright.
Fake Face Detection Methods: Can They Be
Generalized?
1st Ali Khodabakhsh, 2nd Raghavendra Ramachandra, 3rd Kiran Raja, 4th Pankaj Wasnik, 5th Christoph Busch
Department of Information Security and Communication Technology,
Norwegian University of Science and Technology
Gjovik, Norway
{ali.khodabakhsh, raghavendra.ramachandra, kiran.raja, pankaj.wasnik, christoph.busch}@ntnu.no
Abstract—With advancements in technology, it is now possible
to create representations of human faces in a seamless manner
for fake media, leveraging the large-scale availability of videos.
These fake faces can be used to conduct personation attacks on
the targeted subjects. Availability of open source software and
a variety of commercial applications provides an opportunity
to generate fake videos of a particular target subject in a
number of ways. In this article, we evaluate the generalizability
of the fake face detection methods through a series of studies
to benchmark the detection accuracy. To this extent, we have
collected a new database of more than 53,000 images, from 150
videos, originating from multiple sources of digitally generated
fakes including Computer Graphics Image (CGI) generation and
many tampering based approaches. In addition, we have also
included images (with more than 3,200) from the predominantly
used Swap-Face application that is commonly available on smart-
phones. Extensive experiments are carried out using both texture-
based handcrafted detection methods and deep learning based
detection methods to find the suitability of detection methods.
Through the set of evaluation, we attempt to answer if the current
fake face detection methods can be generalizable.
Index Terms—Fake Face, Presentation Attack Detection,
Dataset, Generalization, Transfer Learning
I. INTRODUCTION
Face biometrics are widely deployed in various applications
as it ensures reliable and convenient verification of a data
subject. The dominant application of face recognition is for
logical or physical access control to for instance restricted
security areas. Implicitly the human visual system applies face
recognition to determine, which data subject is the communi-
cation partner be it in a face to face conversation or be it
in consuming messages while observing a media stream (e.g.
news channel). With recent advances in deep learning, it is now
possible to seamlessly generate manipulated images/videos in
real-time using technologies like image morphing, Snap-Chat,
Computer Generated Face Image (CGFI), Generative Adver-
sarial Networks (GAN) and Face2Face [1]. These technologies
enable an attacker to manipulate the face image either by
swapping it with another face or by pixel-wise manipulation
to generate a new face image/video. It is well demonstrated in
the literature that face recognition techniques fail drastically in
detecting generated fake faces [2]. Further fake face samples
can also be shared by intention with the social media, in order
to spread the fake news associated with the target subject. The
challenge is not only posed to the biometric systems but also
(a) Bona fide (b) Face retargeting1
(c) DeepFake2(d) CGI3
Fig. 1: Examples of different fake faces in contrast to the bona
fide presentation.
to the general media perception on social media. Thus it is of
paramount importance to detect faked face representations to
reduce the vulnerability of biometrics systems and to reduce
the impact of manipulated social media content.
Traditional biometric systems have addressed this problem
of detecting the fake faces using Presentation Attack Detection
(PAD) schemes [3], [4]. PAD schemes in the earlier works
have investigated and provided remedial measures focused on
both attacks with low-cost artefacts ( e.g. print, display, and
wrap) and high-cost artefacts (like silicon masks). Another
kind of attacks based on face morphing takes face images of
two different data subjects to generate a new morphed face
1Pinscreen: http://www.pinscreen.com/
2https://www.fakeapp.org/
3“We the people”: http://www.macinnesscott.com/vr-art-x
image which can practically match both the subjects [2]. Yet
another and recently created method of generating a faked face
image/video was presented in [1] that can be used to introduce
a personation attack on the target subject. The personation
attack can be constructed by the re-enactment process, trans-
ferring the facial expressions from the source actor to a
target actor, resulting in the manipulated images/video. This
generated facial sample through such procedures is referred
to as the fake face [5], [6]. The generated content shows
high sample quality of images/videos, which is difficult to
detect even for trained forensic examiners [6]. There are recent
additions to generate fake face images that include the use of
GAN, CGI, Face2face, and others which are highly realistic.
The reliable detection of such fake face images is challenging
due to the process of re-enactment. This results in infinitesimal
variation in the face images that challenges the conventional
forensics methods based on extracting edge discontinuities and
texture information in spotting manipulated images.
To the best of our knowledge, there exists only one work
that has attempted to detect fake faces, which were using
only one type of fakes, generated by Face2Face application
[6]. In their work [6], pre-trained deep Convolutional Neural
Network (CNN) based approaches are evaluated on the newly
constructed fake face image database. The results reported in
[6] show good detection performance of the pre-trained Xcep-
tion CNN that can be attributed to the fact that both fake face
generation and detection are carried out on the training and
testing subset of one particular dataset (FaceForensics). While
this is an important first step, we need to anticipate that with
the evolution of computer vision technologies, fake faces can
also be generated using alternative and newer methods. Thus,
it is necessary to provide an insight into the generalization of
the methods that are used to detect the fake faces to measure
the reliability.
In this work, we present a comprehensive and exploratory
study on the generalizability of different fake face detection
methods based on both recent deep learning methods and
conventional texture descriptor based methods. To this extent
of studying generalizability, we present a new database created
using diverse methodologies for generating fake faces. Further,
we also propose the protocols to effectively evaluate the
generalizability of both texture based and deep learning based
methods. The main contributions of this paper in fake face
detection are:
A new database which we hereafter refer as Fake Face
in the Wild (FFW) database with more than 53,000
images (from 150 videos) assembled from public sources
(YouTube) is introduced. This database shows the largest
diversity of different fake face generation methods pro-
vided so far.
In the view of limited public databases available for this
key research area, the newly created database will be
made available for the public along with the publication
of this paper.
Comprehensive evaluation of 6different algorithms that
include various kinds of deep learning methods such as
Category Type # of videos
CGI Full 50
Head 22
Tampering Face (FakeApp) 50
Face (Other) 28
Total 150
TABLE I: Fake Face in the Wild Dataset (FFW) broad
statistics. CGI faces were generated using several different
graphics engines. Face (FakeApp) were generated in multiple
resolutions and with different settings. Face (Other) category
includes Face replacement, part of face splicing, and partial
CGI faces, some of which were done manually, others auto-
matically (see Figure 3 for examples).
AlexNet [7], VGG19 [8], ResNet50 [9], Xception [10],
GoogLeNet/Inceptionv3 [11], and texture based methods
based on Local Binary Patterns (LBP) with Support
Vector Machine (SVM).
Extensive experiments providing insights on the gener-
alization of the algorithms for unseen fake faces are
presented. Specifically, fake faces generated using three
different methods such as CGI, FakeApp, face swap, etc
are considered.
II. FAKE FACE IN THE WILD DATAS ET (FFW)
This section presents the details of the newly constructed
database. To simulate the performance of fake face detection
methods in the wild, a set of videos from a public video
sharing website (YouTube) is collected. This dataset is col-
lected with the special focus on digitally created contents,
generated with recently developed technologies. These videos
include a wide array of fake images generated through CGI,
GANs, manual and automatic image tampering techniques,
and their combinations, due to the widespread use of these
methodologies. CGI is considered in this work due to the
wide availability and the ease of creation of high-quality
fake face images that include images of variable sizes. The
key motivation in creating this database can be attributed to
non-available public databases for either devising detection
methods or the study of generalizability. This work, therefore,
facilitates further research by making the dataset publicly
available along with the paper.4.
Table I shows a summary of the videos in the FFW dataset.
The dataset is created using videos of variable duration ranging
from 2seconds that corresponds to 60 frames up-to 74 seconds
that corresponds to more than 2,000 frames. The videos are
carefully selected to have a resolution of at least 480p and
above and are manually checked for assuring the quality
to avoid images with visible artifacts, face poses, degraded
illumination on faces and resolution. The constructed dataset
consists of 150 videos, of which 85 videos broadly pertain to
face images manipulated via image tampering (e.g., splicing,
replacing, etc) and 65 corresponds to the use of CGI. The
database thus consists of 53,000 images. In order to have bona
4Download information available at http://ali.khodabakhsh.org/ffw/
10 20 30 40 50 60 70
Average brisque
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Bona Fide
Fake Faces
Fig. 2: Distribution of BRISQUE quality scores for the Fake
Faces in the Wild (FFW) dataset.
fide samples for the evaluation, we have employed publicly
available face forensic database [6] resulting in a total of
78,500 bona fide samples from 150 videos.
To evaluate the performance on the newly created database,
the quality measures are taken into consideration by processing
the database through the same compression algorithm such that
the quality of both fake and bona fide samples are consistent.
This further avoids misleading detection error rates that for
instance can be attributed to compression artefacts and bias
the detection methods. Figure 2 shows the distribution of the
average BRISQUE quality assessment [12] measured for FFW
database indicating high overlap of the distribution justifying
the similar quality. A sample set of images from the FFW
dataset can also be seen in Figure 3.
III. FAKE FACE DETECTION TECHNIQUES
With the goal of detection of a wide range of
forged/CG/tampered audiovisual content, many methods origi-
nating from image forensics and biometrics presentation attack
detection can be adapted. In this perspective, widely used
texture-based method - Local Binary Patterns (LBP) and a set
of CNN based systems are considered. The selection of CNN
architectures AlexNet [7], VGG19 [8], ResNet50 [9], Xception
[10], and GoogLeNet/Inceptionv3 [11] is based on the recent
works demonstrating very high performance for various tasks.
The parameters are optimized when possible on the training
data and the details of parameter tuning is presented in IV-B.
IV. EXP ER IM EN TAL EVALUATIO N
This section presents the experimental evaluation of the
FFW dataset. The experiment protocols are designed in accor-
dance with protocols advised in [6]. We present the evaluation
of detecting known attacks followed by detecting unknown
attacks.
A. Evaluation Metrics
We present the detection error rates in terms of Equal Error
Rates (EER) to provide performance in the lines of earlier
work. We further supplement the results using the ISO/IEC
30107-3 [13] with Attack Presentation Classification Error
Rate (APCER) and Bona fide Presentation Classification Error
Rate (BPCER) as described in [13].
B. Experimental Protocol
To effectively evaluate the fake detection methods, we
divide the whole database to have three different disjoint
partitions such as training set, development set, and testing set.
The training set is adopted from the FaceForensics database
[6] that has 7,040 bona fide and 7,040 fake face samples.
The training set is used to fine tune the pre-trained deep
CNN networks. To effectively fine-tune the networks and avoid
overfitting, we employ 5different types of data augmentation
on each of the training images that includes translation and
reflection. The learning rates of the last layer are boosted
such that weights of the earlier layer are not affected and
the weights of the last layer are adapted for the new training
data. Thus, we have used the weight learning rate factor as
10 and bias learning rate factor as 20. For the texture based
Local Binary Patterns (LBP) [14], the histogram is extracted
using (8,1) neighborhoods with a block size of 40 pixels. The
training dataset is used to train the SVM classifier.
The development dataset is comprised of 1,500 bona fide
and 1,500 fake face samples that are taken from the validation
set of FaceForensics database [6]. This dataset is used to fix
the operating thresholds such as Equal Error Rates (EER). The
testing dataset consists of three specific kinds: (1) To evaluate
known artefacts - TestSet-I - Test set corresponds to test set of
FaceForensics database [6] that comprised of 1,500 bona fide
and 1,500 fake face samples. This dataset is particularly used
to understand the detection performances of known attacks.
(2) To evaluate unknown artefacts - TestSet-II - The test set
in this case consists of a newly constructed FFW dataset. In
order to be inline with known attacks, this set is comprised of
1,500 bona fide and 1,500 fake face samples. (3) To evaluate
unknown artefacts - TestSet-III - This test set comprises of
1,776 bona fide samples and 1,576 fake faces generated using
FaceSwap and SwapMe application proposed by [15].
While TestSet-I focuses on measuring the performance of
the detection algorithms, TestSet-II and TestSet-III are used to
measure the generalizability of the detection techniques. It has
to be noted that none of these sets (TestSet-II and TestSet-III)
are used either for training, fine-tuning or validation process.
V. RE SU LTS AND DISCUSSION
The detailed results and the obtained performance are
provided in this section.
A. Performance on the Known Fake Face Attacks (TestSet-I)
The performance of texture- and CNN-based methods on
known attacks (TestSet-I) are summarized in Table II and Table
III. Following are the main observations:
Fig. 3: Examples from Fake Faces in the Wild (FFW) dataset. Top row: CGI full scene. Middle row: Deepfakes. Bottom row
from left to right: Head CGI x2, Face replacement x2, Face CGI x2, Part of face splicing x2.
Accuracy ±CI
Texture-based LBP 96.33% ±0.69%
AlexNet 95.83% ±0.73%
VGG19 98.30% ±0.47%
CNN-based ResNet 98.43% ±0.45%
Xception 98.70% ±0.41%
Inception 99.60% ±0.23%
TABLE II: The accuracy of texture- and CNN-based classifiers
on the TestSet I dataset along with their confidence interval
(CI).
APCER BPCER EER
LBP 3.80% ±0.99% 2.87% ±0.86% 3.33%
AlexNet 7.80% ±1.38% 1.73% ±0.67% 3.73%
VGG19 2.47% ±0.80% 0.47% ±0.35% 1.40%
ResNet 2.27% ±0.77% 0.47% ±0.35% 1.40%
Xception 2.47% ±0.80% 0.13% ±0.19% 1.07%
Inception 0.67% ±0.42% 0.47% ±0.35% 0.53%
TABLE III: Performance of the systems on known fake faces
from TestSet I. The threshold is computed on the development
database.
CNN-based methods perform well and except for
AlexNet, provide a detection accuracy of over 98%. In
contrast, LBP features classified with SVM have the
accuracy of 96% on the test data.
In the benchmark of the CNN networks, the Inception
network gives the best performance by a large margin.
The low error rates in accord with a low EER error
confirm the stability of the selected threshold point for
decision. However, deviation from the selected operating
point towards lower BPCER and higher APCER is vis-
ible in the results, suggesting slight inaccuracy in EER
threshold estimation.
APCER BPCER EER
LBP 89.00% ±1.62% 2.87% ±0.86% 48.73%
AlexNet 91.47% ±1.44% 1.73% ±0.67% 32.13%
VGG19 90.73% ±1.50% 0.47% ±0.35% 29.40%
ResNet 89.53% ±1.58% 0.47% ±0.35% 30.33%
Xception 93.20% ±1.30% 0.13% ±0.19% 26.87%
Inception 91.93% ±1.41% 0.47% ±0.35% 27.47%
TABLE IV: Performance of the systems on unknown attacks
from TestSet II. The threshold is computed on the development
database.
B. Performance on the Unknown Fake Face Presentations
(TestSet-II)
Following the good performance of all neural network
solutions along with the LBP features, the generalizability of
the learned classifiers are examined on the collected dataset
of matching size as shown in Table IV and the observations
are:
The performance of all systems in terms of APCER errors
drops significantly, rendering the systems ineffective,
classifying most images as bona fide.
A closer look at the EER values for these systems shows
much better than random performance of CNN-based
models on the Unknown dataset.
It can be concluded that the performance of the CNN-
based systems is very poor because of the low perfor-
mance at the selected operating point.
To illustrate this further, the score histogram of the known
and unknown attacks are presented in Figures 4 and 5 for LBP-
SVM and Inception networks respectively. The dotted vertical
line indicates the threshold computed on the development
database that corresponds to the EER. Figure 4 shows the
inability of the system in distinguishing unknown attacks by a
significant overlap between the bona fide distribution and the
distribution of scores from the unknown attacks. However, a
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Comparison Scores
0
0.05
0.1
0.15
0.2
0.25
Normalized Distribution
Bona Fide
Test set I
Test set II
EER threshold
Fig. 4: LBP-SVM system comparison score distribution on
TestSets I and II.
-20 -15 -10 -5 0 5 10 15 20
Comparison Scores (logit)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Normalized Distribution
Bona Fide
Test set I
Test set II
EER threshold
Fig. 5: Inceptionv3 system comparison score distribution on
TestSets I and II.
close look into Figure 5 shows that even though the network
is capable of discriminating between unknown attacks and the
bona fide to some extent, the weak placement of the decision
boundary causes the network to fail. By setting the threshold
of the system to the EER point on the known attacks, even
though the system shows optimal performance for the known
attacks, it also becomes vulnerable to new types of attacks,
where the separability may be less.
1) Performance on each Sub-Type of Attacks: To have a
closer look at the capability of CNNs in generalization, EERs
for each type is calculated separately and reported in Table V.
From these results, it is visible that the networks perform
better in detecting CGI compared to contents generated
by FakeApp, or other techniques.
These results indicate that even though the networks
were not trained to detect CGI specifically, they are still
Full Image Manipulation
CGI FakeApp Other
AlexNet 32.60% 28.80% 34.37%
VGG19 28.00% 31.20% 28.60%
ResNet 28.80% 28.37% 34.40%
Xception 23.60% 25.20% 31.20%
Inception 23.40% 27.40% 31.40%
TABLE V: CNN performances in terms of EER on subcate-
gories, corresponding to Table I.
APCER BPCER EER
LBP 90.16% ±1.50% 3.43% ±0.86% 46.06%
AlexNet 94.04% ±1.19% 5.01% ±1.04% 43.02%
VGG19 97.27% ±0.82% 2.31% ±0.71% 44.93%
ResNet 89.40% ±1.55% 8.22% ±1.30% 43.79%
Xception 93.15% ±1.27% 3.43% ±0.86% 40.99%
Inception 71.64% ±2.27% 22.58% ±1.98% 46.39%
TABLE VI: Performance of the systems on
FaceSwap/SwapMe dataset from TestSet III. The threshold is
computed on the development database.
somewhat effective for detecting of CGI videos.
C. Performance on the FaceSwap/SwapMe Dataset (TestSet-
III)
To investigate the transferability of the generalization ability
of the networks on the unknown data of a widely different
type, experiments were done on a filtered subset of the
FaceSwap/SwapMe dataset as shown in Table VI.
The APCER and EER scores present a further drop in
performance.
These results indicate the lack of transferability of the
learned classifiers to the general face forgery classifica-
tion cases.
VI. CONCLUSION AND FUTURE WO RK
The advancement of image manipulation and image genera-
tion techniques have now provided the ability to create seam-
less and convincing fake face images. The challenging nature
of data both for visual perception and algorithmic detection
is provided in recent works. The key problem that was not
considered up until now is the evaluation of generalizability
on existing fake face detection techniques. In order to answer
the question of generalizability, in this work, we have created
a new database which we refer to as Fake Face in the Wild
(FFW) dataset containing 53,000 images from 150 videos
that are publicly available. The key observation from this
work throws light on deficiencies of detection algorithms when
unknown data is presented. This observation holds for both
texture descriptors and deep-learning methods, which yet can-
not meet the challenge of detecting fake faces. This analysis
further emphasizes the importance of validation of detectors
across multiple datasets. Proposed detectors that lack such
validation can show misleadingly high performances while
having limited applicability, and provide little contribution to
the ongoing research. As such, advancements in fake face
detection technology call for the incorporation of proper cross-
dataset validation in all future research as a requirement for
publication.
The future work in the direction of fake face detection
will involve the development of systematical methods for
answering the generalization problem, and employment of
multi-modal cues from fake face data.
REFERENCES
[1] J. Thies, M. Zollh ¨
ofer, M. Stamminger, C. Theobalt, and M. Nießner,
“Face2face: Real-time face capture and reenactment of rgb videos,” in
Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Confer-
ence on. IEEE, 2016, pp. 2387–2395.
[2] R. Raghavendra, K. B. Raja, and C. Busch, “Detecting morphed face
images,” in 2016 IEEE 8th International Conference on Biometrics
Theory, Applications and Systems (BTAS), Sept 2016, pp. 1–7.
[3] S. Bhattacharjee and S. Marcel, “What you can’t see can help you -
extended-range imaging for 3d-mask presentation attack detection,” in
2017 International Conference of the Biometrics Special Interest Group
(BIOSIG), Sept 2017, pp. 1–7.
[4] R. Ramachandra and C. Busch, “Presentation attack detection methods
for face recognition systems: A comprehensive survey,ACM Comput.
Surv., vol. 50, no. 1, pp. 8:1–8:37, Mar. 2017. [Online]. Available:
http://doi.acm.org/10.1145/3038924
[5] A. Khodabakhsh, R. Ramachandra, and C. Busch, “A taxonomy of
audiovisual fake multimedia content creation technology,” in Proceed-
ings of the 1st IEEE International Workshop on Fake MultiMedia
(FakeMM’18), 2018, pp. –.
[6] A. R ¨
ossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and
M. Nießner, “Faceforensics: A large-scale video dataset for forgery
detection in human faces,” arXiv preprint arXiv:1803.09179, 2018.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in Neural In-
formation Processing Systems 25. Curran Associates, Inc., 2012, pp.
1097–1105.
[8] K. Simonyan and A. Zisserman, “Very deep convolutional networks
for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
[Online]. Available: http://arxiv.org/abs/1409.1556
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” CoRR, vol. abs/1512.03385, 2015. [Online]. Available:
http://arxiv.org/abs/1512.03385
[10] F. Chollet, “Xception: Deep learning with depthwise separable
convolutions,CoRR, vol. abs/1610.02357, 2016. [Online]. Available:
http://arxiv.org/abs/1610.02357
[11] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and
Z. Wojna, “Rethinking the inception architecture for computer
vision,” CoRR, vol. abs/1512.00567, 2015. [Online]. Available:
http://arxiv.org/abs/1512.00567
[12] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image
quality assessment in the spatial domain,” IEEE Transactions on Image
Processing, vol. 21, no. 12, pp. 4695–4708, Dec 2012.
[13] ISO/IEC 30107-3:2017, “Information technology - Biometric presen-
tation attack detection - Part 3: Testing and reporting,” International
Organization for Standardization, Standard, Sep. 2017.
[14] T. Ojala, M. Pietik¨
ainen, and T. M¨
aenp¨
a¨
a, “Multiresolution
gray-scale and rotation invariant texture classification with local
binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 24, no. 7, pp. 971–987, Jul. 2002. [Online]. Available:
http://dx.doi.org/10.1109/TPAMI.2002.1017623
[15] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, “Two-stream neural
networks for tampered face detection,” in 2017 IEEE Conference on
Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE,
2017, pp. 1831–1839.
... Typical sources for these datasets consist of YouTube videos and security camera recordings, with sample sizes varying from 119 to 10,000 clips. Each clip has a runtime of less than 10 seconds [28]. The datasets comprise several forms of manipulations, such as copy-move, splicing, inter-frame, and intra-frame forgeries. ...
... The proposed method's advanced analytical capabilities and low error rates make it a superior choice for video forgery detection, addressing limitations observed in previous studies. [45] Deep fake Eye blinking Custom dataset Requires further assessment using additional video samples [28] Deep fake Head pose UADFV and DARPA GAN Lacks proficiency in identifying puppetmaster and lip-sync counterfeits. [46] Deep fake Color artifacts LSUN and ImageNet Localization is not yet effectively done [47] Deep fake Classification Self, FaceForensics Suffers from overfitting problem [48] Pixel motion detection VTL, SULFA, DERF The problem of misdetection arises due to its susceptibility to imprecise detection. ...
Article
Full-text available
div class="flex-1 overflow-hidden"> The widespread accessibility of inexpensive mobile phones, digital cameras, camcorders, and security closed-circuit television (CCTV) cameras has resulted in the integration of filmmaking into our everyday existence. YouTube, Facebook, Instagram, and Snapchat are a few of the video-sharing and editing applications that facilitate the process of uploading and editing videos. Additional instances include Adobe Photoshop, Windows Movie Maker, and Video Editor. Although editing has its advantages, there is a potential risk of counterfeiting. This occurs when films are edited with the intention of misleading viewers or manipulating their perspectives, which can be particularly troublesome in judicial procedures where recordings are submitted as evidence. The issue has been exacerbated by the emergence of deep learning methods, such as deepfake videos that effectively manipulate facial characteristics. Consequently, individuals have become less reliant on visual evidence. These issues emphasise the pressing necessity for the creation of dependable methods to determine the authenticity of films and identify cases of fraud. Contemporary methods can depend on assessing modified frames or utilising distortions generated during video codec compression or double compression. Since 2016, multiple studies have been undertaken to investigate techniques, strategies, and applications to tackle this problem. The objective of this survey study is to provide a comprehensive analysis of these algorithms, highlighting their advantages and disadvantages in detecting different forms of video forgeries. </div
... These are also called as classification metrics and derived from the 2-D matrix i.e., confusion matrix shown in Fig.4. This compromise between the actual values and predicted values that values are calculated by TP (true positive), TN (true negative), FP (false positive), and FN (false negative) [30]. ...
Article
Full-text available
The adoption of web technology has come to be accompanied by a number of worrying security issues, one of which is deep fakes that are now counted among the top visual deceits in the field. The need for identifying such manipulations which is on the rise is the need for stronger methods that can be used to identify such manipulations. This article deals with the usage of fully connected neural networks (FCNN), convolutional neural networks (CNN), and deep convolutional neural networks (DCNN) to determine if a presented facial image is original or fake. In this case, the methods apply the use of the improvements in the feature extracting techniques to catch even the smallest differences in modified materials. Tests conducted on kaggle benchmark datasets depend on that that the methods are the best for it as a solution for safeguarding reliable and efficient forgery detection. The outcome is suggestive of that integration of deep learning methods like CNN, FCNN, and DCNN automated systems has the potential for advancing the struggle against manipulation in media field. As compared with the other models, the CNN is excelling in my testing and it is far better than the rest. More precisely, the CNN is the most perfect while FCNN had its drawbacks in the precision and specificity.
... These fake faces can be utilized as fake personation attack victims. There are many ways to use both commercially accessible apps and open-source software to create convincing videos on a certain subject [16]. Studies show that this new model can produce very realistic fake faces in images. ...
Article
Full-text available
The ability to create fake images, such as incorporating one’s own face with another, has been significantly made easier by sophisticated image editing software. One of the most difficult challenges is to improve multi-classification performance of an unbalanced dataset of real and fake faces. We investigated and compared 16 different face classification methods: MobileNet, MobileNetV2, NASNetMobile, ResNet50, DenseNet121, ResNet50V2, DenseNet169, ResNet101V2, ResNet152, DenseNet201, InceptionV3, ResNet152V2, Xception, InceptionResNetV2, EfficientNetV2M and our proposed model. After analyzing several face classification techniques, we developed a novel hybrid model that combined an improved version of MobileNetV2+convolutional architecture for fast feature embedding (CAFFE) named as “CMNV2" included in methodology. Goal of this study is to add 6 additional layers to pre-trained model structures previously present to increase accuracy of CMNV2 model for classification and prediction of real and fake faces of real-time and photo images. In this work, multi-classification of unbalanced dataset of 2040 images containing 4 categories: real_face, fake_face_easy, fake_face_mid and fake_face_hard is trained and tested with deep learning (DL) models of Keras applications. Our CMNV2 Proposed model, which combines deep neural networks (DNNs), transfer learning (TL) and deep learning architecture performed better than 15 standard methods in extracting image features for face classification (FC) and prediction. Through combing their feature sets at a particular layer, the CMNV2 model hybridized using MobileNetV2 and CAFFE, enabling the model to capture both generalized and specialized information. Transfer learning is then used for fine-tuning on the particular dataset. Our CMNV2 model also outperformed other 15 traditional face classification models with 9 performance metrics evaluation, achieving 98.53% percent accuracy, a 1.47% error rate with less computational time and fewer parameters. The dataset and source code used in our article are made available on github platform https://github.com/AnilKumargithu/Real-and-Fake-Faces-Classification-and-Prediction.
... In the literature, there are notably numerous studies focusing on the generalization of fake faces detection. References such as [153,154,155,156] consistently report that a synthetic face detector trained on images generated by an algorithm A 1 is likely to su er from a significant performance drop when detecting images generated by a di erent algorithm A 2 . ...
Thesis
Full-text available
Today, it is easier than ever to manipulate images for unethical purposes. This practice is therefore increasingly prevalent in social networks and advertising. Malicious users can for instance generate convincing deep fakes in a few seconds to lure a naive public. Alternatively, they can also communicate secretly hidding illegal information into images. Such abilities raise significant security concerns regarding misinformation and clandestine communications. The Forensics community thus actively collaborates with Law Enforcement Agencies worldwide to detect image manipulations. The most effective methodologies for image forensics rely heavily on convolutional neural networks meticulously trained on controlled databases. These databases are actually curated by researchers to serve specific purposes, resulting in a great disparity from the real-world datasets encountered by forensic practitioners. This data shift addresses a clear challenge for practitioners, hindering the effectiveness of standardized forensics models when applied in practical situations. Through this thesis, we aim to improve the efficiency of forensics models in practical settings, designing strategies to mitigate the impact of data shift. It starts by exploring literature on out-of-distribution generalization to find existing strategies already helping practitioners to make efficient forensic detectors in practice. Two main frameworks notably hold promise: the implementation of models inherently able to learn how to generalize on images coming from a new database, or the construction of a representative training base allowing forensics models to generalize effectively on scrutinized images. Both frameworks are covered in this manuscript. When faced with many unlabeled images to examine, domain adaptation strategies matching training and testing bases in latent spaces are designed to mitigate data shifts encountered by practitioners. Unfortunately, these strategies often fail in practice despite their theoretical efficiency, because they assume that scrutinized images are balanced, an assumption unrealistic for forensic analysts, as suspects might be for instance entirely innocent. Additionally, such strategies are tested typically assuming that an appropriate training set has been chosen from the beginning, to facilitate adaptation on the new distribution.. Trying to generalize on a few images is more realistic but much more difficult by essence. We precisely deal with this scenario in the second part of this thesis, gaining a deeper understanding of data shifts in digital image forensics. Exploring the influence of traditional processing operations on the statistical properties of developed images, we formulate several strategies to select or create training databases relevant for a small amount of images under scrutiny. Our final contribution is a framework leveraging statistical properties of images to build relevant training sets for any testing set in image manipulation detection. This approach improves by far the generalization of classical steganalysis detectors on practical sets encountered by forensic analyst and can be extended to other forensic contexts.
... Ali et al. [12] tested the generalization of the fake face detection methods. Two types of fake face detection methods have been tested in this paper. ...
Article
Full-text available
Advancement in Artificial intelligence has resulted in evolvement of various Deepfake generation methods. This subsequently leads to spread of fake information which needs to be restricted. Deepfake detection methods offer solution to this problem. However, a particular Deepfake detection method which gives best results for a set of Deepfake images (generated by a particular generation method) fails to detect another set of Deepfake images (generated by another method). In this work various Deepfake detection methods were tested for their suitability to decipher Deepfake images generated by various generation methods. We have used VGG16, ResNet50, VGG19, and MobileNetV2 for deepfake detection and pre-trained models of StyleGAN2, StyleGAN3, and ProGAN for fake generation. The training dataset comprised of 200000 images, 50 % of which were real and 50% were fake. The best performing Deepfake detection model was VGG19 with more than 96 percent accuracy for StyleGAN2, StyleGAN3, and ProGAN- generated fakes.
Article
Full-text available
Contents on social media may be without difficulty handy in this era and with the help of advanced tools and cheaper computing infrastructure, it has made very smooth for people to provide deep fakes. These may be AI generated motion pictures which appearance actual however are fake, that may be created via face swapping strategies. It causes a intense terrible impact on social existence of humans because of spreading of disinformation and swindle. With the speedy advancement on this technology and misuse of it, all people can easily create propaganda affecting panic and chaos. for this reason, a sturdy machine to discover and differentiate among real and pretend content material has become important on this age of social media. This paper evaluations on distinctive methods of detecting the deepfakes and examine them a good way to gain more accuracy in consequences.
Article
With the rapid advancement of deep generative techniques, such as generative adversarial networks (GANs), the creation of realistic fake images and videos has become increasingly accessible, raising significant security and privacy concerns. Although existing deepfake detection methods perform well within a single dataset, they often experience substantial performance degradation when applied across datasets or manipulation types. To address this challenge, we propose a novel deepfake detection framework that combines multiple loss functions and the MixStyle technique. By integrating Cross-Entropy Loss, ArcFace loss, and Focal Loss, our model enhances its discriminative power to better handle complex forgery characteristics and effectively mitigate data imbalance. Additionally, the MixStyle technique introduces diverse visual styles during training, further improving the model’s generalization across different datasets and manipulation scenarios. Experimental results demonstrate that our method achieves superior detection accuracy across a range of cross-dataset and cross-manipulation tests, significantly improving model robustness and generalizability.
Article
Full-text available
The vulnerability of face recognition systems to presentation attacks (also known as direct attacks or spoof attacks) has received a great deal of interest from the biometric community. The rapid evolution of face recognition systems into real-time applications has raised new concerns about their ability to resist presentation attacks, particularly in unattended application scenarios such as automated border control. The goal of a presentation attack is to subvert the face recognition system by presenting a facial biometric artefact. Popular face biometric artefacts include a printed photo, the electronic display of a facial photo, replaying video using an electronic display and 3D face masks. These have demonstrated a high security risk for state-of-the-art face recognition systems. However, several presentation attack detection (PAD) algorithms (also known as countermeasures or anti-spoofing methods) have been proposed that can automatically detect and mitigate such targeted attacks. The goal of this survey is to present a systematic overview of the existing work on face presentation attack detection that has been carried out. This paper describes the various aspects of face presentation attacks, including different types of face artefacts, state-of-the-art PAD algorithms and an overview of the respective research labs working in this domain, vulnerability assessments and performance evaluation metrics, the outcomes of competitions, the availability of public databases for benchmarking new PAD algorithms in a reproducible manner and finally a summary of the relevant international standardization in this field. Furthermore, we discuss the open challenges and future work that needs to be addressed in this evolving field of biometrics.
Article
Face2Face is an approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time. This live setup has also been shown at SIGGRAPH Emerging Technologies 2016, by Thies et al. where it won the Best in Show Award.
Article
With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even in real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors of fake videos. In fact, distinguishing between original and manipulated video can be a challenge for humans and computers alike, especially when the videos are compressed or have low resolution, as it often happens on social networks. Research on the detection of face manipulations has been seriously hampered by the lack of adequate datasets. To this end, we introduce a novel face manipulation dataset of about half a million edited images (from over 1000 videos). The manipulations have been generated with a state-of-the-art face editing approach. It exceeds all existing video manipulation datasets by at least an order of magnitude. Using our new dataset, we introduce benchmarks for classical image forensic tasks, including classification and segmentation, considering videos compressed at various quality levels. In addition, we introduce a benchmark evaluation for creating indistinguishable forgeries with known ground truth; for instance with generative refinement models.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry