Figure - uploaded by Yiting Wang
Content may be subject to copyright.
Source publication
It has become a research hotspot to detect whether a video is natural or DeepFake. However, almost all the existing works focus on detecting the inconsistency in either spatial or temporal. In this paper, a dual-branch (spatial branch and temporal branch) neural network is proposed to detect the inconsistency in both spatial and temporal for DeepFa...
Contexts in source publication
Context 1
... detecting the input frames with MTCNN, we crop all the regions where faces are detected and resize them to 224 × 224 pixels. Figure 1 shows an example face cropped from a frame. ...Context 2
... in order to test the performance of the proposed temporal branch, it is also compared with three existing temporal models, i.e., C3D [22], CNN-LSTM [26], and Sharp Multiple Instance Learning (S-MIL) [21]. Moreover, the input of each compared model is a sequence of consecutive stacked optical flow frames. The comparative results are presented in Fig. 10. The results show that our model is much better than C3D and S-MIL due to the use of the EfficientNet and Bi-LSTM. The average accuracy of our model reaches 92% for the two datasets. Even for the most difficult NT subdataset, the accuracy is still 85%. The results also demonstrate that our model outperforms the CNN-LSTM based on ...Similar publications
Trapping and manipulation of particles using laser beams has become an important tool in diverse fields of research. In recent years, particular interest is given to the problem of conveying optically trapped particles over extended distances either down or upstream the direction of the photons momentum flow. Here, we propose and demonstrate experi...
Citations
... The rapid development and distribution of DeepFakes technology issued an urgent call for the research community to present Face Image Manipulation Detection (FIMD) techniques that can check the authenticity of the face images and detect manipulations if exist. In the last few years, the research community dedicated a lot of efforts in developing FIMD techniques that can generally serve the media forensics and security systems [5][6][7][8][9][10][11][12][13][14]. ...
DeepFakes and face image manipulation methods have been widely distributed in the last few years and several techniques have been presented to check the authenticity of the face image and detect manipulation if exists. Most of the available manipulation detection techniques have been successfully applied to reveal one type of manipulation under specific conditions, however, many limitations and challenges can be encountered in this field. To overcome some limitations and challenges, this paper presents a new face image authentication (FIA) scheme based on Multi-Task Cascaded Conventional Neural Networks (MTCNN) and watermarking in Slantlet transform (SLT) domain. The proposed FIA scheme has three main algorithms that are face detection and selection, embedding, and extraction algorithms. Different block sizes have been used to divide the image into non-overlapping blocks followed by classifying them into two groups that are blocks from face area (FA) and blocks from the remaining area (RA) of the image. In the embedding algorithms, the authentication information is generated from FA blocks and embedded in the RA blocks. In the extraction algorithms, the embedded information is extracted from RA blocks and compared with the calculated data from FA blocks to reveal manipulations and localize the manipulated blocks if exist. Extensive experiments have been conducted to evaluate the performance of the proposed FIA scheme for different face images. The experimental work included tests for payload, capacity, visual quality, time complexity, and localization of manipulations. The results proved the efficiency of the proposed scheme in detecting and localizing different face image manipulations such as attributes attacks, retouching attacks, expression swap, face swap, and morphing attacks. The proposed scheme overcomes many limitations and it is 100% accurate in localizing the tampered blocks which makes it a better candidate for practical applications.
... Moreover, it was also found that real faces are coherent in different local regions, while forged faces are mixed from different face sources and thus produce inconsistent information at certain locations. Therefore, the concept of consistency learning [19] was introduced into forgery detection [20][21][22], which usually measures the local similarity between individual patches of an image to capture the inconsistency between tampered and authentic regions. However, these methods tend to overlook the importance of global features, which encompass valuable discriminative information such as the colors of the artifacts in different facial regions and the contextual links between individual artifacts. ...
... Forgery detection based on consistency learning. Recent works [19][20][21][22] show that the manipulation methods typically disrupt the correlation between the local regions of the faces and attempt to utilize consistency learning to capture the local artifacts. Zhao et al. [20] extracted the middle layer features of the ResNet network and constructed patch similarity features based on them, which were used to assist in locating the forged regions and guide the model to detect the local inconsistencies of the forged faces. ...
... Chen et al. [21] constructed the similarity matrix of the frequency domain stream and the RGB stream to capture the local inconsistencies of the forged faces in both the spatial and frequency domains. Kuang et al. [22] proposed a dual-branch (spatial branch and temporal branch) neural network to detect the inconsistency in both spatial and temporal for DeepFake video detection. The spatial branch aims at detecting spatial inconsistency by the effective EfficientNet model. ...
The proliferation of fake images generated by deepfake techniques has significantly threatened the trustworthiness of digital information, leading to a pressing need for face forgery detection. However, due to the similarity between human face images and the subtlety of artefact information, most deep face forgery detection methods face certain challenges, such as incomplete extraction of artefact information, limited performance in detecting low-quality forgeries, and insufficient generalization across different datasets. To address these issues, this paper proposes a novel noise-aware multi-scale deepfake detection model. Firstly, a progressive spatial attention module is introduced, which learns two types of spatial feature weights: boosting weight and suppression weight. The boosting weight highlights salient regions, while the suppression weight enables the model to capture more subtle artifact information. Through multiple boosting-suppression stages, the proposed model progressively focuses on different facial regions and extracts multi-scale RGB features. Additionally, a noise-aware two-stream network is introduced, which leverages frequency-domain features and fuses image noise with multi-scale RGB features. This integration enhances the model’s ability to handle image post-processing. Furthermore, the model learns global features from multi-modal features through multiple convolutional layers, which are combined with local similarity features for deepfake detection, thereby improving the model’s robustness. Experimental results on several benchmark databases demonstrate the superiority of our proposed method over state-of-the-art techniques. Our contributions lie in the progressive spatial attention module, which effectively addresses overfitting in CNNs, and the integration of noise-aware features and multi-scale RGB features. These innovations lead to enhanced accuracy and generalization performance in face forgery detection.
... Deepfake videos can be detected by deploying a dual-branch neural network that works on spatial and temporal inconsistencies [56], by implementing light-weight deep ensemble model that uses visual inputs [62]. Another author attempted to generalize the existing deepfake detection techniques to draw a simplified one-way method [16]. ...
Artificial images and recordings are broad on the web via different media channels such as blogs, YouTube videos, etc. These manipulated and synthesized images tend to steal the identity of individuals and majorly contribute to establishing societal disruptions such as theft, political errors, social engineering, disinformation attacks and reputation fraud. These fake visual objects gradually came to be known as deep fakes. Different deep learning techniques are used to generate deepfake images which go unnoticed by human eyes. It is essential to develop a defense mechanism that can stop the common people from being manipulated and harnessed. The objective of this work is to develop an ensemble deep learning-based system that can differentiate between fake and real images. With the use of the recommended optical flow technique, a novel approach is proposed that extracts the apparent motion of image pixels which gives more accurate results compared to other state-of-the-art. FaceForensics + + dataset is used to test the extraction algorithms and ensemble model which fetched an accuracy of 86.02% for the DeepFake subset and 85.7% for the FaceSwap subset of the dataset. To the best knowledge, no one has completely used the ensemble model- OptiFake on the optical flow derived frames, highlighting a research gap in the field of deepfake detection.
... Moreover, a web interface has been designed to upload the video for the subsequent deepfake prediction. (Kuang et al., 2022) explored a dual-branch approach to capture inconsistencies from the sequence of video frames to detect deepfake manipulation in videos. The method comprises spatial and temporal branches for learning the spatial and temporal information from the input video. ...
Recent advancements in deep learning generative models have raised concerns as they can create highly convincing counterfeit images and videos. This poses a threat to people's integrity and can lead to social instability. To address this issue, there is a pressing need to develop new computational models that can efficiently detect forged content and alert users to potential image and video manipulations. This paper presents a comprehensive review of recent studies for deepfake content detection using deep learning‐based approaches. We aim to broaden the state‐of‐the‐art research by systematically reviewing the different categories of fake content detection. Furthermore, we report the advantages and drawbacks of the examined works, and prescribe several future directions towards the issues and shortcomings still unsolved on deepfake detection.
... Kuang et al. [91] explored a dual-branch approach to capture inconsistencies from the sequence of video frames to detect deepfake manipulation in videos. ...
... In Table 3, we show a collection of articles (Reference) and their average scores according to the datasets. [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38] 94.2 [39], [40], [41], [42], [43] 93.55 ...
... Celeb-DF [25], [29], [33], [35], [36], [41], [45], [47], [52] 85.79 [27], [43], [49], [50], [53], [54] 82.23 7 Celeb-DFv2 [34] 99.31 [40], [32], [31] 88.01 8 DFDC preview [24], [45], [55], [56], [57] 91.61 [51] 84.4 9 DFDC [29], [35], [42], [58] 83.27 [29], [31], [41], [42], [59] 89. 3 10 DeeperForensics-1.0 [30] 62.46 --11 ...
Deepfake video has usefulness in entertainment and multimedia technology, however, the danger of deepfake is significant to the social, economical, and political sectors so far. Specifically, to diverge any public opinion by generating fake news and spreading misleading information, national security may be under risk due to misrepresenting statements given by political leaders. The creation of such manipulated videos are getting easier day by day and at the same time it is necessary to detect and prevent them. In order to do that, researchers are creating challenging fake video databases for artificial intelligence (AI) based detection models to contribute to the research. This paper reviewed the existing deepfake video detection datasets available online and used in the previous research articles. We analyzed the literature from two different perspectives, datasets and detection models. The goal of this study is to introduce all publicly available datasets in this field including the discussion of techniques used to generate the data. In addition to our contribution, we showed a result comparison among different deepfake datasets and discussed the findings. This is an open access article under the CC BY-SA license.
... To address the detection problem of forged face images, researchers have proposed various methods for face forgery detection [17,21,23,29,30,37,41,54,56,57]. Although most of the methods achieve good performance when testing within a dataset, the model performance degrades dramatically when facing images of unknown forgery methods. ...
Face forgery detection has been a widespread issue recently due to the adverse effects of face forgery techniques on social media. The state-of-the-art deep learning based methods commonly employ low-level texture features for face forgery detection, since most face forgery methods have difficulty simulating low-level signals in natural images. However, most existing methods only visit the low-level features from the spatial or temporal perspective. In this work, we revisit the face forgery detection problem from a spatio-temporal perspective to cover both for better generalization performance. Specifically, we propose a Spatio-Temporal Difference Network (STDN) to mine low-level clues for face forgery detection. The network contains three different but complementary branches 1) high-frequency channel difference images, 2) inter-frame residual signals, and 3) raw RGB images. It is able to capture face forgery traces through a three-branch collaborative learning framework. Furthermore, we propose a multimodal attention fusion module to effectively fuse the complementary features from different branches. Through comprehensive experiments on several publicly available datasets, we demonstrate the superior performance of the proposed STDN. The effectiveness of low-level spatio-temporal clues in a collaborative learning framework could potentially guide future work in face forgery detection.