ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

DeepFakes and face image manipulation methods have been widely distributed in the last few years and several techniques have been presented to check the authenticity of the face image and detect manipulation if exists. Most of the available manipulation detection techniques have been successfully applied to reveal one type of manipulation under specific conditions, however, many limitations and challenges can be encountered in this field. To overcome some limitations and challenges, this paper presents a new face image authentication (FIA) scheme based on Multi-Task Cascaded Conventional Neural Networks (MTCNN) and watermarking in Slantlet transform (SLT) domain. The proposed FIA scheme has three main algorithms that are face detection and selection, embedding, and extraction algorithms. Different block sizes have been used to divide the image into non-overlapping blocks followed by classifying them into two groups that are blocks from face area (FA) and blocks from the remaining area (RA) of the image. In the embedding algorithms, the authentication information is generated from FA blocks and embedded in the RA blocks. In the extraction algorithms, the embedded information is extracted from RA blocks and compared with the calculated data from FA blocks to reveal manipulations and localize the manipulated blocks if exist. Extensive experiments have been conducted to evaluate the performance of the proposed FIA scheme for different face images. The experimental work included tests for payload, capacity, visual quality, time complexity, and localization of manipulations. The results proved the efficiency of the proposed scheme in detecting and localizing different face image manipulations such as attributes attacks, retouching attacks, expression swap, face swap, and morphing attacks. The proposed scheme overcomes many limitations and it is 100% accurate in localizing the tampered blocks which makes it a better candidate for practical applications.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Multimedia Tools and Applications
https://doi.org/10.1007/s11042-025-20684-7
Face image authentication scheme based onMTCNN andSLT
RashaThabit2 · MohanadA.Al‑Askari1· DunyaZekiMohammed3·
ElhamAbdulwahabAnaam4· ZainabH.Mahmood5· DinaJamalJabbar6·
ZahraaAqeelSalih7
Received: 20 July 2022 / Revised: 25 October 2023 / Accepted: 4 February 2025
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025
Abstract
DeepFakes and face image manipulation methods have been widely distributed in the
last few years and several techniques have been presented to check the authenticity of the
face image and detect manipulation if exists. Most of the available manipulation detec-
tion techniques have been successfully applied to reveal one type of manipulation under
specific conditions, however, many limitations and challenges can be encountered in this
field. To overcome some limitations and challenges, this paper presents a new face image
authentication (FIA) scheme based on Multi-Task Cascaded Conventional Neural Net-
works (MTCNN) and watermarking in Slantlet transform (SLT) domain. The proposed
FIA scheme has three main algorithms that are face detection and selection, embedding,
and extraction algorithms. Different block sizes have been used to divide the image into
non-overlapping blocks followed by classifying them into two groups that are blocks from
face area (FA) and blocks from the remaining area (RA) of the image. In the embedding
algorithms, the authentication information is generated from FA blocks and embedded in
the RA blocks. In the extraction algorithms, the embedded information is extracted from
RA blocks and compared with the calculated data from FA blocks to reveal manipulations
and localize the manipulated blocks if exist. Extensive experiments have been conducted
to evaluate the performance of the proposed FIA scheme for different face images. The
experimental work included tests for payload, capacity, visual quality, time complexity, and
localization of manipulations. The results proved the efficiency of the proposed scheme
in detecting and localizing different face image manipulations such as attributes attacks,
retouching attacks, expression swap, face swap, and morphing attacks. The proposed
scheme overcomes many limitations and it is 100% accurate in localizing the tampered
blocks which makes it a better candidate for practical applications.
Keywords DeepFakes detection· Face manipulation detection· Face manipulation
localization· Face image security· Multimedia forensics
Extended author information available on the last page of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
Several sophisticated convolutional neural network (CNN) architectures have been devised that have achieved impressive results in various domains. One downside of this success is the advent of attacks using deepfakes, a family of tools that enable anyone to use a personal computer to easily create fake videos of someone from a short video found online. Several detectors have been introduced to deal with such attacks. To achieve state-of-the-art performance, CNN-based detectors have usually been upgraded by increasing their depth and/or their width, adding more internal connections, or fusing several features or predicted probabilities from multiple CNNs. As a result, CNN-based detectors have become bigger, consume more memory and computation power, and require more training data. Moreover, there is concern about their generalizability to deal with unseen manipulation methods. In this chapter, we argue that our forensic-oriented capsule network overcomes these limitations and is more suitable than conventional CNNs to detect deepfakes. The superiority of our “Capsule-Forensics” network is due to the use of a pretrained feature extractor, statistical pooling layers, and a dynamic routing algorithm. This design enables the Capsule-Forensics network to outperform a CNN with a similar design and to be from 5 to 11 times smaller than a CNN with similar performance.
Article
Full-text available
It has become a research hotspot to detect whether a video is natural or DeepFake. However, almost all the existing works focus on detecting the inconsistency in either spatial or temporal. In this paper, a dual-branch (spatial branch and temporal branch) neural network is proposed to detect the inconsistency in both spatial and temporal for DeepFake video detection. The spatial branch aims at detecting spatial inconsistency by the effective EfficientNet model. The temporal branch focuses on temporal inconsistency detection by a new network model. The new temporal model considers optical flow as input, uses the EfficientNet to extract optical flow features, utilize the Bidirectional Long-Short Term Memory (Bi-LSTM) network to capture the temporal inconsistency of optical flow. Moreover, the optical flow frames are stacked before inputting into the EfficientNet. Finally, the softmax scores of two branches are combined with a binary-class linear SVM classifier. Experimental results on the compressed FaceForensics++ dataset and Celeb-DF dataset show that: (a) the proposed dual-branch network model performs better than some recent spatial and temporal models for the Celeb-DF dataset and all the four manipulation methods in FaceForensics++ dataset since these two branches can complement each other; (b) the use of optical flow inputs, Bi-LSTM and dual-branches can greatly improve the detection performance by the ablation experiments.
Article
Full-text available
Fingerprint biometric systems can automatically identify individuals based on their fingerprint characteristics. Different image watermarking schemes have been applied to fingerprint images for security and protection purposes. The previous schemes are either robust watermarking schemes or reversible watermarking schemes. In order to ensure the intactness of the minutiae, this work applies the robust reversible watermarking scheme for fingerprint images which can recover the original image after the watermark extraction process and provide robustness against different kinds of attacks. The proposed scheme embeds the identification number (ID) of the person in his fingerprint image to provide security while saving and exchanging the image. The scheme has two stages of security; in the first stage, the ID of the received image is extracted and compared with the saved ID in the database, if they are identical the system proceeds to the second stage in which the matching score between the received fingerprint and the saved fingerprint is calculated to authenticate the received image. The experimental results proved the efficiency of the proposed scheme in terms of visual quality, robustness, reversibility, and intactness of the fingerprint's minutiae.
Article
Full-text available
Thanks to the technological advances, social media has become more popular year by year, especially when it is common to upload selfies to the Internet where anyone from anywhere can have access to them, thus leading to some privacy issue. More specifically, when a selfie photo is relatively clear and bright, there could be a high probability of revealing a person’s locations and/or some associated information. In this paper, a framework is designed to automatically obtain the cornea information. First, the Haar Cascade algorithm is applied on the captured eye area and a find-tuned YOLO object detector is then used for iris localization. Next, an image calibration is performed to get a more accurate identification. Furthermore, image super resolution and denoising are applied to boost the image quality. Finally, Google Vision API is used for object detection. Experimental results indicate that certain privacy information could be obtained from a photo via aforementioned processes, especially when some person information can be identified. More specifically, for certain phones when the image capturing distance is around 50 cm, the probability to identify a person (from Google Vision API) can be as high as 60%. Although lowering image quality could help reduce the risk of privacy exposure, it could make the photo undesirably blur. To address these issues, a novel method is proposed to remove sensitive privacy information while at the same time being able to produce eye-stunning images.
Chapter
Full-text available
Digital manipulation has become a thriving topic in the last few years, especially after the popularity of the term DeepFakes. This chapter introduces the prominent digital manipulations with special emphasis on the facial content due to their large number of possible applications. Specifically, we cover the principles of six types of digital face manipulations: (i) entire face synthesis, (ii) identity swap, (iii) face morphing, (iv) attribute manipulation, (v) expression swap (a.k.a. face reen-actment or talking faces), and (vi) audio-and text-to-video. These six main types of face manipulation are well established by the research community, having received the most attention in the last few years. In addition, we highlight in this chapter publicly available databases and code for the generation of digital fake content.
Chapter
Full-text available
Recently, digital face manipulation and its detection have sparked large interest in industry and academia around the world. Numerous approaches have been proposed in the literature to create realistic face manipulations, such as DeepFakes and face morphs. To the human eye manipulated images and videos can be almost indistinguishable from real content. Although impressive progress has been reported in the automatic detection of such face manipulations, this research field is often considered to be a cat and mouse game. This chapter briefly discusses the state of the art of digital face manipulation and detection. Issues and challenges that need to be tackled by the research community are summarized, along with future trends in the field.
Chapter
Full-text available
We hear a lot about accidents in our day-to-day life. In India, a person dies in every 4 min due to road accidents, which is the highest in our world. Though there are several reasons for accidents to occur, one of the main reasons can be the drowsiness of the driver. Drowsiness and weariness are among the significant causes of the road accidents. This paper gives an insight about how we can detect the drowsiness of a driver. An alert will be sent to the driver as well as passengers on board if we detect that the driver is drowsy. This can reduce the probability of an accident to occur and increase transport safety. First we calculate the eye lid closure, and we take a few frames to consider and detect the drowsiness which is EAR, and then while he yawns, the values of the mouth will be calculated in the same procedure as eyes which are MAR. The message alert will be sent if the values are greater than the threshold value considered alerting everyone in the vehicle.KeywordsOpenCVEye aspect ratio (EAR)Mouth aspect ratio (MAR)Histogram of oriented gradients (HOG)Support vector machine (SVM)Bayesian classifierMulti-task cascade convolution neural network (MTCNN)
Chapter
The face targets in the dynamic scene video stream avoidably have external factors such as occlusion, fast movement, blurred background, uncontrollable lighting, etc., which are likely to cause target missing, results in the decrease of accuracy and real-time of the face detection algorithm. In this paper, we propose an improved algorithm combining Kalman filter and MTCNN deep learning network, using Kalman filter to predict the center position of the face in the next frame, and provide the MTCNN network with the recommended detection of the next frame based on the position of the predicted center point Area, MTCNN's R-net and O-net then detect the area and extract the target. Through experiments, we prove that the algorithm improves the detection speed while ensuring accuracy. Aiming at the problem that the facial structure of a single face sample is complex, while for groups, it tends to be similar, which makes it difficult to extract features suitable for classification, we propose an improved method based on the MobileFaceNet network. With the introduction of style attention in the MobileFaceNet network Mechanism, it enhances the abilities to extract local area features of the face, and the Arcface loss function is used to train the model. Experiments show that our method effectively improves the accuracy of face recognition and reduces misrecognition.