Figure 2 - uploaded by Guoqing Wang
Content may be subject to copyright.
The overall diagram of the proposed approach for face presentation attack detection via adversarial domain adaptation. We first pre-train a source model optimized with triplet loss in source domain. Subsequently, we perform adversarial adaptation by learning a target model such that under the embedding space the discriminator cannot reliably predict whether a sample is from source domain or target domain. Finally, target images are mapped with the target model to the embedding space and classified with k-nearest neighbors classifier. Dashed lines indicate fixed network parameters.
Source publication
Face recognition (FR) is being widely used in many applications from access control to smartphone unlock. As a result, face presentation attack detection (PAD) has drawn increasing attentions to secure the FR systems. Traditional approaches for PAD mainly assume that training and testing scenarios are similar in imaging conditions (illu-mination, s...
Contexts in source publication
Context 1
... shown in Fig. 2, our approach consists of two parts, i.e., source domain model learning and adversarial domain ...
Context 2
... leverage the source domain knowledge and effectively distinguish between live and spoof face images in the target domain. We propose an adversarial domain adaptation (ADA) to build the target domain PAD model f θ T (·). In order to perform ADA, we divide both the source domain model and target domain model into two parts: encoder and decoder (see Fig. 2). The relationship of encoders, decoders, and the entire model f θ are as ...
Context 3
... target domain models. The network structure is shown in Fig. 3. There are 4 residual blocks in the encoder of each network and each block has 4 convolutional layers. The decoder of each network is a simple linear model containing 2 FC layers that aim to transform the encoded features into 128-D embedding feature vectors. The discriminator D in Fig. 2 consists of 3 fully connected layers: which have 256 hidden units, 128 hidden units, and two outputs nodes, respectively. Each of the first two layers uses a ReLU activation ...
Citations
... The traditional DA methods generally judge a fake or real face by using a simple FC layer-based classifier optimized with cross-entropy loss. The DA-based generalized FAS methods are mainly based on Maximum Mean Discrepancy (MMD) and adversarial learning methods (Jia et al., 2021;Wang et al., 2019. The self-designed DA-based deep learning FAS methods are mainly based on some novel deep learning frameworks such as teacher-student learning , generative DA (Zhou et al., 2022), contrastive learning , and disentangled representation learning (Yue et al., 2022). ...
... The best and the second best values are given in bold The best and the second best values are given in bold with only 2D attack types. The methods we compare can be divided into three categories: DG-based FAS methods (Jia et al., 2020;Zhou et al., 2023;Long et al., 2023), source-free DA-based FAS methods He et al., 2020;Liang et al., 2020;Yang et al., 2021a, b;LV et al., 2021; and unsupervised DA-based FAS methods Zhou et al., 2022;Wang et al., 2019;Quan et al., 2021). As shown in Tables 4 and 5. ...
Unsupervised domain adaptation-based face anti-spoofing methods have attracted more and more attention due to their promising generalization abilities. To mitigate domain bias, existing methods generally attempt to align the marginal distributions of samples from source and target domains. However, the label and pseudo-label information of the samples from source and target domains are ignored. To solve this problem, this paper proposes a Weighted Joint Distribution Optimal Transport unsupervised multi-source domain adaptation method for cross-scenario face anti-spoofing (WJDOT-FAS). WJDOT-FAS consists of three modules: joint distribution estimation, joint distribution optimal transport, and domain weight optimization. Specifically, the joint distributions of the features and pseudo labels of multi-source and target domains are firstly estimated based on a pre-trained feature extractor and a randomly initialized classifier. Then, we compute the cost matrices and the optimal transportation mappings from the joint distributions related to each source domain and the target domain by solving Lp-L1 optimal transport problems. Finally, based on the loss functions of different source domains, the target domain, and the optimal transportation losses from each source domain to the target domain, we can estimate the weights of each source domain, and meanwhile, the parameters of the feature extractor and classifier are also updated. All the learnable parameters and the computations of the three modules are updated alternatively. Extensive experimental results on four widely used 2D attack datasets and three recently published 3D attack datasets under both single- and multi-source domain adaptation settings (including both close-set and open-set) show the advantages of our proposed method for cross-scenario face anti-spoofing.
... It aims to minimize the distribution discrepancy between the source and target domain by leveraging the unlabeled target data. However, the target data is difficult to collect, or even unknown during training which limits the utilization of DA methods (Li et al., 2018a;Tu et al., 2019;Wang et al., 2019). (2) Domain Generalization (DG). ...
Although the generalization of face anti-spo-ofing (FAS) is increasingly concerned, it is still in the initial stage to solve it based on Vision Transformer (ViT). In this paper, we present a cross-domain FAS framework, dubbed the Transformer with dual Cross-Attention and semi-fixed Mixture-of-Expert (CA-MoEiT), for stimulating the generalization of Face Anti-Spoofing (FAS) from three aspects: (1) Feature augmentation. We insert a MixStyle after PatchEmbed layer to synthesize diverse patch embeddings from novel domains and enhance the generalizability of the trained model. (2) Feature alignment. We design a dual cross-attention mechanism which extends the self-attention to align the common representation from multiple domains. (3) Feature complement. We design a semi-fixed MoE (SFMoE) to selectively replace MLP by introducing a fixed super expert. Benefiting from the gate mechanism in SFMoE, professional experts are adaptively activated with independent learning domain-specific information, which is used as a supplement to domain-invariant features learned by the super expert to further improve the generalization. It is important that the above three technologies can be compatible with any variant of ViT as plug-and-play modules. Extensive experiments show that the proposed CA-MoEiT is effective and outperforms the state-of-the-art methods on several public datasets.
... (a) Domain alignment-based approaches In previous work, Li et al. (2018b) minimized the maximum mean discrepancy to align the distributions of the source and target domains. Then, Wang et al. (2019Wang et al. ( , 2020 introduced domain adversarial training for face anti-spoofing to learn domain-invariant features. Jia et al. (2021b) applied domain adversarial learning to align the marginal distribution between the source and target domains and minimize the distribution discrepancy. ...
Face anti-spoofing is a critical component of face recognition technology. However, it suffers from poor generalizability for cross-scenario target domains due to the simultaneous presence of unseen domains and unknown attack types. In this paper, we first propose a challenging but practical problem for face anti-spoofing, open-set single-domain generalization-based face anti-spoofing, aiming to learn face anti-spoofing models that generalize well to unseen target domains with known and unknown attack types based on a single source domain. To address this problem, we propose a novel unknown-aware causal generalized representation learning framework. Specifically, the proposed network consists of two modules: (1) causality-inspired intervention domain augmentation, which generates out-of-distribution images to eliminate spurious correlations between spoof-irrelevant variant factors and category labels for generalized causal feature learning; and (2) unknown-aware probability calibration, which performs known and unknown attack detection based on the original and generated images to further improve the generalizability for unknown attack types. The results of extensive qualitative and quantitative experiments demonstrate that the proposed method learns well-generalized features for both domain shift and unknown attack types based on a single source domain. Our method achieves state-of-the-art cross-scenario generalizability for both live faces and known attack types and unknown attack types.
... Although these methods have achieved promising results under intra-dataset scenarios, they neglect domain discrepancy across different domains and may encounter performance degradation when adapting to new domains. To mit- igate this problem of domain shift, recent studies introduce domain adaptation (DA) (Jia et al. 2021;Wang et al. 2019) and domain generalization (DG) (Zhou et al. 2021;Chen et al. 2021) into the field of PAD. As shown in Figure 1, DAbased methods focus on transferring performance on the unlabeled target domain, but they may affect the performance of the source domain. ...
... These works achieve promising results with intradata but neglect the domain gap across different domains. To achieve better generalized performance in the target data, DA (Jia et al. 2021;Li et al. 2018;Wang et al. 2019Wang et al. , 2020a and DG Jia et al. 2020;Liu et al. 2021a,b;Shao et al. 2019) are introduced into the PAD area. SDA (Wang et al. 2021a) designs a domain adaptor to utilize the unlabeled test domain data at inference. ...
Previous face Presentation Attack Detection (PAD) methods aim to improve the effectiveness of cross-domain tasks. However, in real-world scenarios, the original training data of the pre-trained model is not available due to data privacy or other reasons. Under these constraints, general methods for fine-tuning single-target domain data may lose previously learned knowledge, leading to a catastrophic forgetting problem. To address these issues, we propose a multi-domain incremental learning (MDIL) method for PAD, which not only learns knowledge well from the new domain but also maintains the performance of previous domains stably. Specifically, we propose an adaptive domain-specific experts (ADE) framework based on the vision transformer to preserve the discriminability of previous domains. Furthermore, an asymmetric classifier is designed to keep the output distribution of different classifiers consistent, thereby improving the generalization ability. Extensive experiments show that our proposed method achieves state-of-the-art performance compared to prior methods of incremental learning. Excitingly, under more stringent setting conditions, our method approximates or even outperforms the DA/DG-based methods.
... Recently, a variety of domain adaptation (DA) and do- main generalization (DG) based PAD methods [22,34,35,44,49] are proposed to boost the PAD generalizability. DAbased PAD methods [23,41] learn a discriminative feature space by accessing the labeled source domains and unlabeled target domains. However, this target data is typically unavailable in real-world scenarios. ...
... DA transfers the knowledge from the source domain to the target domain, where the unlabeled target data is accessed in the training process. DA-based face PAD methods align the feature space between source and target domains by minimizing the maximum mean discrepancy (MMD) [23] and adversarial training [41]. However, collecting unlabeled target domain data is very difficult and laborious. ...
... However, these solutions heavily deteriorate in practical sce-narios where unseen spoofing attacks appear. To solve this problem, a few generalizable methods are designed to mitigate the image distribution shift by performing domain adaptation Wang et al. 2019aWang et al. , 2020aShao et al. 2019;Jia et al. 2020). Meanwhile, meta-learning based ones are explored (Chen et al. 2021;Qin et al. 2022;Jia, Zhang, and Shan 2022). ...
Along with the widespread use of face recognition systems, their vulnerability has become highlighted. While existing face anti-spoofing methods can be generalized between attack types, generic solutions are still challenging due to the diversity of spoof characteristics. Recently, the spoof trace disentanglement framework has shown great potential for coping with both seen and unseen spoof scenarios, but the performance is largely restricted by the single-modal input. This paper focuses on this issue and presents a multi-modal disentanglement model which targetedly learns polysemantic spoof traces for more accurate and robust generic attack detection. In particular, based on the adversarial learning mechanism, a two-stream disentangling network is designed to estimate spoof patterns from the RGB and depth inputs, respectively. In this case, it captures complementary spoofing clues inhering in different attacks. Furthermore, a fusion module is exploited, which recalibrates both representations at multiple stages to promote the disentanglement in each individual modality. It then performs cross-modality aggregation to deliver a more comprehensive spoof trace representation for prediction. Extensive evaluations are conducted on multiple benchmarks, demonstrating that learning polysemantic spoof traces favorably contributes to anti-spoofing with more perceptible and interpretable results.
... While the majority of the works utilize cross-entropy loss for supervised learning, some methods use contrastive loss in Siamese networks [30] and triplet loss [31] to achieve the goal of increasing the inter-class distance between genuine and spoof faces in a dataset. There are also unsupervised learning-based implementations such as [32] where triplet loss is utilized for model training. Besides the supervised and unsupervised learning techniques, a self-supervised learning-based face PAD technique has also been presented in [33]. ...
The vulnerability of conventional face recognition systems to face presentation or face spoofing attacks has attracted a great deal of attention from information security, forensic, and biometric communities during the past few years. With the recent advancement and availability of cutting-edge computing technologies, sophisticated and computationally expensive solutions to many problems have been made possible. Accordingly, deep learning-based face presentation attack detection (PAD) methods have gained increasing popularity. In this research, we propose a supervised contrastive learning approach to tackle the face anti-spoofing problem. Essentially, the latent space encoding is achieved through an encoder network using the contrastive loss function infused with the class label information. The proposed robust encoding is followed by a simple classifier to distinguish between a real and a spoof face. To the best of our knowledge, this is the first work that uses fully supervised contrastive learning for the two-dimensional (2D) face PAD task. The performance of the proposed method is evaluated on several face anti-spoofing datasets and the results clearly show the efficacy of the proposed approach compared to other contemporary methods.
... In particular, the research community has amassed several large-scale FAS datasets with rich annotations [28][29][30][31]. Our work on multi-source domain processing shares similarities with domain adaptation (DA) methods [32][33][34][35][36][37][38][39] that require a retrained model to perform well on both source and target domain data. Specifically, Zhang et al. [37] intro-duced the concept of margin disparity discrepancy to characterize the differences between source and target domains, which has inspired our work. ...
Face anti-spoofing is critical for enhancing the robustness of face recognition systems against presentation attacks. Existing methods predominantly rely on binary classification tasks. Recently, methods based on domain generalization have yielded promising results. However, due to distribution discrepancies between various domains, the differences in the feature space related to the domain considerably hinder the generalization of features from unfamiliar domains. In this work, we propose a multi-domain feature alignment framework (MADG) that addresses poor generalization when multiple source domains are distributed in the scattered feature space. Specifically, an adversarial learning process is designed to narrow the differences between domains, achieving the effect of aligning the features of multiple sources, thus resulting in multi-domain alignment. Moreover, to further improve the effectiveness of our proposed framework, we incorporate multi-directional triplet loss to achieve a higher degree of separation in the feature space between fake and real faces. To evaluate the performance of our method, we conducted extensive experiments on several public datasets. The results demonstrate that our proposed approach outperforms current state-of-the-art methods, thereby validating its effectiveness in face anti-spoofing.
... By considering both temporal and spatial information and limiting a cross-entropy loss and a generalization loss, the author [131] has aided in learning generalized feature representations. To increase the discriminability of the learned feature space, the author combined learning a generalized feature space with a dual-force triplet mining constraint [77]. Finding a compact and generalized feature space for fake faces is challenging due to the high distribution disparities among fake faces in different domains. ...
Biometrics has been evolving as an exciting yet challenging area in the last decade. Though face recognition is one of the most promising biometrics techniques, it is vulnerable to spoofing threats. Many researchers focus on face liveness detection to protect biometric authentication systems from spoofing attacks with printed photos, video replays, etc. As a result, it is critical to investigate the current research concerning face liveness detection, to address whether recent advancements can give solutions to mitigate the rising challenges. This research performed a systematic review using the PRISMA approach by exploring the most relevant electronic databases. The article selection process follows preset inclusion and exclusion criteria. The conceptual analysis examines the data retrieved from the selected papers. To the author, this is one of the foremost systematic literature reviews dedicated to face-liveness detection that evaluates existing academic material published in the last decade. The research discusses face spoofing attacks, various feature extraction strategies, and Artificial Intelligence approaches in face liveness detection. Artificial intelligence-based methods, including Machine Learning and Deep Learning algorithms used for face liveness detection, have been discussed in the research. New research areas such as Explainable Artificial Intelligence, Federated Learning, Transfer learning, and Meta-Learning in face liveness detection, are also considered. A list of datasets, evaluation metrics, challenges, and future directions are discussed. Despite the recent and substantial achievements in this field, the challenges make the research in face liveness detection fascinating.
... The results presented in the table exhibit the robustness and versatility of the proposed technique in a cross-database testing scheme. The overall HTER results show that the proposed method achieves better mean HTER compared to the face PAD techniques presented in [67], [47], and [68] by 34.25%, 19.11%, and 20.08% respectively. ...
... For training on Replay-Attack and testing on CASIA-FASD, the proposed method outperforms [67] and [68] [47]. For training on Replay-Attack and testing on ROSE-Youtu, the proposed approach shows superior performance in terms of crossdatabase HTER and achieves 33.46%, 17.03%, and 20.21% gain as compared to [47], [67] and [68] respectively. ...
... For training on Replay-Attack and testing on CASIA-FASD, the proposed method outperforms [67] and [68] [47]. For training on Replay-Attack and testing on ROSE-Youtu, the proposed approach shows superior performance in terms of crossdatabase HTER and achieves 33.46%, 17.03%, and 20.21% gain as compared to [47], [67] and [68] respectively. For training on ROSE-Youtu and testing on Replay-Attack, the proposed method outperforms [67], [47], and [68] by an HTER margin of 75.63%, 78.27%, and 72.18% respectively. ...
Face presentation attack detection (PAD) is considered to be an essential and critical step in modern face recognition systems. Face PAD aims at exposing an imposter or an unauthorized person seeking to deceive the authentication system. Presentation attacks are typically made using a fake ID through a digital/printed photograph, video, paper mask, 3D mask, and make-up etc. In this research, we propose a novel face PAD solution using an interpolation-based image diffusion augmented by transfer learning of a MobileNet convolutional neural network. The proposed interpolation-based image diffusion method and face PAD approach, implemented in a single framework, shows promising results on various anti-spoofing databases. The experimental results illustrate that the proposed face PAD method shows superior performance compared to most of the state-of-the-art methods.