ArticlePublisher preview available

Open-Set Single-Domain Generalization for Robust Face Anti-Spoofing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Face anti-spoofing is a critical component of face recognition technology. However, it suffers from poor generalizability for cross-scenario target domains due to the simultaneous presence of unseen domains and unknown attack types. In this paper, we first propose a challenging but practical problem for face anti-spoofing, open-set single-domain generalization-based face anti-spoofing, aiming to learn face anti-spoofing models that generalize well to unseen target domains with known and unknown attack types based on a single source domain. To address this problem, we propose a novel unknown-aware causal generalized representation learning framework. Specifically, the proposed network consists of two modules: (1) causality-inspired intervention domain augmentation, which generates out-of-distribution images to eliminate spurious correlations between spoof-irrelevant variant factors and category labels for generalized causal feature learning; and (2) unknown-aware probability calibration, which performs known and unknown attack detection based on the original and generated images to further improve the generalizability for unknown attack types. The results of extensive qualitative and quantitative experiments demonstrate that the proposed method learns well-generalized features for both domain shift and unknown attack types based on a single source domain. Our method achieves state-of-the-art cross-scenario generalizability for both live faces and known attack types and unknown attack types.
This content is subject to copyright. Terms and conditions apply.
International Journal of Computer Vision (2024) 132:5151–5172
https://doi.org/10.1007/s11263-024-02129-0
Open-Set Single-Domain Generalization for Robust Face Anti-Spoofing
Fangling Jiang1·Qi Li2,3 ·Weining Wang4·Min Ren5·Wei Shen6·Bing Liu1·Zhenan Sun2,3
Received: 14 September 2023 / Accepted: 17 May 2024 / Published online: 3 June 2024
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024
Abstract
Face anti-spoofing is a critical component of face recognition technology. However, it suffers from poor generalizability for
cross-scenario target domains due to the simultaneous presence of unseen domains and unknown attack types. In this paper,
we first propose a challenging but practical problem for face anti-spoofing, open-set single-domain generalization-based
face anti-spoofing, aiming to learn face anti-spoofing models that generalize well to unseen target domains with known and
unknown attack types based on a single source domain. To address this problem, we propose a novel unknown-aware causal
generalized representation learning framework. Specifically, the proposed network consists of two modules: (1) causality-
inspired intervention domain augmentation, which generates out-of-distribution images to eliminate spurious correlations
between spoof-irrelevant variant factors and category labels for generalized causal feature learning; and (2) unknown-aware
probability calibration, which performs known and unknown attack detection based on the original and generated images to
further improve the generalizability for unknown attack types. The results of extensive qualitative and quantitative experiments
demonstrate that the proposed method learns well-generalized features for both domain shift and unknown attack types based
on a single source domain. Our method achieves state-of-the-art cross-scenario generalizability for both live faces and known
attack types and unknown attack types.
Keywords Face anti-spoofing ·Generalized feature learning ·Unknown-aware face presentation attack detection
Communicated by Segio Escalera.
Fangling Jiang and Qi Li contributed equally to this study.
BQi Li
qli@nlpr.ia.ac.cn
Fangling Jiang
jiangfangling66@gmail.com
Weining Wang
weining.wang@nlpr.ia.ac.cn
Min Ren
renmin@bnu.edu.cn
Wei S he n
shenwei12@oppo.com
Bing Liu
bingliu@usc.edu.cn
Zhenan Sun
znsun@nlpr.ia.ac.cn
1School of Computer Science, University of South China,
Hengyang, China
2New Laboratory of Pattern Recognition (NLPR), MAIS,
CASIA, Beijing, China
1 Introduction
In recent years, face recognition technology has been widely
used in daily life due to its advantages, such as its non-
contact nature, convenience, naturalness, and high accuracy.
Nevertheless, with the rapid development of modern tech-
niques, face recognition technology is hindered by significant
security threats, such as face presentation attacks. Face recog-
nition systems can be attacked with a variety of high-quality
spoof faces, such as printed photos, displayed videos, and
masks, in attempts to pass system verification and imper-
sonate target legal users. To protect the security of face
recognition systems, face anti-spoofing, which aims to dis-
tinguish face presentation attacks from live access attempts
3School of Artificial Intelligence, UCAS, Beijing, China
4The Laboratory of Cognition and Decision Intelligence for
Complex Systems, CASIA, Beijing, China
5School of Artificial Intelligence, Beijing Normal University,
Beijing, China
6OPPO Research Institute, Beijing, China
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Article
Full-text available
Face presentation attack detection (PAD) plays a pivotal role in securing face recognition systems against spoofing attacks. Although great progress has been made in designing face PAD methods, developing a model that can generalize well to unseen test domains remains a significant challenge. Moreover, due to the different types of spoofing attacks, creating a dataset with a sufficient number of samples for training deep neural networks is a laborious task. This work proposes a comprehensive solution that combines synthetic data generation and deep ensemble learning to enhance the generalization capabilities of face PAD. Specifically, synthetic data is generated by blending a static image with spatiotemporal-encoded images using alpha composition and video distillation. In this way, we simulate motion blur with varying alpha values, thereby generating diverse subsets of synthetic data that contribute to a more enriched training set. Furthermore, multiple base models are trained on each subset of synthetic data using stacked ensemble learning. This allows the models to learn complementary features and representations from different synthetic subsets. The meta-features generated by the base models are used as input for a new model called the meta-model. The latter combines the predictions from the base models, leveraging their complementary information to better handle unseen target domains and enhance overall performance. Experimental results from seven datasets—WMCA, CASIA-SURF, OULU-NPU, CASIA-MFSD, Replay-Attack, MSU-MFSD, and SiW-Mv2—highlight the potential to enhance presentation attack detection by using large-scale synthetic data and a stacking-based ensemble approach.
Article
Full-text available
Face anti-spoofing has been widely exploited in recent years to ensure security in face recognition systems; however, this technology suffers from poor generalization performance on unseen samples. Most previous methods align the marginal distributions from multiple source domains to learn domain-invariant features to mitigate domain shift. However, the category information of samples from different domains is ignored during these marginal distribution alignments; this can potentially lead to features of one category from one domain being misaligned to those of different categories from other domains, although the marginal distributions across domains are well aligned from the whole point of view. In this paper, we propose a simple but effective conditional domain adversarial framework whose main goal is to align the conditional distributions across domains to learn domain-invariant conditional features. Specifically, we first construct a parallel domain structure and its corresponding regularization to reduce negative influences from the finite samples and diversity of spoof face images on the conditional distribution alignments. Then, based on the parallel domain structure, a feature extractor and a global domain classifier, which play a conditional domain adversarial game, are leveraged to make the features of the same category across different domains indistinguishable. Moreover, intra-domain and cross-domain discrimination regularization are further exploited in conjunction with conditional domain adversarial training to minimize the classification error of class predictors. Extensive qualitative and quantitative experiments demonstrate that the proposed method learns well-generalized features from fewer source domains and achieves state-of-the-art performance on six public datasets.
Article
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications ( i.e ., phone unlocking) while lacking consideration of long-distance scenes ( i.e ., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Su rveillance Hi gh- Fi delity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
Article
Current domain adaptation methods for face anti-spoofing leverage labeled source domain data and unlabeled target domain data to obtain a promising generalizable decision boundary. However, it is usually difficult for these methods to achieve a perfect domain-invariant liveness feature disentanglement, which may degrade the final classification performance by domain differences in illumination, face category, spoof type, etc. In this work, we tackle cross-scenario face anti-spoofing by proposing a novel domain adaptation method called cyclically disentangled feature translation network (CDFTN). Specifically, CDFTN generates pseudo-labeled samples that possess: 1) source domain-invariant liveness features and 2) target domain-specific content features, which are disentangled through domain adversarial training. A robust classifier is trained based on the synthetic pseudo-labeled images under the supervision of source domain labels. We further extend CDFTN for multi-target domain adaptation by leveraging data from more unlabeled target domains. Extensive experiments on several public datasets demonstrate that our proposed approach significantly outperforms the state of the art. Code and models are available at https://github.com/vis-face/CDFTN.
Article
The paper studies face spoofing, a.k.a. presentation attack detection (PAD) in the demanding scenarios of unknown attacks. While earlier studies have revealed the benefits of ensemble methods, and in particular, a multiple kernel learning (MKL) approach to the problem, one limitation of such techniques is that they treat the entire observation space similarly and ignore any variability and local structure inherent to the data. This work studies this aspect of face presentation attack detection with regards to one-class multiple kernel learning to benefit from the intrinsic local structure in bona fide samples to adaptively weight each representation in the composite kernel. More concretely, drawing on the one-class Fisher null formalism, we formulate a convex localised multiple kernel learning algorithm by regularising the collection of local kernel weights via a joint matrix-norm constraint and infer locally adaptive kernel weights for zero-shot one-class unseen attack detection. We present a theoretical study of the proposed localised MKL algorithm using Rademacher complexities to characterise its generalisation capability and demonstrate its advantages over some other options. An assessment of the proposed approach on general object image datasets illustrates its efficacy for anomaly and novelty detection while the results of the experiments on face PAD datasets verify its potential in detecting unknown/unseen face presentation attacks.
Article
Deep learning models usually suffer from the domain shift issue, where models trained on one source domain do not generalize well to other unseen domains. In this work, we investigate the single-source domain generalization problem: training a deep network that is robust to unseen domains, under the condition that training data are only available from one source domain, which is common in medical imaging applications. We tackle this problem in the context of cross-domain medical image segmentation. In this scenario, domain shifts are mainly caused by different acquisition processes. We propose a simple causality-inspired data augmentation approach to expose a segmentation model to synthesized domain-shifted training examples. Specifically, 1) to make the deep model robust to discrepancies in image intensities and textures, we employ a family of randomly-weighted shallow networks. They augment training images using diverse appearance transformations. 2) Further we show that spurious correlations among objects in an image are detrimental to domain robustness. These correlations might be taken by the network as domain-specific clues for making predictions, and they may break on unseen domains. We remove these spurious correlations via causal intervention. This is achieved by resampling the appearances of potentially correlated objects independently. The proposed approach is validated on three cross-domain segmentation scenarios: cross-modality (CT-MRI) abdominal image segmentation, cross-sequence (bSSFP-LGE) cardiac MRI segmentation, and cross-site prostate MRI segmentation. The proposed approach yields consistent performance gains compared with competitive methods when tested on unseen domains.
Chapter
In this work, we study multi-domain learning for face anti-spoofing (MD-FAS), where a pre-trained FAS model needs to be updated to perform equally well on both source and target domains while only using target domain data for updating. We present a new model for MD-FAS, which addresses the forgetting issue when learning new domain data, while possessing a high level of adaptability. First, we devise a simple yet effective module, called spoof region estimator (SRE), to identify spoof traces in the spoof image. Such spoof traces reflect the source pre-trained model’s responses that help upgraded models combat catastrophic forgetting during updating. Unlike prior works that estimate spoof traces which generate multiple outputs or a low-resolution binary mask, SRE produces one single, detailed pixel-wise estimate in an unsupervised manner. Secondly, we propose a novel framework, named FAS-wrapper, which transfers knowledge from the pre-trained models and seamlessly integrates with different FAS models. Lastly, to help the community further advance MD-FAS, we construct a new benchmark based on SIW, SIW-Mv2 and Oulu-NPU, and introduce four distinct protocols for evaluation, where source and target domains are different in terms of spoof type, age, ethnicity, and illumination. Our proposed method achieves superior performance on the MD-FAS benchmark than previous methods. Our code is available at https://github.com/CHELSEA234/Multi-domain-learning-FAS.