A preview of this full-text is provided by Springer Nature.
Content available from International Journal of Computer Vision
This content is subject to copyright. Terms and conditions apply.
International Journal of Computer Vision (2024) 132:5439–5452
https://doi.org/10.1007/s11263-024-02135-2
CA-MoEiT: Generalizable Face Anti-spoofing via Dual Cross-Attention
and Semi-fixed Mixture-of-Expert
Ajian Liu1
Received: 31 July 2023 / Accepted: 30 May 2024 / Published online: 15 June 2024
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024
Abstract
Although the generalization of face anti-spo-ofing (FAS) is increasingly concerned, it is still in the initial stage to solve it based
on Vision Transformer (ViT). In this paper, we present a cross-domain FAS framework, dubbed the Transformer with dual
Cross-Attention and semi-fixed Mixture-of-Expert (CA-MoEiT), for stimulating the generalization of Face Anti-Spoofing
(FAS) from three aspects: (1) Feature augmentation. We insert a MixStyle after PatchEmbed layer to synthesize diverse
patch embeddings from novel domains and enhance the generalizability of the trained model. (2) Feature alignment. We
design a dual cross-attention mechanism which extends the self-attention to align the common representation from multiple
domains. (3) Feature complement. We design a semi-fixed MoE (SFMoE) to selectively replace MLP by introducing a fixed
super expert. Benefiting from the gate mechanism in SFMoE, professional experts are adaptively activated with independent
learning domain-specific information, which is used as a supplement to domain-invariant features learned by the super expert
to further improve the generalization. It is important that the above three technologies can be compatible with any variant of
ViT as plug-and-play modules. Extensive experiments show that the proposed CA-MoEiT is effective and outperforms the
state-of-the-art methods on several public datasets.
Keywords Face anti-spoofing ·Domain generalization ·Vision transformer ·Mixture-of-experts
1 Introduction
Face Anti-Spoofing (FAS) plays a vital role in protecting face
recognition systems from malicious Presentation Attacks
(PAs), ranging from print-attack (Zhang et al., 2012), replay-
attack (Chingovska et al., 2012) and mask-attack (Erdogmus
& Marcel, 2014). Despite the existing methods (Yang et al.,
2014; Patel et al., 2016; Liu et al., 2018; George & Marcel,
2019; Yu et al., 2020c; Zhang et al., 2020a; Liu et al., 2020;
Yu et al., 2020b) obtain remarkable performance in intra-
dataset experiments where training and testing data are from
the same domain, on the other hand, is an unsolved challenge
to the cross-dataset experiments due to large distribution dis-
crepancies among different domains.
Communicated by Segio Escalera.
BAjian Liu
ajianliu92@gmail.com
1The State Key Laboratory of Multimodal Artificial
Intelligence Systems (MAIS), Institute of Automation,
Chinese Academy of Sciences (CASIA), Beijing, China
There are two schemes to improve the generalization
of Presentation Attack Detection (PAD) technology: (1)
Domain Adaptation (DA). It aims to minimize the distri-
bution discrepancy between the source and target domain by
leveraging the unlabeled target data. However, the target data
is difficult to collect, or even unknown during training which
limits the utilization of DA methods (Li et al., 2018a;Tu
et al., 2019; Wang et al., 2019). (2) Domain Generalization
(DG). It can conquer this by taking the advantage of multiple
source domains without seeing any target data.
A straightforward strategy is to collect diverse source data
from multiple relevant domains to train a model with more
domain-invariant and generalizable representations. Some
methods (Menon, 2019; Yang et al., 2021) directly use aug-
mentation from the data level to improve the diversity of
training data. While (Wang et al., 2022) and (Huang et al.,
2022) suggest that more efficient data can be obtained by
extending augmentation from the image level to multiple fea-
ture levels.
In addition to implicitly synthesizing samples from the
novel domains, some methods (Shao et al., 2019; Saha et al.,
2020; Jia et al., 2020; Kim & Kim, 2021; Wang et al., 2022)
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.