Article

Learning Polysemantic Spoof Trace: A Multi-Modal Disentanglement Network for Face Anti-spoofing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Along with the widespread use of face recognition systems, their vulnerability has become highlighted. While existing face anti-spoofing methods can be generalized between attack types, generic solutions are still challenging due to the diversity of spoof characteristics. Recently, the spoof trace disentanglement framework has shown great potential for coping with both seen and unseen spoof scenarios, but the performance is largely restricted by the single-modal input. This paper focuses on this issue and presents a multi-modal disentanglement model which targetedly learns polysemantic spoof traces for more accurate and robust generic attack detection. In particular, based on the adversarial learning mechanism, a two-stream disentangling network is designed to estimate spoof patterns from the RGB and depth inputs, respectively. In this case, it captures complementary spoofing clues inhering in different attacks. Furthermore, a fusion module is exploited, which recalibrates both representations at multiple stages to promote the disentanglement in each individual modality. It then performs cross-modality aggregation to deliver a more comprehensive spoof trace representation for prediction. Extensive evaluations are conducted on multiple benchmarks, demonstrating that learning polysemantic spoof traces favorably contributes to anti-spoofing with more perceptible and interpretable results.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Domain Adaption Methods. Domain adaption in FAS has been studied by many researchers in recent years [31,51,29,74,37,19,20,35,57,32,73,48]. Wang et al. [54] propose a crossdomain face PA detection method using disentangled representation learning and multi-domain learning. ...
Preprint
Full-text available
Face anti-spoofing (FAS) plays an important role in protecting face recognition systems. State-of-the-art works attempt to employ multi-scale contextual information contained in feature maps to raise FAS performance. These multi-scale based methods 1) fuse feature pyramids at different levels in networks, or 2) merge features extracted from multiple face crops, or 3) combine a number of hand-crafted features as multi-scale features. Previous works are awkward in leverage multi-scale benefits, leading to limited performance improvement. Here in this paper, we propose a new multi-scale based approach for FAS. Different from state-of-the-art methods, we design a multi-scale module, which can extract multi-scale feature representations at different scales, and embed it in certain positions in FAS networks. In our self-designed multi-scale module, we employ atrous convolution as well as average pooling followed by upsampling, both of which can extract multi-scale features. We conduct extensive experiments on four benchmark databases, i.e., OULU-NPU, SiW, CASIA-MFSD and Replay-Attack. Results show that our proposed multi-scale based method: 1) not only achieves state-of-the-art performance on intra database testing (especially 2.5% ACER in protocol 4 of the OULU-NPU database and 0.65% ACER in protocol 3 of the SiW database), 2) but also generalizes well on cross database testing (particularly 19.6% HTER from Replay-Attack to CASIA-MFSD databases).
Article
Full-text available
Face presentation attack detection (PAD) is essential for securing the widely used face recognition systems. Most of the existing PAD methods do not generalize well to unseen scenarios because labeled training data of the new domain is usually not available. In light of this, we propose an unsupervised domain adaptation with disentangled representation (DR-UDA) approach to improve the generalization capability of PAD into new scenarios. DR-UDA consists of three modules, i.e., ML-Net, UDA-Net and DR-Net. ML-Net aims to learn a discriminative feature representation using the labeled source domain face images via metric learning. UDA-Net performs unsupervised adversarial domain adaptation in order to optimize the source domain and target domain encoders jointly, and obtain a common feature space shared by both domains. As a result, the source domain PAD model can be effectively transferred to the unlabeled target domain for PAD. DR-Net further disentangles the features irrelevant to specific domains by reconstructing the source and target domain face images from the common feature space. Therefore, DR-UDA can learn a disentangled representation space which is generative for face images in both domains and discriminative for live vs. spoof classification. The proposed approach shows promising generalization capability in several public-domain face PAD databases.
Conference Paper
Full-text available
Face anti-spoofing is critical to the security of face recognition systems. Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing. Despite the great success, most previous works still formulate the problem as a single-frame multi-task one by simply augmenting the loss with depth, while neglecting the detailed fine-grained information and the interplay between facial depths and moving patterns. In contrast , we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. The proposed method is able to capture discriminative details via Residual Spatial Gradient Block (RSGB) and encode spatio-temporal information from Spatio-Temporal Propagation Module (STPM) efficiently. Moreover, a novel Contrastive Depth Loss is presented for more accurate depth supervision. To assess the efficacy of our method, we also collect a Double-modal Anti-spoofing Dataset (DMAD) which provides actual depth for each sample. The experiments demonstrate that the proposed approach achieves state-of-the-art results on five benchmark datasets including OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, and the new DMAD. Codes will be available at https://github.com/clks-wzz/ FAS-SGTD.
Conference Paper
Full-text available
Face anti-spoofing detection is a crucial procedure in biometric face recognition systems. State-of-the-art approaches, based on Convolutional Neural Networks (CNNs), present good results in this field. However, previous works focus on one single modal data with limited number of subjects. The recently published CASIA-SURF dataset is the largest dataset that consists of 1000 subjects and 21000 video clips with 3 modalities (RGB, Depth and IR). In this paper, we propose a multi-stream CNN architecture called FaceBagNet to make full use of this data. The input of FaceBagNet is patch-level images which contributes to extract spoof-specific discriminative information. In addition , in order to prevent overfitting and for better learning the fusion features, we design a Modal Feature Erasing (MFE) operation on the multi-modal features which erases features from one randomly selected modality during training. As the result, our approach wins the second place in CVPR 2019 ChaLearn Face Anti-spoofing attack detection challenge. Our final submission gets the score of 99.8052% (TPR@FPR = 10e-4) on the test set.
Conference Paper
Full-text available
Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects (≤ 170) and modalities (≤ 2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and visual modalities. Specifically, it consists of 1, 000 subjects with 21, 000 videos and each sample has 3 modalities (i.e., RGB, Depth and IR). We also provide a measurement set, evaluation protocol and training/validation/testing subsets, developing a new benchmark for face anti-spoofing. Moreover , we present a new multi-modal fusion method as base-line, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modal. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at https://sites.google.com/qq. com/chalearnfacespoofingattackdete/.
Conference Paper
Full-text available
Face recognition (FR) is being widely used in many applications from access control to smartphone unlock. As a result, face presentation attack detection (PAD) has drawn increasing attentions to secure the FR systems. Traditional approaches for PAD mainly assume that training and testing scenarios are similar in imaging conditions (illu-mination, scene, camera sensor, etc.), and thus may lack good generalization capability into new application scenarios. In this work, we propose an end-to-end learning approach to improve PAD generalization capability by utilizing prior knowledge from source domain via adversarial domain adaptation. We first build a source domain PAD model optimized with triplet loss. Subsequently, we perform adversarial domain adaptation w.r.t. the target domain to learn a shared embedding space by both the source and target domain models, in which the discriminator cannot reliably predict whether a sample is from the source or target domain. Finally, PAD in the target domain is performed with k-nearest neighbors (k-NN) classifier in the embedding space. The proposed approach shows promising generalization capability in a number of public-domain face PAD databases.
Conference Paper
Full-text available
Face recognition has evolved as a prominent biometric authentication modality. However, vulnerability to presentation attacks curtails its reliable deployment. Automatic detection of presentation attacks is essential for secure use of face recognition technology in unattended scenarios. In this work, we introduce a Convolutional Neural Network (CNN) based framework for presentation attack detection, with deep pixel-wise supervision. The framework uses only frame level information making it suitable for deployment in smart devices with minimal computational and time overhead. We demonstrate the effectiveness of the proposed approach in public datasets for both intra as well as cross-dataset experiments. The proposed approach achieves an HTER of 0% in Replay Mobile dataset and an ACER of 0.42% in Protocol-1 of OULU dataset outperforming state of the art methods.
Conference Paper
Full-text available
Face presentation attack detection (PAD) has drawn increasing attentions to secure face recognition (FR) systems which are being widely used in many applications from access control to smartphone unlock. Traditional approaches for PAD may lack good generalization capability into new application scenarios due to the limited number of subjects and data modality. In this work, we propose an end-to-end multi-modal fusion approach via spatial and channel attention to improve PAD performance on CASIA-SURF. Specifically , we first build four branches integrated with spatial and channel attention module to obtain the uniform features of different modalities, i.e., RGB, Depth, IR and the fused modality with 9 channels which concatenating three modalities. Subsequently, the features extracted from the four branches are concatenated and fed into the shared layers to learn more discriminative features from the fusion perspective. Finally, we get the classification confidence scores w.r.t. PAD or not. The entire network is optimized with the joint of the center loss and softmax loss and SGRD solver to update the parameters. The proposed approach shows promising results on the CASIA-SURF dataset.
Article
Full-text available
Face anti-spoofing is the crucial step to prevent face recognition systems from a security breach. Previous deep learning approaches formulate face anti-spoofing as a binary classification problem. Many of them struggle to grasp adequate spoofing cues and generalize poorly. In this paper, we argue the importance of auxiliary supervision to guide the learning toward discriminative and generalizable cues. A CNN-RNN model is learned to estimate the face depth with pixel-wise supervision, and to estimate rPPG signals with sequence-wise supervision. Then we fuse the estimated depth and rPPG to distinguish live vs. spoof faces. In addition, we introduce a new face anti-spoofing database that covers a large range of illumination, subject, and pose variations. Experimental results show that our model achieves the state-of-the-art performance on both intra-database and cross-database testing.
Conference Paper
Full-text available
The face image is the most accessible biometric modality which is used for highly accurate face recognition systems, while it is vulnerable to many different types of presentation attacks. Face anti-spoofing is a very critical step before feeding the face image to biometric systems. In this paper , we propose a novel two-stream CNN-based approach for face anti-spoofing, by extracting the local features and holistic depth maps from the face images. The local features facilitate CNN to discriminate the spoof patches independent of the spatial face areas. On the other hand, holistic depth map examine whether the input image has a face-like depth. Extensive experiments are conducted on the challenging databases (CASIA-FASD, MSU-USSA, and Replay Attack), with comparison to the state of the art.
Article
Full-text available
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Article
Full-text available
Research on non-intrusive software-based face spoofing detection schemes has been mainly focused on the analysis of the luminance information of the face images, hence discarding the chroma component, which can be very useful for discriminating fake faces from genuine ones. This paper introduces a novel and appealing approach for detecting face spoofing using a colour texture analysis. We exploit the joint colour-texture information from the luminance and the chrominance channels by extracting complementary low-level feature descriptions from different colour spaces. More specifically, the feature histograms are computed over each image band separately. Extensive experiments on the three most challenging benchmark data sets, namely, the CASIA face anti-spoofing database, the replay-attack database, and the MSU mobile face spoof database, showed excellent results compared with the state of the art. More importantly, unlike most of the methods proposed in the literature, our proposed approach is able to achieve stable performance across all the three benchmark data sets. The promising results of our cross-database evaluation suggest that the facial colour texture representation is more stable in unknown conditions compared with its gray-scale counterparts.
Conference Paper
Full-text available
Spoofing with photograph or video is one of the most common manner to circumvent a face recognition system. In this paper, we present a real-time and non-intrusive method to address this based on individual images from a generic webcamera. The task is formulated as a binary classification problem, in which, however, the distribution of positive and negative are largely overlapping in the input space, and a suitable representation space is hence of importance. Using the Lambertian model, we propose two strategies to extract the essential information about different surface properties of a live human face or a photograph, in terms of latent samples. Based on these, we develop two new extensions to the sparse logistic regression model which allow quick and accurate spoof detection. Primary experiments on a large photo imposter database show that the proposed method gives preferable detection performance compared to others.
Article
Face anti-spoofing approach based on domain generalization (DG) has drawn growing attention due to its robustness for unseen scenarios. Existing DG methods assume that the domain label is known. However, in real-world applications, the collected dataset always contains mixture domains, where the domain label is unknown. In this case, most of existing methods may not work. Further, even if we can obtain the domain label as existing methods, we think this is just a sub-optimal partition. To overcome the limitation, we propose domain dynamic adjustment meta-learning (D2^2AM) without using domain labels, which iteratively divides mixture domains via discriminative domain representation and trains a generalizable face anti-spoofing with meta-learning. Specifically, we design a domain feature based on Instance Normalization (IN) and propose a domain representation learning module (DRLM) to extract discriminative domain features for clustering. Moreover, to reduce the side effect of outliers on clustering performance, we additionally utilize maximum mean discrepancy (MMD) to align the distribution of sample features to a prior distribution, which improves the reliability of clustering. Extensive experiments show that the proposed method outperforms conventional DG-based face anti-spoofing methods, including those utilizing domain labels. Furthermore, we enhance the interpretability through visualization.
Article
Local features contain crucial clues for face anti-spoofing. Convolutional neural networks (CNNs) are powerful in extracting local features, but the intrinsic inductive bias of CNNs limits the ability to capture long-range dependencies. This paper aims to develop a simple yet effective framework that is versatile in extracting both local information and long-range dependencies for face anti-spoofing. To this end, we propose a novel architecture, namely Conv-MLP, which incorporates local patch convolution with global multi-layer perceptrons (MLP). Conv-MLP breaks the inductive bias limitation of traditional full CNNs and can be expected to better exploit long-range dependencies. Furthermore, we design a new loss specifically for the face anti-spoofing task, namely moat loss. The moat loss benefits discriminative representations learning and can improve the generalization capability on unseen presentation attacks. In this work, multi-modal data are directly fused at the signal level to extract complementary features. Extensive experiments on single and multi-modal datasets demonstrate that Conv-MLP outperforms existing state-of-the-art methods while being more computationally efficient. The code is available at https://github.com/WeihangWANG/Conv-MLP .
Article
Face anti-spoofing (FAS) is essential for securing face recognition systems. Despite the decent performance, few existing works fully leverage temporal information. This would inevitably lead to inferior performance because real and fake faces tend to share highly similar spatial appearances, while important temporal features between consecutive frames are neglected. In this work, we propose a temporal transformer network (TTN) to learn multi-granularity temporal characteristics for FAS. It mainly consists of temporal difference attentions (TDA), a pyramid temporal aggregation (PTA), and a temporal depth difference loss (TDL). Firstly, the vision transformer (ViT) is used as the backbone where comprehensive local patches are utilized to provide subtle differences between live and spoof faces. Then, instead of learning temporal features on global faces which may miss some important local cues, the TDA is developed to extract motion-sensitive cues on each of the comprehensive local patches. Moreover, the TDA is inserted into different layers of the ViT, learning multi-scale motion-sensitive local cues to improve the FAS performance. Secondly, it is observed that different subjects may have different visual tempos in some actions, making it necessary to model different temporal speeds. Our PTA aggregates temporal features at various tempos, which could build short-range and long-range relations among multiple frames. Thirdly, depth maps for real parts may change continuously, while they remain zeros for spoof regions. In order to locate motion features on facial parts, the TDL is proposed to guide the network to locate spoof facial parts where motion patterns between neighboring frames are set as the ground truth. To the best of our knowledge, this work is the first attempt to learn temporal characteristics via transformers. Both qualitative and quantitative results on several challenging tasks demonstrate the usefulness and effectiveness of our proposed methods.
Article
Existing face anti-spoofing (FAS) methods fail to generalize well to unseen domains with different data distribution from the training domains, due to the distribution discrepancies between various domains. To extract domain-invariant features for unseen domains, this work proposes a Dual-Branch Meta-learning Network (DBMNet) with distribution alignment for face anti-spoofing. Specifically, DBMNet consists of a feature embedding (FE) branch and a depth estimating (DE) branch for real and fake face discrimination. Each branch acts as a meta-learner and is optimized by step-adjusted meta-learning that can adaptively select the best number of meta-train steps. In order to mitigate distribution discrepancies between domains, we introduce two distribution alignment losses to directly regularize the two meta-learners, i.e. , the triplet loss for FE branch and the depth loss for DE branch, respectively. Both of them are designed as part of the meta-train and meta-test objectives, which contribute to higher-order derivatives on the parameters during the meta-optimization for further seeking domain-invariant features. Extensive ablation studies and comparisons with the state-of-the-art methods show the effectiveness of our method for better generalization.
Article
Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks. Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance, which limits the generalization ability of FAS model. In this paper, we propose Dual Spoof Disentanglement Generation (DSDG) framework to tackle this challenge by “anti-spoofing via generation”. Depending on the interpretable factorized latent disentanglement in Variational Autoencoder (VAE), DSDG learns a joint distribution of the identity representation and the spoofing pattern representation in the latent space. Then, large-scale paired live and spoofing images can be generated from random noise to boost the diversity of the training set. However, some generated face images are partially distorted due to the inherent defect of VAE. Such noisy samples are hard to predict precise depth values, thus may obstruct the widely-used depth supervised optimization. To tackle this issue, we further introduce a lightweight Depth Uncertainty Module (DUM), which alleviates the adverse effects of noisy samples by depth uncertainty learning. DUM is developed without extra-dependency, thus can be flexibly integrated with any depth supervised network for face anti-spoofing. We evaluate the effectiveness of the proposed method on five popular benchmarks and achieve state-of-the-art results under both intra- and inter- test settings. The codes are available at https://github.com/JDAI-CV/FaceX-Zoo/tree/main/addition_module/DSDG .
Article
Face presentation attack detection (PAD) has become a key component in face-based application systems. Typical face de-spoofing algorithms estimate the noise pattern of a spoof image to detect presentation attacks. These algorithms are device-independent and have good generalization ability. However, the noise modeling is not very effective because there is no ground truth (GT) with identity information for training the noise modeling network. To address this issue, we propose using the bona fide image of the corresponding subject in the training set as a type of GT called appr-GT with the identity information of the spoof image. A metric learning module is proposed to constrain the generated bona fide images from the spoof images so that they are near the appr-GT and far from the input images. This can reduce the influence of imaging environment differences between the appr-GT and GT of a spoof image. Extensive experimental results demonstrate that the reconstructed bona fide image and noise with high discriminative quality can be clearly separated from a spoof image. The proposed algorithm achieves competitive performance.
Chapter
Prior studies show that the key to face anti-spoofing lies in the subtle image pattern, termed “spoof trace", e.g., color distortion, 3D mask edge, Moiré pattern, and many others. Designing a generic anti-spoofing model to estimate those spoof traces can improve both generalization and interpretability. Yet, this is a challenging task due to the diversity of spoof types and the lack of ground truth. This work designs a novel adversarial learning framework to disentangle the spoof traces from input faces as a hierarchical combination of patterns. With the disentangled spoof traces, we unveil the live counterpart from spoof face, and synthesize realistic new spoof faces after a proper geometric correction. Our method demonstrates superior spoof detection performance on both seen and unseen spoof scenarios while providing visually-convincing estimation of spoof traces. Code is available at https://github.com/yaojieliu/ECCV20-STDN.
Chapter
Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion to obtain better feature representations to achieve more accurate segmentation. This, however, may not lead to satisfactory results as actual depth data are generally noisy, which might worsen the accuracy as the networks go deeper.
Chapter
Face anti-spoofing is crucial to security of face recognition systems. Previous approaches focus on developing discriminative models based on the features extracted from images, which may be still entangled between spoof patterns and real persons. In this paper, motivated by the disentangled representation learning, we propose a novel perspective of face anti-spoofing that disentangles the liveness features and content features from images, and the liveness features is further used for classification. We also put forward a Convolutional Neural Network (CNN) architecture with the process of disentanglement and combination of low-level and high-level supervision to improve the generalization capabilities. We evaluate our method on public benchmark datasets and extensive experimental results demonstrate the effectiveness of our method against the state-of-the-art competitors. Finally, we further visualize some results to help understand the effect and advantage of disentanglement.
Chapter
Face anti-spoofing (FAS) plays a vital role in securing the face recognition systems from presentation attacks. Most existing FAS methods capture various cues (e.g., texture, depth and reflection) to distinguish the live faces from the spoofing faces. All these cues are based on the discrepancy among physical materials (e.g., skin, glass, paper and silicone). In this paper we rephrase face anti-spoofing as a material recognition problem and combine it with classical human material perception, intending to extract discriminative and robust features for FAS. To this end, we propose the Bilateral Convolutional Networks (BCN), which is able to capture intrinsic material-based patterns via aggregating multi-level bilateral macro- and micro- information. Furthermore, Multi-level Feature Refinement Module (MFRM) and multi-head supervision are utilized to learn more robust features. Comprehensive experiments are performed on six benchmark datasets, and the proposed method achieves superior performance on both intra- and cross-dataset testings. One highlight is that we achieve overall 11.3 ± 9.5% EER for cross-type testing in SiW-M dataset, which significantly outperforms previous results. We hope this work will facilitate future cooperation between FAS and material communities.
Article
Face recognition has evolved as a widely used biometric modality. However, its vulnerability against presentation attacks poses a significant security threat. Though presentation attack detection (PAD) methods try to address this issue, they often fail in generalizing to unseen attacks. In this work, we propose a new framework for PAD using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network ( MCCNN ). A novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks. A one-class Gaussian Mixture Model is used on top of these embeddings for the PAD task. The proposed framework introduces a novel approach to learn a robust PAD system from bonafide and available (known) attack classes. This is particularly important as collecting bonafide data and simpler attacks are much easier than collecting a wide variety of expensive attacks. The proposed system is evaluated on the publicly available WMCA multi-channel face PAD database, which contains a wide variety of 2D and 3D attacks. Further, we have performed experiments with MLFP and SiW-M datasets using RGB channels only. Superior performance in unseen attack protocols shows the effectiveness of the proposed approach. Software, data, and protocols to reproduce the results are made available publicly.
Article
This paper addresses the problem of face presentation attack detection using different image modalities. In particular, the usage of short wave infrared (SWIR) imaging is considered. Face presentation attack detection is performed using recent models based on Convolutional Neural Networks using only carefully selected SWIR image differences as input. Conducted experiments show superior performance over similar models acting on either color images or on a combination of different modalities (visible, NIR, thermal and depth), as well as on a SVM-based classifier acting on SWIR image differences. Experiments have been carried on a new public and freely available database, containing a wide variety of attacks. Video sequences have been recorded thanks to several sensors resulting in 14 different streams in the visible, NIR, SWIR and thermal spectra, as well as depth data. The best proposed approach is able to almost perfectly detect all impersonation attacks while ensuring low classification errors. On the other hand, obtained results show that obfuscation attacks are more difficult to detect. We hope that the proposed database will foster research on this challenging problem. Finally, all the code and instructions to reproduce presented experiments is made available to the research community.
Article
Face verification systems are prone to spoofing attacks on photos, videos, and 3D masks. Face spoofing detection, i.e., face anti-spoofing, face liveness detection, or face presentation attack detection, is an important task for securing face verification systems in practice and presents many challenges. In this paper, a state-of-the-art face spoofing detection method based on a depth-based Fully Convolutional Network (FCN) is revisited. Different supervision schemes, including global and local label supervisions, are comprehensively investigated. A generic theoretical analysis and associated simulation are provided to demonstrate that local label supervision is more suitable than global label supervision for local tasks with insufficient training samples, such as the face spoofing detection task. Based on the analysis, the Spatial Aggregation of Pixel-level Local Classifiers (SAPLC), which is composed of an FCN part and an aggregation part, is proposed. The FCN part predicts the pixel-level ternary labels, which include the genuine foreground, the spoofed foreground, and the undetermined background. Then, these labels are aggregated together to yield an accurate image-level decision. Furthermore, to quantitatively evaluate the proposed SAPLC, experiments are carried out on the CASIA-FASD, Replay-Attack, OULU-NPU, and SiW datasets. The experiments show that the proposed SAPLC outperforms the representative deep networks, including two globally supervised CNNs, one depth-based FCN, two FCNs with binary labels, and two FCNs with ternary labels, and achieves competitive performances close to some state-of-the-art method performances under various common protocols. Overall, the results empirically verify the advantage of the proposed pixel-level local label supervision scheme.
Article
Face recognition is a mainstream biometric authentication method. However, vulnerability to presentation attacks (a.k.a spoofing) limits its usability in unsupervised applications. Even though there are many methods available for tackling presentation attacks (PA), most of them fail to detect sophisticated attacks such as silicone masks. As the quality of presentation attack instruments improves over time, achieving reliable PA detection with visual spectra alone remains very challenging. We argue that analysis in multiple channels might help to address this issue. In this context, we propose a multi-channel Convolutional Neural Network based approach for presentation attack detection (PAD). We also introduce the new Wide Multi-Channel presentation Attack (WMCA) database for face PAD which contains a wide variety of 2D and 3D presentation attacks for both impersonation and obfuscation attacks. Data from different channels such as color, depth, near-infrared and thermal are available to advance the research in face PAD. The proposed method was compared with feature-based approaches and found to outperform the baselines achieving an ACER of 0.3% on the introduced dataset. The database and the software to reproduce the results are made available publicly.
Conference Paper
Biometrics emerged as a robust solution for security systems. However, given the dissemination of biometric applications , criminals are developing techniques to circumvent them by simulating physical or behavioral traits of legal users (spoofing attacks). Despite face being a promising characteristic due to its universality, acceptability and presence of cameras almost everywhere, face recognition systems are extremely vulnerable to such frauds since they can be easily fooled with common printed facial photographs. State-of-the-art approaches, based on Convolutional Neural Networks (CNNs), present good results in face spoofing detection. However, these methods do not consider the importance of learning deep local features from each facial region, even though it is known from face recognition that each facial region presents different visual aspects, which can also be exploited for face spoofing detection. In this work we propose a novel CNN architecture trained in two steps for such task. Initially, each part of the neural network learns features from a given facial region. Afterwards, the whole model is fine-tuned on the whole facial images. Results show that such pre-training step allows the CNN to learn different local spoofing cues, improving the performance and the convergence speed of the final model, outperforming the state-of-the-art approaches.
Conference Paper
Convolutional neural networks (CNNs) have been widely used in computer vision community, significantly improving the state-of-the-art. In most of the available CNNs, the softmax loss function is used as the supervision signal to train the deep model. In order to enhance the discriminative power of the deeply learned features, this paper proposes a new supervision signal, called center loss, for face recognition task. Specifically, the center loss simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers. More importantly, we prove that the proposed center loss function is trainable and easy to optimize in the CNNs. With the joint supervision of softmax loss and center loss, we can train a robust CNNs to obtain the deep features with the two key learning objectives, inter-class dispension and intra-class compactness as much as possible, which are very essential to face recognition. It is encouraging to see that our CNNs (with such joint supervision) achieve the state-of-the-art accuracy on several important face recognition benchmarks, Labeled Faces in the Wild (LFW), YouTube Faces (YTF), and MegaFace Challenge. Especially, our new approach achieves the best results on MegaFace (the largest public domain face benchmark) under the protocol of small training set (contains under 500000 images and under 20000 persons), significantly improving the previous results and setting new state-of-the-art for both face recognition and face verification tasks.
Conference Paper
Temporal features is important for face anti-spoofing. Unfortunately existing methods have limitations to explore such temporal features. In this work, we propose a deep neural network architecture combining Long Short-Term Memory (LSTM) units with Convolutional Neural Networks (CNN). Our architecture works well for face anti-spoofing by utilizing the LSTM units' ability of finding long relation from its input sequences as well as extracting local and dense features through convolution operations. Our best model shows significant performance improvement over general CNN architecture (5.93% vs. 7.34%), and hand-crafted features (5.93% vs. 10.00%) on CASIA dataset.
Article
With the wide deployment of face recognition systems in applications from de-duplication to mobile device unlocking,security against face spoofing attacks requires increased attention; such attacks can be easily launched via printed photos, video replays and 3D masks of a face. We address the problem of face spoof detection against print (photo) and replay (photo or video) attacks based on the analysis of image distortion (e.g., surface reflection, moir´e pattern, color distortion, and shape deformation) in spoof face images (or video frames). The application domain of interest is smartphone unlock, given that growing number of smartphones have face unlock and mobile payment capabilities. We build an unconstrained smartphone spoof attack database (MSU USSA) containing more than 1; 000 subjects. Both print and replay attacks are captured using the front and rear cameras of a Nexus 5 smartphone. We analyze the image distortion of print and replay attacks using different (i) intensity channels (R, G, B and grayscale), (ii) image regions (entire image, detected face, and facial component between the nose and chin), and (iii) feature descriptors. We develop an efficient face spoof detection system on an Android smartphone. Experimental results on the public-domain Idiap Replay-Attack,CASIA FASD, and MSU-MFSD databases, and the MSU USSA database show that the proposed approach is effective in face spoof detection for both cross-database and intra-database testing scenarios. User studies of our Android face spoof detection system involving 20 participants show that the proposed approach works very well in real application scenarios.
Chapter
With the wide applications of face recognition, spoofing attack is becoming a big threat to their security. Conventional face recognition systems usually adopt behavioral challenge-response or texture analysis methods to resist spoofing attacks, however, these methods require high user cooperation and are sensitive to the imaging quality and environments. In this chapter, we present a multi-spectral face recognition system working in VIS (Visible) and NIR (Near Infrared) spectrums, which is robust to various spoofing attacks and user cooperation free. First, we introduce the structure of the system from several aspects including: imaging device, face landmarking, feature extraction, matching, VIS, and NIR sub-systems. Then the performance of the multi-spectral system and each subsystem is evaluated and analyzed. Finally, we describe the multi-spectral image-based anti-spoofing module, and report its performance under photo attacks. Experiments on a spoofing database show the excellent performance of the proposed system both in recognition rate and anti-spoofing ability. Compared with conventional VIS face recognition system, the multi-spectral system has two advantages: (1) By combining the VIS and NIR spectrums, the system can resist VIS photo and NIR photo attacks easily. And users’ cooperation is no longer needed, making the system user friendly and fast. (2) Due to the precise key-point localization, Gabor feature extraction and unsupervised learning, the system is robust to pose, illumination and expression variations. Generally, its recognition rate is higher than the VIS subsystem.
Conference Paper
User authentication is an important step to protect information and in this field face biometrics is advantageous. Face biometrics is natural, easy to use and less human-invasive. Unfortunately, recent work has revealed that face biometrics is vulnerable to spoofing attacks using low-tech cheap equipments. This article presents a countermeasure against such attacks based on the LBP−TOP operator combining both space and time information into a single multiresolution texture descriptor. Experiments carried out with the REPLAY ATTACK database show a Half Total Error Rate (HTER) improvement from 15.16% to 7.60%.
Conference Paper
For a robust face biometric system, a reliable anti-spoofing approach must be deployed to circumvent the print and replay attacks. Several techniques have been proposed to counter face spoofing, however a robust solution that is computationally efficient is still unavailable. This paper presents a new approach for spoofing detection in face videos using motion magnification. Eulerian motion magnification approach is used to enhance the facial expressions commonly exhibited by subjects in a captured video. Next, two types of feature extraction algorithms are proposed: (i) a configuration of LBP that provides improved performance compared to other computationally expensive texture based approaches and (ii) motion estimation approach using HOOF descriptor. On the Print Attack and Replay Attack spoofing datasets, the proposed framework improves the state-of-art performance, especially HOOF descriptor yielding a near perfect half total error rate of 0%and 1.25% respectively.
Article
A technique evaluating liveness in face image sequences is presented. To ensure the actual presence of a live face in contrast to a photograph (playback attack), is a significant problem in face authentication to the extent that anti-spoofing measures are highly desirable. The purpose of the proposed system is to assist in a biometric authentication framework, by adding liveness awareness in a non-intrusive manner. Analyzing the trajectories of certain parts of a live face reveals valuable information to discriminate it against a spoofed one. The proposed system uses a lightweight novel optical flow, which is especially applicable in face motion estimation based on the structure tensor and inputs of a few frames. For reliable face part detection, the system utilizes a model-based local Gabor decomposition and SVM experts, where selected points from a retinotopic grid are used to form regional face models. Also the estimated optical flow is exploited to detect a face part. The whole procedure, starting with three images as input and finishing in a liveness score, is executed in near real-time without special purpose hardware. Experimental results on the proposed system are presented on both a public database and spoofing attack simulations.