Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

This paper presents a summary of the Masked Face Recognition Competitions (MFR) held within the 2021 International Joint Conference on Biometrics (IJCB 2021). The competition attracted a total of 10 participating teams with valid submissions. The affiliations of these teams are diverse and associated with academia and industry in nine different countries. These teams successfully submitted 18 valid solutions. The competition is designed to motivate solutions aiming at enhancing the face recognition accuracy of masked faces. Moreover, the competition considered the deployability of the proposed solutions by taking the compactness of the face recognition models into account. A private dataset representing a collaborative, multi-session, real masked, capture scenario is used to evaluate the submitted solutions. In comparison to one of the top-performing academic face recognition solutions, 10 out of the 18 submitted solutions did score higher masked face verification accuracy.
Content may be subject to copyright.
MFR 2021: Masked Face Recognition Competition
Fadi Boutros1,2,, Naser Damer1,2,, Jan Niklas Kolf1,, Kiran Raja 3,,
Florian Kirchbuchner1,, Raghavendra Ramachandra3,, Arjan Kuijper1,2,,
Pengcheng Fang4,+, Chao Zhang4,+, Fei Wang4,+, David Montero5,6,+, Naiara Aginako6,+,
Basilio Sierra6,+, Marcos Nieto5,+, Mustafa Ekrem Erakın7,+, U˘
gur Demir7,+, Hazım Kemal Ekenel7,+,
Asaki Kataoka8,+, Kohei Ichikawa8,+, Shizuma Kubo8,+, Jie Zhang9,10,+, Mingjie He9,10,+,
Dan Han9,10,+, Shiguang Shan9,10,+, Klemen Grm11,+, Vitomir ˇ
Struc11,+, Sachith Seneviratne12,+,
Nuran Kasthuriarachchi13,+, Sanka Rasnayaka14,+, Pedro C. Neto15,16,+, Ana F. Sequeira16,+,
Joao Ribeiro Pinto15,16,+, Mohsen Saffari15,16,+, Jaime S. Cardoso15,16,+
1Fraunhofer Institute for Computer Graphics Research IGD, Germany - 2TU Darmstadt, Germany
3Norwegian University of Science and Technology, Norway - 4TYAI, China
5VICOMTECH, Spain - 6University of the Basque Country, Spain
7SiMiT Lab, Istanbul Technical University, Turkey 8ACES, Inc, Japan
9Institute of Computing Technology, Chinese Academy of Sciences, China
10University of Chinese Academy of Sciences, China
11University of Ljubljana, Slovenia - 12University of Melbourne, Australia
13University of Moratuwa, Sri Lanka - 14National University of Singapore, Singapore
15INESC TEC, Portugal - 16 University of Porto, Portugal
Competition organizer. +Competition participant.
This paper presents a summary of the Masked Face
Recognition Competitions (MFR) held within the 2021 In-
ternational Joint Conference on Biometrics (IJCB 2021).
The competition attracted a total of 10 participating teams
with valid submissions. The affiliations of these teams are
diverse and associated with academia and industry in nine
different countries. These teams successfully submitted 18
valid solutions. The competition is designed to motivate
solutions aiming at enhancing the face recognition accu-
racy of masked faces. Moreover, the competition consid-
ered the deployability of the proposed solutions by taking
the compactness of the face recognition models into ac-
count. A private dataset representing a collaborative, multi-
session, real masked, capture scenario is used to evaluate
the submitted solutions. In comparison to one of the top-
performing academic face recognition solutions, 10 out of
the 18 submitted solutions did score higher masked face ver-
ification accuracy.
1. Introduction
Given the current COVID-19 pandemic, it is essential
to enable contactless and smooth-running operations, espe-
978-1-6654-3780-6/21/$31.00 ©2021 IEEE
cially in contact-sensitive facilities like airports. With the
ever-enhancing performance of face recognition, the tech-
nology has been preferred as a contactless means of veri-
fying identities in applications ranging from border control
to logical access control on consumer electronics. How-
ever, wearing masks is now essential to prevent the spread
of contagious diseases and has been currently forced in pub-
lic places in many countries. The performance, and thus the
trust in contactless identity verification through face recog-
nition can be impacted by the presence of a mask [19]. The
effect of wearing a mask on face recognition in a collabo-
rative environment is currently a sensitive issue. This com-
petition is the first to attract and present technical solutions
that enhance the accuracy of masked face recognition on
real face masks and in a collaborative verification scenario.
In a recent study, the National Institute of Standards and
Technology (NIST), as a part of the ongoing Face Recog-
nition Vendor Test (FRVT), has published a specific study
(FRVT -Part 6A) on the effect of face masks on the per-
formance on face recognition systems provided by vendors
[31]. The NIST study concluded that the algorithm accu-
racy with masked faces declined substantially. One of the
main study limitations is the use of simulated masked im-
arXiv:2106.15288v1 [cs.CV] 29 Jun 2021
ages under the questioned assumption that their effect repre-
sents that of real face masks. The Department of Homeland
Security has conducted an evaluation with similar goals,
however on more realistic data [4]. They also concluded
with the significant negative effect of wearing masks on the
accuracy of automatic face recognition solutions. A study
by Damer et al. [12] evaluated the verification performance
drop in 3 face biometric systems when verifying masked vs
not-masked faces, in comparison to verifying not-masked
faces to each other. The authors presented limited data
(24 subjects), however, with real masks and multiple cap-
ture sessions. They concluded by noting the bigger effect
of masks on genuine pairs decisions, in comparison to im-
poster pairs decisions. This study has been extended [11]
with a larger database and evaluation on both synthetic and
real masks, pointing out the questionable use of simulated
masks to represent the real mask effect on face recogni-
tion. Recent work has evaluated the human performance
in recognizing masked faces, in comparison to automatic
face recognition solutions [10]. The study concluded with
a set of take-home messages that pointed to the correlated
effect of wearing masks on both, human recognizers and au-
tomatic face recognition. Beyond recognition, facial masks
showed to affect both, the vulnerability of face recognition
to presentations attacks, and the detectability of these at-
tacks [18].
There were only a few works that address enhancing the
recognition performance of masked faces. Li et al. [26] pro-
posed to use an attention-based method to train a face recog-
nition model on the periocular area of masked faces. This
presented improvement in the masked face recognition per-
formance, however in a limited evaluation. Moreover, the
proposed approach essentially only maps the problem into
a periocular recognition problem. A recent preprint by [3]
presented a relatively small dataset of 53 identities crawled
from the internet. The work proposed to fine-tune Facen-
Net model [34] using simulated masked face images to im-
prove the recognition accuracy. Wang et al. [35] presented
three datasets crawled from the internet for face recognition,
detection, and simulated masked faces. The authors claim
to improve the verification accuracy from 50% to 95% on
masked faces. However, they did not provide any infor-
mation about the evaluation protocol, proposed solution, or
implementation details. Moreover, the published part of the
dataset does not contain pairs of not-masked vs masked im-
ages for the majority of identities. A work by Montero et
al. [30] proposed to combine ArcFace loss with a specially
designed mask-usage classification loss to enhance masked
face recognition performance. Boutros et al. [7] proposed a
template unmasking approach that can be adapted on the top
of any face recognition network. This approach aims to cre-
ate unmasked-like templates from masked faces. This goal
was achieved on top of multiple networks by the proposed
self-restrained triplet loss [7]. On a related matter, a rapid
number of works are published to address the detection of
wearing a face mask [5, 28, 36, 33]. These studies did not
address the effect of wearing a mask on the performance of
face recognition or present solution to improve masked face
Besides the exclusive interest in face recognition accu-
racy, there is a growing interest in compact face recognition
models [29]. This interest is driven by the demand for face
recognition deployment on consumer devices and the need
to enhance the throughput of face recognition processes.
A major challenge has been organized in ICCV 2019 to
motivate researchers to build lightweight face recognition
models [16]. MobileFaceNets are an example of such face
recognition models [9]. MixedFaceNets [6] are a recent ex-
ample where mixed depthwise convolutional kernels, with
a tailored head and embedding design and a shuffle opera-
tion, are utilized to achieve high recognition accuracies with
extremely light models.
Motivated by (a) the hygiene-driven wide use of facial
masks, (b) the proven performance decay of existing face
recognition solutions when processing masked faces, (c)
the need to motivate novel research in the direction of en-
hancing masked face recognition accuracy, and (d) the re-
quirement of light-weight models by various applications,
we conducted the IJCB Masked Face Recognition Compe-
tition 2021 (IJCB-MFR-2021). The competition attracted
submissions from academic and industry teams with a wide
international representation. The final participation toll was
10 teams with valid submissions. These teams submitted 18
valid solutions. The solutions were evaluated on a database
collected to represent a collaborative face verification sce-
nario with individuals wearing real face masks. This paper
summarises this competition with a detailed presentation of
the submitted solutions and the achieved results in terms
of masked vs masked face verification accuracy, masked vs
not-masked face verification accuracy, and the compactness
of the recognition models.
In the next sections, we start by introducing the competi-
tion evaluation database, the evaluation criteria, and the par-
ticipating teams. Then, in Section 3, short descriptions of
the submitted solutions are listed. In Section 4, we present
and discuss the achieved results along with listing the win-
ning submissions. We end the paper in Section 5 with a final
general conclusion.
2. Database, evaluation criteria, and partici-
2.1. Database
The evaluation data, the masked face recognition compe-
tition data (MFRC-21), simulates a collaborative, yet vary-
ing scenario. Such as the situation in automatic border con-
trol gates or unlocking personal devices with face recog-
Session Session 1:
Session 2 and 3:
Data split BLR MR MP
Number of Captures 470 940 1880
Table 1: An overview of the MFRC-21 database structure.
nition, where the mask, illumination, and background can
change. The database is collected by the hosting institute
and not available publicly. The data is collected on three
different, not necessarily consecutive days. We consider
each of these days as one session. On each day, the sub-
jects have collected three videos, each of a minimum length
of 5 seconds (used as single image frames). The videos
are collected from static webcams (not handheld), while the
subjects are requested to look at the camera, simulating a
login scenario. The data is collected by subjects at their res-
idences during the pandemic-induced home-office period.
The first session is considered a reference session, while
the other two were considered probe sessions. Each day
contained three types of captures, no mask, masked with
natural illumination, masked with additional illumination.
The database participants were asked to remove eyeglasses
only when the frame is considered very thick. No other re-
strictions were imposed, such as background or mask type
and its consistency over days, to simulate realistic scenar-
ios. The first second of each video was neglected to avoid
possible biases related to the subject interaction with the
capture device. After the neglected one second, three sec-
onds were considered. From these three seconds, 10 frames
are extracted with a gap of 9 frames between each consecu-
tive frame, knowing that all videos are captured at a frame
rate of 30 frames per second.
The final considered portions of the database in the com-
petition are (a) the not-masked baseline reference from the
first session (noted as BLR), (b) the masked reference from
the first session (noted as MR), and (c) the masked face
probes from the second and third sessions under both illu-
mination scenarios (noted as MP). A summary of the used
database is presented in Table 1 and samples of the database
are presented in Figure 1. The database contained 47 sub-
jects, all of them participated in all the sessions. All the
subject provided their informed consent to use the data for
research purposes.
Two evaluation setups are considered, (a) not-masked
vs masked, where all images in BLR are compared to all
images in MP (noted as BLR-MP), and (b) masked vs.
masked, where all images in MR are compared to all im-
ages in MP (noted as MR-MP).
2.2. Evaluation criteria
The solutions evaluation will be based on both, the
verification performance and the compactness of the used
mode/models. The verification evaluation will be based on
the verification performance of masked vs. not-masked ver-
ification pairs, as this is the common scenario, where the
reference is not-masked, while the probe is masked, e.g. in
entry to a secure access area. This scenario will be noted as
BLR-MP. However, the performance of masked vs. masked
verification pairs is also be reported in this paper. This sce-
nario is noted as MR-MP.
The verification performance is evaluated and reported
as the false non-match rate (FNMR) at different operation
points FMR100, FMR1000, which are the lowest FNMR
for a false match rate (FMR) <1.0% and <0.1%, respec-
tively. The verification performance evaluation of the sub-
mitted solutions is based on FMR100. To get an indication
of generalizability, we also report a separability measure be-
tween the genuine and imposter comparison scores. This is
measured by the Fisher Discriminant Ratio (FDR) as for-
mulaed in [32].
To consider the deployability of the participating solu-
tions, we will also consider the compactness of the model
(represented by the number of trainable parameters [17]) in
the final ranking. The participants are asked to report the
number of trainable parameters and can be asked to provide
their solutions to validate this number.
The final teams ranking is be based on a weighted Borda
count, where the participants will be ranked by (a) the
verification metric as mentioned above (noted as Rank-
a), and (b) by the number of trainable parameters in their
model/models (notes as Rank-b). For Rank-a, the solutions
with lower FMR100 are ranked first, and for Rank-b, the
solutions with the lower number of trainable parameters are
ranked first. In the final ranking, Rank-a will have 75%
weight and Rank-b will have 25% weight. Each participant
is given a Borda count (BC) for each ranking criteria (BC-a
and BC-b). For example, if solution X is ranked first out of
10 participants in the verification performance rank-a (BC-
a =9) and third out of 10 solutions in model compactness
Rank-b (BC-b = 7) (this corresponds to BC = total number
of solutions – rank). Then the weighted Borda count w-BC
= 0.75x9+0.25x7= 8.5. Therefore, the final score of solution
X is 8.5 and higher indicates a better solution. The solutions
are ranked from the highest w-BC to the lowest w-BC.
2.3. Submission and evaluation process
Each of the teams was requested to submit their solutions
as Win32 or Linux console applications. These applications
should be able to accept three parameters, evaluation-list
(text file), landmarks (text file), and an output path. The
evaluation-list contains pairs of the path to the reference and
probe images and a label for each of the compared images,
indicating if the image is masked or not. The landmarks
provided a bounding box and five landmark locations of the
images as detected by the MTCNN solution [38]. Only the
pairs of images with valid detected faces are provided to the
(a) Not-masked baseline faces (BLR) (b) Masked faces (MR/MP)
Figure 1: Samples of the MFRC-21 database from the two capture types (BLR and MR/MP). MR and MP have similar
capture settings, MR on the first setting and MP on the second and third session.
solutions in the evaluation-list. From the initial considered
data, the face detector [38] did not provide valid face detec-
tions. For the BLR-MP pairs, 4.42% of the pairs contained
invalid detections of faces and thus not considered in the
evaluation. for the MR-MP pairs, 4.75% of the pairs con-
tained invalid detections of faces and thus not considered in
the evaluation. The output of the solution application script
is a text file containing comparison scores for each pair in
the evaluation-list.
2.4. Competition participants
The competition aimed at attracting participants with a
high geographic and activity variation. The call for par-
ticipation was shared on the International Joint Confer-
ence on Biometrics (IJCB 2021) website, on the competi-
tion own website 1, on public computer vision mailing lists
(e.g. CVML e-Mailing List), and through private e-Mailing
lists. The call for participation has attracted 12 registered
teams. Out of these, 10 teams have submitted valid solu-
tions. These 10 teams have affiliations based in nine differ-
ent countries. Seven of the 10 teams are affiliated with aca-
demic institutions, two are affiliated with the industry, and
one team has both academic and industry affiliations. Only
one of the participating teams has chosen to be anonymous.
Each team was allowed to submit up to two solutions. The
total number of validly submitted solutions is 18. A sum-
mary of the participating teams is presented in Table 2.
3. Submitted solutions
Ten teams have been registered for MFR 2021 compe-
tition and submitted 18 valid solutions. Table 2 presents a
summary of the registered team members and their affili-
ation, submitted solutions, and type of institution of each
registered team (Academic, Industry, or mix of both aca-
demic and industry). In the following, we provide a brief
description of the valid submitted solutions:
A1 Simple employed ArcFace [15] to train a ResNet
model. A1 Simple applied MaskTheFace [3] method to
synthetically generate masked face images in the training
dataset- MS1MV2. A1 Simple is trained with cosine an-
nealing LR scheduling to adjust the learning rate. In the
evaluation phase, A1 Simple used the provided landmark
facial point and bounding box in the MFRC-21 to align and
crop the face image to 112×112. The feature embedding of
the presented solution is of size 512-D. The model is trained
with ArcFace loss. During the training phase, three data
augmentation methods are used- random resized crops, ran-
dom horizontal flip, and color jittering.
TYAI solution uses Sub-center ArcFace [13] and ir-
ResNet152 model to train a masked face recognition model
on Glint360K dataset [2]. The proposed solution randomly
augmented half of the training dataset with a synthetic gen-
erated mask using five types of transparent masks. The
input image size of the proposed model is 112 ×112 and
the size of the output feature embedding is 512-D. During
the training, additional four data augmentation methods are
used: random crop by resizing the image to 128 ×128 and
then randomly cropping it to 112 ×112, random horizontal
flip, random rotation, and random affine. The model uses a
Sub-center ArcFace loss to train the proposed solution.
Mask aware ArcFace (MaskedArcFace) opts to gener-
ate a masked twin dataset from MS1MV2 [20, 15] dataset
and to combine them during the training process. Both
datasets are shuffled separately using the same seed and,
for every new face image selected for the input batch,
MaskedArcFace decides whether the image is taken from
the original (not-masked) or the masked dataset with a prob-
ability of 50%. MaskedArcFace use ArcFace [15] as the
baseline work. MaskedArcFace selects the dataset recom-
mended by ArcFace (MS1MV2) [20, 15] as the training
dataset, which contains 5.8M images and 85,000 identities.
MaskedArcFace uses IResNet-50 as the backbone among
all the network architectures tested in the ArcFace reposi-
tory as it is it offers good trade-off between the accuracy
and the number of parameters. For the generation of the
masked version of the dataset, MaskedArcFace uses Mask-
TheFace [3]. The types of masks considered are surgical,
surgical green, surgical blue, N95, cloth, and KN95. The
mask type is selected randomly with a 50% probability of
applying a random color and a 50% probability of applying
a random texture. During the evaluation phase, MaskThe-
Solution Team members Affiliations Type of institution
A1 Simple Asaki Kataoka, Kohei Ichikawa, Shizuma Kubo ACES, Inc, Japan Industry
TYAI Pengcheng Fang, Chao Zhang, Fei Wang TYAI, China Industry
MaskedArcFace David Montero, Naiara Aginako Basilio Sierra,
Marcos Nieto Vicomtech, Spain - University of the Basque Country, Spain Academic
MFR-NMRE-F Klemen Grm, Vitomir ˇ
Struc University of Ljubljana, Slovenia Academic
MUFM Net Sachith Seneviratne, Nuran Kasthuriarachchi,
Sanka Rasnayaka
University of Melbourne, Australia - National University of Singapore,
Singapore - University of Moratuwa, Sri Lanka Academic
VIPLFACE-M Jie Zhang , Mingjie He, Dan Han, Shiguang Shan Institute of Computing Technology, Chinese Academy of Sciences, China,
University of Chinese Academy of Sciences, China Academic
SMT-MFR-1 Mustafa Ekrem Erakın, U˘
gur Demir,
Hazım Kemal Ekenel
Smart Interaction and Machine Intelligence Lab (SiMiT Lab), Istanbul
Technical University, Turkey Academic
LMI-SMT-MFR-1 Mustafa Ekrem Erakın, U ˘
gur Demir,
Hazım Kemal Ekenel, Klemen Grm, Vitomir ˇ
Struc Istanbul Technical University, Turkey - University of Ljubljana, Slovenia Academic
IM-MFR Pedro C. Neto, Ana F. Sequeira, Jo˜
ao Ribeiro Pinto,
Mohsen Saffari, Jaime S. Cardoso
INESC TEC, Portugal - University of Porto, Faculty of Engineering
(FEUP), Portugal Academic
Anonymous-1 Anonymous Anonymous mix
Table 2: A summary of the submitted solutions, participant team members, affiliations, and type of institutions (Industry,
Academic, or mix). The table lists the abbreviations of each submitted solution. Details of the submitted algorithms are in
Section 3.
Solution Verification performance Compactness Joint
FMR100 FMR1000 FDR Rank-a BC-a number of parameters Rank-b BC-b w-BC Rank
Baseline 0.06009 0.07154 8.6899 - - 65155648 - - - -
TYAI 0.05095 0.05503 11.2005 1 17 70737600 14 4 13.75 1
MaskedArcFace 0.05687 0.05963 10.4484 5 13 43589824 6 12 12.75 2
SMT-MFR-2 0.05584 0.06268 11.2025 3 15 65131000 12 6 12.75 2
A1 Simple 0.05538 0.06113 8.5147 2 16 87389138 16 2 12.5 3
VIPLFACE-M 0.05681 0.06279 8.2371 4 14 65128768 10 8 12.5 3
MTArcFace 0.05699 0.05860 10.7497 6 12 43640002 7 11 11.75 4
SMT-MFR-1 0.05704 0.06003 10.6824 7 11 65131000 12 6 9.75 5
VIPLFACE-G 0.05750 0.07269 8.1693 9 9 65128768 10 8 8.75 6
MFR-NMRE-B 0.05819 0.08344 7.9504 10 8 43723943 8 10 8.5 7
LMI-SMT-MFR-1 0.05722 0.06205 9.7384 8 10 108854000 17 1 7.75 8
MFR-NMRE-F 0.08125 0.17660 5.3876 12 6 43723943 8 10 7 9
MUFM Net 0.17579 0.40489 4.4640 14 4 25636712 3 15 6.75 10
IM-AMFR 0.28252 0.47608 3.7414 15 3 36898792 4 14 5.75 11
LMI-SMT-MFR-2 0.05848 0.07096 8.5278 11 7 108854000 17 1 5.5 12
Anonymous-1 0.92536 0.96596 0.1011 17 1 23777281 1 17 5 13
IM-MFR 0.28447 0.47430 3.7369 16 2 36898792 4 14 5 13
EMUFM Net 0.16239 0.35681 4.5445 13 5 76910136 15 3 4.5 14
Anonymous-2 0.97125 0.99517 0.0426 18 0 23777281 1 17 4.25 15
Table 3: The comparative evaluation of the submitted solutions on the MFRC-21 dataset. The results are presented in terms
of verification performance including FMR100, FMR1000, and FDR, and the model compactness in terms of the number of
trainable parameters. The FMR100 and FMR1000 are given as absolute values. The rank of the verification performance
(Rank-a) is based on FMR100 and the rank of the solution compactness (Rank-b) is based on the number of parameters.
Rank-a has 75% weight and Rank-b has 25% weight. The results are ordered based on weighted Borda count (w-BC).
Face uses the provided landmark points and the bounding
box provided by the competition to align and crop face im-
ages. The feature embedding produced by MaskedArcFace
solution is of the size 512-D and the input face image is of
the size 112 ×112 pixels.
Multi-task ArcFace (MTArcFace) utilized the same
training dataset, loss function, backbone, and mask gener-
ation method as in MaskedArcFace. MTArcFace adds an-
other dense layer in parallel to the one used to generate the
feature vector by IResNet-50, just after the dropout layer.
The new dense layer generates an output with two floats,
which correspond to the scores related to the probability that
the face is masked or not, respectively. This way, MTArc-
Face aims to force the network to learn when a face is wear-
ing a mask. This information will also be used by the layer
that generates the feature vector. The data preprocessing
steps and the size of the feature embedding are identical to
the MaskedArcFace.
Solution Verification performance
FMR100 FMR1000 FDR Rank
Baseline 0.05925 0.06504 9.68640 -
TYAI 0.04489 0.05961 12.36306 1
VIPLFACE-M 0.05759 0.06788 8.98593 2
A1 Simple 0.05771 0.06368 10.48611 3
SMT-MFR-2 0.05792 0.06172 11.30901 4
MaskedArcFace 0.05825 0.06245 10.57307 5
SMT-MFR-1 0.05825 0.06012 11.03444 6
VIPLFACE-G 0.05843 0.06359 9.41466 7
MTArcFace 0.0585 0.06390 10.16996 8
LMI-SMT-MFR-1 0.05856 0.06061 9.90914 9
LMI-SMT-MFR-2 0.05916 0.06586 8.87424 10
MFR-NMRE-B 0.05970 0.12903 8.11963 11
MFR-NMRE-F 0.09630 0.1989 4.73224 12
EMUFM Net 0.15045 0.31945 4.45317 13
MUFM Net 0.16354 0.37607 4.43278 14
IM-AMFR 0.23507 0.40265 3.94744 15
IM-MFR 0.23661 0.40373 3.94905 16
Anonymous-1 0.89481 0.97584 0.19968 17
Anonymous-2 0.9114 0.98102 0.16569 18
Table 4: The comparative evaluation results of the submit-
ted solutions. The verification evaluation is based on the
verification performance of masked vs. masked verification
pairs where references and probes are masked. The per-
formances are reported in terms of FMR-100, FMR-1000
and FDR. The FMR100 and FMR1000 are given as absolute
values. The reported results are ordered based on FMR-100.
Masked face recognition using non-masked region
extraction and fine-tuned recognition model (MFR-
NMRE-F) Based on the 5-point face landmark detec-
tions, the proposed approach identifies a crop that corre-
sponds to the upper facial region where masks are not visi-
ble. Then, MFR-NMRE-F fine-tuned a VGG2-SE-ResNet-
50 face recognition model for the classification task on these
crops using the VGGFace2 [8] training dataset processed
with the RetinaFace [14] detector. For the evaluation, MFR-
NMRE-F uses the provided face landmarks provided by
MFRC-21, since they correspond closely to the RetinaFace
results obtained on the training dataset. Using the land-
mark coordinates, the MFR-NMRE-F solution extracts the
upper face region, extracts feature vectors using the fine-
tuned VGG2-SE-ResNet-50 model, and compares features
using the cosine similarity measure. The proposed method
is trained using cross-entropy (CE) loss. The input size of
the proposed model is 96 ×192 and the feature embedding
size is 2048-D.
Masked face recognition using non-masked region ex-
traction and pre-trained recognition model (MFR-
NMRE-B) identifies a crop that corresponds to the up-
per facial region where masks are not visible based on the
5-point face landmark. MFR-NMRE-B utilized a VGG2-
Solution Input size FM Loss function RM SM
A1 Simple 112 x 112 512 ArcFace No Yes
TYAI 112 x 112 512 Sub-center ArcFace No Yes
MaskedArcFace 112 x 112 512 ArcFace No Yes
MTArcFace 112 x 112 512 ArcFace No Yes
MFR-NMRE-F 96 x 192 2048 CE No No
MFR-NMRE-B 112 x 224 2048 CE No No
MUFM Net 224 x 224 2048 CE No Yes
EMUFM Net 224 x 224 2048 CE No Yes
VIPLFACE-M 112 x 112 512 ArcFace No Yes
VIPLFACE-G 112 x 112 512 ArcFace No No
SMT-MFR-1 112 x 112 512 ArcFace Yes No
SMT-MFR-2 112 x 112 512 ArcFace Yes No
LMI-SMT-MFR-1 96 x 192 2048 CE No No
112 x 112 512 ArcFace Yes No
LMI-SMT-MFR-2 112 x 224 2048 CE No No
112 x 112 512 ArcFace Yes No
IM-MFR 224 x 224 512 CE, triplet loss
and MSE No Yes
IM-AMFR 224 x 224 512 CE, triplet loss
and MSE No Yes
Anonymous-1 160 x 160 512 CE No Yes
Anonymous-2 160 x 160 512 CE No Yes
Table 5: Basic details of the submitted solutions including,
the input image size, the feature embedding size (FM), the
loss function used for training, the use of real masked faces
(RM), and simulated masked faces (SM) in the training pro-
cess. The solutions in bold are the ones ranked top in the
competition. Note that all the top-ranked solutions used a
version of the ArcFace loss [15, 13].
SE-ResNet-50 model pre-trained for the classification task
using the VGGFace2 [8] training dataset. Different from
MFR-NMRE-F, the MFR-NMRE-B solution did not fine-
tune the feature extraction model with cropped images. For
the evaluation, the proposed method uses the provided face
landmarks provided by MFRC-21. Using the landmark co-
ordinates, the proposed method crops the upper face re-
gion, extracts feature vectors using the VGG2-SE-ResNet-
50 model, and compares features using the cosine simi-
larity measure. MFR-NMRE-B is trained using Softmax
cross-entropy loss. The input size of the proposed model is
112 ×224 and the feature embedding size is 2048-D.
Masked-Unmasked Face Matching Net (MUFM Net)
utilizes Momentum Contrast (MoCo) [21] to create an ini-
tial embedding using a ResNet-50 model trained on CelebA
dataset [27]. Then, synthetic masked versions of CelebA,
Spectacles on Faces [1], Youtube Faces [37] and LFW [25]
are created as defined in [31]. The initial model is fine-
tuned using these dataset. For fine-tuning, MUFM Net uses
a siamese network with shared weights with absolute dif-
ferences taken at the last bottleneck layer. This difference
is fed into a 512 fully connected layer followed by a single
softmax node.The model is fine-tuned with binary cross-
entropy loss with 50% of layers frozen. The input size of
the presented model is 224 ×224 pixels.
Ensemble MUFM Net (EMUFM Net) builds upon
MUFM to create an ensemble. First, the best-performing
MUFM models are selected based on the validation accu-
racy. The selected models are M1 (obtained after 695K iter-
ations) and M2 (obtained after 885K iteration) These mod-
els are fine-tuned on hard examples drawn from the training
set. Three models are fine-tuned- E1 and E2 builds on M1
where 90% and 80% of the layers are frozen, respectively,
and E3 builds on M2 where 50% of the layers are frozen.
All these models have an input of size 224 ×224 and an
output embedding of size 2048-D. During the testing phase,
the similarity scores of these three models (E1-3) are aver-
aged to provide the final similarity score.
VIPLFACE-M adopted ResNet-100 [22] and ArcFace
loss [15] for face recognition. The proposed solution uses a
refined version of MS1M dataset [20] for training the pro-
posed solution. The number of face images in the training
dataset is 3.8M of 50K identities. VIPLFACE-M uses the
synthetic mask creation method defined in 2to add synthetic
masks on part of the training dataset. The number of syn-
thetically masked face images used in the training is 500K
and the number of synthetically masked identities is 50K.
During the training phase, the proposed solution uses ran-
dom flipping as a data augmentation method. The input size
of the presented solution is 112 ×112 and the output feature
embedding size is 512-D.
VIPLFACE-G is based on training ResNet-100 model
[22] with ArcFace loss [15]. The input size of the presented
solution is 112×112 and the feature embedding size is 512-
D. The model is trained on a clean version of MS1M [20]
that contains 5.8M of 80K identities. The presented solution
uses random flip to augment the dataset during training.
SiMiT Lab – Masked Face Recognition–1 (SMT-MFR-
1) employs LResNet-100E-IR model [22] trained with
ArcFace loss function [15]. The model is originally trained
on MS1MV2 dataset [20, 15]. SMT-MFR-1 solution de-
pends on fine-tuning LResNet100E-IR using two real world
masked face datasets- Real World Occluded Faces (ROF) 3
and MFR2 dataset [3]. MFR2 contains 296 images of 53
identities. ROF dataset is crawled from the internet and
contains 678 masked face images and 1853 not-masked
face images of 123 identities. The proposed solution is
fine-tuned using the ROF dataset and a part of the MFR2
dataset (35 identities). The model process input image of
size 112 ×112 to produce feature embedding of size 512-D.
2 Zoo/blob/
During the training, the training dataset is augmented using
a horizontal flip augmentation method.
SiMiT Lab – Masked Face Recognition–2 (SMT-MFR-
2) is conceptually identical to SMT-MFR-1. Different
from SMT-MFR-1, the SMT-MFR-2 model is fine-tuned us-
ing the ROF dataset and the entire MFR2 dataset.
LMI - SiMiT Lab - Masked Face Recognition - 1 (LMI-
SMT-MFR-1) is a combination of two solutions- MFR-
NMRE-F and SMT-MFR-1. First, the features are extracted
separately by each of the solutions- MFR-NMRE-F and
SMT-MFR-1. Then, the comparison scores are calculated
for each solution. To combine the scores, cosine similar-
ity measures are converted to euclidean distance in MFR-
NMRE-F. The output of SMT-MFR-2 is euclidean distance.
After this, the scores are normalized separately for each so-
lution. Then, both scores are multiplied to generate the en-
semble score.
LMI - SiMiT Lab - Masked Face Recognition - 2 (LMI-
SMT-MFR-2) is also a combination of two solutions-
lows the same scores fusion method described in the LMI-
SMT-MFR-1 solution.
Ignoring masks for accurate masked face recognition
(IM-MFR) approach consists of two different training
processes. The first, which aims to build a classification
model, uses 6000 training identities from the VGGFace2
dataset [8] to minimize the cross-entropy while classifying
these images. Each image had a probability of 65% of be-
ing masked. All training images are randomly resized and
cropped to 224 ×224 In this solution, the masked creation
method [31] uses the open implementation 4by Boutros et
al. [7]. After achieving above 96% accuracy in the classi-
fication on the validation set, the last fully-connected layer
was replaced with a fully connected layer with 512 outputs
units. All the layers, except the newest one, are now frozen.
The last layer is trained with and joint Triplet Loss and MSE
for metric learning. The backbone network is a ResNet-50
[23]. The model is trained for 65k iterations.
Ignoring masks for accurate masked face recognition
(IM-AMFR) follows the same training procedure, archi-
tecture, and loss function as in IM-MFR. The only dif-
ference is the number of training iteration where the IM-
AMFR model is trained for 32k training iterations.
anonymous-1 and anonymous-2 employed FaceNet
[34] as base architecture pre-trained on VGGFace2 [8].
MaskTheFace [3] is used to augment the LFW [25, 24]
dataset and create a masked-face dataset. A masked ver-
sion of each image in LFW is created. The FaceNet model
is then fine-tuned using the augmented dataset. In the
anonymous-1 solution, the model is fine-tuned using only
masked face images. In the anonymous-2 solution, the
model is fine-tuned using pairs of unmasked and masked
images. For inference, the last layer of FaceNet consists of
512-dimensional embeddings, while the input size for both
solutions is 160 ×160 pixels. One must note that the pre-
sented approach is reasonable, however, the verification ac-
curacy presented in Section 4 is extremely low, which might
indicate an implementation error in the submission.
Baseline The baseline is chosen to put the submitted ap-
proaches in perspective of state-of-the-art face recognition
model performance. The considered baseline is the Ar-
cFace, which scored state-of-the-art performance on sev-
eral face recognition evaluation benchmarks such as LFW
99.83% and YTF 98.02% by using Additive Angular Mar-
gin loss (ArcFace) to improve the discriminative ability of
the face recognition model. We considered ArcFace based
on ResNet-100 [22] architecture pretrained on refined ver-
sion of the MS-Celeb-1M dataset [20] (MS1MV2).
Figure 2: The ROC curve scored by the top 10 solutions in
the BLR-MP experimental setting.
4. Results and analyses
This section presents comparative evaluation results of
the submitted solution. We present first the achieved results
on the BLR-MP evaluation setting and the model compact-
ness. Then, we present the achieved results on the MR-MP
evaluation setting.
4.1. Not-masked vs. masked (BLR-MP)
Table 3 presents comparative evaluation results achieved
by the submitted solutions for BLR-MP evaluation setting
and the model compactness. The results are reported and
ranked based on the evaluation criteria described in Section
2.2. From the reported results in Table 3 we made the fol-
lowing observations:
Based on the defined evaluation criteria in Section 2.2, the
top-ranked solution based on the weighted Borda count is
TYAI (rank 1), followed by MaskedArcFace and SMT-
MFR-2 (rank 2) and then A1 Simple and VIPLFACE-M
(rank 3).
Most of the presented solutions achieved a competitive
verification performance, in comparison to the baseline.
Ten out of 18 solutions achieved higher verification per-
formance than the baseline solution for the BLR-MP eval-
uation setting as reported in Table 3 and Figure 2. Fig-
ure 2 presented the achieved verification performances in
term of Receiver operating characteristic (ROC) curves
by the top 10 solution on the BLR-MP experimental
setting. The best verification performance in terms of
FMR100 is achieved by the TYAI solution, where the
achieved FMR100 was 0.05095 (Table 3 and Figure 3a).
By comparing the verification performances reported in
Table 3 and the loss function utilized by each of the solu-
tion reported in Table 5, it is noted that the models trained
with margin-based softmax loss (ArcFace or Sub-center
ArcFace loss) achieved higher verification performance
than the models trained with other loss functions includ-
ing cross-entropy and triplet loss. This points out the gen-
eralizability brought by the nature of the marginal penalty
that forces a better separability between classes (identi-
ties) and better compactness within classes.
The solutions that achieved competitive FMR100 to the
baseline solution have relatively higher separability be-
tween genuine and imposter scores (FDR) than other so-
lutions that achieved relatively lower verification perfor-
Regarding model compactness, all solutions contain be-
tween 23M and 108M parameters as shown in Table 3
and Figure 3c. The top 3 ranked solutions have less than
87M parameters. This indicates that utilizing a larger
and deeper deep learning model does not necessarily and
solely lead to higher verification performance.
The common strategy to improve the masked face recog-
nition verification performance by the submitted solu-
tions is to augment the training dataset with a simulated
mask. All submitted solutions depended on training or
fine-tuning face recognition model with masked face im-
ages (real or simulated). However, none of the presented
solutions propose a solution that could be applied on top
of the existing face recognition model, as in [7]. Further-
more, none of the presented solutions has clearly bene-
fited from the mask labels included in the evaluation list.
Four of the five top-ranked solutions utilized synthetically
(a) (b) (c)
Figure 3: (a) The FMR100 scored by the top 14 solutions in the BLR-MP experimental setting. (b) The FMR100 scored by
the top 14 solutions in the MR-MP experimental setting. (c) The number of trainable parameters in the top 16 solutions.
generate masks to augment the training dataset with sim-
ulated masked images. Utilizing such a method is usually
easier than other solutions, such as using a real masked
training dataset. Collecting a large-scale training dataset
with pairs of not-masked/masked face images is, how-
ever, not a trivial task.
4.2. Masked vs. masked (MP-MR)
The verification performance of the experimental setting
MR-MP for all submitted solutions is presented in Table 4.
The achieved verification performance is reported in terms
of FMR100, FMR1000, and FDR. The presented results are
ordered and ranked based on the achieved FMR100. It can
be noted from the reported verification performance in Ta-
ble 4 that ten out of 18 solutions achieved better verification
performance than the baseline solution when comparing
masked reference to masked probe (MR-MP). TYAI solu-
tion achieved the best verification performance followed by
VIPLFACE-M and A1 Simple. By comparing the reported
verification performance of BLR-MP evaluation setting (Ta-
ble 3) and the reported one of MR-MP (Table 4), we can
observe the following: a) Most of the solutions have higher
separability between genuine and imposter scores (higher
FDR) when both reference and probe are masked (MR-MP)
than the case where only the probe are masked (BLR-MP).
b) The top-ranked solutions in the MR-MP evaluation set-
ting are also ranked among the top solutions in the BLR-MP
evaluation setting.
5. Conclusion
Driven by the pandemic-driven use of facial masks, the
Masked Face Recognition Competitions (MFR 2021) was
organized to motivate and evaluate face recognition solu-
tions specially designed to perform well with masked faces.
A total of 10 teams from 11 affiliations participated in the
competition and contributed 18 solutions for the evaluation.
The evaluation focused on not-masked vs. masked face ver-
ification accuracy, the masked vs. masked face verification
accuracy, and the face recognition model compactness. Out
of the 18 submitted solutions, 10 achieved lower verifica-
tion error (FMR100) than the considered baseline. Most of
the top-performing solutions used variations of the ArcFace
loss and either real or simulated masked face databases in
their training process. The lowest achieved FMR100 for the
not-masked vs. masked evaluation was 5.1%, in compari-
son to an FMR100 of 6.0% scored by the baseline.
Acknowledgments: This research work has been funded
by the German Federal Ministry of Education and Research
and the Hessen State Ministry for Higher Education, Re-
search and the Arts within their joint support of the National
Research Center for Applied Cybersecurity ATHENE.
[1] M. Afifi and A. Abdelhamed. Afif4: Deep gender classifi-
cation based on adaboost-based fusion of isolated facial fea-
tures and foggy faces. J. Vis. Commun. Image Represent.,
62:77–86, 2019.
[2] X. An, X. Zhu, Y. Xiao, L. Wu, M. Zhang, Y. Gao, B. Qin,
D. Zhang, and Y. Fu. Partial FC: training 10 million identities
on a single machine. CoRR, abs/2010.05222, 2020.
[3] A. Anwar and A. Raychowdhury. Masked face recognition
for secure authentication. arXiv preprint arXiv:2008.11104,
[4] Arun Vemury and ake Hasselgren and John Howard and Yev-
geniy Sirotin. 2020 biometric rally results - face masks
face recognition performance.
Rally2020/Results2020, 2020. Last accessed: June
30, 2021.
[5] B. Batagelj, P. Peer, V. ˇ
Struc, and S. Dobriˇ
sek. How to cor-
rectly detect face-masks for covid-19 from visual informa-
tion? Applied Sciences, 11(5), 2021.
[6] F. Boutros, N. Damer, M. Fang, F. Kirchbuchner, and A. Kui-
jper. MixFaceNets: Extremely efficient face recognition net-
works. In 2021 IEEE IJCB, IJCB 2021, Shenzhen, China ,
August 4 - 7, 2021, pages 1–8. IEEE, 2021.
[7] F. Boutros, N. Damer, F. Kirchbuchner, and A. Kuijper. Un-
masking face embeddings by self-restrained triplet loss for
accurate masked face recognition. CoRR, abs/2103.01716,
[8] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman.
Vggface2: A dataset for recognising faces across pose and
age. In 2018 13th IEEE FG (FG 2018), pages 67–74. IEEE,
[9] S. Chen, Y. Liu, X. Gao, and Z. Han. Mobilefacenets: Effi-
cient cnns for accurate real-time face verification on mobile
devices. In CCBR, volume 10996 of Lecture Notes in Com-
puter Science, pages 428–438. Springer, 2018.
[10] N. Damer, F. Boutros, M. S¨
ußmilch, M. Fang, F. Kirchbuch-
ner, and A. Kuijper. Masked face recognition: Human vs.
machine. CoRR, abs/2103.01924, 2021.
[11] N. Damer, F. Boutros, M. S¨
ußmilch, F. Kirchbuchner, and
A. Kuijper. An extended evaluation of the effect of real and
simulated masks on face recognition performance. IET Bio-
metrics, 2021.
[12] N. Damer, J. H. Grebe, C. Chen, F. Boutros, F. Kirchbuch-
ner, and A. Kuijper. The effect of wearing a mask on face
recognition performance: an exploratory study. In BIOSIG,
volume P-306 of LNI. Gesellschaft f¨
ur Informatik e.V., 2020.
[13] J. Deng, J. Guo, T. Liu, M. Gong, and S. Zafeiriou. Sub-
center arcface: Boosting face recognition by large-scale
noisy web faces. In ECCV (11), volume 12356 of Lecture
Notes in Computer Science, pages 741–757. Springer, 2020.
[14] J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou. Reti-
naface: Single-shot multi-level face localisation in the wild.
In CVPR, pages 5202–5211. IEEE, 2020.
[15] J. Deng, J. Guo, N. Xue, and S. Zafeiriou. Arcface: Ad-
ditive angular margin loss for deep face recognition. In
IEEE CVPR, CVPR 2019, Long Beach, CA, USA, June 16-
20, 2019, pages 4690–4699, 2019.
[16] J. Deng, J. Guo, D. Zhang, Y. Deng, X. Lu, and S. Shi.
Lightweight face recognition challenge. In 2019 IEEE/CVF
ICCV, ICCV Workshops 2019, Seoul, Korea (South), October
27-28, 2019, pages 2638–2646. IEEE, 2019.
[17] P. Doll´
ar, M. Singh, and R. B. Girshick. Fast and accurate
model scaling. CoRR, abs/2103.06877, 2021.
[18] M. Fang, N. Damer, F. Kirchbuchner, and A. Kuijper. Real
masks and fake faces: On the masked face presentation at-
tack detection. CoRR, abs/2103.01546, 2021.
[19] M. Gomez-Barrero, P. Drozdowski, C. Rathgeb, J. Patino,
M. Todisco, A. Nautsch, N. Damer, J. Priesnitz, N. W. D.
Evans, and C. Busch. Biometrics in the era of COVID-19:
challenges and opportunities. CoRR, abs/2102.09258, 2021.
[20] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. Ms-celeb-1m:
A dataset and benchmark for large-scale face recognition.
In ECCV (3), volume 9907 of Lecture Notes in Computer
Science, pages 87–102. Springer, 2016.
[21] K. He, H. Fan, Y. Wu, S. Xie, and R. B. Girshick. Momentum
contrast for unsupervised visual representation learning. In
2020 IEEE/CVF CVPR, CVPR 2020, Seattle, WA, USA, June
13-19, 2020, pages 9726–9735. IEEE, 2020.
[22] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning
for image recognition. In 2016 IEEE CVPR, CVPR 2016,
Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE
Computer Society, 2016.
[23] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning
for image recognition. In Proceedings of the IEEE CVPR,
pages 770–778, 2016.
[24] G. B. Huang, M. A. Mattar, H. Lee, and E. G. Learned-
Miller. Learning to align from scratch. In NIPS, pages 773–
781, 2012.
[25] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller.
Labeled faces in the wild: A database for studying face
recognition in unconstrained environments. Technical Re-
port 07-49, Uni. of Massachusetts, Amherst, October 2007.
[26] Y. Li, K. Guo, Y. Lu, and L. Liu. Cropping and attention
based approach for masked face recognition. Applied Intel-
ligence, pages 1–14, 2021.
[27] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face at-
tributes in the wild. In ICCV, pages 3730–3738. IEEE Com-
puter Society, 2015.
[28] M. Loey, G. Manogaran, M. H. N. Taha, and N. E. M. Khal-
ifa. A hybrid deep transfer learning model with machine
learning methods for face mask detection in the era of the
covid-19 pandemic. Measurement, 167:108288, 2021.
[29] Y. Mart´
ıaz, M. Nicol´
ıaz, H. M´
L. S. Luevano, L. Chang, M. Gonzalez-Mendoza, and L. E.
Sucar. Benchmarking lightweight face architectures on spe-
cific face recognition scenarios. Artificial Intelligence Re-
view, pages 1–44, 2021.
[30] D. Montero, M. Nieto, P. Leskovsk ´
y, and N. Aginako. Boost-
ing masked face recognition with multi-task arcface. CoRR,
abs/2104.09874, 2021.
[31] M. L. Ngan, P. J. Grother, and K. K. Hanaoka. Ongoing face
recognition vendor test (frvt) part 6b: Face recognition accu-
racy with face masks using post-covid-19 algorithms. 2020.
[32] N. Poh and S. Bengio. A study of the effects of score nor-
malisation prior to fusion in biometric authentication tasks.
Technical report, IDIAP, 2004.
[33] B. Qin and D. Li. Identifying facemask-wearing condition
using image super-resolution with classification network to
prevent covid-19. Sensors, 20(18):5236, 2020.
[34] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A uni-
fied embedding for face recognition and clustering. In CVPR,
pages 815–823. IEEE Computer Society, 2015.
[35] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu,
P. Yi, K. Jiang, N. Wang, Y. Pei, H. Chen, Y. Miao, Z. Huang,
and J. Liang. Masked face recognition dataset and applica-
tion, 2020.
[36] Z. Wang, P. Wang, P. C. Louis, L. E. Wheless, and
Y. Huo. Wearmask: Fast in-browser face mask detection
with serverless edge computing for covid-19. arXiv preprint
arXiv:2101.00784, 2021.
[37] L. Wolf, T. Hassner, and I. Maoz. Face recognition in un-
constrained videos with matched background similarity. In
CVPR 2011, pages 529–534. IEEE, 2011.
[38] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detec-
tion and alignment using multitask cascaded convolutional
networks. IEEE Signal Pro. Lett., 23(10):1499–1503, 2016.
... Follow-up solutions trained FR models in a way that would promote producing similar face templates for masked and unmasked faces [18][19][20][21]. This interest in enhancing FR performance on masked faces led to two competitions that attracted a diverse set of academic and industrial participants [22,23]. Further studies by Fang et al. [24] have unravelled the vulnerabilities of face presentation attack detection solutions to presentation attacks of maskedfaces or attacks with real masks placed on them and proposed possible technical solutions [25]. ...
... -15 important clues for the development of FR solutions that are robust to masked faces. Such solutions recently focussed on training FR models that can process both masked and unmasked faces [18,22] or on reducing the effect of the mask on the face embedding by learning to transfer it into an embedding that behaves similarly to that of an unmasked face [17]. ...
Full-text available
The recent COVID‐19 pandemic has increased the focus on hygienic and contactless identity verification methods. However, the pandemic led to the wide use of face masks, essential to keep the pandemic under control. The effect of wearing a mask on face recognition (FR) in a collaborative environment is a currently sensitive yet understudied issue. Recent reports have tackled this by evaluating the masked probe effect on the performance of automatic FR solutions. However, such solutions can fail in certain processes, leading to the verification task being performed by a human expert. This work provides a joint evaluation and in‐depth analyses of the face verification performance of human experts in comparison to state‐of‐the‐art automatic FR solutions. This involves an extensive evaluation by human experts and 4 automatic recognition solutions. The study concludes with a set of take‐home messages on different aspects of the correlation between the verification behaviour of humans and machines.
... These studies [9][10][11] concluded that the verification performance of face-recognition solutions significantly degraded when the subject was wearing a face mask, compared to the case where their face was unmasked. This was followed by several efforts to enhance masked face recognition [13][14][15], including competitions where some of the submitted solutions proposed the use of the periocular recognition [16,17]. Periocular biometrics have a distinct advantage over facial biometrics when the face is largely occluded or when capturing a full face is less convenient than capturing the periocular region (e.g., a selfie on a smartphone or masked face recognition [11], while maintaining the touchless nature of face capture. ...
... The number of trainable parameters in ResNet-34 and ResNet-18 is 21.3 million (m) and 11.8 m, respectively. Additionally, ResNet proposed the use of a small filter size of (32,16,8) with three groups of residual blocks to create a compact version of the ResNet model (ResNet-110) designed for the CIFAR-10 database with fewer trainable parameters. The used ResNet-110 in this work contains 1.8 m trainable parameters. ...
Full-text available
This work addresses the challenge of building an accurate and generalizable periocular recognition model with a small number of learnable parameters. Deeper (larger) models are typically more capable of learning complex information. For this reason, knowledge distillation (kd) was previously proposed to carry this knowledge from a large model (teacher) into a small model (student). Conventional KD optimizes the student output to be similar to the teacher output (commonly classification output). In biometrics, comparison (verification) and storage operations are conducted on biometric templates, extracted from pre-classification layers. In this work, we propose a novel template-driven KD approach that optimizes the distillation process so that the student model learns to produce templates similar to those produced by the teacher model. We demonstrate our approach on intra- and cross-device periocular verification. Our results demonstrate the superiority of our proposed approach over a network trained without KD and networks trained with conventional (vanilla) KD. For example, the targeted small model achieved an equal error rate (EER) value of 22.2% on cross-device verification without KD. The same model achieved an EER of 21.9% with the conventional KD, and only 14.7% EER when using our proposed template-driven KD.
Full-text available
Over the years, the evolution of face recognition (FR) algorithms has been steep and accelerated by a myriad of factors. Motivated by the unexpected elements found in real-world scenarios, researchers have investigated and developed a number of methods for occluded face recognition (OFR). However, due to the SarS-Cov2 pandemic, masked face recognition (MFR) research branched from OFR and became a hot and urgent research challenge. Due to time and data constraints, these models followed different and novel approaches to handle lower face occlusions, i.e., face masks. Hence, this study aims to evaluate the different approaches followed for both MFR and OFR, find linked details about the two conceptually similar research directions and understand future directions for both topics. For this analysis, several occluded and face recognition algorithms from the literature are studied. First, they are evaluated in the task that they were trained on, but also on the other. These methods were picked accordingly to the novelty of their approach, proven state-of-the-art results, and publicly available source code. We present quantitative results on 4 occluded and 5 masked FR datasets, and a qualitative analysis of several MFR and OFR models on the Occ-LFW dataset. The analysis presented, sustain the interoperable deployability of MFR methods on OFR datasets, when the occlusions are of a reasonable size. Thus, solutions proposed for MFR can be effectively deployed for general OFR.
Full-text available
Health organizations advise social distancing, wearing face mask, and avoiding touching face to prevent the spread of coronavirus. Based on these protective measures, we developed a computer vision system to help prevent the transmission of COVID-19. Specifically, the developed system performs face mask detection, face-hand interaction detection, and measures social distance. To train and evaluate the developed system, we collected and annotated images that represent face mask usage and face-hand interaction in the real world. Besides assessing the performance of the developed system on our own datasets, we also tested it on existing datasets in the literature without performing any adaptation on them. In addition, we proposed a module to track social distance between people. Experimental results indicate that our datasets represent the real-world’s diversity well. The proposed system achieved very high performance and generalization capacity for face mask usage detection, face-hand interaction detection, and measuring social distance in a real-world scenario on unseen data. The datasets are available at .
Full-text available
Due to the global spread of the Covid-19 virus and its variants, new needs and problems have emerged during the pandemic that deeply affects our lives. Wearing masks as the most effective measure to prevent the spread and transmission of the virus has brought various security vulnerabilities. Today we are going through times when wearing a mask is part of our lives, thus, it is very important to identify individuals who violate this rule. Besides, this pandemic makes the traditional biometric authentication systems less effective in many cases such as facial security checks, gated community access control, and facial attendance. So far, in the area of masked face recognition, a small number of contributions have been accomplished. It is definitely imperative to enhance the recognition performance of the traditional face recognition methods on masked faces. Existing masked face recognition approaches are mostly performed based on deep learning models that require plenty of samples. Nevertheless, there are not enough image datasets containing a masked face. As such, the main objective of this study is to identify individuals who do not use masks or use them incorrectly and to verify their identity by building a masked face dataset. On this basis, a novel real-time masked detection service and face recognition mobile application was developed based on an ensemble of fine-tuned lightweight deep Convolutional Neural Networks (CNN). The proposed model achieves 90.40% validation accuracy using 12 individuals’ 1849 face samples. Experiments on the five datasets built in this research demonstrate that the proposed system notably enhances the performance of masked face recognition compared to the other state-of-the-art approaches.
Full-text available
Since early 2020, the COVID-19 pandemic has had a considerable impact on many aspects of daily life. A range of different measures have been implemented worldwide to reduce the rate of new infections and to manage the pressure on national health services. A primary strategy has been to reduce gatherings and the potential for transmission through the prioritisation of remote working and education. Enhanced hand hygiene and the use of facial masks have decreased the spread of pathogens when gatherings are unavoidable. These particular measures present challenges for reliable biometric recognition, e.g. for facial-, voice-and hand-based biometrics. At the same time, new challenges create new opportunities and research directions, e.g. renewed interest in non-constrained iris or periocular recognition, touch-less fingerprint-and vein-based authentication and the use of biometric characteristics for disease detection. This article presents an overview of the research carried out to address those challenges and emerging opportunities.
Full-text available
The recent COVID‐19 pandemic has increased the focus on hygienic and contactless identity verification methods. However, the pandemic led to the wide use of face masks, essential to keep the pandemic under control. The effect of wearing a mask on face recognition (FR) in a collaborative environment is a currently sensitive yet understudied issue. Recent reports have tackled this by evaluating the masked probe effect on the performance of automatic FR solutions. However, such solutions can fail in certain processes, leading to the verification task being performed by a human expert. This work provides a joint evaluation and in‐depth analyses of the face verification performance of human experts in comparison to state‐of‐the‐art automatic FR solutions. This involves an extensive evaluation by human experts and 4 automatic recognition solutions. The study concludes with a set of take‐home messages on different aspects of the correlation between the verification behaviour of humans and machines.
Full-text available
Face recognition is an essential technology in our daily lives as a contactless and convenient method of accurate identity verification. Processes such as secure login to electronic devices or identity verification at automatic border control gates are increasingly dependent on such technologies. The recent COVID-19 pandemic has increased the focus on hygienic and contactless identity verification methods. The pandemic has led to the wide use of face masks, essential to keep the pandemic under control. The effect of mask-wearing on face recognition in a collaborative environment is currently a sensitive yet understudied issue. Recent reports have tackled this by using face images with synthetic mask-like face occlusions without exclusively assessing how representative they are of real face masks. These issues are addressed by presenting a specifically collected database containing three sessions, each with three different capture instructions, to simulate real use cases. The data are augmented to include previously used synthetic mask occlusions. Further studied is the effect of masked face probes on the behaviour of four face recognition systems-three academic and one commercial. This study evaluates both masked-to-non-masked and masked-to-masked face comparisons. In addition, real masks in the database are compared with simulated masks to determine their comparative effects on face recognition performance.
Full-text available
The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and (iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community.
Full-text available
This paper studies the impact of lightweight face models on real applications. Lightweight architectures proposed for face recognition are analyzed and evaluated on different scenarios. In particular, we evaluate the performance of five recent lightweight architectures on five face recognition scenarios: image and video based face recognition, cross-factor and heterogeneous face recognition, as well as active authentication on mobile devices. In addition, we show the lacks of using common lightweight models unchanged for specific face recognition tasks, by assessing the performance of the original lightweight versions of the lightweight face models considered in our study. We also show that the inference time on different devices and the computational requirements of the lightweight architectures allows their use on real-time applications or computationally limited platforms. In summary, this paper can serve as a baseline in order to select lightweight face architectures depending on the practical application at hand. Besides, it provides some insights about the remaining challenges and possible future research topics.
Using the face as a biometric identity trait is motivated by the contactless nature of the capture process and the high accuracy of the recognition algorithms. After the current COVID-19 pandemic, wearing a face mask has been imposed in public places to keep the pandemic under control. However, face occlusion due to wearing a mask presents an emerging challenge for face recognition systems. In this paper, we present a solution to improve masked face recognition performance. Specifically, we propose the Embedding Unmasking Model (EUM) operated on top of existing face recognition models. We also propose a novel loss function, the Self-restrained Triplet (SRT), which enabled the EUM to produce embeddings similar to these of unmasked faces of the same identities. The achieved evaluation results on three face recognition models, two real masked datasets, and two synthetically generated masked face datasets proved that our proposed approach significantly improves the performance in most experimental settings.
Face masks have become one of the main methods for reducing the transmission of COVID-19. This makes face recognition (FR) a challenging task because masks hide several discriminative features of faces. Moreover, face presentation attack detection (PAD) is crucial to ensure the security of FR systems. In contrast to the growing number of masked FR studies, the impact of face masked attacks on PAD has not been explored. Therefore, we present novel attacks with real face masks placed on presentations and attacks with subjects wearing masks to reflect the current real-world situation. Furthermore, this study investigates the effect of masked attacks on PAD performance by using seven state-of-the-art PAD algorithms under different experimental settings. We also evaluate the vulnerability of FR systems to masked attacks. The experiments show that real masked attacks pose a serious threat to the operation and security of FR systems.