Content uploaded by Ruben Tolosana
Author content
All content in this area was uploaded by Ruben Tolosana on Apr 04, 2022
Content may be subject to copyright.
Towards Fingerprint Presentation Attack Detection
Based on Convolutional Neural Networks and Short
Wave Infrared Imaging
Ruben Tolosana∗, Marta Gomez-Barrero†, Jascha Kolberg†, Aythami Morales∗,
Christoph Busch†and Javier Ortega-Garcia∗
∗BiDA Lab - Biometrics and Data Pattern Analytics, Universidad Autonoma de Madrid, Spain
Email: (ruben.tolosana, aythami.morales, javier.ortega)@uam.es
†da/sec - Biometrics and Internet Security Research Group, Hochschule Darmstadt, Germany
Email: (marta.gomez-barrero, jascha.kolberg, christoph.busch)@h-da.de
Abstract—Biometric recognition offers many advantages over
traditional authentication methods, but they are also vulnerable
to, for instance, presentation attacks. These refer to the presenta-
tion of artifacts, such as facial pictures or gummy fingers, to the
biometric capture device, with the aim of impersonating another
person or to avoid being recognised. As such, they challenge
the security of biometric systems and must be prevented. In
this paper, we present a new fingerprint presentation attack
detection method based on convolutional neural networks and
multi-spectral images extracted from the finger in the short wave
infrared spectrum. The experimental evaluation, carried out on
an initial small database but comprising different materials for
the fabrication of the artifacts and including unknown attacks
for testing, shows promising results: all samples were correctly
classified.
Keywords—Presentation attack detection; biometrics; finger-
print; SWIR; CNN
I. INTRODUCTION
Deep Learning (DL) has become a thriving topic in the
last years [1], allowing computers to learn from experience
and understand the world in terms of a hierarchy of simpler
units. This way, DL has enabled significant advances in
complex domains such as natural language processing [2] and
computer vision [3], among many others. The main reasons to
understand its high deployment lie on the increasing amount
of available data, which thereby allows the succesfull training
of deep architectures. These can in turn outperform other
traditional machine learning techniques. However, the belief
that DL architectures can be only used for those tasks with
massive amounts of available data is changing thanks to the
development of, for instance, pre-trained models. This concept
refers to network models that are trained for a given task with
large available databases, and then are retrained (a.k.a. fine-
tuned, adapted) for a different task for which data are usually
scarce. All these advances have allowed the deployment of
DL architectures in many different fields, such as biometric
recognition [4], [5].
Biometrics refers to automated recognition of individu-
als based on their biological (e.g., iris or fingerprint) or
behavioural (e.g., signature or voice) characteristics. Even
if biometric recognition systems offer numerous advantages
over traditional authentication methods (e.g., they provide a
stronger link between subject and identity, and they cannot be
lost or forgotten), they are also vulnerable to external attacks.
Among all possible attack points [6], the biometric capture
device is probably the most exposed one: no further knowledge
about the inner functioning of the system is required to launch
an attack. Such attacks are known in the ISO/IEC IS 30107 [6]
as presentation attacks (PA), and refer to the presentation to
the capture device of a presentation attack instrument (PAI),
such as a fingerprint overlay, in order to interfere with its
intended behaviour.
To be able to prevent PAs, techniques able to automatically
distinguish between bona fide (i.e., real or live) presentations
and access attempts carried out by means of PAIs must be
developed [7]. They are known as presentation attack detection
(PAD) methods. A considerable attention has been directed
to the development of efficient PAD approaches within the
last decade for several biometric characteristics, including iris
[8], fingerprint [9], or face [10]. In particular, within the
DL community, Convolutional Neural Networks (CNNs) have
been used for fingerprint PAD purposes, based either on the
complete fingerprint samples [11], [12] or on a patch-wise
manner [13], [14]. In addition, a Deep Belief Network (DBN)
system with multiple layers of Restricted Boltzmann Machines
(RBMs) was used in [15] also for fingerprint PAD. A more
general approach was tested on face, iris, and fingerprint data
in [16].
All the aforementioned DL PAD approaches achieve accu-
racy detection rates over 90% on the freely available LivDet
[17] and ATVS-FFp databases [18]. Such high accuracy rates
indicate not only the valuable contributions of the authors
but also that other databases, comprising a larger number of
materials for the fabrication of the PAIs, should be explored.
However, one question remains unanswered: will unknown
attacks also be detected? From the aforementioned works,
only Chugh et al. considered a wider database comprising
over twelve different PAI fabrication materials in [14]. Part
of the materials were used as unknown attacks, showing that
©2018 Gesellschaft für Informatik e.V., Bonn, Germany
Authorized licensed use limited to: Universidad Autonoma de Madrid. Downloaded on April 04,2022 at 14:20:01 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c) (d) (e) (f) (g) (h) (i)
Fig. 1: (a) Bona fide sample captured at 1200 nm, and cropped ROIs corresponding to: (b) to (e) a bona fide sample, and (f)
to (i) a silicone PAI, for the selected wavelengths.
the error rates were multiplied up to six times with respect to
the evaluation carried out on known attacks. Therefore, some
more research is needed in this area.
To further tackle these issues with unknown attacks, some
researchers have started considering other sources of informa-
tion different of the traditional fingerprint capture devices [8],
[9]. More specifically, the use of multi-spectral infrared tech-
nologies has been studied for face [19] and fingerprint [20],
[21]. More recently, the characteristic remission properties of
the human skin for multi-spectral Short Wave Infrared (SWIR)
wavelengths was exploited in [19] for facial PAD, achieving
a 99% detection accuracy.
In this context, we propose a fingerprint PAD method based
on CNNs and multi-spectral SWIR finger samples captured in
the range 1200 nm – 1550 nm. To the best of our knowledge,
this is the first study exploring the potential of SWIR imaging
and CNNs for fingerprint PAD, on a small database in terms
of samples but considering a wide variety of PAI species
(i.e., both complete thick gummy fingers and more challenging
overlays). We successfully detected all of them. It should also
be highlighted that only six PAIs were used for training, thus
being able to test the remaining six PAIs as unknown attacks
(i.e., attacks not seen previously by the classifier, thereby
representing a bigger challenge and a better representation of
a real-world scenario).
The rest of the article is organised as follows. The SWIR
sensor and fingerprint PAD method proposed are described in
Sect. II. Sect. III presents the experimental protocol and the
results obtained in this work. Final conclusions are drawn in
Sect. IV.
II. PROPOSED PRESENTATION ATTACK DETECTION
METHOD
A. Short Wave Infrared (SWIR) Imaging
The capture device comprises two sensors for SWIR and
visible spectrum (VIS) wavelenghts, which are placed next
to each other inside a closed box. Next to them, the LEDs
for the corresponding wavelengths illuminate the finger. The
box includes an open slot on the top where the user stands
the finger during the acquisition. When the finger is placed
there, all ambient light is blocked and thus only the desired
wavelengths are used for the acquisition. In particular, we have
used a Hamamatsu InGaAs SWIR sensor array, which captures
images of 64×64 pixels, with a 25 mm fixed focal length lens
optimised for wavelengths within 900 – 1700 nm. We have
considered the following SWIR wavelengths: 1200 nm, 1300
nm, 1450 nm, and 1550 nm, similar to the ones considered
in [19] for the skin vs. non-skin facial classification. Fig. 1a
shows the acquired image of a bona fide sample for the 1200
nm wavelength. In addition, and although is out of the scope
of this work, fingerprint verification can be carried out with
contactless finger photos acquired in the visible spectrum with
a 1.3 MP camera and a 35 mm VIS-NIR lens.
In order to utilise only foreground finger information, a
preprocessing stage is first applied to the original image (Fig.
1a) so as to select the region of interest (ROI), corresponding
to the open slot where the finger is placed. The ROIs of the
bona fide sample for all SWIR wavelengths, with a size of
18 ×58 px, are depicted from Fig. 1b to 1e.
Finally, Fig. 1f to 1i also shows a silicone PAI. Some
differences may be observed if we compare the images to
those captured from a bona fide presentation: whereas for the
bona fide, the images show a decrease in the intensity value
for bigger wavelengths, this is not the case for the PAI. Such
trend will be hence exploited by the PAD method.
B. Convolutional Neural Networks (CNNs)
CNNs have been one of the most successful network
architectures in the last years. Some of their key design
principles were drawn from the findings of the Neurophysiol-
ogists Nobel Prizes David Hubel and Torsten Wiesel in the
field of human vision [1]. CNN-based systems are mainly
composed of convolutional and pooling layers. The former
extracts patterns from the images through the application
of several convolutions in parallel to local regions of the
images. These convolutional operations are carried out by
means of different kernels (adapted by the learning algorithm)
that assign a weight to each pixel of the local region of
the image depending on the type of patterns to be extracted.
Therefore, each kernel of one convolutional layer is focused
on extracting different patterns such as horizontal or vertical
edges. The output of these operations produces a set of linear
activations (a.k.a. feature map) that serve as input to nonlinear
activations such as the rectified linear activation function
(ReLU). Finally, it is common to use pooling layers to make
the representation invariant to small translations of the input.
The pooling function replaces the output of the net at a certain
Authorized licensed use limited to: Universidad Autonoma de Madrid. Downloaded on April 04,2022 at 14:20:01 UTC from IEEE Xplore. Restrictions apply.
"8?9
4@
A3B3
A3B/
"83/C
4@
A/B3
A/B/
"8/>?
4@
AB3
AB/
AB
AB9
"8>3/
4@
A9B3
A9B/
A9B
A9B9
!
"8>3/
4@
A>B3
A>B/
A>B
A>B9
.5"6 .5"6 .5"6 .5"6
$%!
D4/
*A4);
$%!
D43
*A46!
&631,5!$ *!"!$
Fig. 2: Architecture of our Proposed CNN-based system for fingerprint PAD. FS denotes the filter size of the kernels.
region with a statistical summary of the nearby outputs. For
instance, the max-pooling function selects the maximum value
of the region.
Since, to the best of our knowledge, there are no public
databases of SWIR finger images, the available data is not
enough to train the entire CNN from scratch. Therefore,
we propose a combination of CNN pre-trained models and
fine-tuning. Fine-tuning techniques have a two-fold objective,
namely: i)replace and retrain the classifier (i.e., the fully-
connected layers) of the pre-trained model to our specific task,
and ii)adapt the weights of all or some of the convolutional
layers. In particular, we have used the VGG19 pre-trained
model [22], which achieved the second place in the classifica-
tion task of the ImageNet 2014 challenge with a total of 1,000
classes such as animals, vehicles, etc. This model comprises a
total of 16 convolutional layers and 3 fully-connected layers,
and has been modified for the specific task of fingerprint PAD.
Fig. 2 shows the final architecture of the proposed system.
The input of the network is an RGB image where each
dimension consists of SWIR images captured at different
wavelengths or combinations of them. In order to optimise
the input, an exhaustive analysis was carried out using a
development dataset to minimise the intra-class variability of
the bona fide class, and at the same time maximise the inter
class variability between bona fide and PA samples. The best
combination found was: i)1550 nm, ii)1450 nm, and iii)a
combination of both wavelengths.
Then, given the data scarcity and the small input size,
we have reduced the complexity of the original VGG19 pre-
trained model by eliminating one of the 3 fully-connected
layers. In addition, the number of neurons of the first fully-
connected layer is reduced to 32 instead of 512. Regarding
the retraining of the VGG19 model, as depicted in Fig. 2, the
first 4 convolutional blocks of the CNN network are frozen,
whereas the weights of the last convolutional block and fully-
connected layers are adapted to the fingerprint PAD task (see
vertical line separating the two groups of layers). The reason
behind this fine-tuning technique lies on the fact that the
first layers of the CNN extract more general features related
to directional edges and colours, whereas last layers of the
network are in charge of extracting more abstract features
related to the specific task.
Finally, the softmax classification layer of the original
VGG19 pre-trained model is replaced for a sigmoid activation
layer in order to provide output scores between 0 (bona fides)
and 100 (PAs), as required in the ISO/IEC 30107-3 on PAD
evaluation and reporting [23].
III. EXPERIMENTAL EVALUATION
The proposed CNN-based system is implemented under
the Keras framework using Tensorflow as back-end, with a
NVIDIA GeForce GTX 1080 GPU. For the fine-tuning of the
layers we consider Adam optimizer with a learning rate value
of 0.001 and a loss function based on binary cross-entropy.
In the next sections, we describe the experimental protocol
followed and discuss the results obtained.
A. Database and Experimental Protocol
The selection of the PAI fabrication materials is based on
the requirements of IARPA ODIN program evaluation, cov-
ering the most challenging PAIs. In particular, the following
twelve different PAIs are considered in the experiments: 3D
printed fingerprint and 3D printed fingerprint coated with
silver paint to mimic the conductive properties of the skin;
fingers fabricated with blue and green wax, gelatine, playdoh,
silly putty, and silicone; overlays fabricated with dragon skin
and urethane; and fingerprints printed on regular matte paper
and on transparency paper. The bona fide samples have been
captured from seven out of ten fingers in order to increase
the variability. For each bona fide and PAI, between one and
four samples have been acquired. This database was captured
by our BATL project partners at the University of South
California.
Authorized licensed use limited to: Universidad Autonoma de Madrid. Downloaded on April 04,2022 at 14:20:01 UTC from IEEE Xplore. Restrictions apply.
(SRFK
/RVV
7UDLQLQJGDWDVHW
9DOLGDWLRQGDWDVHW
Fig. 3: Evolution of the loss function with the number of
epochs for the development datasets. Our selected CNN model
is indicated using a small green vertical line.
In order to perform a clear analysis of our proposed ap-
proach, the database has been divided into development and
test datasets. Moreover, the development dataset is divided into
training and validation datasets that are used for selecting the
best configuration of our proposed CNN-based system and
adjusting the weights of the final layers of the network. The
training dataset comprises 6 bona fides and 6 PAI samples.
For the PAIs, we have chosen one sample of dragon skin,
blue wax, gelatine, playdoh, silicon, and printed fingerprint.
Regarding the validation dataset, a total of 3 bona fides and
3 PAI samples are considered. In this case, only blue wax,
playdoh and printed fingerprint PAIs are selected. Finally, the
test dataset considered for the final evaluation of the system
includes the samples not seen for the network during the
development stage (4 bona fides and 38 PAs). It is important to
remark that only six out of twelve available PAIs are used for
the development of the proposed method in order to evaluate
the robustness of our system to new types of PAIs that can
arise (i.e., unknown attacks).
Finally, in compliance with the ISO/IEC IS 30107-3 on
Biometric presentation attack detection - Part 3: Testing and
Reporting [23], the following metrics are used to evaluate
the performance of the PAD method: i)Attack Presentation
Classification Error Rate (APCER), or percentage of attack
presentations wrongly classified as bona fide presentations;
and ii)Bona Fide Presentation Classification Error Rate
(BPCER), or percentage of bona fide presentations wrongly
classified as presentation attacks.
B. Results
Fig. 3 shows the evolution of the loss function with the
number of epochs for the development datasets. It is important
to remark that very similar loss values are obtained for
both training and validation datasets along all the epochs,
thus showing the robustness of the features extracted by the
CNN. The proposed CNN model is selected after 16 epochs,
providing a final loss value of 0 for both training and validation
datasets.
Then, the fingerprint PAD method is evaluated using new
samples from the test dataset. It is worth noting that these
samples have not been used during the development of the
system, thus yielding unbiased results. The proposed approach
achieves final values of 0% APCER and BPCER, proving the
success of considering fine-tuning techniques over pre-trained
model even with small amounts of data.
Finally, it should be highlighted that all new types of PAIs
(e.g. green wax, urethane, or silly putty), which were not
considered during the development of our proposed CNN-
based system, are correctly detected as PAs. This means
that the classifier is robust to all previously unseen attacks
considered, thereby proving the soundness of the approach.
We may thus conclude that the proposed SWIR sensor and
fingerprint PAD method are able to detect unknown attacks.
IV. CONCLUSIONS
We have presented a novel fingerprint presentation attack
detection approach based on CNNs and SWIR multi-spectral
images. Based on an exhaustive analysis of the intra- and inter-
class variability, two SWIR wavelengths and their combination
were selected as input for the network.
The experimental evaluation yields a BPCER = 0% (i.e.,
highly convenient system) and at the same time APCER = 0%
(i.e, highly secure). In fact, even unknown attacks are correctly
detected, thereby showing the promising performance of the
proposed method, in spite of the small training set (six bona
fides and six PAIs). This is partly due to the use of pre-trained
CNN models.
As future work lines, we will acquire a bigger database,
comprising more PAIs and more bona fide samples, in order
to further test the performance of the algorithm for both known
and unknown attacks.
ACKNOWLEDGMENTS
This research is based upon work supported in part by
the Office of the Director of National Intelligence (ODNI),
Intelligence Advanced Research Projects Activity (IARPA)
under contract number 2017-17020200005. The views and
conclusions contained herein are those of the authors and
should not be interpreted as necessarily representing the offi-
cial policies, either expressed or implied, of ODNI, IARPA,
or the U.S. Government. The U.S. Government is authorized
to reproduce and distribute reprints for governmental purposes
notwithstanding any copyright annotation therein. This work
was carried out during an internship of R. Tolosana at da/sec.
R. Tolosana is supported by a FPU Fellowship from Spanish
MECD.
REFERENCES
[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,
2016.
[2] I. Sutskever, O. Vinyals, and Q.-V. Le, “Sequence to sequence learning
with neural networks,” in Proc. NIPS, 2014.
[3] B. Zhou, A. Khosla et al., “Learning deep features for discriminative
localization,” in Proc. CVPR, 2016.
[4] A. Rattani and R. Derakhshani, “On fine-tuning convolutional neural
networks for smartphone based ocular recognition,” in Proc. IJCB, 2017.
Authorized licensed use limited to: Universidad Autonoma de Madrid. Downloaded on April 04,2022 at 14:20:01 UTC from IEEE Xplore. Restrictions apply.
[5] R. Tolosana, R. Vera-Rodriguez et al., “Exploring recurrent neural
networks for on-line handwritten signature biometrics,” IEEE Access,
pp. 1 – 11, 2018.
[6] ISO/IEC JTC1 SC37 Biometrics, ISO/IEC 30107-1. Information Tech-
nology - Biometric presentation attack detection - Part 1: Framework,
ISO, 2016.
[7] S. Marcel, M. Nixon, and S.-Z. Li, Eds., Handbook of Biometric Anti-
Spoofing. Springer, 2014.
[8] J. Galbally and M. Gomez-Barrero, “Presentation attack detection in
iris recognition,” in Iris and Periocular Biometrics, C. Busch and
C. Rathgeb, Eds. IET, Aug. 2017.
[9] C. Sousedik and C. Busch, “Presentation attack detection methods for
fingerprint recognition systems: a survey,” IET Biometrics, vol. 3, no. 4,
pp. 219–233, 2014.
[10] J. Galbally, S. Marcel, and J. Fierrez, “Biometric antispoofing methods:
A survey in face recognition,” IEEE Access, vol. 2, pp. 1530–1552,
2014.
[11] R.-F. Nogueira, R. de Alencar Lotufo, and R. C. Machado, “Fingerprint
liveness detection using convolutional neural networks,” IEEE TIFS,
vol. 11, no. 6, pp. 1206–1213, 2016.
[12] H.-U. Jang, H.-Y. Choi et al., “Fingerprint spoof detection using contrast
enhancement and convolutional neural networks,” in Proc. ICISA, 2017,
pp. 331–338.
[13] A. Toosi, S. Cumani, and A. Bottino, “CNN patch-based voting for
fingerprint liveness detection,” in Proc. IJCCI, 2017.
[14] T. Chugh, K. Cao, and A.-K. Jain, “Fingerprint spoof buster: Use of
minutiae-centered patches,” IEEE TIFS, vol. 13, no. 9, pp. 2190–2202,
2018.
[15] S. Kim, B. Park et al., “Deep belief network based statistical feature
learning for fingerprint liveness detection,” Pattern Recognition Letters,
vol. 77, pp. 58–65, 2016.
[16] D. Menotti, G. Chiachia et al., “Deep representations for iris, face, and
fingerprint spoofing detection,” IEEE TIFS, vol. 10, no. 4, pp. 864–879,
2015.
[17] “Livdet - liveness detection competitions,” 2009–2017. [Online].
Available: http://livdet.org/
[18] J. Galbally, J. Fierrez et al., “Evaluation of direct attacks to fingerprint
verification systems,” Telecommunication Systems, vol. 47, no. 3-4, pp.
243–254, 2011.
[19] H. Steiner, S. Sporrer et al., “Design of an active multispectral SWIR
camera system for skin detection and face verification,” Journal of
Sensors, vol. 2016, 2016.
[20] R.-K. Rowe, K.-A. Nixon, and P.-W. Butler, Multispectral Fingerprint
Image Acquisition. Springer London, 2008, pp. 3–23.
[21] S. Chang, K. Larin et al., “Fingerprint spoof detection by NIR optical
analysis,” in State of the Art in Biometrics. InTech, 2011, pp. 57–84.
[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in Proc. ICLR, 2015.
[23] ISO/IEC JTC1 SC37 Biometrics, ISO/IEC IS 30107-3. Information
Technology - Biometric presentation attack detection - Part 3: Testing
and Reporting, ISO, 2017.
Authorized licensed use limited to: Universidad Autonoma de Madrid. Downloaded on April 04,2022 at 14:20:01 UTC from IEEE Xplore. Restrictions apply.