Conference PaperPDF Available

Evolution and evaluation of biometric systems

Authors:

Abstract and Figures

Biometric systems have evolved significantly over the past years: from single-sample fully-controlled verification matchers to a wide range of multi-sample multi-modal fully-automated person recognition systems working in a diverse range of unconstrained environments and behaviors. The methodology for biometric system evaluation however has remained practically unchanged, still being largely limited to reporting false match and non-match rates only and the tradeoff curves based thereon. Such methodology may no longer be sufficient and appropriate for investigating the performance of state-of-the-art systems. This paper addresses this gap by establishing taxonomy of biometric systems and proposing a baseline methodology that can be applied to the majority of contemporary biometric systems to obtain an all-inclusive description of their performance. In doing that, a novel concept of multi-order performance analysis is introduced and the results obtained from a large-scale iris biometric system examination are presented.
Content may be subject to copyright.
Abstract— Biometric systems have evolved significantly over
the past years: from single-sample fully-controlled verification
matchers to a wide range of multi-sample multi-modal fully-
automated person recognition systems working in a diverse
range of unconstrained environments and behaviors. The
methodology for biometric system evaluation however has
remained practically unchanged, still being largely limited to
reporting false match and non-match rates only and the trade-
off curves based thereon. Such methodology may no longer be
sufficient and appropriate for investigating the performance of
state-of-the-art systems. This paper addresses this gap by
establishing taxonomy of biometric systems and proposing a
baseline methodology that can be applied to the majority of
contemporary biometric systems to obtain an all-inclusive
description of their performance. In doing that, a novel concept
of multi-order performance analysis is introduced and the
results obtained from a large-scale iris biometric system
examination are presented.
I. I
NTRODUCTION
An organization that deploys or plans to deploy a
biometric system needs to know how well the system
performs and what factors affect its performance so that
proper system selection or setup adjustments can be made.
The only way to acquire such knowledge is through
evaluation, which is the procedure that involves testing of a
system on a database and/or in a specific setup for the
purpose of obtaining measurable statistics that can be used
to compare systems or setups to one another.
Biometric systems have evolved significantly over the
years and are now applied in a wide variety of applications
and scenarios. It is therefore understood that what is good
for one application or scenario may not be as good for
another, and, as a consequence, the evaluation procedure
may have to be different for different applications and
scenarios. In this paper, such differences are examined
through establishing a taxonomy of biometric systems,
including the definition of key concepts related to biometric
system performance (Section II) and tracing the evolution of
biometric systems (Section III). The limitations of the
conventional biometrics evaluation methodologies are then
examined (Section IV) and a new all-inclusive evaluation
framework is proposed (Section V), followed by the
Dmitry O. Gorodnichy is Senior Research Scientist with the Video
Surveillance and Biometric Technologies group of the Laboratory and
Scientific Services Directorate of the Canada Border Services Agency, 79
Bentley, Ottawa, ON, K2E 6T7, Canada (Tel: (613) 954-3785, e-mail:
dmitry.gorodnichy@cbsa-asfc.gc.ca).
*
Appeared in, Proceedings of Second IEEE Symposium on
Computational Intelligence for Security and Defense Applications. Ottawa,
Canada, 9-10 July 2009.
presentation of a novel multi-order performance analysis
approach, which is the main contribution of the proposed
framework (Section VI).
II. T
ERMINOLOGY AND CRITICAL CONCEPTS
Although there have been many books and recently
several standards written defining key biometrics concepts,
below we cite and redefine those of them that are most
important in the context of the current presentation. Several
new definitions are also introduced.
A. Biometrics as Image Recognition
We start from the definition of a biometric system that will
help us to define the taxonomy quantifiers for biometric
systems and to appreciate the fact, which should be always
kept in mind while conducting an evaluation of a biometric
system, that biometric solutions are derived from two main
research areas: 1) Image Processing (IP), which is a part of
computer science that deals with the extraction of numerals
from imagery data, and 2) Pattern Recognition (PR), which
is a part of statistical machine learning theory that can match
numerals to one another.
Definition: Biometrics is an automated technique of
measuring a physical characteristic (biometric data) of a
person for the purpose of recognizing him/her.
The importance is given to the word "automated", which
implies that all steps involved in the recognition process are
done by a computer, and to the fact that the word
"recognition" is used in general terms here.
This definition also defines two components that make a
biometric system: Capture component, where "measuring"
of a trait is done through an image/video/signal capture
device, and Recognition component, which is a recognition
software that performs analysis and matching of
measurements.
In order to not confuse biometric raw data, which in the
case of image-based biometrics are raw images, with
biometric templates derived from the raw images by means
of IP techniques and to highlight the two stages of biometric
deployment we also use the following definitions:
Definition: Enrolled data are biometric images that are
stored in the system at the Enrollment stage for the purpose
of being matched upon later. Passage (or Test) data are
new biometric images that are presented to the system at the
Recognition stage for the purpose of being recognized. A
Dmitry O. Gorodnichy
Evolution and evaluation of biometric systems
*
single piece of data is referred to as a sample or image.
Note that Enrolled data are often of better quality than
Passage data, due to the fact that enrollment happens only
once and is therefore a well guided and controlled process.
B. Operational Biometric Recognition tasks
From the operational point of view, we can see that there
are five operational recognition tasks for which a biometric
system can be applied within an organization. These tasks
vary significantly in their biometric data acquisition
procedures, error costs and error mitigation strategies, as
summarized below:
1. Verification, also referred to as authentication or 1 to 1
recognition, as when verifying ATM clients or Restricted
Access Area officers using a bank or Access Card.
2. Identification, or 1 to N recognition (N is often large or
can grow), or positive (or "White list") identification, as
when identifying a pre-registered individual from a watch
list, where a test sample is compared against all individuals
in a database and the best match (or the best k matches) are
selected to identify a person.
3. Screening, or negative (or "Black list") identification,
which is a special case of 1 to M recognition (M is normally
not large and fixed), as when monitoring traffic of people for
the purpose of identifying criminals in it.
4. Classification, or categorization, is a special case of 1
to K recognition (K is small and fixed), where a person is
recognized as belonging to a) one of the limited number of
classes such as person's gender, race or various medical
genetic condition, which can be used as soft biometrics, or b)
one of limited number of identities as used in automated
annotation (tagging) of people in teleconferences or video
stream(s).
5. Similarity quantifier, which is a special case of
verification used in Forensic document investigation, in
which both (or more) images to be compared are presented
to a system at the same time and/or in which a biometric
system is used to provide the comparative measurements
rather than a final recognition decision so that a human
analyst will make the final recognition decision himself.
A match obtained in verification and positive identification
tasks may be no longer questioned. On the other hand, the
match result obtained in Screening or Classification would
normally be further processed or investigated and, in many
cases, also combined with other recognition data available
about the person.
It is also understood that for verification and positive
identification tasks a false non-match has much less negative
impact/cost (“inconvenience”) than a false match (“security
breach”), whereas for negative identification tasks this is the
opposite.
C. Operational Biometric modality characteristics
Based on the type of the operational recognition task, an
organization may impose certain requirements on the
biometric modality used by a biometric system, in particular
with respect to the following modality characteristics [5]:
1. Universality: each person should have the trait.
2. Uniqueness: how well biometrics separates individuals
from one another.
3. Permanence: measures how well a biometric resists
aging, fatigue etc.
4. Performance: accuracy, speed, and robustness of
technology used.
5. Collectability: ease of acquisition for measurement.
6. Acceptability: degree of public approval of technology.
A trade-off between Performance and Acceptability is
normally observed as illustrated in Figure 1 (from [6]). -
Well performing biometrics systems use biometric data that
are very personal and may therefore be less accepted by the
public or harder to collect. Such biometrics may require
person’s permission and/or cooperation, which is the case
with verification or “White list” identification. On the other
hand, “Black list” identification will likely rely on biometric
data that can be easily collectable from people without their
cooperation, but which, as a result, is less discriminating.
Fig. 1. Performance of different image-based biometric modalities with
respect to different operational modality characteristics.
D. Operational conditions
Based on the recognition task and scenario, several
operational conditions may need to be imposed and/or
expected, such as
1. Overt vs. Covert image capture
2. Cooperative vs. Non-cooperative participant
3. Structured vs. Non-structured (constrained vs. non-
constrained) environment environment-wise (eg.
lighting condition)
4. Structured vs. Non-structured (constrained vs. non-
constrained) environment – procedure-wise.
5. Size of the database: Large vs. small
6. Local vs. Centralized data storage
7. Relative Impact (Cost) of False Match vs. those of False
Non-Match
Note that Conditions 1-4 for the Enrollment stage may be
different from those observed at the Recognition stage.
E. Recognition steps and bottlenecks
In order to understand why biometric recognition may fail
and how to conduct the evaluation, one needs to know how
biometric recognition works. Figure 2 illustrates the
processing steps performed in face recognition (from [6])
and iris recognition systems, which are applicable to most
image-based systems. These steps are:
1. Capture of image(s)
2. Best image(s) selection and enhancement (preprocessing)
3. Biometric region extraction (segmentation)
4. Feature detection and selection: minutia, colour, edges…
5. Computation of template: set of L numbers (0<Xi<MAXi,
i=1...L) corresponding to feature attributes (angles, RGB
values, wavelet coefficients …)
6. Computation of match scores (similarity distances): Sk
7. Recognition decision: based on a statistical rule, the
simplest and most commonly used of which is binary
comparison to a fixed threshold, optionally followed by
its integration / fusion with other data (post-processing).
Error in any of these steps may drastically affect the final
recognition decision of the system. The examples shown in
Figure 2, taken from face and iris recognition systems,
illustrate these steps and some of problems that could occur.
It should be appreciated that solutions to these problems rely
on the techniques from both Image Processing (Steps 1-5)
and Pattern Recognition (Steps 5-7) research.
a)
1
2
345
b)
Fig.2. Recognition steps performed in face biometrics (a) and iris
biometrics (b), and the associated problems that may occur at each step.
III. B
IOMETRICS
E
VOLUTION
A. Evolution towards surveillance
As one examines the evolution of biometrics, one can see
that over the years, as computers become faster and more
automated intelligent processing is done, biometric systems
are increasingly applied to less intrusive, less constrained,
free-flow surveillance-like environments, where biometric
data can be acquired at a distance and possibly in
inconspicuous (covert) manner. As a result, for such systems
to achieve reliable performance, the recognition results may
need to be integrated or fused over time and/or with results
obtained from other biometric systems.
Of a particular interest is the phenomenon of merging
Biometrics and Video Surveillance, illustrated in Figure 2,
and the arrival of such biometric technologies as Biometric
Surveillance, Soft Biometrics and Stand-off Biometrics, also
identified as Biometrics at a Distance, Remote Biometrics,
Biometrics on the Move or Biometrics on the Go, and an
increased demand for Face Recognition from Video, which is
where Biometrics meets Video Surveillance and which is
seen as a golden solution to many operational needs.
Fig. 3. Evolution of Biometric and Video Surveillance systems: towards
each other, with overlap in Face Recognition.
B. Special Interest: Face Recognition
While for humans recognizing a face in a photograph or in
video is natural and easy, computerized face recognition is
very challenging. In fact, automated recognition of faces is
more difficult than recognition of other imagery data such as
iris, vein, or fingerprint images due to the fact that the human
face is a non-rigid 3D object which can be observed at
different angles and which may also be partially occluded. It
is important therefore for an organization interested in using
face recognition systems to know what is possible and what
is not in the area of automated facial recognition as well as to
know how to evaluate such systems.
Based on prior work [6-8], we summarize in Table I the
readiness level of those face recognition technologies that
are most the closest for deployment, and also highlight the
fact that we are far away from the general face recognition as
performed by humans.
TABLE
I
READINESS LEVEL OF FACE RECOGNITION TECHNOLOGIES
5 - ready for deployment, 4 -needs minor R&D, 3 – needs some R&D, …
0 - not ready at all
RL=5
Human-assisted Recognition From Video (not biometrics
per se), where face is automatically extracted from video,
e.g. to be linked with boarding pass or vehicle plate
number or matched with passport photo.
RL=4
Face image and geometry automatically extracted from
video is used together with other modality (eg. Iris)
recognition.
RL=3
Automated Recognition from ICAO-conformed passport
photographs - as good as finger or iris recognition.
RL=3
Automated Recognition From Video only – is possible, if
procedural constraints are imposed (to make video
snapshot image quality closer to that of passport image).
RL=3
Identification in small-size database, as in monitoring
access-restricted areas applications.
RL=0.1
General unconstrained automated face recognition.
In order to know how to conduct evaluation of Face
Recognition systems, one needs to know what makes such
stand-off biometrics so different from other biometrics.
C. Special Interest: Stand-off biometrics
As opposed to other biometrics, in which a person
intentionally comes in contact with a biometric sensor,
stand-off biometrics is applied to a person without his/her
direct engagement with the sensor. In many cases, a person
would not even know where a capture device is located or
whether his/her biometric trait is being captured. As a result,
a single biometric measurement or output of a stand-off
biometrics system is normally much less identifying than that
of other biometrics system. This means two things.
First, it is common for a stand-off biometric system to
have more than one match below the matching threshold, or
to have two or more matches having very close matching
scores.
Second, the final recognition decision of a stand-off
biometric system is not based on a single measurement or
output, but rather on a number of biometric measurements
taken from the same or different sensor, combined together
using some data fusion technique.
This leads us to reconsidering the way the performance
evaluation of biometric systems is done.
IV. B
IOMETRICS EVALUATION
It is well accepted nowadays that biometrics, especially
image-based, will never produce error-free recognition
results. However and most importantly, it is also appreciated
now that, with proper system tuning and setup adjustment,
critical errors of the biometric systems can be minimized to
the level allowed for the operational use.
The insights on system tuning and setup adjustment, as
well as on the selection of the system and risk mitigation
procedures that best suit the operational needs, can only be
obtained through system performance evaluation. However,
the performance evaluation protocols and metric should be
appropriate for the task and scenario to which the systems
are applied.
A. From Door opening to Intelligence gathering
Fostered by end-users’ perception of access control
biometric systems and by the way biometric systems are
marketed by industry, there has been a widespread stereotype
created about biometric systems that they are tools to open a
"door" - either a physical door (to enter a plane or restricted
access area) or a virtual "door" (as in a laptop or a cell
phone). This stereotype creates a simplistic understanding of
how biometric system results are obtained, used and judged
upon. In particular (see Table II), it could be seen that it
creates parallels between two very different technologies: an
intelligence gathering device, which a Biometric System is,
and a Proximity Sensor that is used to open a door (or valve)
in a presence of person.
The most striking similarity between the two technologies
is seen in the way both technologies are evaluated. Indeed, as
one examines current biometric evaluation standards [1,2]
and evaluation reports of various biometric technologies
[3,4], one can find that the way biometric recognition
performance is evaluated and reported is still primarily based
on counting the number of times a "door" has opened
correctly and incorrectly, i.e. using the False Match and
False Non-Match Rates (FMR and FNMR) and the trade-off
curves built thereon.
In the light of the biometrics evolution and its current
applications, which is highlighted in the previous sections,
such evaluation framework may no longer be found
sufficient and/or appropriate. Instead, a new evaluation
framework needs to be developed that allows one to obtain
the all-inclusive description of the performance of a
biometric system based on its place in biometric taxonomy
and all data measured during the run of the system.
TABLE
II
B
IOMETRIC SYSTEMS VS
.
PROXIMITY SENSORS
Biometric systems
(for access control) Proximity sensors
Application
Task Open the “door” for
the person Open the “door” for
a person
Measurements
taken Similarity distance
(match score): S
Distance to a
person: D
Tasks achieved
when S < T when D < T
Calibration
done by computing similarity
distances of genuine
and imposter data
measuring distances
at different ranges
Performance
metric FMR, FNMR
(ROC / DET curves) FMR, FNMR
(ROC / DET curves)
B. Conventional performance evaluation metrics
According to conventional methodology, the following
two binary errors that a system can exhibit are counted:
False Match (FM), also known as False Accept (FA),
False Hit, False Positive or Type I error; and
False Non-Match (FNM) also known as False Reject
(FR), False Miss, False Negative or Type 2 error.
By applying a biometric system on a significantly large
data set, the total number of FA and FR is counted to
compute the cumulative measurements:
False Accept Rate (FAR)
False Reject Rate (FRR) or True Acceptance Rate
(TAR = 1 - FRR), also known as Hit Rate,
at fixed rates of another or as functions of match threshold.
The trade-off curves, also called Figures of Merit, are also
computed such as:
Detection Error Trade-off (DET) curve, which is the
graph of FAR vs. FRR, obtained by varying the system
match threshold, or
Receiver Operator Characteristic (ROC) curve, which
is similar to DET curve, but plots TAR against FAR.
It is important to note that when counting the number of
matches and non-matches, verification match and
identification match are defined differently. In verification,
an image is matched if its matching score is less (or larger)
than a threshold, whereas in identification an image is
matched if its score is the smallest (or largest).
Two additional metrics/curves have been specifically
proposed for Identification systems to address the issue:
Rank-k identification rate (Rk) - the number of times the
correct identity is in the top k most likely candidates.
Cumulative Match Characteristic (CMC) curve, which
plots the rank-k identification rate against k.
These rates/curves still do not offer a complete picture
about the system performance, as they do not provide any
metric to estimate the confidence of the system in its
recognition decision (Step 7 in Figure 2, Section II.E). Nor
can they be used to distinguish False Reject Rate from true
"not-in-the-list" detection rate, if applied to an open dataset.
Additionally, besides recognition measurements, other
system usability factors also have to be evaluated, in order to
see if the conditions/requirements imposed on the systems
operation (Section II.C) are met and to insure that it can be
further customized and upgraded.
We therefore propose a new all-inclusive evaluation
methodology that would allow one to investigate most of the
issues related to the performance of a state-of-the-art system.
V. T
OWARDS ALL
-
INCLUSIVE EVALUATION
A. Hierarchy for generic biometrics evaluation
Table III shows the hierarchy of steps for a general all-
inclusive evaluation of a biometric system, which takes into
account modality suitability, cost, factors and the
performance criteria. Normally, the suitability of the
modality should be evaluated first and prior to making the
decision on a particular biometric solution or product.
TABLE
III
A
LL
-
INCLUSIVE BIOMETRICS EVALUATION
1. Determine suitability of modality (-ies)
2. Determine costs/impact of FM and FNM
3. Determine all factors affecting performance
4. Evaluate performance of market solutions *:
I. wrt all factors that affect the performance
a. On large-scale database (>1000)
b. On Pilot project (in real environment)
II. wrt capability to be integrated / customized
c. Wrt input parameters (pre-processing)
d. Wrt output parameters (post-processing)
B. Factor-driven datasets
There are several datasets publicly available for many
image-based biometrics. Such datasets would be of great
value for any biometric system. It is recommended however
that data presented in those datasets be first analyzed for the
variability of factors in them that may affect the recognition
performance. In many cases such factors are listed along
with dataset description, as it is for face databases. In
particular, a summary of facial dataset sorted out according
to the factors that affect face recognition performance is
prepared in [10].
If the information of dataset images factors is not
available, such information can be obtained through
preprocessing of images with image quality analysis tools,
which are often supplied with biometric systems.
C. Matching vs. Capture evaluation
A single provider may not be the best in the market in
both the capture component of the biometric system and in
the matching component of it. It is therefore recommended
that evaluation be done independently for the capture
components of the biometric system and the matching
components, and that an organization imposes an open-
architecture constraint on systems to be deployed - in order
to insure that they provide access to as many parameters as
possible and allow their integration with other sensors or
system components.
D. Evaluation criteria types (for Matching and Capture)
Evaluation criteria for Matching components are divided
into three types:
Type M0: General questions. These questions, usually
graded Yes/No or Unsure, relate to the abilities and
functionality of the program, rather than to evaluating its
recognition performance.
Type M1: Recognition performance tested on large-
scale production factor-agnostic dataset(s).
Type M2: Recognition performance tested on factor-
specific dataset(s).
Evaluation criteria for Capture components are divided
into two types:
Type C0: General questions, related to functionality,
convenience and ease of use of the Capture module, and
Type C1: Capture performance tested on factor-specific
dataset(s).
Example of C1 criteria questions that identify factors that
affect iris recognition performance is given in Table IV.
TABLE
IV
C
APTURE
C
RITERIA
C1:
R
OBUSTNESS TO
F
ACTORS
(
FOR IRIS RECOGNITION
)
ID # Performance with respect to the following factors:
C1.1a Orientation – Iris
C1.1b Orientation – Camera
C1.2a Iris resolution – in pixels
C1.2b Iris resolution – distance to camera
C1.3 Occlusion
C1.4 Image quality: focus, motion blur
C1.5a Illumination: Light source location (Front, back, side)
C1.5b Illumination: specular reflection (from LED or Lamps)
C1.5c Illumination: brightness / contrast
E. Data preparation, collection and analysis
The core of any matching evaluation is obtaining and
analyzing the recognition matching scores produced by the
system. For comprehensive performance evaluation, the
procedure described in Table V is proposed. This procedure
employs a novel multi-order performance analysis approach,
which is described in more detail in the next section.
The procedure commences from a small-size dataset with a
goal of obtaining a “bird’s-eye view” of the system’s
functionality and to obtain the estimates of the speed and
level of programming effort that is required for each of the
steps defined in the protocol.
The most time consuming step in the procedure is the
computation of all-to-all match scores (Step 2). If for a
given dataset size (N) a system permits computing such
scores within a reasonable amount of time, then the multi-
order analysis of the system performance for this size is
performed. For a reference, Table VI shows the estimate
time needed to perform Encoding and Matching steps for
different dataset sizes, based on testing several iris biometric
systems.
TABLE
V
P
ROTOCOL FOR COMPREHENSIVE PERFORMANCE EVALUATION
OF A BIOMETRIC SYSTEM
Step 0. Data preparation
Select Enrolled and Passage (possibly of lower quality) datasets:
of several sizes (N), eg. 100, 500, 1000, 5000
with K passage images per each enrolled image,
(if possible) corresponding to different factors that affect the
performance
Apply one set at a time for each system (or parameter, or factor),
starting from a smaller set, and measure the time needed for
each of the following steps. Don’t proceed to a larger set, if the
estimated time is over the limit.
Step 1. Encoding (of all images in a Enrolled and Passage sets)
Measure:
Failure to Acquire for Enrolled images (FTA.E)
Failure to Acquire for Passage images (FTA.P)
Image quality numbers
Step 2. Matching (Obtaining scores for ALL available data):
i) using default settings/threshold,
ii) using other possible settings/thresholds
Step 2a. Get match scores for Enrolled set - Imposter tests only
Measure: FAR = #FalseAccepts/(N-FTA.E)
Step 2b.1. Get match scores for Passage set – Genuine tests only
Measure: FRR = #FalseRejects/(N-FTA.P)
Step 2b.2. Get match scores for Passage set – Imposter tests only
Measure: FAR = #FalseAccepts/(N-FTA.P)
Step 3. Multi-order analysis (of ALL obtained scores)
Step 3.a. Order-0 (no Analysis, Visualization only):
Plot Probability Distribution Functions PDF(S) of genuine and
imposter scores (at different increments to highlight trade-off zone)
Step 3.b. Order-1 (conventional) analysis:
Compute/Plot verification rates and curves, where match is defined
when a score is below a threshold:
- FMR, FNMR, DET
Step 3.c. Order-2 analysis:
Compute/Plot Rank-1 identification rates, where match is defined
when it is a Minimal score:
- FMR, FNMR, DET
- distribution of best scores values (optional)
Step 3.d. Order 3 analysis:
Compute /Plot Rank-k (k=2,3,4,>5) identification rates and
distribution of Confidences, defined as below:
1: PDF(S2-S1) of second best score minus best score
2: PDF (N(S<T)) of number of scores less than a threshold
3: PDF(Rk) of identification rank
(Steps 3.c and 3.d can be performed in a single procedure).
Trade-off curves obtained on sets of different sizes are plotted on the same
graph to highlight the tolerance to scalability, with all output dots visible.
TABLE
VI
TIME REQUIRED TO ENCODE AND MATCH DATASETS OF DIFFERENT SIZES
N 100 500 1000 5000 10K 20K 50K
Step 1 5’ 30’ 1h 6h 12h 1d 3d
Step2a .5’-20’ 1’-6h 5’-1d 2h-1w 4h-1m 8h/4m 3d-1y+
Step2b 10’-3h 30’-3d 1h-2w 8h-50w 17h-4y 1.5d-16y 5d-100y
VI. M
ULTI
-
ORDER PERFORMANCE ANALYSIS
The multi-order terminology for the proposed innovative
analysis comes from the analogy with multi-order statistics
terminology, in which order-0 statistics signifies using the
value itself, order-1 statistics signifies computing the average
of several values, and order-2 and order-3 statistics signify
computing the deviation (variance) and high-order statistical
moments.
Similarly, the multi-order biometric performance analysis
framework is defined as an approach that examines the
evaluation of the system at several levels (or orders) of
detail
. This framework defines the conventionally used
performance metrics, such as summarized in Section IV, as
the Order-1 analysis and introduces the concepts of Order-2
and Order-3 analysis defined as follows.
Definition: The Order-1 analysis of the biometric system
performance is the analysis that is based on a single number
output (score) of the system, as when computing verification
match/non-match rates and the error trade-off curves
based on a binary comparison of a single 1-to-1 match score
to a threshold.
Definition: The Order-2 analysis of performance is the
analysis that is based on all scores that can be obtained by
the system for a sample, as when finding the best match
score in 1-to-N identification.
Definition: The Order-3 analysis of the biometric system
performance is based on the relationship between the match
scores obtained by the system for a sample, as when finding
the difference between the best and second-best match
scores or all scores that are lower than a threshold.
Additionally, all statistics and graphical visualization
related to score distributions obtained by the system is
referred to as the Order-0 analysis. Such analysis does not
produce a metric that can be used to quantify the quality of
the system performance. Nevertheless, as demonstrated in
Figure 4, it provides very important insights on how a system
performs and where the performance bottlenecks could be.
The results obtained from the Order-1 analysis are shown
in Figure 5. These are the results that would normally be
found in evaluation reports published to date or that would
be obtained for a product with existing evaluation standards.
It should emphasized that when plotting the Order-1 tradeoff
curves, it is important that points that are used to extrapolate
the curves be shown. The reason is that a system may never
attain certain low levels of FMR or FNMR that are shown on
the curve. This is why it is also very useful to report the
FMR and FNMR curves (as functions of threshold) in
addition to the DET or ROC curves.
Strictly speaking, to follow the analogy with statistics, we should have
called the conventional single-number-based evaluation as the Order-0
analysis, with Order-1 and Order-2 analysis corresponding to Order-1 and
Order-2 statistics. The shift in numbering is due to the introduction of the
Order-0 analysis, which, strictly speaking, is not an analysis but a
visualization of the inner properties of a biometric system.
By highlighting the area of error trade-off and plotting the
curves obtained for different dataset sizes on one graph, one
can investigate the issues related to the scalability of the
system such as an increased number of false rejects and/or
the necessity to modify the match threshold (see Figure 5.b).
Very useful and informative curves of Order-1 could be,
they still do not provide a complete answer on what system is
the best. In particular, a system that has a higher FNMR (for
a fixed FMR) can still be preferable to a system that has a
lower FNMR, if it has better mechanisms to report and deal
with non-confident recognition decisions.
Fig.4. Order-0 analysis visualizes Probability Distribution Functions for
genuine and imposter scores and allows one to spot some problems with the
data or the matching algorithm.
As for Order-2 analysis, it is by definition routinely
performed for identification applications, which require
examination of scores for everyone in a database. It is
however rarely performed for verification applications,
where we could be found very useful too, for example to
insure that 1-to-1 match is indeed the best and only match in
the entire dataset.
A. Order-3 analysis and Recognition Decision Confidence
The limitation of the Order-1 analysis and the need for
higher order analysis is best demonstrated by Figure 2.b
(Step 6). The figure shows the best five matching scores
obtained for two test images presented to a biometric system
for the purpose of identification. As seen, the scores obtained
for the test image in the left column provide a very confident
winner - the minimal score, whereas the scores obtained for
the test image in the right column are much less identifying,
as there are several scores that are close to the minimum.
Additionally, depending on where the match threshold is,
there could be more than one scores below the threshold.
More comprehensive statistics on this phenomenon is
shown in Figure 6, which shows the Order-3 performance
analysis results obtained from several state-of-the-art
systems.
The experiments were run following the evaluation
protocol described in Section V (Table V), with datasets
containing iris images from 100, 500, 1000, and 4000
individuals, each individual having one enrolled image and
six passage images of the same (right) eye. The results
reported in Figure 6 are from the 1000-identities passage
dataset: ie. containing 6000 iris images.
Figure 6.a shows the number of instances when there were
0 (ie false rejects), 1, 2, 3, and so on scores below a
default threshold.
Figure 6.b shows how close the second best score was to
the best score, by showing the number of instances when
the second best score was within 0.01, 0.02, and so on
distance from the best score.
Figure 6.c shows how many times the genuine person
scored the best (Rank-1), second best (Rank-2), third best
(Rank-3) and so on, of which the portion of scores that
were above the default threshold is marked in dark red.
a)
b)
c)
Fig.5. Order-1 analysis is the current performance evaluation standard and
is based on computing verification-based (1-to-1) False Match and False
Non-Match Rates (a,b) and the associated error trade-off curves (c).
As can be seen, the information obtained with Order-3
analysis provides a sense of the reliability of the biometric
recognition results for both verification and recognition, and
can therefore be used as biometric recognition confidence
metrics.
What is also important to indicate is that, as the presented
results show, there are many instances when there is more
than one match below the matching threshold, or when there
are two or more matches having very close matching scores.
With traditional status-quo evaluation methodologies this
important information is lost. However, with the proposed
multi-order methodology this information is not lost and can
be used to fine-tune the system, as well as to develop the
procedures to mitigate the risks associated with having non-
confident recognition results.
a)
b)
c)
Fig.6. Order-3 analysis involves computing the rates of recognition
confidences, computed as: a) the number of matches below a threshold, b)
distance from best score to second best score, and c) recognition rank itself.
VII. C
ONCLUSIONS
Performance evaluation plays a critical role in biometric
system deployment, due to the fact that biometric systems
can produce errors. In-house technical evaluation allows one
to insure that the quality of the software delivered by the
vendor meets the operational requirements. It also allows one
to build an operational and efficient system tailored to a
specific need, by ensuring that a biometric system provides
access to as many parameters of the system as possible and
allows its integration with other sensors or system
components.
It takes a good understanding of all technical problems
and stages underlying the biometric process to conduct a
comprehensive evaluation. All factors and system taxonomy
differentiators have to be taken into account when evaluating
a biometric system. The recognition performance needs to be
understood, and all performance changes that are due to a
change of a system or system parameters and not only the
match/non-match errors have to be analyzed.
Even though no biometric modality, except DNA, is error-
free, critical errors can be minimized to the level allowed for
the operational use - with a proper performance evaluation
and optimization strategy. Despite the fact that performance
may also deteriorate over time, as the number of stored
people increases and spoofing techniques become more
sophisticated, there are also many ways to improve biometric
system performance - by using more samples, modalities,
and adding additional environmental and/or procedural
constraints. For an organization that intensively relies on
biometric technology for its day-to-day activities, it is
therefore recommended that continuous performance
monitoring, tuning and upgrading of its biometric systems be
carried out, accompanied with a regular all-inclusive system
performance evaluation. To conduct such an evaluation, the
biometrics taxonomy accompanied by the multi-order
performance analysis framework proposed in this paper can
be used.
Disclaimer: The data and results presented in this paper are not associated
with any production system or vendor product. They are obtained from lab
environment experiments performed on a variety of state-of-the-art iris and
face recognition biometric systems using real anonymized biometric data.
They are chosen to be representative of many cases observed throughout the
experiments and are used here solely for the purpose of illustrating the
concepts presented in the paper.
A
CKNOWLEDGEMENT
This work has been done in part for a CBSA Iris Biometric
Technology Examination, and in part for the PSTP projects
on Stand-off Biometrics Evaluation (PSTP08-0109BIO) and
Biometric Border Security Evaluation Framework (PSTP08-
0110BIO). The author gratefully acknowledges the work of
many CBSA colleagues who prepared the iris data and
conducted the experiments reported in this paper.
R
EFERENCES
[1] ISO/IEC 19795-1:2005 Biometric performance testing and reporting.
Part 1: Principles and framework.
[2] ISO/IEC 19795-2:2007 Biometric performance testing and reporting.
Part 2: Testing methodologies for technology and scenario evaluation.
[3] In Face Recognition Vendor Test website, http://www.frvt.org.
[4] International Biometric Group. Biometric Performance Certification
and test plan - www.biometricgroup.com/testing_and_evaluation.html
[5] A. K. Jain, A. Ross, and S. Prabhakar. An Introduction to Biometric
Recognition. IEEE Transactions on Circuits and Systems for Video
Technology, Special Issue on Image- and Video-Based Biometrics,
14(1):4–20, January 2004.
[6] D. O. Gorodnichy. Video-based framework for face recognition in
video. In Second Intern. Workshop on Face Processing in Video
(FPiV’05), Proc. of Second Canadian Conference on Computer and
Robot Vision (CRV’05), pp. 330-338, Victoria, BC, online http://iit-
iti.nrc-cnrc.gc.ca/iit-publications-iti/docs/NRC-48216.pdf, 2005.
[7] D. O. Gorodnichy. Seeing faces in video by computers (Editorial).
Image and Video Computing, Special Issue on Face Processing in
Video Sequences. (online at http://iit-iti.nrc-cnrc.gc.ca/iit-
publications-iti/docs/NRC-48295.pdf), 24(6):1–6, 2006.
[8] D. O. Gorodnichy. “Face databases and evaluation” chapter in
Encyclopedia of Biometrics (Editor: Stan Li), 2009, Elsevier
Publisher (on-line at http://www.videorecognition.com/doc)
... Throughout history, biometrics has been an essential tool for identifying individuals, beginning with simpler methods such as signatures and distinctive stamps, and evolving to more complex methods [1]. There is one objective that all of these techniques have, regardless of whether they were developed years ago or now, and that is to authenticate each individual and to prevent forgeries, thus ensuring security [2]. ...
Article
Full-text available
Human identification using gait as a biometric feature has gained significant attention in recent years, showing notable advancements in medical fields and security. A review of recent developments in remote radar-based gait identification is presented in this article, focusing on the methods used, the classifiers employed, trends and gaps in the literature. Particularly, recent trends highlight the increasing use of Artificial Intelligence (AI) to enhance the extraction and classification of features, while key gaps remain in the area of multi-subject detection. In this paper, we provide a comprehensive review of the techniques used to implement such systems over the past 7 years, including a summary of the scientific publications reviewed. Several key factors are compared to determine the most suitable radar for remote gait-based identification, including accuracy, operating frequency, bandwidth, dataset, range, detection, feature extraction, size and number of features extracted, multiple subject detection, radar modules used, AI used and their properties, and the testing environment. Based on the study, it was determined that Frequency-Modulated Continuous-Wave (FMCW) radars were more accurate than Continuous-Wave (CW) radars and Ultra-Wideband (UWB) radars in this field. Despite the fact that FMCW is the most closely related radar to real-world scenarios, it still has some limitations in terms of multi-subject identification and open-set scenarios. In addition, the study indicates that simpler AI techniques, such as Convolutional Neural Network (CNN), are more effective at improving results.
... In the literature, it is possible to find another classification, known as 'stand-off' biometrics (Gorodnichy, 2009;Wheeler et al., 2008). This category would include systems capable of operating at a greater-than-normal distance between subject and sensor and with less-constrained subject behaviour (International Biometrics Group, 2011), therefore collecting biometric data with minimal or no direct engagement of the subject and, in many cases, even without their knowledge of the capture process, which can be considered a 'clandestine use' of the biometric system (J. ...
Preprint
Full-text available
Despite the increasing adoption of biometric technologies, their regulation has not kept up with the same pace, particularly with regard to safeguarding individuals' privacy and personal data. Policymakers may struggle to comprehend the technology behind biometric systems and their potential impact on fundamental rights, resulting in insufficient or inadequate legal regulation. This study seeks to bridge this gap by proposing a taxonomy of biometric technologies that can aid in their effective deployment and supervision. Through a literature review, the technical characteristics of biometric systems were identified and categorised. The resulting taxonomy can enhance the understanding of biometric technologies and facilitate the development of regulation that prioritises privacy and personal data protection.
... Evaluating the performance of biometric systems is highly important, because they can create failures, as none of the biometric modalities is free of errors (Gorodnichy 2009). In the generic guidelines on ABC implementation produced by the European Border and Coast Guard Agency Frontex, border agencies are advised to evaluate system risks particularly in the planning phase of a deployment process. ...
Conference Paper
Full-text available
Digital transformation of European border management entails multiple benefits. Novel digital systems and cyber-physical infrastructures, such as biometrics-enabled automated border control and advanced analytics aim to equip border agencies with more effective tools against constantly evolving border security threats. Additionally, they provide new means to increase the agencies' performance in the context of growing volumes in trade and travel. Besides a myriad of opportunities for improved border management, the development, deployment and sophistication of these technologies compared to legacy systems bring about new vulnerabilities that may be hard to identify and manage with techniques used today. This paper employs a systems-theoretic approach to address the security of border control systems. The focus is on border checks, which involve technologies used for ensuring and controlling that persons and the objects in their possession are authorised to enter or exit the EU area at external borders. The paper provides a preliminary review of current literature and discusses the basic tenets and main features of security analyses in this field by reflecting them against the STAMP model and the STPA technique for security analysis purposes. The systems-theoretic approach is demonstrated in this paper by presenting the first phase of a coarse STPA-inspired security analysis at air borders with a particular interest on automated border control systems. Based on the analysis, STPA was found as a suitable approach for security analysis, as it supports assessment of the interactions between various stakeholders within the border control system. As a conclusion, we also provide insights for future research directions in cyber-physical border check systems and applications of systems-theoretic analysis methods in this particular field.
... Evaluating the performance of biometric systems is highly important, because they can create failures, as none of the biometric modalities is free of errors (Gorodnichy 2009). In the generic guidelines on ABC implementation produced by the European Border and Coast Guard Agency Frontex, border agencies are advised to evaluate system risks particularly in the planning phase of a deployment process. ...
Book
Full-text available
Aalto University hosted the 2nd edition of the International Seminar on Safety and Security of Autonomous Vessels (ISSAV) together with the 7th edition of the European STAMP Workshop and Conference (ESWC). ISSAV promotes all aspects of maritime safety and security in the context of autonomous vessels. The seminar focuses on exchanging knowledge about key safety and security challenges and opportunities in the context of autonomous vessels and the autonomous maritime ecosystem. The ESWC focuses on applications and studies related to the Systems-Theoretic Accident Model and Processes (STAMP) which is a relatively new systems-thinking approach to engineering safer systems. The 2nd edition of the International Seminar on Safety and Security of Autonomous Vessels (ISSAV) and the 7th edition of the European STAMP Workshop and Conference (ESWC) took place 17-20 September 2019 in Helsinki, Finland. Scope – ISSAV Autonomous vessels have become a topic of high interest for the maritime transport industry. Recent progress in the development of technologies enabling autonomous systems has fostered the idea that autonomous vessels will soon be a reality. However, before the first autonomous vessel can be released into her actual context of operation, it is necessary to ensure that it is safe and secure. The aim of ISSAV is to promote all aspects of maritime safety and security in the context of autonomous vessels. The seminar focuses on exchanging knowledge about key safety and security challenges and opportunities in the context of autonomous vessels and the autonomous maritime ecosystem. The seminar has a special emphasis on: 1. The challenges in managing safety and security in the operation of autonomous vessels and the entire ecosystem of an autonomous maritime system 2. Innovative approaches for managing the safety of autonomous vessels, supporting the design, operations and managerial strategies for ensuring the safety in the functioning of the autonomous maritime system. 3. Digitalization as technological enabler for efficient safety and security assurance in the context of autonomous shipping. 4. Discussion and research on how to standardize safety approaches for autonomous vessels. 5. The development of intelligent security strategies for establishing resilient and robust systems for autonomous vessels 6. Safety and security integration in the operative context of autonomous maritime systems 7. Safety aspects of autonomous shipping in extreme environments Scope – ESWC Traditional system safety approaches are being challenged by the introduction of new technology and the increasing complexity of the systems we design, manufacture and operate. STAMP and its associated tools deal with the complexity of systems and provide systematic ways to analyze and assess existing and conceptual systems proactively or detect and illustrate deficiencies revealed through investigations. ESWC brings together researchers and practitioners who apply, or want to get familiar with, STAMP that is widely used in different sectors such as space, aviation, healthcare, defense, nuclear, railways, infrastructure and automotive. The conference covers the following topics: 1. Experiences using STPA, STPA-Sec, and CAST 2. Introducing STAMP, STPA, and CAST into large organizations 3. Safety-guided and Security-guided design using STPA and STPA-Sec 4. Using STPA to make decisions
... Evaluating the performance of biometric systems is highly important, because they can create failures, as none of the biometric modalities is free of errors (Gorodnichy 2009). In the generic guidelines on ABC implementation produced by the European Border and Coast Guard Agency Frontex, border agencies are advised to evaluate system risks particularly in the planning phase of a deployment process. ...
Conference Paper
Full-text available
In material handling and logistics, there’s a trend towards increasingly adaptable and flexible approaches on all system levels: from the supply chain and logistic network level down to the factory and warehouse floors. Recent examples of increasingly flexible material handling technologies on the floor level are autonomously navigating automated guided vehicles (AGVs) and plug-and-work material handling systems, the first allowing adaptable material flow systems with minimal fixed infrastructure, the latter allowing the user to easily re-configure steady conveyor systems on demand. In the field of safety engineering, there has recently beenresearch towards safety assurance of open adaptive systems (OAS) with frameworks such as runtime certification as potential enabler for these novel systems. In this work, we seek to combine recent concepts from the safety engineering community with traditional and advanced technologies from the area of material handling machinery to enable the next step in operational flexibility in this application area. We suggest potential application use cases which would be enabled by the use of dynamic safety contracts: safely cooperating material handling machinery. Compared to machinery with traditional, fixed interfaces, the machine-to-machine cooperation will increase the complexityofthe requiredsafety-related control systems and software, which will in turn require new approaches for the risk assessment and safety engineering of these types of systems.We suggest the use of STPA for safety-driven design of cooperative material handlingmachinery. We discuss one novel application concept, AGV-Storage crane cooperative handover,in detailand present initial results of STPA analysis for the application.
Article
Full-text available
Motor Imagery handles the brain activity patterns of motor action without explicit movements. For extracting the discriminating features, Common Spatial Patterns are the most widely used algorithm that is very sensitive to artifacts and prone to overfitting. Here, we develop a metric to assess the relevance of Common Spatial Patterns using a mapping through Kernel Principal Component Analysis with the benefit of improved interpretation that allows evaluating the zones, which contribute the most to the motor imagery classification accuracy. Validation is carried out on a real-world database, appraising two labels of Motor Imagery activity. From the obtained results, we prove that the developed approach allows the performance enhancement, at the time, the relevant set decreases the number of channels to feed the classifier, and thus reducing the computational cost.
Chapter
Recently, electroencephalogram (EEG) has been used as a biometric modality. An EEG-based biometric system allows an automatic recognition of people based on their EEG signals. The quantity and quality of identity information extracted from EEG determine the performance of the EEG-based biometric system. In this paper, we evaluate the loss in identity information through different signal segmentation scenarios using Autoregressive model and K-Nearest Neighbor classifier. Our objective is to find some criteria linked to data segmentation allowing to reduce as far as possible the simulated loss of identity information. Experiments were conducted on EEG publicly available datasets collected in resting state for both opened and closed eyes. Results show that overlapped segmentation with longer segments’ length stands the best to the simulated loss favoring larger percentages of overlap.
Chapter
Full-text available
Motor Imagery handles the brain activity patterns of motor action without explicit movements. For extracting the discriminating features, Common Spatial Patterns are the most widely used algorithm that is very sensitive to artifacts and prone to overfitting. Here, we develop a metric to assess the relevance of Common Spatial Patterns using a mapping through Kernel Principal Component Analysis with the benefit of improved interpretation that allows evaluating the zones, which contribute the most to the motor imagery classification accuracy. Validation is carried out on a real-world database, appraising two labels of Motor Imagery activity. From the obtained results, we prove that the developed approach allows the performance enhancement, at the time, the relevant set decreases the number of channels to feed the classifier, and thus reducing the computational cost.
Conference Paper
Full-text available
This paper presents a number of new views and techniques claimed to be very important for the problem of face recognition in video (FRiV). First, a clear differentiation is made between photographic facial data and video-acquired facial data as being two different modalities: one providing hard biometrics, the other providing softer biometrics. Second, faces which have the resolution of at least 12 pixels between the eyes are shown to be recognizable by computers just as they are by humans. As a way to deal with low resolution and quality of each individual video frame, the paper offers to use the neuro-associative principle employed by human brain, according to which both memorization and recognition of data are done based on a flow of frames rather than on one frame: synaptic plasticity provides a way to memorize from a sequence, while the collective decision making over time is very suitable for recognition of a sequence. As a benchmark for FRiV approaches, the paper introduces the IIT-NRC video-based database of faces which consists of pairs of low-resolution video clips of unconstrained facial motions. The recognition rate of over 95%, which we achieve on this database, as well as the results obtained on real-time annotation of people on TV allow us to believe that the proposed framework brings us closer to the ultimate benchmark for the FRiV approaches, which is "if you are able to recognize a person, so should the computer".
Article
Full-text available
A wide variety of systems requires reliable personal recognition schemes to either confirm or determine the identity of an individual requesting their services. The purpose of such schemes is to ensure that the rendered services are accessed only by a legitimate user and no one else. Examples of such applications include secure access to buildings, computer systems, laptops, cellular phones, and ATMs. In the absence of robust personal recognition schemes, these systems are vulnerable to the wiles of an impostor. Biometric recognition, or, simply, biometrics, refers to the automatic recognition of individuals based on their physiological and/or behavioral characteristics. By using biometrics, it is possible to confirm or establish an individual's identity based on "who she is", rather than by "what she possesses" (e.g., an ID card) or "what she remembers" (e.g., a password). We give a brief overview of the field of biometrics and summarize some of its advantages, disadvantages, strengths, limitations, and related privacy concerns.
Article
The beginning of the 20th century was met with the invention that changed forever the way people saw the world. The invention, named the motion picture, was made using a technological breakthrough that enabled displaying pictures so fast (faster than 16 frames a second) that the human eyes perceived them as a single object – a movie. Now, in the beginning of the 21st century, it is fascinating to see that we are experiencing the arrival of another phenomenon related to video. This time it is due to the recent technological advances in computer and video hardware, which enable us to process video in real time (faster than 16 frames a second), opening the way for new vision-based applications and technologies. Au début du XXe siècle est apparue une invention qui allait changer pour toujours la façon de voir le monde. Cette invention, le cinéma, est attribuable à une percée technologique qui permettait de projeter les images si rapidement (plus rapidement que 16 images à la seconde) que l'œil humain pouvait percevoir un mouvement continu. Maintenant, au début du XXIe siècle, il est fascinant de constater que nous assistons à l'apparition d'un autre phénomène dans le domaine de l'audiovisuel, qui cette fois est dû à l'évolution récente de la technologie informatique et vidéo. La nouvelle technologie permet de traiter les données vidéo en temps réel (plus rapidement que 16 images à la seconde), ce qui est prometteur pour la création d'applications et de technologies relatives à la vision.
Biometric Performance Certification and test plan - www.biometricgroup.com/testing-and-evaluation
  • International Biometric
  • Group
Biometric performance testing and reporting. Part 1: Principles and framework
  • Iso Iec
An Introduction to Biometric Recognition IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Image-and Video-Based Biometrics
  • A K Jain
  • A Ross
  • S Prabhakar
A. K. Jain, A. Ross, and S. Prabhakar. An Introduction to Biometric Recognition. IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Image-and Video-Based Biometrics, 14(1):4–20, January 2004.
Biometric Performance Certification and test plan -www
  • International Biometric Group
International Biometric Group. Biometric Performance Certification and test plan -www.biometricgroup.com/testing_and_evaluation.html