Content uploaded by Isaac Griswold-Steiner
Author content
All content in this area was uploaded by Isaac Griswold-Steiner on Oct 20, 2018
Content may be subject to copyright.
Morph-a-Dope: Using Pupil Manipulation to Spoof
Eye Movement Biometrics
Isaac Griswold-Steiner, Zakery Fyke, Mushfique Ahmed, and Abdul Serwadda
Department of Computer Science
Texas Tech University, Lubbock, TX 79409
{isaac.griswold-steiner, zakery.fyke, mushfique.ahmed, abdul.serwadda}@ttu.edu
Abstract—Eye Tracking Authentication — a mechanism where
eye movement patterns are used to verify a user’s identity —
is increasingly being explored for use as a layer of security
in computing systems. Despite being widely studied, there is
barely any research investigating how these systems could be
attacked by a determined attacker. In particular, the relationship
between pupil characteristics and lighting is one that could lead
to vulnerabilities in improperly secured systems.
This paper presents Morph-a-Dope, an attack that leverages
lighting manipulations to defeat eye tracking authentication
systems that heavily rely on features derived from pupil sizes.
Across 20 attacker-victim pairs, the attack increased the EER
by an average of over 50% as compared to the zero-effort
attack by the overall population, and as much as 500% for
individual victims. Our research calls for a greater emphasis
on manipulation-resistant pupil size features or system designs
that otherwise avoid such vulnerabilities.
Index Terms—eye tracking biometrics, behavioral biometrics,
spoof attack, continuous authentication, machine learning
I. INTRODUCTION
Eye tracking authentication systems are increasingly being
fronted as an avenue for continuous authentication (e.g., see
Rigas et al. [19], Darwish et al. [7] and De Luca et al. [8]).
This line of research is promising because it utilizes innate
biological differences between users for authentication in a non-
intrusive way. Despite the interest in these methods, continuous
eye tracking-based authentication schemes have never been
rigorously evaluated under the assumption of a sophisticated
adversary.
Take the case of pupil size characteristics. Owing to their
being a physical trait of the user, they offer significantly more
discriminative power than many other features used in eye
movement-based biometric systems (e.g., see [7]). This has
led to pupil size features being extremely important in several
authentication and identification systems in the literature (e.g.,
see [7], [17], and [3]). However, because they can be influenced
by light changes, it’s plausible that an adversary could craft
an attack centered around their dynamics under manipulated
lighting. As researchers continue to churn out eye tracking
authentication systems that heavily rely on these features for
enhanced discriminative power, there has surprisingly never
been a study that methodically studies whether or how these
features could be exploited.
In this paper, we bridge this research gap. In particular, we
design and evaluate Morph-a-Dope
1
, an attack that leverages
lighting manipulations to defeat eye tracking-based authenti-
cation systems that are built in such a way to heavily rely on
features extracted from pupil size. The term dope in this case
represents the pupils, which, like a disoriented person, are easily
fooled to adjust back and forth (i.e., to increase and decrease in
size), to accommodate the malicious light changes. In the end,
the victims pupil size signature is morphed into that desired by
the attacker, consequently defeating the authentication system.
To design Morph-a-Dope, we first rigorously analyze the
dynamics of pupil size variation under carefully controlled
lighting. Based on findings from this analysis, we then designed,
implemented, and evaluated the performance of Morph-a-Dope.
Relative to the zero-effort attacks traditionally used to evaluate
eye tracking authentication systems, we show that our attack
increases the Equal Error Rate by an average of over 50%.
The contributions of this paper are summarized below:
•Targeted Evaluation of Potential Attack Vectors Based
on Past Research
: We examine several examples of high
performing eye tracking authentication systems and break
them into two categories based on their use of pupil
size features. We find that a subset of authentication
systems are heavily reliant on pupil size features, and that
removing these features causes a significant degradation
in performance. Based on our initial findings, we then
replicate one of these systems and verify that pupil size
features are by far the most powerful features in this line of
research. These findings influenced our decision to develop
an impostor attack against eye tracking authentication
systems that are centered on pupil size features.
•Examining the Impact of Light Manipulation on
Pupil Behavior
: With the high discriminative power
of the pupil-centric features making them an inviting
target for an adversary, we investigated the feasibility of
them being systematically manipulated using carefully
controlled lighting. We examined a number of variables
that would likely be of interest to the attacker — e.g.,
the evolution, stability and rate of change of pupil size
features under different lighting conditions. Our analysis
1
The attack name is adapted from Rope-a-Dope, a term that was coined for
the technique Muhammad Ali used to defeat George Foreman in their famous
1974 boxing match. Foreman was the dope who unknowingly had most of his
punches absorbed by the ropes [9].978-1-5386-7693-6/18/$31.00 ©2018 IEEE
revealed that these features can be precisely controlled
towards an intended target, indicating their vulnerability
to impersonation attacks.
•Designing and Evaluating Morph-a-Dope
: Leveraging
the observed pupil patterns, we designed Morph-a-Dope
and implemented it against a state-of-the-art eye tracking
authentication system. We rigorously evaluate the impact
of this attack from a wide range of perspectives, such as
the aggregate victim vulnerability against all attackers, an
impact analysis for attacker-victim pairs, and the attack
performance enhancement for our attackers given the
application of Morph-a-Dope. Our findings confirm the
lethality of Morph-a-Dope, as it significantly degrades all
performance indicators studied.
Road-map
: In the following sections we will describe
previous research in this area (Section II), our data collection
process and experimental design (Section III), exploration of
attack vector based on past research (Section IV), an analysis of
pupil-light dynamics (Section V), Morph-a-Dope’s threat model,
design, and experimental results (Section VI), and conclusion
(Section VII).
II. RE LATE D WOR K
Eye Tracking Authentication Systems are a subset of
biometric authentication systems which use measurements
taken of the human eye and its movements to authenticate
the user. Early work by Kasprowski et al. [14] in 2004 used
a custom made head mounted eye tracker to collect data on
subjects observing flashing points on the screen.
In 2010, Kinnunen et al. [15] utilized a Tobii X120
gazetracker while the subject was watching a video in order
to gather data on short-term gaze direction to create feature
vectors. Modeling these, they achieved an EER of roughly
30 percent. While not sufficient for a realistic application,
this paper was distinct from the work of [14] and [3], as it
was task independent. Holland and Komogortsev [12] used a
similar method in 2011, having subjects read excerpts from
”The Hunting of the Snark” while measuring 15 different eye
movement patterns and getting an EER of 27 percent.
Rigas et al. [19] used random dots, text, and video to
authenticate (and identify) users with eye movement biometrics.
Their work used fixations and saccades, with EERs of around
10-15% prior to applying various fusion schemes. Their
rigorous work demonstrated the potential of eye movement
biometrics with over 300 users.
In [10], the authors showcased a novel eye tracking au-
thentication system consisting of 20 eye focus features. The
researchers collected data from 30 participants while they
interacted with dots on a screen. Another set of experiments
involved 10 participants reading, writing, browsing the web, and
watching videos. After training a classifier with this data, the
authors were able to achieve EERs between 1-30% depending
on experimental conditions and the use of majority voting.
As far as we are aware, there is no published research
that successfully aids impostors in attacking a continuous eye
tracking authentication system. All previous research has used
Fig. 1: Our data collection setup included a room where the
only lighting sources were the monitors, overhead lighting, and
a desktop light. The Tobii Eye Tracker can be seen on the
bottom of the monitor. Light conditions are adjusted with the
overhead lights or the settings on the base of the lamp.
zero-effort attacks to check the viability of such systems. Our
research aims to address this gap.
III. DATA COLLECTION PROCEDURES
A. Experimental Overview
After receiving IRB approval we gathered eye tracking
data during several different experiments. These were as
follows: (1) We gathered data from users for core eye tracking
authentication. This involved the experimental scenarios where
participants browsed wikipedia and watched videos. (2) An
experiment where we raised the lighting in the room from
almost zero to a relatively high level using commonly available
tools. The procedures for these experiments will be explained
further in the following sections. (3) Finally, we conducted a
set of experiments to see if a determined attacker with the right
strategy could penetrate a eye tracking authentication system
that uses the features described in Section
IV-B
. The process
for the attack is explained in greater detail within Section
VI-B
.
For the sake of clarity, we have intentionally chosen to describe
the attack procedures later in the paper with the rest of the
attack-related content.
Participant Recruitment
: We recruited participants from
across campus (undergraduate and graduate students primarily).
In total, we had 15 users for our replication of eye tracking
authentication research and 10 participants for the pupil size and
light experiment. Each participant was informed that they would
be reading and viewing videos during the experiments. After
reading our consent agreement and signing it, each participant
was given an unique and anonymous ID.
Equipment
: For tracking a participant’s eyes, we used the
Tobii Pro X2-60. It has a sampling rate of around 60 Hz. To
manipulate a participant’s pupil size we used a regular desk
lamp that is under
$
30. The model was a 8W LED lamp with
7-levels of brightness made by LE [1].
TABLE I: Overview of authentication system designs and performance metrics.
Paper Including Pupil Size Features Excluding Pupil Size Features Feature Overview Users
[7] 11.35 (HTER) 33.45 (HTER) Spatial, temporal, and pupil size 22
[10] ∼18 (EER) ∼31 (EER) Spatial, temporal, and pupil size 30
[22] — 6.3 (EER) Spatial and temporal features
from saccades and fixations 30
[19] — 10-15 (EER) CEM-B, saccadic vigor, and
acceleration features 322
[20] — 12.1 (EER) Fixation density map-based
features 200
B. Data Collection For Core Eye Tracking Authentication
To measure the performance of our eye tracking authentica-
tion system we used several real-world scenarios to demonstrate
its capabilities. We use browsing and video watching activities
because they are similar to scenarios in which we can imagine
an attacker trying to imitate a target. For example, a non-
authorized person might try to search documents or security
footage for confidential information.
The browsing activity involved two sessions of the
’Wikipedia Game’. In this game, participants are given a random
starting page and told to find their way to a target page by only
clicking links within each Wikipedia article. If they reach the
page before the end of the time limit they are to click ’random
article’ and start again. Participants searched for the name of
our university in the first session and ’University of Oxford’
in the second session.
In the video watching activity, participants watched “Big
Buck Bunny” and “The Problems with First Past the Post
Voting Explained”. We reserved “Big Buck Bunny” for our
training data and “The Problems with First Past the Post Voting
Explained” for the testing data. The use of a video activity and
browsing activities (such as searching Wikipedia) are common
forms of stimulus in eye movement biometrics (e.g., see [10]).
C. Data Collection for Pupil Analytics
In this section we describe the data collection experiment
that was used to drive our pupil-light dynamics analysis.
In these experiments we kept the room completely dark for
all sessions except for light coming from the monitor and lamp
(see Section
III-A
for info on the lamp and Fig. 1 for placement
of the lamp). The first experiment is conducted as follows.
During each session, participants are told to read “Animal
Farm” at whatever pace is comfortable for them. We start by
having them read and allowing them to grow accustomed to
darkness (aside from the monitor, which was kept constant).
Participants spend approximately 2-3 minutes on each lighting
stage before we increase the lighting to the next level.
During the sessions exploring the relationship between pupil
size and lighting, we used the rig shown in Fig. 2 to gather
illuminance data during the course of the experiment. While the
measured light was extremely steady during the experiments,
there were differences between users in how much light was
detected at each stage. This is because slight differences in the
Fig. 2: For collecting room brightness data for the pupil
analytics experiment (see Sections
III-C
and V) we rigged
a phone so that it could be worn around the neck. The phone
was running the Android app ”Physics Toolbox Sensor Suite”.
This was used to measure illuminance
2
from the perspective
of the participant.
way a user sat could impact how much light hit their chest. The
full analysis for the results of this experiment are in Section
V.
IV. EXPLORING ATTACK VE CT OR S THROUGH AN
ANALYSIS OF PAS T RESEARCH
In this section we explore five previously designed systems
from the literature and look at their usage of pupil size and
other feature types. Based on their performance and the types of
features they utilize, we select a subset of the feature paradigms
to target with our attack. We then replicate one of the systems
as a baseline and for the attack itself. A root cause analysis
is then conducted for the changes in performance when pupil
size features are dropped, and we show how our authentication
system performs compared to past work before developing the
attack.
A. Performance of Pupil Dependent Systems Compared to the
State-of-the-Art
Several previously developed eye tracking authentication
systems and their performance is described in Table I. The
second and third columns in this table compare the system
performance when pupil size features are used compared
to when they are not. The fourth column provides a brief
description of the types of features used for these systems
(please see individual papers for details). Two of the systems
(see [7] and [10]) achieved EERs of 11-18% with pupil size
features but performed poorly when they were removed (EERs
of 30-34%). In these two papers, the features used by the eye
tracking systems were generally based on simple statistical
properties of the raw data (e.g., mean, max, standard deviation,
etc). The drop in performance without pupil size suggests that
the predictive power of pupil size features far outweigh other
features in these two systems.
The last three systems (see [19], [20], and [22]) did not use
pupil size features and yet achieved state-of-the-art performance.
In [20], the authors used Fixation Density Maps to map eye
movement and focus. This was a very different strategy as
compared to [19] and [22], both of which used temporal and
spatial features (extracted from saccades and fixations but not
from pupil sizes) for the authentication process. In particular,
Rigas et al. [19] formulated a set of features that characterize a
wide range of behavioral dynamics contained within saccades
and fixations. For example, they developed features which
capture saccadic vigor (see definition of saccadic vigor in [6]).
All three systems achieved EERs of less than 13%.
Motivated by the above described results (summarized in
Table I), we chose to target the system in [10], which is one
of the systems that demonstrate highly pupil size dependent
behavior. Our hypothesis is that if an eye tracking authentication
system performs dramatically worse without pupil size features,
then it is likely that imitating pupil size will help an attacker
succeed in authenticating as their target.
For the remainder of this section we replicated and evaluated
the authentication system developed in [10] as closely as
possible given the available descriptions of how that work
was conducted. We emphasize that this attack is designed to
target the family of eye tracking authentication systems in
which pupil size features represent a significant portion of the
discriminative power of the model (see also [7], [24], [10], or
[3]). We replicate [10] as a representative system that captures
the general phenomena of pupil features taking up a lion’s share
of the information contained in the feature-set. The system in
[10] was selected because it is described in enough detail for
us to replicate it closely. The one part of the system that had
some missing details was the final fusion stage, however, the
authors provided enough results before this fusion stage that
sufficiently served as a baseline for us to perform our analysis
and benchmarking.
B. Replication and Performance Analysis for Target Eye
Tracking Authentication System
This subsection is devoted to replicating the target au-
thentication system (see [10]) and analyzing its performance
compared to related research. We first briefly describe the
feature extraction methodology, then conduct a root cause
analysis of the decrease in performance when pupil size
Fig. 3: Classifier-independent analysis of feature quality. We
computed the sum of the KS values per feature type between
our research and that of [10]. Our KS are an average of the
separate analysis for browsing and video data. For both our
work and [10], the figure shows that pupil size features have
more discriminative power than temporal and spatial features.
features are removed. Finally, we share our aggregate baseline
authentication scores from before the attack to show that the
replicated system performs similarly to related research.
1) Feature Specifications: Following the design in [10], we
used 5 pupil, 7 spatial, and 8 temporal features. These features
are described in detail in [10]. All features either come directly
from within a fixation or the relationship between fixations.
We obtained the Dispersion Threshold algorithm from Salvucci
et al. [21] and used a min duration of 50 ms and max fixation
diameter of 60 pixels.
Pupil Size Features
: Using the data in each fixation, we
find the mean, max, min, range, and standard deviation of pupil
size.
Spatial Features
: The spatial features measure properties of
relative location, both between fixations and for focus points
within them. To find the distance between fixations, we save the
previous fixation location for each iteration of our algorithm,
after which the distance between it and the current fixation is
measured.
Four features are based on the distance from the center to
the individual data point. Once the distance is calculated for
all points of focus, we take the max, mean, min, and standard
deviation. Three features in this group are based on the max
pairwise distance, defined as the max distance between all
data pairs in a fixation. This set of features measures the
relative distance between different points of focus. This is
useful because a user may have a point of focus which is more
or less concentrated than that of other users.
Temporal Features
: After each fixation, we record the
current saccade starting time. Combined with the most recent
fixation, this allowed us to calculate the Duration of the Saccade.
From this, we calculated the max, mean, and standard deviation
of speed values for a fixation. Acceleration and related features
then follow from the speed values in terms of change in speed
and change in time.
2) Root Cause Analysis for Performance Degradation in
Target Authentication System when Pupil Size Features are
Removed: The goal of this subsection is to help explain why
the target authentication system performs poorly when pupil
size features are removed. To determine if this change in
performance is due to the underlying characteristics of the
features (rather than the classifier), we measure relative feature
importance using a classifier-independent approach. We analyze
our features in terms of the Kolmogorov-Smirnov (KS) statistic
[16] to show the magnitude by which pupil size features are
dominant. The KS statistic measures whether data from two
different users comes from the same distribution, a feature has
high discriminative power if the test reveals that data from
different users has different distributions (i.e. a high KS value
means a strong feature).
Fig. 3 shows that pupil size features are the most important,
with a total KS value greater than that of the other two feature
types combined. Based on our analysis, several pupil size
features are an entire order of magnitude more predictive than
many of the other features. A similar result was shown in
related research, demonstrating that our extracted features are
similar in predictive power to those of the original system.
This shows that any model using this same set of features is
likely to be highly dependent on pupil size features for accurate
predictions.
3) Overview of Baseline Authentication Performance: After
demonstrating the importance of pupil size features in our data
using the classifier-independent KS statistic, we then verified
that removing pupil size features has an impact on performance.
We compare these results with related research to show that
our system performs similarly to those of past work.
We used a Support Vector Machine (SVM) as the core
machine learning algorithm for the authentication system
(similar to [10], for comparison). The SVM had a C of 10,000,
gamma of 0.001, and rbf kernel. During the training process we
used a 10 to 1 ratio of impostors to authentic users, allowing
us to get a more significant representation of impostors for
the algorithms to learn from. To account for the higher ratio
of impostors to real users we used the balanced class weight
configuration. This configuration changes the penalty parameter
of the error term according to the class proportions in the
training data. Our training data came from the first session of
every experiment and test data from the second. To get a intra-
session dataset we randomly selected 30% of our training and
testing data. We did 10 iterations of the randomized split with
model training and testing, taking the average of the results as
our final scores.
To get a measure of how our system performs compared
to past work, we compare our results with those of [7] and
[10]. We use the mean Equal Error Rate (EER) – the error rate
at the threshold when the False Accept Rate equals the False
Reject Rate — to measure performance
3
. The authors in [7]
used the Half Total Error Rate (HTER), the mean of the False
3Note that we express our EER on a scale running from 0 to 100%.
Fig. 4: Pupil Size and Illuminance
Accept Rate and the False Reject Rate. The HTER is close
enough to the EER that we can make a comparison with their
results in terms of general trends. Additionally, the comparison
with [7] is based on the results from the scenario where users
viewed an image since their other scenarios did not reflect a
continuous authentication setting (e.g., scenarios such as that
of users executing a predefined visual pattern are better aligned
with static authentication than continuous authentication).
In [10] and [7], when using pupil size features, the authors
report an EER of about 18% and 11.35% respectively (shown
in Table I). Our authentication system achieved an EER of
16.26% with pupil size features. In all three authentication
systems, eliminating the pupil size features from the feature set
caused a significant dent in performance. Our system increased
to an EER of 39.69%, while the other two increased to 33.45%
(for [7]) and about 31% (for [10]). Overall, the findings in
Fig. 3 and Table I reveal the importance of the pupil size
features for the types of eye tracking authentication systems
which we investigated. Next we describe how we explored the
fundamentals of light and pupil size dynamics before using
these concepts to design our attack.
V. E XAMINING THE IMPAC T OF LI GH T ON PU PI L BEHAVIOR
This section analyses the behavior of pupil size under the
influence of light, given the situational limitations that someone
attacking an eye tracking authentication system might have to
contend with. We did not use all components of this analysis
in the development of Morph-a-Dope, as our primary aim
was to discover fundamental patterns that might help with the
development of related attacks or defenses focused on pupil
size features.
Variation of pupil size with light intensity
: A key factor
influencing the design of a pupil manipulation attack is
the mechanics of how pupil size varies with changes in
light intensity. If the pupil size varies drastically with small
changes in lighting, then the attacker might require specialized
equipment for fine-grained control over the pupil size. On the
other hand, if the pupil changes slowly with variations in light
(e.g., linear trend with a gentle slope), that implies coarse-
grained light control and measurement equipment (e.g., off-the-
shelf lights — see [1]) could be sufficient to support the attack.
This also suggests that the attack would be simple enough
to be a realistic threat even in the case of an unsophisticated
attacker.
To better understand the relationship between light and pupil
diameter, we recorded the pupil size over a range of light
intensities between 0 and 50 lux for each of the users in
our study. A light intensity of near zero corresponds to the
scenario where all lights in the room were switched off and
only the computer screen provided lighting for the room. A
light intensity of 40+ lux on the hand corresponds to the highest
settings on the lamp. Fig. 4 summarizes the observed pattern.
To derive the figure, we computed a mean pupil diameter for
each user and light intensity over windows of one second. The
figure shows a relatively consistent logarithmic trend as users
initially distributed over a wider spectrum of pupil diameters
converge to around 3 mm. Overall, the data suggests that the
pupil manipulation attack should be feasible using only basic
equipment, owing to the predictable nature of pupil diameter
exhibited in the bulk of the plot.
Morphing the pupil size distribution with light changes
:
While the previous section gives an intuitive view of how
the pupil can be manipulated by light changes, we also
find it informative to understand how light can impact the
overall distribution of pupil size. The benefit of studying the
distribution is that in practice eye tracking systems use a wide
range of features characterizing the full pupil size distribution.
Studying the behavior of the distribution gives insight into how
a wide range of features might be impacted by morphing a
user’s pupil size.
For analyzing pupil size distributions, we use the Bhat-
tacharyya Coefficient (BC) [4]. The BC value measures the
overlap between two distributions of data, varying between 0
and 1. It is 1 when two distributions overlap completely and
zero when they have no overlap at all.
Let
pdA
and
pdU
be probability distributions for two separate
pupil size time series data. To calculate the BC, we discretized
the pupil size (bin width of 0.5 mm) and then used the following
standard formula which iterates through the common set of
bin delimiters D.
BC =X
x∈D
ppdA(x)∗pdU(x)(1)
To show how the pupil size distribution can be manipulated,
we simulated a scenario where one user is the target and the
other user (the ”attacker”) is trying to transform their pupil
size distribution to overlap with that of the target user. We
do this by taking all user pairs in our dataset and doing the
following: (1) data for
pdA
is taken from the ”Off” (or 0-10
lx) lighting scenario, (2) and for
pdU
we use data from the
”Off”, ”Low” (10-20 lx), and ”High” (40+ lx) scenarios for
Fig. 5: Illustration of how the Bhattacharyya Coefficient stays
consistent once the lighting in the room has changed. Although
there is some variation, it is not so much to risk having an
attacker ”disengage” from their target, once they have set the
appropriate light intensity.
the ”attacker”. We selected ”Off” for
pdA
because we wanted
a lighting setting where the most users are distant from one
another in terms of pupil size distribution. We then used ”Off”,
”Low”, and ”High” levels as a comparison because we wanted
to show how a progressive increase in lighting changes the
distribution of the attacker compared to the target.
Based on the above analysis, we found several interesting
patterns in the data. (1) Higher lighting levels increased the
BC values for user pairs with an initial value under 0.4. These
pairs with a low initial pupil distribution overlap started with
average BC values of 0.06 for the ”Off” setting, which climbed
to 0.29 and 0.63 for ”Low” and ”High”. (2) In about 85% of
cases, the BC value would steadily increase from the ”Off” to
”High” setting. While around 15% of user pairs had a rising
BC value for ”Low”, but the value then fell at ”High”. (3) The
attackers for which the BC value decreased after the ”Low”
light level had overshot their target. Meaning that as their pupil
size decreased it reached the target and then moved past them,
causing a separation of pupil size distributions.
Stability of the distance between two pupil size distri-
butions at a given lighting level
: To shed further light on
the trend depicted above, it is instructive to determine how
stable the BC value between a victim and a target is, given a
fixed lighting setting. In other words, if two distributions have
been driven to overlap, how stable will they remain if light
is held constant? A highly unstable BC value implies that the
attack might be operationally challenging, requiring constant
tweaking of the lighting to keep the distributions from straying
so far apart even when the theoretically required lighting has
been set. On the other hand, if the pupil size distribution is
known to reasonably stay in lock once the target is met, this
could mean that an attacker could set the lighting a few minutes
before the attack with their pupils stabilized. Then begin to
work on the target computer without having to worry about
reconfiguring the lighting again.
Fig. 5 shows our results from our analysis of the stability
of the BC values. For three of our lighting scenarios, namely:
”Off”, ”Low” and ”High” light, we computed the BC between
a target and a given pupil size distribution. Over windows 15
seconds long, we computed the difference between the initial
BC at a light level and the BC of a current window of data.
The figure shows that for all three light levels, the majority
of cases depicted a BC difference of between 0 and 0.1. In
fact, the standard deviation of these differences was less than
0.15. To put these BC variations into context, it is noteworthy
that for two completely overlapping distributions to completely
lose their overlap, it takes a BC shift of 1 unit. This means
that changes of between 0 and 0.1 or in the more extreme
cases, changes of between 0 and 0.2 would not cause much
instability in the distribution.
How long does it take the pupil size to stabilize at a
target?
: Another variable that should be of interest to the
attacker is that of how long it takes to stabilize the pupil size
at a given target. In particular, how long does it take for the
pupil size to no longer be volatile after changes in light level.
If this time is short (say the order of seconds), then the attack
is easily launched as the adversary can set the light and get
the pupils adjusted almost on the fly.
We use the following procedure to determine whether the
pupil size has stabilized. For consecutive 1-second windows of
pupil size data, compute the average within each window of data
collected. We then iterate across these mean values, keeping a
group of the 30 most recent window means (30 second period).
For this group we calculate the pairwise distance (denoted
as
P
) and for each new value we also calculate the pairwise
distance with this group (denoted as
N
). If
N
is less than two
standard deviations from
P
, then we consider it to be stable,
otherwise it is not. What we find is that new values are no
longer stable whenever the lighting changes from one level
to the next (a light setting is kept in place for 3 minutes).
We then compute the time between when the light level was
set and when the pupil size is flagged as stable. Our analysis
shows that in the vast majority of cases it took just two or
three seconds for stability to be attained. This points to how
fast, or easily the attacker will be able to morph the pupil size
features.
Based on the previously described sets of analysis, we were
confident in the ability of an attacker to systematically target a
user, see if they are a good target, and practice staying locked
onto that target. Our next step was to develop a threat model
and design the attack in greater detail.
VI. ATTACK DESIGN AND PERFORMANCE ANA LYSIS
A. Threat Model
In our threat model we examine the scenario in which an
adversary sneaks into a private space and gains access to a
computer after the fully logged-in legitimate user has stepped
away briefly (e.g., at the office). This attack vector, frequently
referred to as a lunchtime attack, is widely used to motivate
continuous eye tracking authentication research (e.g., see [10]).
Like the majority of past research on spoof attacks (e.g.,
see [23], [18], and [11]), we also assume that the attacker has
access to the victim’s biometric template (or raw data). It is this
biometric template that guides the attacker when setting a target
for their pupil size. The attacker could use social engineering
(e.g., threats, flattery, or deception) to acquire the template by
convincing their target to use a compromised computer which
records eye biometric data through a webcam or eye tracker.
An attacker might also be able to purchase biometric templates
on a black market, such as via the dark web (e.g., see example
of insider selling access to biometrics database [5]).
The attacker then leverages our findings regarding light
and pupil size (See Section V) to methodically attack the
eye tracking authentication system. Although our execution
of the attack was manual, an attacker could automate it. For
this they would use their own eye tracker, a programatically
controlled lamp to dynamically adjust the brightness of the
room, and a phone for doing background analysis of real-time
data. The lamp could be dynamically controlled by the phone
over bluetooth based on the difference between the pupil size of
the attacker and their target’s biometric template. The attacker
thus only has to come into the room, quickly setup the lamp
and eye tracker, and begin their attack. Our pupil analytics (see
Section V) indicates that the limited time should not be a major
stumbling block given the instantaneous and highly predictable
behavior of the pupil. This scenario requires that the attacker
has previously done trial runs on the lighting configuration
required to hit the victims pupil size distribution and thus does
not have to undertake any lighting configurations during the
attack itself.
While we have mostly discussed the lunchtime attack
scenario in our threat model, our attack also applies to when the
attacker has access to the victim’s password. The assumption
that the attacker has acquired the user’s password in some
manner is a standard when evaluating biometric systems that
are supposed to act as a second line of defense (e.g., see
[2] or [13]). This allows an evaluation of the value added by
the biometric authentication system, assuming the worst case
scenario in which a user’s password has been compromised.
B. Designing Morph-a-Dope
How the Pupil Dynamics Guided the Design of Morph-a-
Dope
: In Section V we demonstrated several key characteristics
of pupil behavior given changes in lighting, these form the
basis of Morph-a-Dope’s design. We showed that increases in
lighting directed towards a user with a regular desk lamp can
induce changes in pupil size that stabilize within a few seconds.
This means the attacker can react quickly if environmental or
content-related conditions change. Our data also showed that
by changing environmental lighting, an attacker could shift
their pupil size distribution closer to that of their target with
only a desk lamp. Finally, once an attacker sets the lighting
and ”locks onto” a target, we found it unlikely that there would
changes in their pupil size distribution large enough to disrupt
the attack (see Fig. 5). These insights allowed us to see past
the noise and develop an attack that is relatively simple to
setup and use. The remainder of this section will be devoted
to explaining the attack process and how we tested the attack.
Target Acquisition
: In a real attack, an attacker will
likely have several targets available. For example, workplace
settings offer numerous opportunities to access other employee
computers. To simulate this type of scenario we selected
6 targets with varying levels of authentication performance.
Specifically we selected our 2 users with the lowest EER, 2
intermediate EER users, and 2 with a relatively high EER. Our
assumption was that any real-world environment will have a
wide variety of users and authentication templates. We then
asked four of our targets to also be our attackers. This gave
us 20 attacker-victim pairs for the verification of the attack
performance.
Attack Preparation
: In developing the procedures for the
attack preparation, we considered our threat model (see Section
VI-A
). Given that an attacker will understand the properties of
pupil size in relation to light, we assume they would take the
opportunity to prepare and configure themselves for the attack
ahead of time. In terms of our experiment, this meant having the
attackers watch videos or browse Wikipedia while the lighting
was adjusted to try to cause their pupil size distribution to
overlap with that of the target.
For the sake of calibration, our system had a real-time
component which allows the attacker to see their pupil size
characteristics (mean, standard deviation, min, and max) during
the configuration process. We used the Tobii Research API to
stream live data from the Tobii Pro X2-60 eye tracker to our
system. We used a thread pool to distribute processing tasks
for the incoming chunks of data. We found that this allowed
the system to run nearly in real-time by removing blocking
actions from the data collection process.
During the configuration process we focused on comparing
pupil size between the attacker and the target because it allowed
us to easily see what ”direction” we wanted to go in with the
lighting in the room and we found it an adequate method for
estimating vulnerability. To configure the attacker we raised
or lowered the room lighting (using overhead lights and the
lamp) until the attacker was as close to their target as possible.
Conducting the Attack
: Once the attacker was configured,
we had them take part in the two real-world scenarios of
browsing and video watching. For both browsing and video
attack sessions we had participants watch new videos and find
new Wikipedia articles from what they had done in any session
before. The goal of this is to demonstrate that the authentication
system is vulnerable even when the content is different than
when the system was originally trained or configured. Our
attack sessions each lasted 5 minutes for each browsing or
video attack.
Validation Process
: When attacking the system we trained
it as normal – i.e. same procedures and hyperparameters as
the core eye tracking authentication experiment (See Section
IV-B
2). When testing it, all impostor samples were drawn from
the attacker whose pupils were manipulated using Morph-a-
Dope. This allowed us to accurately measure the capabilities of
the attacker in comparison to the general population of users.
We repeated this process for all user-attacker pairs.
TABLE II: Pre-Attack and Post-Attack EERs Per Target.
Target User’s EERs
Scenarios 1 2 3 4 5 6
Pre-Attack Browsing 3.6 30.4 34.7 16.8 6.6 34
Post-Attack Browsing 11.1 14.2 62.5 38.7 1.6 48.9
Pre-Attack Video 8.8 32.6 38.9 24.4 4.8 25.5
Post-Attack Video 17.9 28.2 49.0 32.0 14.2 34.2
C. Attack Performance Analysis
1) Aggregate Impact Analysis Per Victim: Table II shows the
mean pre and post-attack EERs for our targeted victims. The
pre-attack values were the mean EERs for each user before the
attack. The post-attack values were the mean EERs obtained
for each user after the attack. For each victim the post-attack
EER was calculated as follows: (1) the EER due to a specific
attacker for the given user is calculated, (2) this process is
repeated for all attackers targeting the given user, (3) all EERs
calculated in (2) are averaged for the final post-attack EER that
is tabulated in Table II under that specific victim and activity
(i.e., browsing or video). For example, to calculate post-attack
value for the video scenario of target #6, we would average
the post-attack EERs of all four attackers on target #6.
Based on Table II we see that for most users (i.e. victims)
the attack increased their average EER as compared to the
performance before the attack. For example in the browsing
scenario, User #1 had an increase of over 200% (from an EER
of 3.6 to 11.1%). For the video scenario with the same user,
there was an increase in EER of over 100% (from an EER of
8.8 to 17.9%). A similar pattern is seen for the majority of
users (e.g., 1, 3, 4, and 6), albeit with varying levels of error
rate increment.
However, the table also shows some counterintuitive results
where the average EER for a given user decreased as a result
of the attack. For example, in the case of User #2 there was
a decrease in the average EER after the attack for both the
browsing and video scenarios. In the browsing scenario the
EER went from 30.4 to 14.2% (a 53% reduction), while for
video watching the EER decreased from 32.6 to 28.2 (a 13%
reduction) after the attack.
To understand the reason behind this counterintuitive behav-
ior, it’s important to know that the EERs tabulated in Table II
are averages across all attackers. For a User such as #2, further
analysis revealed that most attackers failed to impersonate them.
This had the effect of significantly lowering the mean EER
for this user and thus offsetting the impact of the subset of
attackers who are successful.
In a single case, the victim was not vulnerable to any
attackers (victim #5 for browsing). However, this same victim is
highly vulnerable during their video session. This trait suggests
that the performance of the attack also depends on the content.
Due to the nature of video being more dynamic in terms of
lighting, it’s possible that the wikipedia pages didn’t stimulate
(a) Video Content (b) Browsing Content
Fig. 6: Illustration of attack impact for each attacker-victim pair. The numbers in the grids are the final EER after the attack
during the browsing and video scenarios, while the color of each cell is based on the percentage increase of the EER compared
to the participant’s original authentication EER. Note that although some attackers had percentage increases greater than 300%,
we made a 300% increase the maximum of the colorbar. This helps differentiate between results in the -100 to 300% range.
the eyes in the same manner, leading to differences in pupil
size between the two attack scenarios for this user.
To further explore the root cause of differences in attack
performance between users (e.g., User #1 is badly affected
overall and User #2 is not affected), we go beyond the aggregate
analysis of attack performance (e.g., such as was done in Table
II) by conducting a fine-grained analysis on the performance
of individual attacker-victim pairs. This analysis is described
next.
2) Impact Analysis for Attacker-Victim Pairs: Fig. 6 shows
the results of our analysis of individual attacker-victim pairs.
The numbers in this figure represent the final EER of the
target after the attack for the browsing and video sessions,
while the colors represent the percentage increase compared
to the EER of the target before the attack. For example in the
video scenario, when User #1 was the victim to Attacker #4,
their post-attack EER was 33.2% and the percentage increase
compared to the pre-attack EER was over 250% (owing to the
light yellow color). Overall, the figure shows that all victims
are vulnerable to at least one attacker and that some attackers
are better impersonators than others (e.g., Attacker #3 is better
than Attacker #1).
When we exclusively average the results for attacker-victim
pairs where the attacker exceeded the pre-attack score, we get
an average EER percent increase of 137.18% and 136.78% for
the browsing and video activities respectively. We also want
to emphasize that two users with an EER below 5% for the
browsing or video sessions saw a percentage increase in EER
of greater than 400%. These results show that Morph-a-Dope
has a significant impact on the user population and poses a
real threat, even to the users with the lowest EERs.
3) Understanding the Performance Improvement of Morph-a-
Dope for Attackers: Up to this point in the paper, the reference
point against which we have measured the impact of our attack
has been at the population level. Specifically, we have: (1) used
data from randomly selected impostors to compute the EER,
Fig. 7: Illustration of the impact of Morph-a-Dope on the EERs
of targets. In the majority of cases Morph-a-Dope significantly
increased the EER of the targeted users. For pairs such as 5
and 9 where the post-attack EER was lower than the baseline,
we found that the attackers were already closely overlapping
with the victims before the attack.
(2) used data generated through Morph-a-Dope to compute the
new EER, (3) used the comparison of (1) and (2) to asses the
impact of our attack.
The steps listed above are standard when assessing the attack
impact in biometrics. However, the replacement of the random
impostor data used in (1) with impostor data only obtained from
the specific adversary when they were normally using the eye
tracking authentication system provides additional insight. This
way, we are able to get some measure of how Morph-a-Dope
improves each particular attacher’s probability of breaching the
system. We use the terms Baseline EER to represent the EER
obtained when the attacker’s regular authentication samples
are used to authenticate as the target user, and the term Attack
EER to represent the EER obtained when the attacker uses
Morph-a-Dope to authenticate as the target user.
Fig. 7 shows the Baseline and Attack EER for several
attacker-victim pairs. Overall, this data showed that the attack
contributed significantly to performance improvement for our
attackers. The average increase in EER from the baseline (See
Fig. 7 for examples) is over 1,000% for the browsing scenario
and 275% for video watching. Due to space limitations we
only show 15 of the attacker-victim pairs in Fig. 7. This figure
focuses on the results of effective attackers and it includes
pairs from both browsing and the video watching scenarios.
In about 75% of cases, our attack improved upon the Baseline
EER results. Significantly, in several cases we had attackers
dramatically improve their attack results. In one case an attacker
had a Baseline EER of 1.24% and an Attack EER of around
65%. This shows that even if an attacker is not naturally capable
of successfully impersonating their target, it is quite possible
that with the help of our system they would be capable of
doing so.
VII. CONCLUSION
The primary objective of this research has been to demon-
strate the effectiveness of Morph-a-Dope as an attack against
continuous eye tracking authentication systems which heavily
rely on pupil size features.
To conceptualize this attack we first examined existing
research in eye tracking authentication. We found that a subset
of research is highly dependent on pupil size features, to
the extent that removing them causes dramatic decreases in
performance. This motivated us to replicate one of the systems,
verify that pupil size features far outweigh other features
in discriminative power, and conduct baseline authentication
experiments with that system. Next, we conducted a series of
experiments to explore how lighting impacts pupil size. This
systematic evaluation yielded several results which guided the
development of our attack. Namely, that pupil size adjusts
within a second or two of a lighting change and that this
can shift the distribution of pupil size features towards that
of other users. This demonstrated the potential of our attack
methodology to facilitate imitation by an attacker. We then
conducted an attack based on these concepts and verified that
it can increase the EER by an average of 50%, with some
targeted users experiencing increases of up to 500%.
This research has demonstrated significant weaknesses in
eye tracking authentication systems which are heavily reliant
on features derived from pupil size. Our paper calls for a
rigorous review of the designs for state-of-the-art eye tracking
authentication systems which use pupil size features, with an
eye to defending against attacks like Morph-a-Dope while
guaranteeing reasonable authentication system performance.
VIII. ACKNOWLEDGMENT
This research was supported by National Science Foundation
Award Number: 1527795. We would also like to thank Dr.
Cummins and the Texas Tech Center for Communication
Research for giving us access to an eye tracker for our research.
REFERENCES
[1] Le dimmable led desk lamp. Last accessed in Jan, 2018.
[2]
L. Ballard, S. Kamara, F. Monrose, and M. K. Reiter. Towards practical
biometric key generation with randomized biometric templates. In Pro-
ceedings of the 15th ACM conference on Computer and communications
security, pages 235–244. ACM, 2008.
[3]
R. Bednarik, T. Kinnunen, A. Mihaila, and P. Fr
¨
anti. Eye-movements as
a biometric. Image analysis, pages 16–26, 2005.
[4]
A. Bhattacharyya. On a measure of divergence between two statistical
populations defined by their probability distributions. Bull. Calcutta
Math. Soc., 35:99–109, 1943.
[5]
D. Cameron. Full access to india’s national biometric database reportedly
sold over whatsapp for about $8. Last accessed in May, 2018.
[6]
J. E. Choi, P. A. Vaswani, and R. Shadmehr. Vigor of movements and
the cost of time in decision making. Journal of neuroscience, 34(4):1212–
1223, 2014.
[7]
A. Darwish and M. Pasquier. Biometric identification using the dynamic
features of the eyes. In Biometrics: Theory, Applications and Systems
(BTAS), 2013 IEEE Sixth International Conference on, pages 1–6. IEEE,
2013.
[8]
A. De Luca, M. Denzel, and H. Hussmann. Look into my eyes!: Can you
guess my password? In Proceedings of the 5th Symposium on Usable
Privacy and Security, page 7. ACM, 2009.
[9]
A. Dundee and B. R. Sugar. My view from the corner: A life in boxing.
McGraw-Hill Professional, 2007.
[10]
S. Eberz, K. B. Rasmussen, V. Lenders, and I. Martinovic. Looks like
eve: Exposing insider threats using eye movement biometrics. ACM
Trans. Priv. Secur., 19(1):1:1–1:31, June 2016.
[11]
J. Galbally, R. Cappelli, A. Lumini, G. Gonzalez-de Rivera, D. Maltoni,
J. Fierrez, J. Ortega-Garcia, and D. Maio. An evaluation of direct attacks
using fake fingers generated from iso templates. Pattern Recognition
Letters, 31(8):725–732, 2010.
[12]
C. Holland and O. V. Komogortsev. Biometric identification via eye
movement scanpaths in reading. In Biometrics (IJCB), 2011 International
Joint Conference on, pages 1–8. IEEE, 2011.
[13]
A. K. Jain, A. Ross, and S. Pankanti. Biometrics: a tool for information
security. IEEE transactions on information forensics and security,
1(2):125–143, 2006.
[14]
P. Kasprowski and J. Ober. Eye movements in biometrics. In International
Workshop on Biometric Authentication, pages 248–258. Springer, 2004.
[15]
T. Kinnunen, F. Sedlak, and R. Bednarik. Towards task-independent
person authentication using eye movement signals. In Proceedings of
the 2010 Symposium on Eye-Tracking Research & Applications, pages
187–190. ACM, 2010.
[16]
H. W. Lilliefors. On the kolmogorov-smirnov test for normality with mean
and variance unknown. Journal of the American statistical Association,
62(318):399–402, 1967.
[17]
N. Nugrahaningsih and M. Porta. Pupil size as a biometric trait. In
International Workshop on Biometric Authentication, pages 222–233.
Springer, 2014.
[18]
K. A. Rahman, K. S. Balagani, and V. V. Phoha. Snoop-forge-replay
attacks on continuous verification with keystrokes. IEEE Transactions
on information forensics and security, 8(3):528–541, 2013.
[19]
I. Rigas, O. Komogortsev, and R. Shadmehr. Biometric recognition via
eye movements: Saccadic vigor and acceleration cues. ACM Transactions
on Applied Perception (TAP), 13(2):6, 2016.
[20]
I. Rigas and O. V. Komogortsev. Biometric recognition via probabilistic
spatial projection of eye movement trajectories in dynamic visual
environments. IEEE Transactions on Information Forensics and Security,
9(10):1743–1754, 2014.
[21]
D. D. Salvucci and J. H. Goldberg. Identifying fixations and saccades in
eye-tracking protocols. In Proceedings of the 2000 symposium on Eye
tracking research & applications, pages 71–78. ACM, 2000.
[22]
I. Sluganovic, M. Roeschlin, K. B. Rasmussen, and I. Martinovic. Using
reflexive eye movements for fast challenge-response authentication. In
Proceedings of the 2016 ACM SIGSAC Conference on Computer and
Communications Security, pages 1056–1067. ACM, 2016.
[23]
C. M. Tey, P. Gupta, and D. Gao. I can be you: Questioning the use
of keystroke dynamics as biometrics. Annual Network and Distributed
System Security Symposium 20th NDSS 2013, 24-27 February, pages
1–16, 2013.
[24]
Y. Zhang, W. Hu, W. Xu, C. T. Chou, and J. Hu. Continuous
authentication using eye movement response of implicit visual stimuli.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
Technologies, 1(4):177, 2018.