Measuring Vigilance Decrement using Computer Vision Assisted Eye
Tracking in Dynamic Naturalistic Environments.
Indu P. Bodala1,Student Member, IEEE, Nida I. Abbasi2,Student Member, IEEE, Yu Sun3,
Member, IEEE, Anastasios Bezerianos4,Senior Member, IEEE, Hasan Al-Nashash5,Senior Member, IEEE
and Nitish V. Thakor6,Fellow, IEEE
Abstract— Eye tracking offers a practical solution for mon-
itoring cognitive performance in real world tasks. However,
eye tracking in dynamic environments is difﬁcult due to high
spatial and temporal variation of stimuli, needing further and
thorough investigation. In this paper, we study the possibility
of developing a novel computer vision assisted eye tracking
analysis by using ﬁxations. Eye movement data is obtained
from a long duration naturalistic driving experiment. Source
invariant feature transform (SIFT) algorithm was implemented
using VLFeat toolbox to identify multiple areas of interest
(AOIs). A new measure called ‘ﬁxation score’ was deﬁned to
understand the dynamics of ﬁxation position between the target
AOI and the non target AOIs. Fixation score is maximum when
the subjects focus on the target AOI and diminishes when they
gaze at the non-target AOIs. Statistically signiﬁcant negative
correlation was found between ﬁxation score and reaction time
data (r=−0.2253 and p<0.05). This implies that with vigilance
decrement, the ﬁxation score decreases due to visual attention
shifting away from the target objects resulting in an increase
in the reaction time.
Measuring vigilance decrement with time has always been
a challenge , . The physiological or behavioural vari-
ables that are used for this purpose must be sensitive enough
to measure the changes in individual subject behaviour
with time and robust enough to apply them across various
tasks and applications . Eye tracking provides a practical
solution that can be applicable in natural environments to
measure changes in vigilance. Eye movement data can pro-
vide rich information about the dynamics of perception and
attention in both temporal and spatial domains. For example,
*This work was supported by the National University of Singapore for
Cognitive Engineering Group at Singapore Institute for Neurotechnology
(SINAPSE) under Grant R-719-001-102-232.
1Indu P. Bodala is a Ph.D. student in the faculty of NUS Graduate
School of Integrative Science and Engineering (NGS), National University
of Singapore, Singapore. email@example.com
2Nida I. Abbasi is Master’s student in the department of
Biomedical Engineering, National University of Singapore, Singapore.
3Yu Sun is a senior research fellow in the Cognitive Engineering
group at the Singapore Institute for Neurotechnology (SINAPSE), National
University of Singapore, Singapore. firstname.lastname@example.org
4Anastasios Bezerianos is the head of Cognitive Engineering group at the
Singapore Institute for Neurotechnology (SINAPSE), National University of
Singapore (NUS), Singapore. email@example.com
5Hasan Al-Nashash is a professor in the department of Electrical Engi-
neering, American University of Sharjah, UAE. firstname.lastname@example.org
6Nitish V. Thakor is the director of the Singapore Institute of Neurotech-
nology (SINAPSE), National University of Singapore (NUS), Singapore.
duration of ﬁxations or saccades can reﬂect the amount of
attention towards target stimuli , , . Similarly, the
frequency of eye blinks was used as a measure of workload
experienced in a task , . Other complex measures such
as gaze path, ﬁxation heat map and analysis of eye movement
clusters were developed to provide intuitive insights about
viewing patterns of participants , .
However, analysis of the eye movement data collected in
a naturalistic environment for long duration is a difﬁcult
problem. Tasks conducted in non-conventional, real world
scenarios like driving or surveillance always include many
necessary and unnecessary stimuli . Real world tasks com-
prise dynamic stimuli where the areas of interest constantly
change unlike static stimuli usually used in the laboratory
settings to study cognitive performance. Hence, it is very
important to develop methods to analyze data collected in
such environments to enable interventions that can measure
and enhance performance in everyday tasks.
In this study, we designed a driving experiment where the
subjects performed a speciﬁc braking task in an immersive
virtual environment. The task is designed as close as possible
to the naturalistic driving conditions. In the braking task,
subjects were asked to follow a ‘lead car’ on a country road.
In addition to the lead car which acted as the target stimulus,
there were several non-target stimuli such as oncoming trafﬁc
in the next lane, curvy roads and rich scenery with moun-
tains, grass and houses to the side of the road. Some of these
non-target stimuli may serve as distractors (for example,
scenery) while some may still have signiﬁcance (for example,
oncoming trafﬁc) with respect to the task objective. Eye
movement data of the participants was collected throughout
the task which lasted for about 25 minutes.
To analyse the eye movement data, we adopted areas-of-
interest (AOI) analysis approach. Analysis of AOIs is very
useful in understanding the dynamics of visual attention
in the environments with stimuli distributed and varying
spatially. Metrics such as dwell time, AOI hit and ﬁrst return
were used by researchers to understand visual perception,
attention shifts and the relation to performance , .
Since the stimulus in our experiment was a dynamic scene,
the spatial locations of the AOIs varied. Hence we used
a computer vision algorithm to obtain object locations in
each frame of the video stimulus. Also, these AOIs were
developed based on the semantic characteristics of the objects
deﬁned with respect to the task objectives. Fixations were
Figure 1. (a) An arbitrary frame with the corresponding ﬁxation overlaid. (b) Binary mask for the lead car with mask value
m1, (c) Binary mask for the oncoming trafﬁc with mask value m2 and (d) Binary mask for the control deck and the mirror
with mask value m3 with the corresponding ﬁxation overlaid were obtained from each frame. In this example, m1 is 1 and
m2 and m3 are 0.
then analyzed based on their spatial location and to which
AOIs they belonged.
In the following section, the details of the experiment
task, data collection, identifying dynamic AOIs and analysis
of ﬁxations are discussed in detail. The results of the data
analysis are shown in section 3. The implications of the
ﬁndings and the future directions are discussed in section
4 followed by concluding remarks in section 5.
Six healthy subject aged 20-35 years, with normal or
corrected-to-normal vision and with no previous history of
nervous or psychiatric disorders were recruited from the
National University of Singapore (NUS) to participate in
the study. All the subjects possessed a valid driving license.
The study was approved by the Institutional Review Board
(IRB) of NUS. Written informed consent was obtained from
each participant before the beginning of the experiment. All
participants received monetary compensation for their time
upon the completion of the experiment.
B. Experimental task
The driving experiment was designed using an immersive,
virtual environment based driving simulator where the sub-
jects performed a speciﬁc braking task. All the experiments
were conducted at the same time of the day during afternoon.
Subjects were instructed to drive continuously for 25 mins on
a country road on a sunny day setting where they followed
a lead car without over-taking it (Figure 1.(a)). When the
lead car would brake at infrequent intervals, the subjects
responded by braking to avoid a crash. A trial was deﬁned
as the interval from the start of braking by the lead car to the
start of braking by the subject for which the reaction time
(RT) was calculated. The inter-trial interval varied randomly
between 45-75 seconds. The speed control was automatically
handled by the car in order to minimize variability between
subjects’ driving habits. However, the subjects controlled the
steering to make the car stay in the given lane. The subjects
were strictly instructed to avoid any crashes with the lead
car and the oncoming trafﬁc.
C. Eye tracker data acquisition and preprocessing
While the subjects were performing the driving task, eye
movement data was collected using Tobii TX300 eye tracker
in a standalone external video setup at a sampling rate of 300
Hz. The subjects’ eyes were calibrated to the setup before
starting the experiment. Tobii Studio software was used to
simultaneously capture stimulus video and eye movements
for further analysis. Raw data obtained from the experiment
was classiﬁed into ﬁxations and saccades using the I-VT
ﬁxation ﬁlter available in Tobii studio. The algorithm for eye
movement classiﬁcation was explained in . I-VT ﬁxation
ﬁlter classiﬁes eye movement data based on velocity of the
directional shifts of the eye. If the velocity of a particular
sample is above a certain threshold, then it is classiﬁed
as a saccade. Otherwise it is classiﬁed as a ﬁxation. The
velocity threshold was set to be 300/sec . In addition to
classifying the samples, the I-VT ﬁlter algorithm also had a
gap ﬁll-in function to extrapolate missing data samples and
noise reduction before classifying eye movement data. It will
also look for broken ﬁxations and merge them and/or discard
short ﬁxations after the eye movement classiﬁcation. Markers
to indicate the start of the trial (braking of lead car) and the
response (braking by the subject) were introduced into the
recordings. Gaze data and video segments corresponding to
each trial were extracted using these markers.
D. Obtaining AOIs and ﬁxation scores
1) Object detection using SIFT algorithm: The video seg-
ments corresponding to each trial were divided into frames
at the rate of 5 Hz. Each frame was then processed to
identify AOIs that fall into predeﬁned semantic categories.
The objects present in the experiment were broadly divided
into four semantic categories namely lead car, trafﬁc, scenery
and mirror and control deck. We used source invariant
feature transform (SIFT) algorithm from VLFeat toolbox
for MATLAB to identify the lead car and the oncoming
trafﬁc in each frame . SIFT algorithm comprises a feature
detector and feature descriptor. SIFT detector extracts a
number of frames (attributed regions) from an image in a
way that is consistent with variations in the illumination,
viewpoint and other viewing conditions of the given template
(corresponding to the object to be identiﬁed). SIFT descriptor
associates a signature to the regions which will be used in
identifying their appearance in a compact and robust manner.
We manually checked each frame for correct object identiﬁ-
cation and manually annotated those with wrongly identiﬁed
objects. It is to be noted that the algorithm identiﬁed the
objects accurately in 94% of the total number of frames.
Binary masks, where 1s were assigned to the identiﬁed object
and 0s to the background, were generated to denote the
identiﬁed objects (Figure 1.(b), (c)). Since the coordinates of
the car’s mirror and the control deck were constant in each
frame, we used only one binary mask to represent them for
all the frames (Figure 1.(d)). The rest of the scene is marked
as scenery. Mask value for each mask in (m1,m2,m3as
explained in Figure 1) is deﬁned as the pixel value of each
mask (0 or 1) at the ﬁxation position.
2) Fixation scores: Amongst the above deﬁned semantic
categories, the highest weightage was given to the lead
car (s1=1), since it is the target, followed by the trafﬁc
(s2=0.5), the mirror and the control deck (s3=0.25). Each
ﬁxation position obtained from the preprocessed gaze data
were compared against the binary mask to identify the AOIs
which contained it. Fix score is deﬁned for a ﬁxation as the
average of the product of semantic weights of the AOIs that
contained the ﬁxation as given in (e1). It is to be noted that
if the ﬁxation belongs to the scenery, the ﬁx score will be 0
as all the mask values will be 0. The average of Fix score
for all the ﬁxations in a particular trial is taken as the ‘trial
ﬁxation score’ as given in (e2).
Fix score =∑(si×mi)
Tr ial Fixat ion Score =∑(Fix score j)
where j=1,..,nand nis the number of ﬁxations in that
In this section, we present the results from the analysis
of the reaction time and eye tracking data. The 25 mins
reaction time data is divided into two parts. The ﬁrst 10 mins
data is deﬁned as ‘baseline’ where the vigilance decrement is
assumed to be minimum. The last 15 mins data is assumed
to reﬂect higher vigilance decrement. ANOVA analysis is
performed for the average reaction time for all the subjects
for baseline vs. vigilance decrement data. Figure 2 shows
that there is a statistically signiﬁcant (p<0.05) increase
in the reaction time from the baseline (mean =0.994sand
std dev =0.122s) to the last 15 mins data (mean =1.325s
and std dev =0.24s) indicating that the driving task success-
fully simulated vigilance decrement among subjects.
Figure 2. ANOVA analysis of the average reaction time for
all the subjects for the ﬁrst 10 mins (baseline) and the last
15 mins (vigilance decrement). p−val ue <0.05
Fixation scores are higher when the subjects focus more
on the lead car (target) than on the non-targets. Correlation
analysis between the ﬁxation score and reaction time across
all trials for all the subjects is conducted. Figure 3 shows
statistically signiﬁcant negative correlation between ﬁxation
score and the reaction time (r=−0.2253 and p<0.05).
This agrees with our assumption that the increase in reaction
time is due to the subjects gazing at AOIs which are
not signiﬁcant for task objectives. This implies that, with
vigilance decrement, the visual attention wanders away from
the target objects to non-target objects.
A. Computer vision assisted eye tracking
In this paper, we discussed about the importance of devel-
oping eye tracking metrics that can accommodate dynami-
cally varying visual environments for assessing performance
Figure 3. Correlation analysis between Trial ﬁxation score
and Reaction time. Correlation coefﬁcient (r) =−0.2253 and
of subjects during naturalistic tasks. Eye trackers can be
easily integrated into natural environments to assess the
cognitive performance of individuals. However, there are
very few studies that investigated dynamic AOIs in natural
stimuli . Although these techniques are incorporated into
modern commercially available softwares, the accuracy of
these techniques is limited for stimuli such as in this study
due to various confounding factors such as low contrast be-
tween object and background, multiple objects of interest and
sudden shifts in the position of the cars due to sharp curves in
the road. Hence, it is necessary to identify sophisticated and
robust computer vision algorithms that can identify objects
in dynamic environments with higher accuracy and can learn
semantic importance depending on the task objectives.
B. Measuring vigilance decrement in dynamic environments
With decrement in vigilance, subjects’ focus wanders away
from the task objectives . This will result in the gaze data
shifting from target objects to distractors and a decrease in
the efﬁciency of the task relevant visual trajectories. Hence,
the decrease in ﬁxation score is followed by an increase in
the reaction time. Even though the correlation is signiﬁcant,
the lower value of correlation coefﬁcient may be due to
low sample size or any possible outliers. This approach
has facilitated us to understand the dynamics of attention
across multiple dynamic AOIs which is a common scenario
in naturalistic tasks. Other AOI related features such as dwell
time, transition between AOIs or time to return target AOIs
can be further explored to develop a real time monitoring
system. Also, the sample size will be improved in the future
In this paper, we developed a novel approach to study
eye movements in naturalistic environments. Computer vi-
sion algorithms were used to detect dynamically changing
AOIs. Fixation score metric is developed to account for
the gaze shifts between target and non-target AOIs. Statis-
tically signiﬁcant negative correlation was found between
ﬁxation score and reaction time indicating that ﬁxation score
decreases with vigilance decrement. This research is very
useful in developing computer vision assisted eye tracking
systems that can monitor cognitive performance in dynamic
We would like to express our sincere gratitude to Dr. Quin,
Asst. Professor, Singapore University of Technology and
Design (SUTD) for allowing us to borrow the eye tracker and
data analysis software. We also thank Rohan Ghosh, graduate
student SINAPSE, NUS for his suggestions regarding SIFT
algorithm and Mohammed Sharif, FYP student, ECE, NUS
for helping with data analysis.
 Martel, A., D¨
ahne, S., & Blankertz, B. (2014). EEG predictors of
covert vigilant attention. Journal of neural engineering, 11(3), 035009.
 Dong, Y., Hu, Z., Uchimura, K., & Murayama, N. (2011). Driver
inattention monitoring system for intelligent vehicles: A review. IEEE
transactions on intelligent transportation systems, 12(2), 596-614.
 Berka, C., Levendowski, D. J., Lumicao, M. N., Yau, A., Davis,
G., Zivkovic, V. T., Olmstead, R.E., Tremoulet, P.D. & Craven, P.L.
(2007). EEG correlates of task engagement and mental workload in
vigilance, learning, and memory tasks. Aviation, space, and environ-
mental medicine, 78(5), B231-B244.
 Di Stasi, L. L., Catena, A., Canas, J. J., Macknik, S. L., & Martinez-
Conde, S. (2013). Saccadic velocity as an arousal index in naturalistic
tasks. Neuroscience and biobehavioral reviews, 37(5), 968-975.
 Martinez-Conde, S., Macknik, S. L., & Hubel, D. H. (2004). The
role of ﬁxational eye movements in visual perception. Nature reviews
neuroscience, 5(3), 229-240.
 Bodala, I. P., Ke, Y., Mir, H., Thakor, N. V., & Al-Nashash, H. (2014).
Cognitive workload estimation due to vague visual stimuli using
saccadic eye movements. In Engineering in medicine and biology
society (EMBC), 2014 36th Annual international conference of the
IEEE, pp. 2993-2996. IEEE.
 McIntire, L. K., McKinley, R. A., Goodyear, C., & McIntire, J. P.
(2014). Detection of vigilance performance using eye blinks. Applied
ergonomics, 45(2), 354-362.
 Bodala, I. P., Li, J., Thakor, N. V., & Al-Nashash, H. (2016). EEG
and eye tracking demonstrate vigilance enhancement with challenge
integration. Frontiers in human neuroscience, 10.
 Caldara, R., & Miellet, S. (2011). iMap: a novel method for statistical
ﬁxation mapping of eye movement data. Behavior research methods,
 Istance, H., & Hyrskykari, A. (2011). Gaze-aware systems and atten-
tive applications. Gaze interaction and applications of eye tracking:
Advances in assistive technologies, 175.
 Goldberg, J. H., Stimson, M. J., Lewenstein, M., Scott, N., &
Wichansky, A. M. (2002). Eye tracking in web search tasks: Design
implications. In proceedings of the 2002 symposium on Eye tracking
research and applications, pp. 51-58. ACM.
 Riby, D., & Hancock, P. J. (2009). Looking at movies and cartoons:
eye tracking evidence from Williams syndrome and autism. Journal of
intellectual disability research, 53(2), 169-181.
 Olsen, A. (2012). The tobii i-vt ﬁxation ﬁlter. Tobii Technology.
 Olsen, A., & Matos, R. (2012). Identifying parameter values for an
I-VT ﬁxation ﬁlter suitable for handling data sampled with various
sampling frequencies. In proceedings of the symposium on Eye
tracking research and applications, pp. 317-320. ACM.
 Vedaldi, A., & Fulkerson, B. (2010). VLFeat: An open and portable
library of computer vision algorithms. In proceedings of the 18th ACM
international conference on multimedia, pp. 1469-1472. ACM.
 Papenmeier, F., & Huff, M. (2010). DynAOI: A tool for matching
eye-movement data with dynamic areas of interest in animations and
movies. Behavior research methods, 42(1), 179-187.
 Thomson, D. R., Besner, D., & Smilek, D. (2015). A resource-control
account of sustained attention: Evidence from mind-wandering and
vigilance paradigms. Perspectives on psychological science, 10(1), 82-