Content uploaded by Pawel Kasprowski
Author content
All content in this area was uploaded by Pawel Kasprowski on Apr 27, 2020
Content may be subject to copyright.
Using Dissimilarity Matrix for Eye Movement
Biometrics with a Jumping Point Experiment
Pawel Kasprowski and Katarzyna Harezlak
Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland
pawel.kasprowski@polsl.pl
Abstract. The paper presents studies on the application of the dissim-
ilarity matrix-based method to the eye movement analysis. This method
was utilized in the biometric identification task. To assess its efficiency
four different datasets based on similar scenario (’jumping point’ type)
yet using different eye trackers, recording frequencies and time inter-
vals have been used. It allowed to build the common platform for the
research and to draw some interesting comparisons. The dissimilarity
matrix, which has never been used for identifying people on the basis of
their eye movements, was constructed with usage of different distance
measures. Additionally, there were different signal transforms and met-
rics checked and their performance on various datasets was compared.
It is worth mentioning that the paper presents the algorithm that was
used during the BioEye 2015 competition and ranked as one of the top
three methods.
Keywords: eye movement biometrics, dissimilarity matrix, fusion, dy-
namic time warping
1 Introduction
Eye movement biometrics has been investigated for over 10 years, however there
are still no commercial applications utilizing this modality. The main problem is
lack of established and well understood methods that can be used to distinguish
eye movement characteristics of different people.
Some effort has already been made to solve this problem. There are eye
movement datasets available to download, and there are biometric contest or-
ganized like EMVIC 2012 [6], EMVIC 2014 [5] or BioEye 2015 [13]. But the
methods used by contest participants are not always published and therefore are
sometimes not reproducible. Moreover, because multiple submissions are pos-
sible during such a competition and typically some training data is available
in advance, the methods are optimized for the competition’s dataset and suffer
from poor generalization on other datasets. Such a generalization requires the
application of elaborated methods for datasets collected using different setups
and for different users, which may help to find out a solution serving well for
many eye movements collections.
This is a pre-print. The final version of the paper was published in Springer, Smart Innovation,
Systems and Technologies, Vol. 57, 2016 as part of Proceedings of the 8th KES International
Conference on Intelligent Decision Technologies (KES-IDT 2016) – Part II and is available in
Springer Link Library: https://link.springer.com/chapter/10.1007/978-3-319-39627-9_8
This was the one of the motivating factors to apply the new feature extraction
method – which was developed by authors for the BioEye 2015 competition and
ranked as one of the top three methods – to various eye movement datasets. This
method, based on dissimilarity matrix [2], has not yet been used for eye move-
ment biometrics. We checked its performance using various signals, transforms
and divisions of samples.
To overcome the lack of generalization, the usefulness of the solutions applied
was tested on four different datasets recorded using three different eye trackers.
It enabled us to draw some meaningful conclusions about efficiency of various
approaches combinations and experiment’s scenarios.
2 Eye Movement Biometrics Using a ’jumping point’
Stimulus
All datasets which were used, were recorded using a jumping point - one of the
most popular stimuli. During such an experiment a subject is instructed to follow
with eyes a point displayed on a screen. The point’s position changes periodically
- that is why it is called a ’jumping point’. The advantage of such a stimulus is
that eye movements are more or less predictable and comparable between trials.
On the other hand such a stimulus forces a specific behavior so it measures more
physiological patterns of a person than behavioral ones.
The first usage of such a kind of stimulus was reported in [7]. There were
cepstral coefficients used as features for a classifier. In the work [8] the idea
was extended with usage of Principal Component Analysis (PCA) to reduce
the number of attributes. Another notable work was [10]. Authors extracted
saccades and used training samples to create an Oculomotor Plant Mathematical
Models (OPMM) [12]. The idea was extended in [11] where nine oculomotor plant
characteristics (OPC) were empirically chosen. The OPC biometrics calculated
for different subjects were compared using a voting version of Student t-test
and the Hotelling T-square test. The results were fused using logical AND or
OR techniques.
In 2012 there was the first Eye Movement Verification and Identification
Competition (EMVIC) organized and it resulted in several publications [6].
There were four datasets presented - all created using a ’jumping point’ stimu-
lus. According to [6] the winner of the competition divided samples into parts
and calculated 2D histograms of speed and direction. The second place holder
extracted velocity and acceleration and compared their distributions using the
multivariate Walf-Wolfowitz test [14].
In 2013 Holland and Komogortsev [3] compared results for different stimuli
and devices using the same set of 14 features (named CEM features). The results
were calculated for every feature and the fusion of all features.
Finally, in 2015 there was the BioEye competition announced with four
datasets [13]. Two of them were based on jumping point stimulus and were
used in the presented research.
3 Feature Extraction and Classification
Before any feature extraction method was applied to eye movement signal, each
sample from the dataset was divided into events. An event was a part of a sample
for which stimulus point’s position was in the same place. Every event was
described by a starting position - location of the point just before the event -
and the ending position - location of the point during the event. A direction of
an event was defined as a direction of a vector from its start position to end one.
Signals Extraction. On the basis of the raw eye positions, the first, second
and third derivatives were extracted for every event independently. There were
velocity (v), acceleration (a) and jerk (j) calculated as an absolute value and
for both horizontal and vertical directions. It resulted in 9 signals for every
event (Table 1).
Table 1. Set of signals extracted from eye movement
signal formula description
vx, vy Vx=∂x
∂t , Vy=∂ y
∂t the first derivative of xand y
(i.e. vertical and horizontal velocities)
vxy V=√V2
x+V2
ythe first derivative for absolute velocity
ax, ay V
′
x=∂Vx
∂t , V
′
y=∂Vy
∂t the second derivative of xand y
(i.e. vertical and horizontal accelerations)
axy V
′=√V′2
x+V′2
ythe derivative of vxy
jx, jy V
′′
x=∂V
′
x
∂t , V
′′
y=∂V
′
y
∂t the third derivative of xand y(jerk)
jxy V
′′ =√V′′2
x+V′′2
ythe derivative of axy
where x,y- the raw coordinates
Signal Transformation. The next phase was the calculation of different trans-
forms from each of the nine signals separately. There were four transforma-
tions used: Fourier transform (F) ([8]), Cepstrum transform (C) ([7]) Daub
Wavelet transform (W) and signal normalization to 0-1 (N). Together with
not transformed signal (S) it gave 5 different transforms and altogether, there
were 9 ×5= 45 different signals extracted.
Features Preparation. Signals obtained in the previous pre-processing phase
were subsequently used to build feature sets with usage of the dissimilarity
matrix-based method [2]. Similar method has already been used for behavioral
biometrics [16], but using it for eye movement signal is our original contribution.
While preparing feature sets, at first the eye movement dataset was divided by
half into training and testing events. Because each of datasets used to evaluate
the proposed method was built from samples collected during two sessions, there
were always two samples for each subject. Events from the first user’s sample
were treated as training, while events from the second sample as testing ones.
Then, for every training event, its distances to all other training events were
calculated. These distances formed a feature set, consisting of N features where
N is the number of training events. This feature set was used as an input to
a classification model building algorithm. The same procedure was used for test-
ing events - at first distances to all training events were calculated and formed
a set of attributes and then this set was used to classify the given event.
Given two signals Saand Sbthe distance between them may be calculated
using different measures. In this research three of them were taken into account
– Dynamic Time Warping (DTW) [1], Euclidean distance (EUC) and Earth
Mover’s Distance (EMD) [15]. As a result, a separate feature set for every com-
bination of every signal and distance measure was prepared (45 ×3 = 135
feature sets).
Classification. As it was stated above, every dataset used consisted of two
sessions for each subject. During the classification, events from the first session
of the subject were always used as training data and the events from the second
session of the same subject as testing data. The K Nearest Neighbors algorithm
with value of K equal to 1 was used as a classifier and every testing event was
classified separately.
The final result for each testing sample was determined using a simple score
fusion. The classification model, for every event e, returned a probability p(e, c)
that this event belongs to class c. For 1NN classifier this value was equal to 1
for one class and 0 for all other classes. Score for class cin every sample swas
calculated as: score(s, c) = ∑E
e=1 p(e, c),where Emeans all events belonging
to the sample s. The final label for the sample was calculated as: label(s) =
argmaxc(score(s, c))
4 Datasets
The studies discussed in the paper were conducted using four datasets called
JAZZ, VOG, RAN30 and RAN1Y. As it was mentioned above all of them were
based on a jumping point stimulus, however the time of presentation and a num-
ber of point’s positions displayed differed for given sets. Other differences re-
garded a type of an eye tracker used to record eye movements, numbers of users
taking part in experiments and a time interval between sessions of an experiment.
The detailed information for each set is provided below.
VOG dataset - VOG dataset was obtained with usage of the self-developed
VOG head-mounted eye tracker with a single CMOS camera with USB 2.0 in-
terface (Logitech QuickCam Express) possessing 352x288 sensor and lens with
IR-Pass filter. The camera was mounted on the arm attached to head and was
pointing at the right eye. The system generated 20 - 25 measurements of a center
of a pupil per second. The dataset consisted of recordings collected for 26 partic-
ipants during two sessions separated by three weeks interval. One recording of
an eye movement referred to 30 points displayed on a screen, each for 3 seconds.
There were 52 recordings in this dataset each including 1400-1500 samples.
JAZZ dataset - The second dataset was obtained using head mounted Jazz-
Novo eye tracker (product by Ober-consulting) that records eye positions with
frequency 1000Hz. It uses direct Infra-Red Oculography (IROG) and utilizes
pairs of IR emitters and sensors. The optoelectronic transducers are located
between the eyes. This set included 48 recordings from two sessions related to 24
participants. A setup for a between session interval, a number and time of stimuli
displayed was the same like for the VOG dataset. Each recording consisted of
between 99000 to 100000 samples.
RAN30 and RAN1Y datasets - Both RAN30 and RAN1Y datasets were
part of the BioEye competition and were recorded using an EyeLink eye-tracker
working at 1000 Hz. The raw eye movement signals were subsampled to 250 Hz
with the usage of an anti-aliasing filter.
RAN30 dataset was built on the basis of recordings of 153 subjects and was
collected during two sessions organized one by one in 30 minutes (all together
306 recordings). During each session user’s task was to follow with eyes 100
points, each of which was shown for one second, which gave 25000 samples for
one recording.
RAN1Y dataset consisted of recordings of 37 subjects. The only difference
between RAN30 and RAN1Y experiments was the interval between sessions - it
was one year in the latter.
5 Comparison of Results
Results obtained from the classification process were studied in terms of an in-
fluence of pre-processing phases on a final accuracy of a classification. For this
analysis purpose ANOVA test was used to check an existence of significant dif-
ferences among groups of the above described feature creating methods. In case,
when such a difference was found, Tukey’s HSD test was applied to determine,
which groups exactly differ from each other. Comparing these outcomes with an
accuracy of a classification results allowed us to point out method yielding the
best results.
In the first step of the analysis the transform type applied to each kind
of signals was taken into account. In all four analyzed datasets results of the
ANOVA test rejected the null hypothesis, that all groups had identical means.
Deeper studies of differences with usage of Tukey HSD test and classification
results revealed that for all sets Wavelet and Cepstrum transforms gave signifi-
cantly worse results than three other types - normalization (N), Fourier (F) and
original signal (S). The latter group (S, N, F) provided better accuracy, however
these results turned to be not significant between each other, with one exception,
the VOG set.
Subsequently, our attention was paid to measures used for calculating dis-
similarity matrix. As it was mentioned above, there were three different distance
measures: Dynamic Time Warping, Euclidean distance and Earth Mover Dis-
tance (denoted during tests by D, E, M respectively) taken into account. The
comparison of these methods using the ANOVA test in conjunction with stud-
ies of the classification accuracy revealed that for all four datasets DTW pro-
vided the best classification results. The statistical significance was confirmed
for RAN30 and RAN1Y, and in regard to EMD method in VOG data set. Only
in case of JAZZ set statistically significant differences were not found. All these
discussed results are collected in Figure 1.
It can be seen that for video based eye trackers the results are correlated
with frequency sampling as the results for RAN30 and RAN1Y datasets are
significantly better than for VOG dataset recorded with much lower frequency.
On the other hand, the results of the same method for Jazz-Novo eye tracker
- working differently - are worse, despite of its very high recording frequency
(1000Hz). It shows that the method presented in this paper is not sufficient for
such kind of data.
M
M
M
M
E
E
E
E
D
D
D
D
-0,05
0,05
0,15
0,25
0,35
0,45
0,55
VOG JAZZ RAN30 RAN1y
Fig. 1. Mean results of classification for all four sets and every distance (D - DTW dis-
tance, E - Euclidean distance, M - EMD distance). Significant differences were marked
with horizontal lines
6 Fusion of Feature Sets and Final Results
To check the real strength of that method to perform users’ identification it was
decided to combine feature sets results in a score level voting fusion. For every
feature set at first a score for each sample sand class cwas determined and then
a score for fusion was calculated as: scorefus(s, c) = ∑K
k=1 scorek(s, c) where K
is the number of feature sets taken into account. The final label for a sample s
was determined based on equation labelfus (s) = argmaxc(scoref us (s, c))
The aim of this analysis was to find a combination of features, which provides
the best classification results for all datasets. Because Wavelet and Cepstrum
transforms gave in most cases results significantly worse than the other trans-
forms, and Earth Mover Distance was the worst among distances - feature sets
prepared using these pre-processing methods were omitted in the subsequent
analysis. Additionally, because Dynamic Time Warping gave the best results for
every dataset, it was decided to use feature sets based on this metric in every
analyzed combination. As the result, only three transforms (S, N and F) and
two distance measures (DTW and EUC) were taken into account. There were
different combinations of feature sets checked with a number of feature sets rang-
ing from 27 to 162. The results were also compared with the combination of all
feature sets (see Table 2).
Table 2. Accuracies and Equal Error Rates obtained for different combinations of
transforms and distance functions.
transf./distance Accuracy EER
RAN30 RAN1Y VOG JAZZ RAN30 RAN1Y VOG JAZZ
SFNCW/DEM 81,1% 56.8% 34,6% 12.5% 6.5% 18.1% 30.8% 41.7%
S/D 81.1% 51.4% 38.5% 8.3% 8.1% 24.3% 35.1% 39.6%
N/D 78.4% 54.1% 15.4% 8.3% 9.4% 17.8% 30.8% 45.4%
F/D 70.3% 51.4% 15.4% 16.7% 9.3% 18.9% 38.5% 42.2%
SN/D 83.8% 54.1% 34.6% 8.3% 6.4% 19.8% 33.2% 38.7%
SF/D 78.4% 56.8% 30.8% 12.5% 8.1% 21.6% 38.5% 41.7%
FN/D 78.4% 59.5% 23.1% 12.5% 8.1% 16.8% 36.4% 41.9%
SFN/D 81.1% 62.2% 34.6% 8.3% 6.1% 19.6% 35.6% 42.8%
S/DE 89.2% 54.1% 15.4% 8.3% 8.1% 20.9% 31.5% 41.7%
N/DE 73.0% 54.1% 19.2% 12.5% 9.8% 16.2% 33.5% 41.7%
F/DE 75.7% 48.6% 15.4% 12.5% 9.5% 20.1% 40.0% 37.5%
SN/DE 83.8% 59.5% 26.9% 8.3% 5.8% 18.5% 31.2% 39.8%
SF/DE 83.8% 59.5% 34.6% 8.3% 8.1% 18.9% 35.1% 39.6%
FN/DE 83.8% 54.1% 26.9% 8.3% 6.2% 17.2% 35.7% 45.2%
SFN/DE 89.2% 59.5% 26.9% 8.3% 5.4% 18.2% 33.4% 40.7%
S - not transformed signal, F - Fourier, N - Normalization, C - Cepstrum,
W - Wavelet, D - DTW distance, E - Euclidean distance, M - EMD distance
The obtained results were examined in terms of an existence of Pearson
correlation between datasets. These studies confirmed it for RAN30 and RAN1Y
results (0.42) and for RAN1Y and VOG (0.43). The correlation between VOG
and RAN30 results is lower but still visible (0.37). What is interesting, the results
for JAZZ dataset are negatively correlated with VOG (-0.49). The main reason
of this fact is that, contrary to VOG, JAZZ dataset gave quite good results for
Fourier based transform.
The best combination of feature sets was SF N and D - three transforms: (S)
not transformed signal, (F) Fourier, (N) Normalization and: (D) DTW distance
measure - with 46.55% accuracy on average for all datasets. However, it is visible
that differences in classification accuracy among datasets are significant and the
results are reasonable only for RAN30 and RAN1Y datasets.
Additionally, false rejection and false acceptance rates for Rnumber of tested
recordings and different acceptance thresholds th were calculated using equations
(1) and (2).
F RR(th) = R−∑R
s=1 as,c(s)
R(1)
F AR(th) = ∑R
s=1 ∑C
j=1,j̸=c(s)as,j
(C−1)R(2)
where c(s) denotes the class identifier the sample sbelongs to and as,j is given
by Eq. (3):
as,j (th) = {1scoref us(s, j )> th
0otherwise (3)
By changing the acceptance threshold, the Equal Error Rate (EER) - error
value for the threshold for which FAR and FRR are equal - was calculated for
each set. The results are presented in Table 2. The best combination is the same
as for accuracy only for RAN30 dataset, there are differences in all other datasets.
While accuracy measure deals only with one - the best - result, EER calculation
takes into account all results so it may be treated as a better description of
model’s performance. Results obtained for RAN30 and RAN1Y datasets are
acceptable as for eye movement biometric - in fact EER equal to 5.4% is one of
the best results published so far. However, the same method used for VOG and
JAZZ datasets achieved significantly higher error rates.
7 Discussion
The primary aim of the experiments presented in this paper was to examine
whether usage of exactly the same method for various datasets of eye move-
ments would ensure the similar classification efficiency in all considered cases.
It occurred that the results obtained for each of the four datasets used differed
substantially in accuracy, however there were some common patterns visible,
when comparing performance of different transforms and distance measures.
The detailed discussion of these outcomes is provided below.
Analyzing results concerning distance measures it turns out that Dynamic
Time Warping method proved to be the best choice for every dataset while
Earth Mover’s Distance function was the worst one. Additionally the usage of
Wavelet and Cepstrum transforms did not offer any improvement to the results
in any dataset.
The differences in the accuracy and EER for the same method and different
datasets show that every new method to be applied for eye movement based iden-
tification - despite of achieving good results for some available dataset - should
always be checked against other data collections before any general conclusions
about its performance may be presented.
Another issue to explore was to check, which properties of datasets influenced
the results. All datasets used were built using a very similar scenario (a jumping
point stimulus). Therefore, it was possible to compare results directly. Two of
the datasets (RAN30 and RAN1Y) were collected with usage of the same equip-
ment and the correlation of results for these two datasets is visible. The results
are also correlated with the VOG dataset, which was created with a similar tech-
nique using infrared camera (however with much lower frequency). Interestingly,
it turned out that the results for the last of the datasets (JAZZ) are completely
different and the correlation is even negative. As the latter dataset was gathered
using a device utilizing a completely different technique (IROG) it may be sup-
posed that a type of a device utilized to record data has a significant influence
on classification results.
The other interesting conclusion may be the finding, that a pool of partici-
pants did not influence results significantly. There were different pools used for
RAN30 and RAN1Y datasets and results were similar while almost the same
pool of participants was used for both VOG and JAZZ datasets - and the results
were different in this case.
Comparing RAN30 and RAN1Y datasets it is visible that, despite of similar
distribution of the results, the results for RAN30 dataset are significantly better.
The only reason for this may be a different time interval between sessions, which
was 30 min for RAN30 and 1 year for RAN1Y dataset. It shows that short term
repeatability of eye movement caused by current attitude or mood of a person
may significantly (and artificially) improve classification results. It is in line with
conclusions derived in [4] and [9].
8 Summary
The studies conducted in this research were inspired by the awareness that the as-
sessment of methods used for eye movement data processing and analysis should
be done by their comparison with other studies conducted in the same field.
Having different collections of data it is possible to explore an influence of
some data pre-processing methods on the final classification result, which was
presented in the paper. The research confirmed the existence of both some dif-
ferences and some patterns when various methods and results obtained for them
are taken into account. It allows to suppose that continuing such a type of stud-
ies will enable to reach some general conclusions in field of eye movement data
biometrics. Because the results presented in this work are far from perfect it
indicates that there is a lot of work to be done to lower error rates.
Acknowledegement. The authors would like to thank organizers of BioEye
2015 competition for publishing eye movement datasets that were used in this
research. We also acknowledge the support of Silesian University of Technology
grant BK/263/RAu2/2016.
References
1. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time
series. In: KDD Workshop. vol. 10, pp. 359–370. Seattle, WA (1994)
2. Duin, R.P., Pekalska, E.: The dissimilarity space: Bridging structural and statistical
pattern recognition. Pattern Recognition Letters 33(7), 826–832 (2012)
3. Holland, C.D., Komogortsev, O.V.: Complex eye movement pattern biometrics:
Analyzing fixations and saccades. In: 2013 International Conference on Biometrics
(ICB). pp. 1–8. IEEE (2013)
4. Kasprowski, P.: The impact of temporal proximity between samples on eye move-
ment biometric identification. In: Computer Information Systems and Industrial
Management, pp. 77–87. Springer (2013)
5. Kasprowski, P., Harezlak, K.: The second eye movements verification and identi-
fication competition. In: 2014 IEEE International Joint Conference on Biometrics
(IJCB). pp. 1–6. IEEE (2014)
6. Kasprowski, P., Komogortsev, O.V., Karpov, A.: First eye movement verification
and identification competition at btas 2012. In: 2012 IEEE Fifth International
Conference on Biometrics: Theory, Applications and Systems (BTAS). pp. 195–
202. IEEE (2012)
7. Kasprowski, P., Ober, J.: Eye movements in biometrics. In: Biometric Authentica-
tion, pp. 248–258. Springer (2004)
8. Kasprowski, P., Ober, J.: Enhancing eye-movement-based biometric identification
method by using voting classifiers. In: Defense and Security. pp. 314–323. Interna-
tional Society for Optics and Photonics (2005)
9. Kasprowski, P., Rigas, I.: The influence of dataset quality on the results of behav-
ioral biometric experiments. In: 2013 International Conference of the Biometrics
Special Interest Group (BIOSIG). pp. 1–8. IEEE (2013)
10. Komogortsev, O.V., Jayarathna, S., Aragon, C.R., Mahmoud, M.: Biometric iden-
tification via an oculomotor plant mathematical model. In: Proceedings of the 2010
Symposium on Eye-Tracking Research & Applications. pp. 57–60. ACM (2010)
11. Komogortsev, O.V., Karpov, A., Price, L.R., Aragon, C.: Biometric authentication
via oculomotor plant characteristics. In: 2012 5th IAPR International Conference
on Biometrics (ICB). pp. 413–420. IEEE (2012)
12. Komogortsev, O.V., Khan, J.I.: Eye movement prediction by kalman filter with
integrated linear horizontal oculomotor plant mechanical model. In: Proceedings
of the 2008 Symposium on Eye Tracking Research & Applications. pp. 229–236.
ACM (2008)
13. Komogortsev, O.V., Rigas, I.: Bioeye 2015: Competition on biometrics via eye
movements. In: 2015 IEEE 7th International Conference on Biometrics Theory,
Applications and Systems (BTAS). pp. 1–8. IEEE (2015)
14. Rigas, I., Economou, G., Fotopoulos, S.: Biometric identification based on the eye
movements and graph matching techniques. Pattern Recognition Letters 33(6),
786–792 (2012)
15. Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for
image retrieval. International Journal of Computer Vision 40(2), 99–121 (2000)
16. Shen, C., Cai, Z., Guan, X., Du, Y., Maxion, R.A.: User authentication through
mouse dynamics. IEEE Transactions on Information Forensics and Security 8(1),
16–30 (2013)