Content uploaded by Pawel Kasprowski
Author content
All content in this area was uploaded by Pawel Kasprowski on Jun 09, 2014
Content may be subject to copyright.
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Mining of Eye Movement Data to Discover People
Intentions
Pawel Kasprowski
Institute of Informatics
Silesian University of Technology
Gliwice, Poland
kasprowski@polsl.pl
Abstract. The process of face recognition is a subject of the research in this pa-
per. 1430 recordings of participants eye movements while they were observing
faces were analyzed statistically and various data mining techniques were used
to extract information from eye movements signal. One of the findings is that
the process of face recognition is different for different subjects and therefore
formulating general rules for face recognition process may be difficult. The hy-
pothesis was that it is possible to analyze eye movements signal to predict if the
subject observing the face recognizes it. A model that automatically differenti-
ates observations of recognized and unrecognized faces was built and the results
are encouraging. One of the contributions of the paper is a conclusion that the
optimal set of attributes of eye movement signal for such classification is indi-
vidually specific and different for different people.
Keywords: eye movement, face recognition, data mining
1 Introduction
Face recognition is one of the first abilities of a newborn child and is the basic ability
of every human being. Therefore, there is no surprise that the process of face recogni-
tion is an intensively studied subject. However – to the authors knowledge – there is
no published information regarding an existence of a successful algorithm that auto-
matically recognizes whether a person observing the face knows this face. The paper
describes an attempt to use advanced data mining techniques to recognize subject’s
familiarity of the observed face, basing on eye movements of that subject, recorded
during the observation. The other contribution of the paper is the idea of distinguish-
ing the personal differences in a faces recognition process for different subjects.
2 Related research
There are a lot of studies concerning the way humans recognize faces. It may be di-
vided into two main categories: analyzing neural aspects of face recognition – what
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
parts of our brain are responsible for it [Pitcher et al 2011] and analyzing what ele-
ments of a face are taken into account in a recognition process [Barton et al. 2006].
Eye movements information proved to be very useful for the latter. The face observa-
tion research may be divided into several fields briefly described below.
Models. People are able to recognize faces from the early beginnings of their life.
However, there are still doubts about the way the “algorithm” for face recognition
works. There are two different theories: holistic and analytical [Van Belle et al. 2010].
According to the holistic theory, people recognize faces acquiring an image of the
face that is matching the pattern seen with the pattern remembered previously. A face
is perceived as a whole – a pattern is a combination of all specific features [Tanaka
and Gordon 2011]. The analytical theory concentrates on people’s ability to decom-
pose a face into different features. It suggests that the recognition process uses infor-
mation about special faces’ properties as shape and color of eyes, a nose and a mouth.
According to this theory people separately compare patterns of specific parts of a face
with patterns recorded in their brain. The evidence to this theory is that eye tracking
data recorded during face recognition concentrate on specific parts of the face (like
eyes and nose) [Barton et al 2006; Itier et al 2007]. There were many interesting ex-
periments conducted, like showing only a part of a face [Van Belle et al 2010] or
automatically removing a face after specific number of fixations [Hsiao and Cottrell
2008].
Regions of interest. Eye movement studies show that the first fixation is usually
placed in the middle of the face but near the upper part. Second fixation typically goes
to the left side (in most cases near the right eye on the face) [Guo et al 2012]. People
concentrate on the upper part of the face because it contains more personally distinc-
tive features.
Familiarity. There are also some studies searching for the dif-ferences in how
people observe familiar and unfamiliar faces. These studies are particularly interesting
for our work. According to Ryan et al [2007] the length of the first fixation is differ-
ent when observing familiar and unfamiliar faces. Van Belle at al [2010] analyses
these differences with the conclusion that the differences are the most significant
towards the end of the ob-servation and that the last fixation should be taken into
account. Barton et al [2006] concludes that known faces are analyzed in more simpli-
fied way – only to confirm familiarity. On the con-trary, unknown faces are always
scanned for all interesting fea-tures.
Differences among people. The studies concerning differences in how people ob-
serve faces are not widely adopted. Rozhkova at al [2009] suggested as one of the
conclusions that fixation position when recognizing faces demonstrates significant
inter-individual variability. Blais at all [2008] proved that scan patterns are different
for different races comparing Western Caucasian and East Asian observers. Rigas et
al [2012] were able to identify people basing on the way they observed faces on a
computer screen.
Eye movement data mining. The latter paper is one of the ex-amples of using da-
ta mining techniques to extract information from an eye movement signal. Other in-
teresting examples include mostly people identification [Kasprowski and Ober 2005;
Komogortsev et al 2010] but also people intentions [Bednarik et al 2012]. To the best
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
of authors knowledge there are no published attempts to automatically recognize ob-
server’s familiarity of the face being observed.
3 Experiment and dataset
The main hypothesis of this research was that it is possible to predict if people recog-
nize presented faces basing on their eye movements characteristic. To check that hy-
pothesis a dataset of eye movements samples recorded during faces observations was
built. A head mounted Jazz-Novo eye tracker that records eye positions with 1000Hz
frequency was used. 34 participants took part in the experiment.
One session consisted of initial nine points of regard calibration and subsequent
faces presentations. Between presentations the system was always recalibrated with a
simplified three points calibration. The participants task was to look at a face on the
screen and asses, by pressing one of two possible buttons, if they recognize the face or
not. After pressing the button the face disappeared. This simple task will be named
the ‘observation’ in the subsequent text. Every face appearing on the screen was
cropped, so that eyes were in the same place for every picture. No further processing
was applied and faces were just photographs of different people.
Every person took part in at least one ‘session’ – a sequence of face observations.
There were overall 56 sessions provided. 22 participants took part in two sessions
with one week interval between them. The first session consisted of 24 observations
(24 different faces) and the second session of 27 observations (27 faces different than
in the first session). The total number of separate observations was 1430. The number
of observations for which participant’s decision was positive (i.e. the face is recog-
nized) was 418 and the number of observations with ‘not recognized’ decision (re-
ferred later as negative observations) was 1012.
It is worth mentioning that familiarity of the face is something that is not a binary
property [Van Belle et al 2010]. We can divide familiar faces into: famous faces,
personally familiar faces (i.e. known from real life), familiarized faces (e.g. familiar-
ized during the previous sessions), artificial faces (painted, drawn or produced by a
computer). There are also a number of factors that are not measurable or very difficult
to measure. Unknown face may be similar to some known face, known person may be
photographed in unusual circumstances (like strange hair cut or strange face expres-
sion). That is why we should rather say about a ‘level of familiarity’. This level is
personally dependent and may differ for different people.
It is also very important to notice that in the experiment it wasn’t checked if the
subject really knows the observed face. It was only checked if the subject recognizes
the face. It was assumed that eye movements characteristic is different when subjects
decide that they know the face. However, it could likely happen that the subject made
a mistake. For instance only 85% of subjects recognized the face of Arnold
Schwarzenegger. The faces of people organizing the experiment were recognized by
only 63% of the subjects. So, in the subsequent analyses, the eye movements record-
ings were used to predict the tested subject’s decision about the face familiarity and
not the ground truth about the familiarity.
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
Fig. 1. Example of typical scan-path recorded during face observation
4 Data preparation
After gathering the data, there were some initial studies performed, based on the pre-
viously published findings to see if there were features of the registered observations
that might be used. At first the observation time was examined, hoping that unfamiliar
faces were observed longer. It occurred that there was no significant difference be-
tween recognized and not recognized faces with 2.31 sec. and 2.48 sec (p=0.1). The
shortest observation time was recorded for the most known face of Barack Obama.
However, the second shortest observation time was recorded for a “face_11” – not
known by anybody. The observation time also depends on practically immeasurable
similarity of the face to the most known pictures of the same person. It was the case
of the actress Sandra Bullock, for which a picture was presented with an unusual face
expression what resulted in longer observation time. The most important finding,
regarded significant differences in observation time between participants, with aver-
age values between 1.26 and 5.93 sec. The ANOVA test for different subjects and
observation lengths showed (with F(33,1396)=20.24, p<1*10-10) that hypothesis that
the length of observation was independent of the participants cannot be proved. Inter-
estingly, it occurred that there existed a strong negative correlation between the ob-
servation time and the number of a presented face in the sequence during the session
(-0.62). It means that subsequent observations during the same session were shorter –
the observer was able to make decisions faster. Probably the observers got used to
their task and became more focused on the observation. It showed that the length of
observation is personally specific and depends on many other elements like the length
of a session, similarity of a face to other known faces and so on.
There were several additional features compared for positive and negative observa-
tions. For instance the difference in fixation number was checked with 7.63 fixations
for negative (unrecognized face) and 7.25 fixations for positive (recognized face)
observations. For most of the features used, the differences were not significant with
p>0.05. But it occurred that – similarly to observation length - for nearly all the fea-
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
tures it was impossible to prove independency of the participant. This important find-
ing was used in further experiments.
Fig. 1. Histogram of number of fixations per observation
4.1 Features extraction
Because it occurred that it is difficult to find one feature of eye movement signal that
indicates if a subject will choose known or unknown option, the fusion of different
features to classify observations was checked. The hypothesis was that combination
of a set of weak classifiers may produce satisfactory results.
Because the focus was on differences between positive and negative observations,
the most interesting part of the eye movement signal was after the moment when a
person had already made her/his decision. That is why the eye movement recorded at
the end of the observation was mostly used – assuming that it would be more mean-
ingful [Van Belle 2012]. Surprisingly, it occurs that the decision is made relatively
fast. According to Hsiao and Cottrel [2008] people are able to decide about familiarity
of the face just after two fixations. All subsequent fixations are done only for confirm-
ing their primary – in most cases final – assessment. According to Ryan [2007] even
the length of the first fixation may be different for known and not known faces.
It was decided to use different feature sets to see how it was possible to predict the
subject’s decision. There were both fixation related and signal related features used.
There were seven groups of features built with different parameters.
• Observation length
• Histogram of velocities during the last X ms (with X= 500, 1000 and 1500ms) –
8 values
• Histogram of movement directions during the last X ms (with X= 500, 1000 and
1500ms) – 8 values
• Number of fixations during the last X ms (with X = 500, 1000, 1500, 2000 and
2500 ms)
• Length of the first N fixations (with N = 2, 3, 4, 5)
• Length of the last N fixations (with N = 1, 2, 3, 4, 5)
• X and Y position of the last N fixations (with N = 1 and 2)
Next, there were feature sets created for every combination of groups with different
values of parameters. It resulted in 212 feature sets.
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
4.2 Building a classification model
The main intention was to use the data described in the previous sections to build a
model that automatically classifies observations as positive (recognized face) or nega-
tive. There are plenty of possible classification algorithms that could be used. There
were several the most popular algorithms used and its performance was checked
against available data. There were different algorithms implemented in WEKA library
used [Hall et al 2009]:
• Naïve Bayes
• Random Forest
• J48
• K Nearest Neighbors (with k = 1, 3 and 7)
• SVM (with kernels poly(1), poly(2) and RBF)
There was every feature set combination with every classification algorithm used. It
resulted in creation of 1908 different combinations. For every combination it was
possible to build a classification model using some training data (face observations
with known/unknown labels) and check its performance for some testing data. Be-
cause the number of negative observations was higher than positive observations, the
positive observations were weighted.
The process of building a model for a given dataset consisted of two nested cross-
validation steps. At first, the dataset was divided into 10 folds. Then, for every possi-
ble combination of feature sets and classifiers, a model using every possible set of 9
folds as examples was built and evaluated against samples in the remaining fold. The
result of each such step was a collection of models with information about its perfor-
mance. Area under ROC curve was used to evaluate the model performance [Fawcett
2006]. The next step was choosing 10 combinations that gave the best results to create
new models using all samples from the training set. The models were then used for
classification of samples from the testing set. For every testing sample the results of
models were summarized and normalized to build the system’s answer in the range
<0,1>. Because previous findings showed that features values were significantly dif-
ferent for different subjects – it was additionally decided to repeat this procedure sep-
arately for every subject.
5 Results
The same procedure was executed for the whole dataset and independently for obser-
vations of 22 subjects that took part in both sessions. The results of the classifications
are presented in Table 1. Although accuracy seems to be the most obvious result of
classification, it should be interpreted with care because it doesn’t say much about
results in two class experiments when samples distribution between classes is not
even [Provost et al 1998]. That is why there were two other metrics used that gave
more reliable estimation about the real ‘power’ of the model. AUC stands for Area
Under ROC Curve and EER stands for Equal Error Rate.
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
Table 1. Minimal time intervals used for experiments
Sid AUC EER Accuracy
All 0.56 0.46 0.70
s9 0.92 0.13 0.88
s4 0.91 0.20 0.90
s33 0.89 0.22 0.84
s10 0.89 0.22 0.88
s30 0.87 0.25 0.88
s13 0.86 0.14 0.94
s17 0.86 0.22 0.78
s14 0.85 0.17 0.84
s12 0.85 0.26 0.82
s27 0.84 0.26 0.84
s31 0.83 0.25 0.84
s21 0.82 0.29 0.88
s7 0.78 0.23 0.82
s32 0.76 0.28 0.82
s2 0.75 0.34 0.75
s20 0.74 0.35 0.73
s5 0.73 0.34 0.73
s15 0.73 0.24 0.76
s3 0.71 0.35 0.73
s18 0.71 0.36 0.80
s28 0.70 0.40 0.73
s29 0.65 0.29 0.84
As it is visible in Table 1, the results for the whole dataset are not encouraging. AUC
equal to 0.56 is only a fraction better than a random guess. So, in this field the exper-
iment failed. But when we analyze the results achieved independently for different
subjects, it occurs that the method works. The average AUC is 0,8 what seems to be a
quite good value as for the first attempt. Similarly, the average EER is 26% what is
quite encouraging.
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
6 Conclusion
The possibility of predicting the subject’s decision about the familiarity of the ob-
served face was analyzed in the paper. The prediction was made basing on eye
movements recordings. It occurred that the task was not trivial and it may be difficult
to propose one universal classification model that works with every observer. Tradi-
tional features as observation length, last fixation length, fixation position and so on
were not enough to reliably estimate decision for each person being examined. How-
ever, it occurred that there were significant differences between people in the way
they observe faces. This observation is in agreement with Rigas et al [2012]. Having
this in mind, it was possible to optimize classification models to work independently
for each subject’s observations. The encouraging results were achieved for the exam-
ined observers.
There are multiple possible extensions of this work. Of course the results should be
checked on larger population (because of possible model over-fitting). As mostly
famous faces were studied in this work, the personally familiar faces could be more
interesting for future work. There also could be additional feature sets proposed,
which may work better for less successful examples, like features derived for instance
from heatmaps or scan-paths.
References
1. ALTHOFF, Robert R.; COHEN, Neal J. 1999. Eye-movement-based memory effect: a re-
processing effect in face percep-tion. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 25.4: 997.
Fig. 4. Examples of ROC curves for four different subjects.
This is a pre-print. The final version of the paper was published in Springer, Communications
in Computer and Information Science, Volume 424 2014 and is available in Springer Link
library via http://dx.doi.org/10.1007/978-3-319-06932-6_34
2. BARTON, Jason JS, et al. 2006. Information processing during face recognition: The ef-
fects of familiarity, inversion, and morphing on scanning fixations. Perception, 35: 1089-
1105.
3. BEDNARIK, Roman; VRZAKOVA, Hana; HRADIS, Michal. 2012. What do you want to
do next: a novel approach for intent prediction in gaze-based interaction. In:Proceedings of
the Sym-posium on Eye Tracking Research and Applications. ACM, p. 83-90.
4. BLAIS, Caroline, et al. 2008. Culture shapes how we look at faces. PLoS One, 3.8: e3022.
5. FAWCETT, Tom. 2006. An introduction to ROC analy-sis. Pattern recognition letters,
27.8: 861-874.
6. GUO, Kun, et al. 2012. Consistent left gaze bias in processing different facial
cues.Psychological research, 76.3: 263-269.
7. HALL, Mark, et al. 2009. The WEKA data mining software: an update. ACM SIGKDD
Explorations Newsletter, 11.1: 10-18.
8. HSIAO, Janet Hui-wen; COTTRELL, Garrison. 2008. Two fixations suffice in face recog-
nition. Psychological Science, 19.10: 998-1006.
9. ITIER, Roxane J., et al. 2007. Early face processing specificity: It's in the eyes!.Journal of
Cognitive Neuroscience, 19.11: 1815-1826.
10. KASPROWSKI, Pawel; OBER, Jozef. 2005. Enhancing eye-movement-based biometric
identification method by using voting classifiers. In: Defense and Security. International
Society for Optics and Photonics, p. 314-323.
11. KOHAVI, Ron, et al. 1995. A study of cross-validation and bootstrap for accuracy estima-
tion and model selection. In: IJCAI. p. 1137-1145.
12. KOMOGORTSEV, Oleg V., et al. 2010. Biometric identification via an oculomotor plant
mathematical model. In: Proceedings of the 2010 Symposium on Eye-Tracking Research
& Applications. ACM, p. 57-60.
13. PITCHER, David, et al. 2011. Differential selectivity for dynamic versus static infor-
mation in face-selective cortical re-gions. NeuroImage, 56.4: 2356-2363.
14. PROVOST, Foster J.; FAWCETT, Tom; KOHAVI, Ron. 1998. The case against accuracy
estimation for comparing induction algorithms. In: ICML. p. 445-453.
15. RIGAS, Ioannis; ECONOMOU, George; FOTOPOULOS, Spi-ros. 2012. Biometric identi-
fication based on the eye movements and graph matching techniques.Pattern Recognition
Letters, 33.6: 786-792.
16. ROZHKOVA, G. I.; OGNIVOV, V. V. 2009. Face recognition and eye movements: land-
ing on the nose is not always necessary. In: PERCEPTION. 207 BRONDESBURY PARK,
LONDON NW2 5JN, ENGLAND: PION LTD, p. 77-77.
17. TANAKA, James W.; GORDON, Iris. 2011. Features, configu-ration and holistic face
processing. The Oxford handbook of face perception, 177-194.
18. VAN BELLE, Goedele, et al. 2010. Fixation patterns during recognition of personally fa-
miliar and unfamiliar faces. Frontiers in psychology, 2010, 1.
19. VAN BELLE, Goedele, et al. 2010. Whole not hole: expert face recognition requires holis-
tic perception. Neuropsychologia, 2010, 48.9: 2620-2629.