Conference PaperPDF Available

EMIP Toolkit: A Python Library for Customized Post-processing of the Eye Movements in Programming Dataset

Authors:

Abstract and Figures

The use of eye tracking in the study of program comprehension in software engineering allows researchers to gain a better understanding of the strategies and processes applied by programmers. Despite the large number of eye tracking studies in software engineering, very few datasets are publicly available. The existence of the large Eye Movements in Programming Dataset (EMIP) opens the door for new studies and makes reproducibility of existing research easier. In this paper, a Python library (the EMIP Toolkit) for customized post-processing of the EMIP dataset is presented. The toolkit is specifically designed to make using the EMIP dataset easier and more accessible. It implements features for fixation detection and correction, trial visualization, source code lexical data enrichment, and mapping fixation data over areas of interest. In addition to the toolkit, a filtered token-level dataset with scored recording quality is presented for all Java trials (accounting for 95.8\% of the data) in the EMIP dataset.
Content may be subject to copyright.
EMIP Toolkit: A Python Library for Customized Post-processing
of the Eye Movements in Programming Dataset
Naser Al Madi
Colby College
Waterville, Maine, USA
nmadi@kent.edu
Drew T. Guarnera
The College of Wooster
Wooster, Ohio, USA
dguarnera@wooster.edu
Bonita Sharif
University of Nebraska–Lincoln
Lincoln, Nebraska, USA
bsharif@unl.edu
Jonathan I. Maletic
Kent State University
Kent, Ohio, USA
jmaletic@kent.edu
ABSTRACT
The use of eye tracking in the study of program comprehension in
software engineering allows researchers to gain a better understand-
ing of the strategies and processes applied by programmers. Despite
the large number of eye tracking studies in software engineering,
very few datasets are publicly available. The existence of the large
Eye Movements in Programming Dataset (EMIP) opens the door for
new studies and makes reproducibility of existing research easier.
In this paper, a Python library (the EMIP Toolkit) for customized
post-processing of the EMIP dataset is presented. The toolkit is
specically designed to make using the EMIP dataset easier and
more accessible. It implements features for xation detection and
correction, trial visualization, source code lexical data enrichment,
and mapping xation data over areas of interest. In addition to the
toolkit, a ltered token-level dataset with scored recording quality
is presented for all Java trials (accounting for 95.8% of the data) in
the EMIP dataset.
CCS CONCEPTS
Software and its engineering Software organization and
properties
;
Human-centered computing
Human computer
interaction (HCI).
KEYWORDS
eye-movement, programming, source code, eye tracking, toolkit,
library
ACM Reference Format:
Naser Al Madi, Drew T. Guarnera, Bonita Sharif, and Jonathan I. Maletic.
2021. EMIP Toolkit: A Python Library for Customized Post-processing of
the Eye Movements in Programming Dataset. In ETRA ’21: 2021 Symposium
on Eye Tracking Research and Applications (ETRA ’21 Short Papers), May
25–27, 2021, Virtual Event, Germany. ACM, New York, NY, USA, 6 pages.
https://doi.org/10.1145/3448018.3457425
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany
©2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8345-5/21/05.. .$15.00
https://doi.org/10.1145/3448018.3457425
1 INTRODUCTION
Eye tracking is gaining popularity as a tool in human-oriented
software engineering, providing evidence on attention and the
cognitive processes of programmers [Obaidellah et al
.
2018]. This
popularity is evident by surveying 31 papers in the eld in 2015
[Shara et al
.
2015] and 63 papers in 2018 [Obaidellah et al
.
2018]. A
practical guide on how to conduct studies in software engineering
was also published in 2020 [Shara et al. 2020].
One of the rst papers to use eye tracking in software engineer-
ing research is [Crosby and Stelovsky 1990] in 1990, yet the use of
eye tracking did not become a well-established research method
in software engineering until 2010-2012 [Lai et al
.
2013]. Today,
the use of eye tracking in software engineering research can be
categorized into ve areas: program comprehension [Aschwanden
and Crosby 2006; Bednarik and Tukiainen 2006; Binkley et al
.
2013;
Busjahn et al
.
2011; Crosby and Stelovsky 1990; Duru et al
.
2013;
Maalej et al
.
2014; Sharif and Maletic 2010; Turner et al
.
2014], de-
bugging [Bednarik and Tukiainen 2007; Hejmady and Narayanan
2012; Romero et al
.
2002], model comprehension [De Smet et al
.
2014; Guéhéneuc 2006; Jeanmart et al
.
2009; Porras and Guéhéneuc
2010; Sharif and Maletic 2010; Soh et al
.
2012; Yusuf et al
.
2007],
collaborative programming [Sharma et al
.
2015; Stein and Brennan
2004], and traceability [Ali et al
.
2012; Sharif et al
.
2017; Walters
et al
.
2014]. The use of eye tracking in the study of program com-
prehension in software engineering allows researchers to gain a
better understanding of the strategies and processes applied by
programmers [Madi et al
.
2020, 2021; Obaidellah et al
.
2018; Shara
et al
.
2015]. With this understanding of the experience and needs
of software developers, better tools and support can be provided to
enhance productivity and the quality of software.
Despite the large number of eye tracking studies in software
engineering [Obaidellah et al
.
2018], very few datasets are publicly
available. The existence of a large dataset of eye movements in pro-
gramming enables a) reproducibility of prior studies and b) accessi-
bility of eye movement data to people who might not neceessarily
have an eye tracker but are interested in analyzing/visualizing eye
tracking data in unique ways. This opens the doors to whole new
avenues of research. A community eort was undertaken to pro-
duce one such dataset - Eye Movements In Programming Dataset
(EMIP) [Bednarik et al
.
2020]. The EMIP dataset was an interna-
tional and multi-institutional eort that involved eleven research
ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany Al Madi, et al.
teams across eight countries on four continents. The raw data of
the large dataset (N=216) is freely available for download under the
Creative Commons CC-BY-NC-SA license [Bednarik et al
.
2020]. In
order to start answering specic comprehension questions about
how the gaze moves over the code, the EMIP dataset needs to be
processed rst.
In this paper, we present a Python library for customized post-
processing of the EMIP dataset that we call the EMIP Toolkit. The
toolkit is specically designed to make the EMIP dataset easier and
more accessible to directly address a researcher’s comprehension
questions. It implements features for xation detection and correc-
tion, trial visualization, source code lexical data enrichment, and
mapping xation data over areas of interest in the source code. This
paper makes the following contributions:
A toolkit for customized post-processing of the EMIP dataset
enabling researchers to directly query the data to answer
specic comprehension questions.
A processed version of the EMIP dataset (for all Java trials)
that is ltered, corrected, and inspected with xation data,
source code token data, and lexical information tags.
Both the toolkit and the processed subset of the data are publicly
available under a Creative Commons license to support future
research and replication. The rest of the paper is organized to
provide context to the importance and need for this toolkit and act
as a description of the toolkit features and its potential use.
2 BACKGROUND AND MOTIVATION
The Eye Movements In Programming Dataset (EMIP) is the only
large programming eye movements dataset that we know of that
is publicly available [Bednarik et al
.
2020]. EMIP was collected
through an international eort that consisted of eleven research
teams across eight countries on four continents. This allows for
collecting data from a large number of participants (n=216) from
diverse backgrounds and native languages. The same eye tracking
apparatus and software were used for the data collection with
consistent experimental setup.
The advantages of the EMIP dataset are a) that it is publicly
available in raw unltered form, b) consists of a large number of
participants, and c) includes data from participants with diverse
levels of programming experience and native languages. These char-
acteristics allow for new studies on themes such as the dierences
between eye movements of novices and experts and debugging
strategies among others. At the same time, the dataset is provided
in a raw data format, and the software provided by the eye tracker
manufacturer is discontinued, limited in what features it provides
and not free. We direct the reader to Bednarik et al. [Bednarik et al
.
2020] for a complete description of the dataset.
3 EMIP TOOLKIT
In this section we provide the details of our Python library for
customized post-processing of the EMIP dataset. The toolkit is
specically designed to make using the EMIP dataset easier and
more accessible by providing the following functions:
Parsing raw data les from the EMIP dataset into Experiment,
Trial, and Fixation containers.
Customizable dispersion-based xation detection algorithm
implementation according to the manual of the SMI eye
tracker used in the data collection.
Raw data and ltered data visualizations for each trial.
Performing hit testing between xations and AOIs to deter-
mine the xations over each AOI.
Customizable oset-based xation correction implementa-
tion for each trial.
Customizable Areas Of Interest (AOIs) mapping implemen-
tation at the line level or token level in source code for each
trial.
Visualizing AOIs before and after xations overlay on the
code stimulus.
Mapping source code tokens to generated AOIs and eye
movement data.
Adding source code lexical category tags to eye movement
data using srcML [Collard et al
.
2011]. srcML is a static analy-
sis tool and data format that provides very accurate syntactic
categories (method signatures, parameters, function names,
method calls, declarations and so on) for source code. We
use it to enhance the eye movements dataset to enable better
querying capabilities.
The complete code for the toolkit, a Jupyter Notebook exam-
ples/tutorial highlighting the main features of the toolkit, and the
processed EMIP dataset are available at https://osf.io/djn9s/
3.1 Data Containers
The three main containers of the EMIP Toolkit are:
Experiment: Represents all the trials and associated data for
a single participant.
Trial: Represents the samples and xations in a single trial
run.
Fixation: Represents the data of a single ltered xation
consisting of multiple samples.
The experiment container implements a parser for raw eye track-
ing data les. The parser splits raw data into trials and adds the eye
tracking samples from each trial to its container. The trial container
stores data including trial number, participant ID, and the stimu-
lus that is used in the trial. The trial container also implements a
dispersion-based xation detection algorithm that converts raw
samples into xations. In addition, the container implements xa-
tion correction by oset, and trial data visualization. The xation
container stores the xations generated by the detection algorithm
including trial ID, participant ID, xation timestamp, xation dura-
tion, xation coordinates, and the code token the xation overlays.
The three containers are able to parse all raw data les and
provide a structured view of the data that makes processing data
easier. All other features of the EMIP Toolkit are implemented as
free functions to make using them independent from the specic
structure of the EMIP dataset.
3.2 Fixation Detection
The EMIP Toolkit implements a dispersion-based xation detection
algorithm. This algorithm distinguishes xations form saccades,
blinks, and errors based on the temporal and spacial dispersion of
eye tracking samples [Nyström and Holmqvist 2010]. The main
EMIP Toolkit ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany
event that is often studied is xation duration, since xation dura-
tion is considered an indicator of information processing in human
cognition [Rayner 1998]. Therefore, the most important pieces of in-
formation that a xation detection algorithm produces are xation
location (coordinates) on the screen, and xation duration.
Among the many types of xation detection algorithms, the
dispersion-based algorithms are most commonly used to detect x-
ation events. The concept of dispersion-based algorithms consists
of identifying the raw eye tracker samples as belonging to a single
xation when the samples are dispersed tightly within a limited
region on the screen for a minimum period of time (in our imple-
mentation this is the parameter minimum_duration and it is set by
default to 50 milliseconds). Under this type of algorithm, spatial
and temporal information about samples are taken in consideration
to detect xations and saccades that are implicitly detected as based
on the time and jumps between xations [Nyström and Holmqvist
2010].
The most prominent dispersion-based xation detection algo-
rithm is Dispersion-Threshold Identication (I-DT) [Karthik et al
.
2019; Nyström and Holmqvist 2010]. This is the preferred xation
detection algorithm by the manufacturer of the eye tracker accord-
ing to their extended manual [noa 2011]. The algorithm starts with
a window equal to the minimum_duration value (50 milliseconds by
default), which results in excluding any xations shorter than 50 mil-
liseconds and considering them as noise. This window is expanded
one sample at a time if the sample is within a specic dispersion
radius (in our implementation it is called maximum_dispersion and
it takes a value of 25 pixels by default). Once a sample is identied
outside of the allowed dispersion radius, the samples in the window
are considered a xation and a new window starts. The dispersion
calculation takes in account vertical and horizontal distances and
it is calculated as shown in Equation 1. The potential sample is
added to the window, and if the dispersion value is greater than the
threshold the most recent sample is removed. Equation 1 expects
𝑤𝑖𝑛𝑑𝑜𝑤𝑥
and
𝑤𝑖𝑛𝑑𝑜𝑤𝑦
to be lists of the x, y coordinates of the
samples in the window including the most recent sample that is
being evaluated. The equation calculates the dierence between
the furthest samples in the window, and that dispersion value is
then compared to the maximum_dispersion threshold. When all of
the samples belonging to a xation are detected in the window, the
xation coordinates are registered at the centroid of the window
points.
𝐷𝑖𝑠𝑝𝑒𝑟𝑠𝑖𝑜𝑛 =(𝑚𝑎𝑥 (𝑤𝑖𝑛𝑑𝑜𝑤𝑥) 𝑚𝑖𝑛(𝑤𝑖𝑛𝑑𝑜𝑤𝑥))
+ (𝑚𝑎𝑥 (𝑤𝑖𝑛𝑑𝑜𝑤𝑦) 𝑚𝑖𝑛(𝑤𝑖𝑛𝑑𝑜𝑤𝑦)) ...(1)
3.3 Generating Area Of Interests
An important factor in analyzing eye tracking data is to decide on
the areas of interest on the stimuli to focus the analysis on. The
two predominant areas in eye tracking studies with source code
are the line-level and the token-level areas of interest. In line-level
studies the eye tracking behaviour is studied with each line of code
as a single unit, and such studies can compare the duration of time
a programmer spends on a function prototype in comparison to
other source code lines. Similarly, token-level studies focus on the
(a) Stimulus before adding areas of interests.
(b) Stimulus after adding areas of interests.
Figure 1: EMIP stimulus "Java Rectangle" visualization be-
fore and after adding Areas of Interest (AOIs).
eye movement behaviour on each code token - any set of characters
that is surrounded by spaces. Token-level studies can compare eye
movement behaviour on high frequency code tokens in comparison
to low frequency tokens [Al Madi 2020].
In the EMIP Toolkit we implement a function for drawing Areas
of Interest (AOIs) around stimuli from the EMIP dataset on the
token-level and line level. Figure 1 shows a sample AOI allocation
for the stimulus "Java Rectangle" where AOIs are allocated at the
token level. The function
nd_rectangles
in the toolkit provides
the following information for each area of interest:
kind: Analysis level, and takes one of two values - "sub-line"
(AOIs calculated at the token level) , "line" (AOIs calculated
at the line level).
name: Line and part number for each token in the code. For
example, the rst token in the rst line would be named "line
1 part 1."
ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany Al Madi, et al.
x: X-coordinate of the upper left corner of the AOI.
y: Y-coordinate of the upper left corner of the AOI.
width: Width of the AOI in pixels.
height: Height of the AOI in pixels.
image: Name of the stimulus le used in the trial.
3.4 Trial Data Visualization
Trial data visualization is important for inspecting the validity of the
data collected in each trial. EMIP Toolkit provides a single function
that allows for customized visualization of any trial from any subject
in the EMIP dataset. Figure 2 shows a sample visualization of trial
5 by subject 12 showing raw samples in red and detected xation
from the I-DT algorithm described earlier in green.
Figure 2: Sample visualization of trial 5 by subject 12 show-
ing raw samples (in red) and detected xation (in green).
The EMIP Toolkit function
draw_trial
generates an image of
the trial visualization with the same resolution as the trial stimulus.
The function takes the following parameters:
images_path: Path for the trial stimulus image.
draw_raw_data: Boolean indicating whether raw samples
are drawn.
draw_ltered_xations: Boolean indicating whether detected
xations from the I-DT algorithm are drawn.
save_image: Boolean indicating whether the resulting visu-
alization should be saved as an image.
3.5 Fixation Correction Through Oset
There are many types of systematic errors that could aect an eye
tracking study as stated in Al Madi et al. [Al Madi 2020]. This kind
of error is caused by an invalidation to the calibration process that is
caused by the subject moving or an inconsistency between the setup
during calibration and the experiment. In many of these cases, the
recorded xations can shift from their actual position on a line of
text/code and result in an erroneous recording of xation position
in a trial. Many of these erroneous trials that fall below a specic
quality threshold are often ltered and neglected by researchers
depending on the research goals. In some cases, applying an oset
(a) Error: example shows shifting down and to the right.
(b) Corrected: using oset (X: -100, Y: -100).
Figure 3: An example of a xation correction by applying an
oset.
to all xations can result in restoring the erroneous xations to
their original correct position.
The EMIP Toolkit provides a xation correction function that
applies an oset to any trial that can be used with the trial visu-
alization function by an expert to salvage erroneous eye tracking
trials. The function
sample_oset
takes an x-oset and y-oset
as arguments and updates the positions of all trial samples. It is
designed to be called any number of times, and it keeps a history
of applied osets that is recorded with the trial visualization and
can be reset (undone) at a later time.
Figure 3-a shows a trial with shifting error, where all the samples
are shifted to the right and down. Considering the large number
of trials in the EMIP dataset this pattern repeats in many trials.
EMIP Toolkit ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany
Figure 3-b shows the corrected trial after applying an oset (x:-100,
y:-100) that salvaged this trial from exclusion. In the latter section
of this paper we present a ltered-corrected dataset where we apply
osets to the EMIP trials to correct them and score each trial with
a numerical value representing eye tracking recording quality.
3.6 Mapping Sources Code Tokens and srcML
Tags
Bounding boxes are generated for all tokens in the image stimulus.
In this context, a token is classied as a collection of consecutive
characters delineated by whitespace. Each bounding box is associ-
ated with a line of the source code image stimulus and the order in
which the token appears on the line. The textual content of each
bounding box is derived from a text version of the source code
using the line and order of the provided bounding box data. This
involves using whitespace on each line of the text version to split
the contents into a one to one mapping with the bounding boxes.
In addition to supplying the source code text contained within the
bounding boxes, detailed syntactic information for each token is
provided using a secondary processing phase using srcML [Collard
et al. 2011].
srcML (srcML.org) is both an XML markup format for source
code and an application capable of generating the aforementioned
markup document. The srcML infrastructure supports the C, C++,
C#, and Java programming languages. The srcML markup format
uses XML to represent the hierarchy of the source code and tag
information to identify the syntactic context for all textual tokens
within an input source document. The srcML format ensures that
all original source code content is preserved including comments
and whitesapce to prevent any content loss during conversion to
srcML and back to the source code format.
When converting source code to the srcML format, using the
--position
option adds attributes to each tag indicating the start
and end of the line and column where each element resides within
the source code le. Since the bounding box information from
the EMIP image stimulus indicates which source line contains a
given token, a traversal of the XML DOM can be used to nd all
tags that represent a line of source content. Using the subset of
tags for a given line, the textual tokens contained within the XML
tags can be examined to determine matches with the text version
of the source code. Once a token is identied, all the tag names
that encompass that token are stored to represent the complete
hierarchy of syntactic context for a given source code element.
Each syntactic context collection is presented as a list of the
srcML tag names separated by -> to indicate the direction of the
hierarchy. This representation allows for analysis at varying levels
of granularity when considering the role of syntactic elements
in program comprehension and provides additional value to the
existing EMIP dataset. To further simplify this process, the syntactic
context provided by srcML is pre-computed and provided along
with the EMIP Toolkit and dataset to minimize run-time overhead
and reduce development eorts of potential users.
3.7 Mapping Fixations to Areas of Interest
Most eye tracking research revolves around the idea of measuring
xation duration over areas of interest. We have described xation
detection, correction, source code lexical data enrichment, and
the visualization features of EMIP Toolkit in prior sections. In this
section, we describe how xations are mapped over areas of interest
to generate the complete xation data from the EMIP dataset. The
EMIP Toolkit function hit_test takes as input the xation data
and the generated AOIs of a specic trial to calculate the xation
duration over each AOI. The function considers a xation over
an AOI if a xation is within a 25 pixel radius of an AOI, this
number is customizable in the xation detection algorithm. Each
xation consists of raw samples within a 25 pixel dispersion radius,
therefore we consider xations 25 pixels away from an area of
interest within that area of interest.
The resulting comma separated le consists of xation data over
each AOI. Each row in the resulting le corresponds to a single
xation with the following attributes on the xation and AOI it
overlays:
trial: Trial number.
participant: Participant number.
code_le: Stimulus lename.
code_language: Programming language of the trial.
timestamp: Fixation timestamp.
duration: Fixation duration in milliseconds.
x_cord: X-coordinate of the xation.
y_cord: Y-coordinate of the xation.
aoi_x: X-coordinate of the area of interest under the xation.
aoi_y: Y-coordinate of the area of interest under the xation.
aoi_width: Width of the area of interest under the xation
in pixels.
aoi_height: Height of the area of interest under the xation
in pixels.
token: Source code token under the xation.
length: Length in character spaces of the source code token
under the xation.
srcML: srcML tag of the source code token under the xation.
4 CORRECTED DATASET
The EMIP dataset contains data for Java (207 trials), Python (5 trials)
and Scala (4 trials) code. More than 95% of the trials were in Java.
We construct a ltered, corrected, and scored dataset that is a subset
of the EMIP dataset that focuses on the Java trials. The EMIP Toolkit
xation detection, visualization, adding tokens, adding srcML tags,
and applying xation overlay features were used on the EMIP
dataset to generate a cleaned, processed version of the dataset for
all the Java trials.
In some instances, the EMIP trial xation data presented a clear
shifting pattern with respect to the stimulus. Two of the authors
split the dataset and applied a general oset correction to the x and
y coordinates of the gaze data points with respect to the underlying
stimulus in each trial. This process is repeated until the location of
the data points in the visualization overlay the stimulus in a majority
of locations during a visual inspection. Some trials showed an error
pattern that can not be xed with a general oset as describe by
Al Madi in [Al Madi 2020], such trials were not included in the
corrected dataset.
Following the oset correction, each of the authors reviewed the
other authors correction and rated each data correction with a pass
ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany Al Madi, et al.
or fail vote. If a particular sample receives two pass votes, the data
is accepted into the corrected dataset. If a sample received two fail
votes, the data is deemed to be invalid and dropped from the new
data set. For any sample receiving one pass and one fail vote, the
authors met to review the data sample in question and discussed
any issues to arrive at a consensus regarding the adjustments to the
sample or its presence in the data set. The nal corrected dataset is
available for download in our artifact presented in Section 3.
5 CONCLUSIONS AND FUTURE WORK
We present the EMIP Toolkit, a Python library for customized post-
processing of the EMIP dataset [Bednarik et al
.
2020]. The toolkit is
intended to make using the EMIP dataset easier and it implements
many of the most common and needed features and algorithms
for eye tracking in program comprehension studies. In addition,
we present a corrected, ltered, and scored subset of the EMIP
dataset that is generated using the EMIP Toolkit. This subset is
ready for use by researchers interested in eye movements over
source code and serves as an example of what the EMIP Toolkit is
able to address. In the future, we aim to extend the EMIP Toolkit
to other datasets and make it a generic toolkit for processing eye
movements over source code. This goal includes adding more xa-
tion detection algorithms and algorithms that are specic to source
code, animated visualizations, and extending source code lexical
data enrichment to languages other than Java. Finally, we hope
that this work will contribute to the open data initiative set by the
EMIP dataset, and inspires future research and replication in eye
movements in programming.
REFERENCES
2011. iView X System Manual. https://psychologie.unibas.ch/leadmin/user_upload/
psychologie/Forschung/N-Lab/SMI_iView_X_Manual.pdf
Naser S Al Madi. 2020. Modeling Eye Movement for the Assessment of Programming
Prociency. Ph.D. Dissertation. Kent State University.
Nasir Ali, Zohreh Shara, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2012. An
empirical study on requirements traceability using eye-tracking. In 2012 28th IEEE
International Conference on Software Maintenance (ICSM). IEEE, 191–200.
Christoph Aschwanden and Martha Crosby. 2006. Code scanning patterns in program
comprehension. In Proceedings of the 39th hawaii international conference on system
sciences.
Roman Bednarik, Teresa Busjahn, Agostino Gibaldi, Alireza Ahadi, Maria Bielikova,
Martha Crosby, Kai Essig, Fabian Fagerholm, Ahmad Jbara, Raymond Lister, et al
.
2020. EMIP: The eye movements in programming dataset. Science of Computer
Programming 198 (2020), 102520.
Roman Bednarik and Markku Tukiainen. 2006. An eye-tracking methodology for char-
acterizing program comprehension processes. In Proceedings of the 2006 symposium
on Eye tracking research & applications. ACM, 125–132.
Roman Bednarik and Markku Tukiainen. 2007. Analysing and Interpreting Quantitative
Eye-Tracking Data in Studies of Programming: Phases of Debugging with Multiple
Representations.. In PPIG. Citeseer, 13.
Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I Maletic, Christopher Morrell,
and Bonita Sharif. 2013. The impact of identier style on eort and comprehension.
Empirical Software Engineering 18, 2 (2013), 219–276.
Teresa Busjahn, Carsten Schulte, and Andreas Busjahn. 2011. Analysis of code reading
to gain more insight in program comprehension. In Proceedings of the 11th Koli
Calling International Conference on Computing Education Research. 1–9.
Michael L Collard, Michael J Decker, and Jonathan I Maletic. 2011. Lightweight trans-
formation and fact extraction with the srcML toolkit. In 2011 IEEE 11th international
working conference on source code analysis and manipulation. IEEE, 173–184.
Martha E Crosby and Jan Stelovsky. 1990. How do we read algorithms? A case study.
Computer 23, 1 (1990), 25–35.
Benoít De Smet, Lorent Lempereur, Zohreh Shara, Yann-Gaël Guéhéneuc, Giuliano
Antoniol, and Naji Habra. 2014. Taupe: Visualizing and analyzing eye-tracking
data. Science of Computer Programming 79 (2014), 260–278.
Hacı Ali Duru, Murat Perit Çakır, and Veysi İşler. 2013. How does software visualization
contribute to software comprehension? A grounded theory approach. International
Journal of Human-Computer Interaction 29, 11 (2013), 743–763.
Yann-Gaël Guéhéneuc. 2006. TAUPE: towards understanding program comprehen-
sion. In Proceedings of the 2006 conference of the Center for Advanced Studies on
Collaborative research. 1–es.
Prateek Hejmady and N Hari Narayanan. 2012. Visual attention patterns during
program debugging with an IDE. In Proceedings of the Symposium on Eye Tracking
Research and Applications. 197–200.
Sebastien Jeanmart, Yann-Gael Gueheneuc, Houari Sahraoui, and Naji Habra. 2009.
Impact of the visitor pattern on program comprehension and maintenance. In 2009
3rd International Symposium on Empirical Software Engineering and Measurement.
IEEE, 69–78.
G Karthik, J Amudha, and C Jyotsna. 2019. A Custom Implementation of the Velocity
Threshold Algorithm for Fixation Identication. In 2019 International Conference
on Smart Systems and Inventive Technology (ICSSIT). IEEE, 488–492.
Meng-Lung Lai, Meng-Jung Tsai, Fang-Ying Yang, Chung-Yuan Hsu, Tzu-Chien Liu,
Silvia Wen-Yu Lee, Min-Hsien Lee, Guo-Li Chiou, Jyh-Chong Liang, and Chin-
Chung Tsai. 2013. A review of using eye-tracking technology in exploring learning
from 2000 to 2012. Educational research review 10 (2013), 90–115.
Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the com-
prehension of program comprehension. ACM Transactions on Software Engineering
and Methodology (TOSEM) 23, 4 (2014), 1–37.
Naser Al Madi, Cole S Peterson, Bonita Sharif, and Jonathan Maletic. 2020. Can the ez
reader model predict eye movements over code? towards a model of eye movements
over source code. In ACM Symposium on Eye Tracking Research and Applications.
1–4.
Naser Al Madi, Cole S Peterson, Bonita Sharif, and Jonathan Maletic. 2021. From
Novice to Expert: Analysis of Token Level Eects in a Longitudinal Eye Tracking
Study. In 29th IEEE/ACM International Conference on Program Comprehension.
Marcus Nyström and Kenneth Holmqvist. 2010. An adaptive algorithm for xation,
saccade, and glissade detection in eyetracking data. Behavior research methods 42,
1 (2010), 188–204.
Unaizah Obaidellah, Mohammed Al Haek, and Peter C-H Cheng. 2018. A survey on
the usage of eye-tracking in computer programming. ACM Computing Surveys
(CSUR) 51, 1 (2018), 5.
Gerardo Cepeda Porras and Yann-Gaël Guéhéneuc. 2010. An empirical study on
the eciency of dierent design pattern representations in UML class diagrams.
Empirical Software Engineering 15, 5 (2010), 493–522.
Keith Rayner. 1998. Eye movements in reading and information processing: 20 years
of research. Psychological bulletin 124, 3 (1998), 372.
Pablo Romero, Richard Cox, Benedict du Boulay, and Rudi Lutz. 2002. Visual attention
and representation switching during java program debugging: A study using the
restricted focus viewer. In International Conference on Theory and Application of
Diagrams. Springer, 221–235.
Zohreh Shara, Bonita Sharif, Yann-Gaël Guéhéneuc, Andrew Begel, Roman Bednarik,
and Martha Crosby. 2020. A practical guide on conducting eye tracking studies in
software engineering. Empirical Software Engineering 25, 5 (2020), 3128–3174.
Zohreh Shara, Zéphyrin Soh, and Yann-Gaël Guéhéneuc. 2015. A systematic literature
review on the usage of eye-tracking in software engineering. Information and
Software Technology 67 (2015), 79–107.
Bonita Sharif and Jonathan I Maletic. 2010. An eye tracking study on camelcase and
under_score identier styles. In 2010 IEEE 18th International Conference on Program
Comprehension. IEEE, 196–205.
Bonita Sharif, John Meinken, Timothy Shaer, and Huzefa Kagdi. 2017. Eye movements
in software traceability link recovery. Empirical Software Engineering 22, 3 (2017),
1063–1102.
Kshitij Sharma, Daniela Caballero, Himanshu Verma, Patrick Jermann, and Pierre
Dillenbourg. 2015. Looking AT versus looking THROUGH: A dual eye-tracking
study in MOOC context. International Society of the Learning Sciences, Inc.[ISLS].
Zéphyrin Soh, Zohreh Shara, Bertrand Van den Plas, Gerardo Cepeda Porras, Yann-
Gaël Guéhéneuc, and Giuliano Antoniol. 2012. Professional status and expertise
for UML class diagram comprehension: An empirical study. In 2012 20th IEEE
International Conference on Program Comprehension (ICPC). IEEE, 163–172.
Randy Stein and Susan E Brennan. 2004. Another person’s eye gaze as a cue in
solving programming problems. In Proceedings of the 6th international conference
on Multimodal interfaces. 9–15.
Rachel Turner, Michael Falcone, Bonita Sharif, and Alina Lazar. 2014. An eye-tracking
study assessing the comprehension of C++ and Python source code. In Proceedings
of the Symposium on Eye Tracking Research and Applications. 231–234.
Braden Walters, Timothy Shaer, Bonita Sharif, and Huzefa Kagdi. 2014. Capturing
software traceability links from developers’ eye gazes. In Proceedings of the 22nd
International Conference on Program Comprehension. 201–204.
Shehnaaz Yusuf, Huzefa Kagdi, and Jonathan I Maletic. 2007. Assessing the comprehen-
sion of UML class diagrams via eye tracking. In 15th IEEE International Conference
on Program Comprehension (ICPC’07). IEEE, 113–122.
... A sample paragraph, where yellow circles represent fixations and the size of the circle represent fixation duration, generated using EMTK[Al Madi et al. 2021]. ...
Poster
Full-text available
The viability and need for eye movement-based authentication has been well established in light of the recent adoption of Virtual Reality headsets and Augmented Reality glasses. Previous research has demonstrated the practicality of eye movement-based authenti-cation, but there still remains space for improvement in achieving higher identification accuracy. In this study, we focus on incorporating linguistic features in eye movement based authentication, and we compare our approach to authentication based purely on common first-order metrics across 9 machine learning models. Using GazeBase, a large eye movement dataset with 322 participants, and the CELEX lexical database, we show that AdaBoost classifier is the best performing model with an average F1 score of 74.6%. More importantly , we show that the use of linguistic features increased the accuracy of most classification models. Our results provide insights on the use of machine learning models, and motivate more work on incorporating text analysis in eye movement based authentication.
Conference Paper
Full-text available
Program comprehension is a vital skill in software development. This work investigates program comprehension by examining the eye movement of novice programmers as they gain programming experience over the duration of a Java course. Their eye movement behavior is compared to the eye movement of expert programmers. Eye movement studies of natural text show that word frequency and length influence eye movement duration and act as indicators of reading skill. The study uses an existing longitudinal eye tracking dataset with 20 novice and experienced readers of source code. The work investigates the acquisition of the effects of token frequency and token length in source code reading as an indication of program reading skill. The results show evidence of the frequency and length effects in reading source code and the acquisition of these effects by novices. These results are then leveraged in a machine learning model demonstrating how eye movement can be used to estimate programming proficiency and classify novices from experts with 72% accuracy.
Thesis
Full-text available
The overwhelming majority of software development time is spent reading source code in a process known formally as Program Comprehension. Studies have found that programmers spend more than 50% of their time on activities that reflect searching for information. Program Comprehension is defined as the process of understanding how a software system or part of it work. Without understanding existing source code fixing, debugging, modifying, reusing, and maintaining software become impossible. On the economic side, software maintenance is the biggest cost in creating software systems. Among the modern observation tools for studying program comprehension comes Eye Tracking. The use of eye tracking in the study of human-oriented Software Engineering allowed researchers to gain a better understanding of the strategies and processes applied by programmers. This dissertation presents an investigation of source code reading on the token level, this includes the influence of token frequency, length, and predictability on eye movement. The focus of the investigation is on two central aspects: First, the differences in eye movement during reading source code and during reading natural language text. Second, the differences between novices and experts in the magnitude and influence of linguistic factors that affect eye movement. The results provide evidence that on the token level source code is influenced by the same factors as natural text, yet the magnitude of these effects is different from natural text. In addition, the results suggest that the magnitude of the linguistic effects on eye movement is a proxy indicator of skilled source code reading behavior. Based on these results, a model of eye-movement control is used to simulate eye movement over source code. The model predicts and explains when and where eyes move over source code accounting for cognitive processing time. Such models can be used in various areas of Software Engineering, and the use of the model is demonstrated in two applications: First, in estimating programming language reading proficiency based solely on eye movement. Second, in an automated technique for correcting erroneous eye tracking recordings over source code.
Article
Full-text available
For several years, the software engineering research community used eye trackers to study program comprehension, bug localization, pair programming, and other software engineering tasks. Eye trackers provide researchers with insights on software engineers’ cognitive processes, data that can augment those acquired through other means, such as on-line surveys and questionnaires. While there are many ways to take advantage of eye trackers, advancing their use requires defining standards for experimental design, execution, and reporting. We begin by presenting the foundations of eye tracking to provide context and perspective. Based on previous surveys of eye tracking for programming and software engineering tasks and our collective, extensive experience with eye trackers, we discuss when and why researchers should use eye trackers as well as how they should use them. We compile a list of typical use cases—real and anticipated—of eye trackers, as well as metrics, visualizations, and statistical analyses to analyze and report eye-tracking data. We also discuss the pragmatics of eye tracking studies. Finally, we offer lessons learned about using eye trackers to study software engineering tasks. This paper is intended to be a one-stop resource for researchers interested in designing, executing, and reporting eye tracking studies of software engineering tasks.
Conference Paper
Full-text available
Studies of eye movements during source code reading have supported the idea that reading source code differs fundamentally from reading natural text. The paper analyzed an existing data set of natural language and source code eye movement data using the E-Z reader model of eye movement control. The results show that the E-Z reader model can be used with natural text and with source code where it provides good predictions of eye movement duration. This result is confirmed by comparing model predictions to eye-movement data from this experiment and calculating the correlation score for each metric. Finally, it was found that gaze duration is influenced by token frequency in code and in natural text. The frequency effect is less pronounced on first fixation duration and single fixation duration. An eye movement control model for source code reading may open the door for tools in education and the industry to enhance program comprehension.
Article
Full-text available
Traditional quantitative research methods of data collection in programming, such as questionnaires and interviews, are the most common approaches for researchers in this field. However, in recent years, eye-tracking has been on the rise as a new method of collecting evidence of visual attention and the cognitive process of programmers. Eye-tracking has been used by researchers in the field of programming to analyze and understand a variety of tasks such as comprehension and debugging. In this article, we will focus on reporting how experiments that used eye-trackers in programming research are conducted, and the information that can be collected from these experiments. In this mapping study, we identify and report on 63 studies, published between 1990 and June 2017, collected and gathered via manual search on digital libraries and databases related to computer science and computer engineering. Among the five main areas of research interest are program comprehension and debugging, which received an increased interest in recent years, non-code comprehension, collaborative programming, and requirements traceability research, which had the fewest number of publications due to possible limitations of the eye-tracking technology in this type of experiments. We find that most of the participants in these studies were students and faculty members from institutions of higher learning, and while they performed programming tasks on a range of programming languages and programming representations, we find Java language and Unified Modeling Language (UML) representation to be the most used materials. We also report on a range of eye-trackers and attention tracking tools that have been utilized, and find Tobii eye-trackers to be the most used devices by researchers.
Article
Full-text available
Information Retrieval (IR) approaches, such as Latent Semantic Indexing (LSI) and Vector Space Model (VSM), are commonly applied to recover software traceability links. Recently, an approach based on developers’ eye gazes was proposed to retrieve traceability links. This paper presents a comparative study on IR and eye-gaze based approaches. In addition, it reports on the possibility of using eye gaze links as an alternative benchmark in comparison to commits. The study conducted asked developers to perform bug-localization tasks on the open source subject system JabRef. The iTrace environment, which is an eye tracking enabled Eclipse plugin, was used to collect eye gaze data. During the data collection phase, an eye tracker was used to gather the source code entities (SCE’s), developers looked at while solving these tasks. We present an algorithm that uses the collected gaze dataset to produce candidate traceability links related to the tasks. In the evaluation phase, we compared the results of our algorithm with the results of an IR technique, in two different contexts. In the first context, precision and recall metric values are reported for both IR and eye gaze approaches based on commits. In the second context, another set of developers were asked to rate the candidate links from each of the two techniques in terms of how useful they were in fixing the bugs. The eye gaze approach outperforms standard LSI and VSM approaches and reports a 55 % precision and 67 % recall on average for all tasks when compared to how the developers actually fixed the bug. In the second context, the usefulness results show that links generated by our algorithm were considered to be significantly more useful (to fix the bug) than those of the IR technique in a majority of tasks. We discuss the implications of this radically different method of deriving traceability links. Techniques for feature location/bug localization are commonly evaluated on benchmarks formed from commits as is done in the evaluation phase of this study. Although, commits are a reasonable source, they only capture entities that were eventually changed to fix a bug or resolve a feature. We investigate another type of benchmark based on eye tracking data, namely links generated from the bug-localization tasks given to the developers in the data collection phase. The source code entities relevant to subjected bugs recommended from IR methods are evaluated on both commits and links generated from eye gaze. The results of the benchmarking phase show that the use of eye tracking could form an effective (complementary) benchmark and add another interesting perspective in the evaluation of bug-localization techniques.
Article
A large dataset that contains the eye movements of N=216 programmers of different experience levels captured during two code comprehension tasks is presented. Data are grouped in terms of programming expertise (from none to high) and other demographic descriptors. Data were collected through an international collaborative effort that involved eleven research teams across eight countries on four continents. The same eye tracking apparatus and software was used for the data collection. The Eye Movements in Programming (EMIP) dataset is freely available for download. The varied metadata in the EMIP dataset provides fertile ground for the analysis of gaze behavior and may be used to make novel insights about code comprehension.
Article
Research in program comprehension has evolved considerably over the past decades. However, only little is known about how developers practice program comprehension in their daily work. This article reports on qualitative and quantitative research to comprehend the strategies, tools, and knowledge used for program comprehension. We observed 28 professional developers, focusing on their comprehension behavior, strategies followed, and tools used. In an online survey with 1,477 respondents, we analyzed the importance of certain types of knowledge for comprehension and where developers typically access and share this knowledge. We found that developers follow pragmatic comprehension strategies depending on context. They try to avoid comprehension whenever possible and often put themselves in the role of users by inspecting graphical interfaces. Participants confirmed that standards, experience, and personal communication facilitate comprehension. The team size, its distribution, and open-source experience influence their knowledge sharing and access behavior. While face-to-face communication is preferred for accessing knowledge, knowledge is frequently shared in informal comments. Our results reveal a gap between research and practice, as we did not observe any use of comprehension tools and developers seem to be unaware of them. Overall, our findings call for reconsidering the research agendas towards context-aware tool support.