ChapterPDF Available

Tracking Eye Movement for Controlling Real-Time Image-Abstraction Techniques


Abstract and Figures

Acquisition and consumption of visual media such as digital image and videos is becoming one of the most important forms of modern communication. However, since the creation and sharing of images is increasing exponentially, images as a media form suffer from being devalued, as the quality of single images are getting less and less important, and the frequency of the shared content turns to be the focus. In this work, an interactive system which allows users to interact with volatile and diverting artwork based on their eye movement only is presented. The system uses real-time image-abstraction techniques to create an artwork unique to each situation. It supports multiple, distinct interaction modes, which share common design principles, enabling users to experience game-like interactions focusing on eye-movement and the diverting image content itself. This approach hints at possible future research in the field of relaxation exercises and casual art consumption and creation.
Content may be subject to copyright.
Tracking Eye Movement for Controlling
Real-Time Image-Abstraction Techniques
Maximilian ochting and Matthias Trapp[0000000338615759]
Hasso Plattner Institute, Faculty of Digital Engineering,
University of Potsdam, Germany,
Abstract. Acquisition and consumption of visual media such as digital
image and videos is becoming one of the most important forms of mod-
ern communication. However, since the creation and sharing of images is
increasing exponentially, images as a media form suffer from being deval-
ued, as the quality of single images are getting less and less important,
and the frequency of the shared content turns to be the focus. In this
work, an interactive system which allows users to interact with volatile
and diverting artwork based on their eye movement only is presented.
The system uses real-time image-abstraction techniques to create an art-
work unique to each situation. It supports multiple, distinct interaction
modes, which share common design principles, enabling users to experi-
ence game-like interactions focusing on eye-movement and the diverting
image content itself. This approach hints at possible future research in
the field of relaxation exercises and casual art consumption and creation.
Keywords: Eye-Tracking, Image Abstraction, Image Processing, Artis-
tic Image Stylization, Interactive media
1 Introduction
1.1 Temporary Aesthetic Experiences
Providing users with a temporary visual and aesthetic experience through an
interactive system is not an unprecedented idea. In this work, eye tracking tech-
nology and image-abstraction techniques are combined with a novel interaction
paradigm (Figure 1). This approach aims to increase the value of the visual con-
tent through volatility and instability, and taking away the decision power of
the user. Contradicting the assisting nature of traditional technology, the power
dynamic that usually exists between a user and a technical device is reversed.
The presented system takes charge and nudges the user to follow a certain,
pre-determined behavior by rewarding the user in case of success, and punishing
deviating behavior by setting the user back. Such volatile visual content also
allows for deeper appreciation of the evolving artwork. The interaction with the
art allows it to become more intriguing and provide more enjoyment to observers
in contrast to still images.
2 M. ochting, M. Trapp
1.2 Challenges for Eye Tracking based Interaction Techniques
Eye Tracker
Current Gaze PositionInteractive Artwork
Fig. 1. The proposed system in a user in-
teraction scenario. Mounted at the bottom
side of the monitor, the eye tracker records
the eye movement of the user. The applica-
tion displays the computer-generated art-
work that is controlled by the processed
sensor inputs.
The presented approach creates a
unique interaction experience be-
tween the system and the user, by al-
lowing the user to take part in the cre-
ation of the computer-transformed vi-
sual content that is presented to them.
The goal is to create volatile artwork,
unique to each situation. The interac-
tion aims to be“frictionless” and im-
mersive by using real-time sensors,
i.e., especially eye tracking. If the cho-
sen interaction technique requires it,
the user may be constrained to cer-
tain behavior patterns, e.g., not blink-
ing for a set duration, as a game-like
In this work, artistic stylization
through image processing gives the in-
teractive system a medium to interact with the user. Processing and interpreting
user input, aesthetical changes in the artwork through local and global changes
of abstraction parameters, choice of interaction technique, and choice of image
are applied. This necessitates strong customizability and high granularity of the
image-abstraction techniques to implement the proposed interaction modes suc-
cessfully. For the proposed approach, a set of different image-abstraction tech-
niques that imitate real mediums such as cartoon or watercolor filtering [13] have
been implemented.
Next to the mainstream usage of eye trackers for statistical purposes, e.g.,
for market research, the application as a direct input device is not usual. Zhai
et al. observed [24], that eye tracking-based input methods put considerable
strain on the user and are less straining and offer more convenient interaction
when combined with other input methods. Because the presented work aims to
create an intense environment and a particularly inconvenient interaction, using
only eye-tracker input and the resulting user strain, benefits the fundamental
approach of the system by having the user “work” for his rewards.
Allowing real-time sensor data streams to influence the choice of image-
abstraction techniques and its parameters demonstrates unique technical chal-
lenges. Implementing and utilizing the output of sensor data feeds in a real-time
application requires the resulting concurrency problems, since both the main ap-
plication and the sensor threads need to synchronize data access. Additionally,
a flexible data model and a dynamic system that adapts these changes during
application run-time is required in order to allow system behavior to be influ-
enced by the provided sensor data (interaction techniques) in a dynamic and
easily exchangeable manner.
Tracking Eye Movement for Controlling Image-Abstraction Techniques 3
1.3 Combining Eye Tracking and Image Abstraction
The image-abstraction techniques utilized by our approach are based on earlier
works on the topic of a platform-independent format for image processing effects
with parametrization on multiple Level-of-Control (LOC) [6]. Based on this for-
mat, a system for capturing and processing sensor data and allowing interaction
techniques to be implemented as an interface between sensor data and image ab-
straction from a previous work by M. ochting and M. Trapp was extended [21].
The presented approach of that work is developed further in this paper with
detailed performance measurements and analysis, more in-depth description of
gaze data and its processing, discussion of the platform-independent image ab-
straction effect format, further conclusions and additional related work and fu-
ture research directions. It details the concept for an interactive system that
manipulates art-work based on Graphics Processing Unit (GPU)-accelerated
image-abstraction techniques using eye tracking. For it, a system architecture
is described that processes real-time sensor data with exchangeable interaction
and image-abstraction techniques. Based on this, observed effects on users using
a set of interaction techniques are discussed.
The remainder of this work is structured as follows. Section 2 reviews and
discusses related work with respect to fundamentals of eye tracking-based in-
teraction techniques and interactive image processing. Section 3 describes the
concept of mapping sensor data to interactions for image-abstraction, the re-
quired technical conditions and the interaction techniques themselves. Section 4
presents the parametrization of image and video abstraction techniques (Sec-
tion 4.1) and the mapping of sensor data to their respective behavior. Section 5
highlights aspects of the prototypical implementation of the application and the
image-abstraction techniques. Section 6 discusses application and results of the
presented approach. Finally, Section 7 concludes this paper and presents ideas
for future research directions.
2 Background and Related Work
The background of this work mainly concerns interactive image-abstraction tech-
niques with implementations suitable for GPUs (Section 2.1) as well as inter-
action techniques that are based eye tracking technologies (Section 2.2) and
approaches that apply eye tracking for understanding art (Section 2.3).
2.1 Interactive Image-Abstraction Techniques
To give the proposed system a medium to interact with the user [19], image pro-
cessing for the purpose of artistic stylization is used. The system applies aesthetic
changes in the artwork, based on user input, through mask-based and global
changes of effect parameters, choice of image and choice of image-abstraction ef-
fect [9]. This necessitates a high degree of customization of the image-abstraction
effects in order to enable the proposed interaction.
4 M. ochting, M. Trapp
Furthermore, the system is required to be highly interactive to prevent user
frustration and hold up the immersion of a fluid interaction, which poses to be
particularly difficult since the system includes high-frequency eye tracking [22].
Therefore, the described, complex image stylization techniques are implemented
in the system through different caching and pre-loading mechanisms. For this
work, a set of complex image-abstraction techniques have been selected that
imitate real mediums such as cartoon or watercolor filtering [5] and implemented,
in the combination with multiple interaction techniques (Section 4). More use
cases for image processing include medical analysis and enterprise software.
2.2 Interaction Techniques based on Eye Tracking
Eye tracking has been used prominently in medical and psychological research
for recording and studying human visual behavior [14]. Previously, eye tracking
devices have been implemented in experimental and business environments in
various domains. The field of human-computer-interaction is a prominent exam-
ple of eye tracking applications, where eye trackers serve mainly two different
purposes: statistical analysis and immediate interface interaction.
In interface interaction, projects in human-computer-interaction interpret the
user eye movement as an input component for navigating and interacting with
interactive systems [12]. The work of Alonso et al . on air traffic controllers is
one such example [1], in which they show eye tracking can improve interac-
tion convenience even in the complex use case of air traffic control. However,
Kasprowski et al. concludes that touchpad or mouse input is still superior in in-
teraction accuracy and reaction speed [10] for workloads that necessitate quick
and precise interaction. Statistical analysis is more widespread purpose of eye
tracking in human-computer-interaction projects, for which users and their eye
movement are observed while they fulfill given tasks, e.g., using a web site. The
gathered gaze data is then analyzed and conclusions can be drawn, e.g., which
parts of the web site attract the most visual attention and which areas the user
did not perceive at all. This analysis has the goal of understanding eye move-
ments and human visual behavior, such as in the work by Kiefer et al. [11] in
which the researchers understand how maps are perceived by the test persons.
The statistical data collected by eye trackers can not only be used for drawing
immediate conclusions, but also for creating visualizations. Blascheck et al. [4]
summarized different approaches to visualizing such eye tracking data. In the
visualizations, patterns such as fixations or other repeating eye movements can
be identified. From these findings, researchers can draw conclusions about which
visual content relates to which eye gaze pattern.
Mishra [16] found that eye tracking as a direct input device lacks precision.
As a solution, patents involving prediction systems for improving the precision of
interface selections are highlighted. With these improvements, the mainstream
usage of eye tracking for interface navigation may be more easily reachable in
the future.
Tracking Eye Movement for Controlling Image-Abstraction Techniques 5
Sensor Data Image Abstraction State
Application State
Sensor Data Mapping Image Abstraction ImageSensor Data Acquisition
Fig. 2. Overview of the software system’s processing stages and control flow (red) as
well as data flow (blue) and user interaction (green).
2.3 Eye Tracking in Understanding and Creating Digital Art
Eye tracking has also found applications in the domain of understanding and
creating digital art. For instance, eye tracking as an interaction technique has
been used in multiple art exhibitions, allowing to influence the artwork through
eye movement. In “Molecular Informatics” by Mikami [15], users can discover a
Virtual Reality (VR) environment of 3D molecule structures which are dynami-
cally generated based on the eye movements of the user. Furthermore, statistical
analysis of eye tracking data has been utilized for research on how humans pro-
cess and perceive art. For example, the work of Quiroga and Pedreira [17] has
shown that eye movement and visual attention of human observers can be easily
manipulated by changing the presented pictures in small areas.
The British artist Graham Fink created many drawings using eye tracking [7].
Using his gaze and an on/off-switch, he draws the lines of his portraits on a digital
canvas. He also executed live performances in museums, creating portraits of
museum guests. He described using his software as “incredibly difficult” and
requiring intense concentration [2].
3 Eye Tracking and Image-Abstraction Techniques
Starting with the hardware system setup (Section 3.1), this section further de-
scribes the software system (Section 3.2), required low-level sensor data acqui-
sition (Section 3.3), and its mapping to the different levels-of-control for image-
abstraction techniques (Section 4.3).
6 M. ochting, M. Trapp
3.1 System Setup and Hardware
Eye Tracker
Fig. 3. Overview of system components
and the setup environment.
In the proposed approach, a glasses-
free consumer eye tracker (Tobii Gam-
ing Eye Tracker 4C ) is used to cre-
ate a natural interaction with the sys-
tem. Based on the intention of the cur-
rently chosen interaction technique, it
allows the user to influence the sys-
tem and the artwork. However, our
system does not rely on specific hard-
ware and can be used with different
manufacturers. In the following, the
requirements and constraints on the
environment that need to be met in
order for the eye tracker to work in a
stable manner is described(Figure 3).
Since the eye tracker can only
track one pair of eyes at the same
time, a well-lit environment with a
single user is required. Furthermore, it is recommended by the manufacturer
of the eye tracker to use monitors with a maximum diagonal length of 27 inches
with a display ratio of 16:9 and 30 inches with a display ratio of 21:9. Addition-
ally, the distance from the sensor is advised to be kept between 50cm to 95 cm
in order to deliver consistent results. However, in the presented approach, a 43
inches 16:9 display was used with a sensor distance of approx. 1 m and found
stable and precise enough for the presented system. Extension cords for the Uni-
versal Serial Bus (USB) connection of the eye tracker are advised against while
the native cable has a length of 80 cm, therefore constraining the arrangement
of the components in the build of the proposed system.
Furthermore, a calibration step that tracks and measures the visual tracking
response through varying head position, rotation, and distance is required by
the eye tracker to allow for more precise gaze tracking. Ideally, this calibration
is repeated on a per-user basis for long-term use, while in a more casual scenario
with often switching users, a calibration-less set-up for a frictionless interac-
tion should be aimed towards. A standard calibration for a certain average user
position should therefore be used in the latter case. Then, the correct user posi-
tion could be suggested through ground markers or other physical constraints to
guarantee a certain tracking precision. Furthermore, visual feedback the status
of eye detection can guide the user to assume the right position.
3.2 Overview of Software System Components
Figure 2 displays a conceptual overview of the software system used for our
approach. It comprises three fundamental stages:
Tracking Eye Movement for Controlling Image-Abstraction Techniques 7
Sensor Data Acquisition: This stage acquires, filters, and manages the re-
spective sensor data from the eye tracking device and possibly other real-time
sensors (Section 3.3). Its main functionality is the conversion of low-level
hardware events to high-level software events required by the subsequent
Sensor Data Mapping: Using the derived high-level events, this stage ma-
nipulates the respective image-abstraction state and global application state
according to the different interaction mappings and modes.
Rendering: In this stage, the behavior of the active interaction modes is ex-
ecuted and the respective rendering and composition steps are executed to
generate the complete image that is then displayed to the user.
3.3 Sensor Data Processing
The Tobii Gaming 4C Eye Tracker can be utilized using different Software De-
velopment Kits (SDKs), such as Stream Engine, a low-level C++ binding, Core
SDK, a high-level C# binding, and a Universal Windows Platform (UWP) pre-
view of the Core SDK. Because the presented approach shares core functionality
with related projects written in C++ and the low-level binding is presented as
the most flexible choice, the Stream Engine SDK was chosen. When connected
to an eye tracker device, the SDK allows to register different callbacks, each cor-
responding to one data point (Table 1). Almost all sensor data feeds are updated
and generate an event every 11 ms, calling every corresponding callback handler
registered by the developer. Since it is not possible to change the update fre-
quency of 11 ms in the SDK, the only options to reduce the frequency is skipping
callbacks or merging them, which results in tick rates of 22 ms, 33 ms, etc.
Even the low-level Stream Engine Application Programming Interface (API)
seems to apply interpolation to the sensor values, as around 5% of user blinks
do not seem to invalidate the gaze point data point, suggesting that there could
be software interpolation.
Table 1. Overview of low-level and high-level eye tracking events. Each data point
can be observed by registering a callback which is called on every sensor capture, i.e.,
every 11 ms for most points.
Event Description
Gaze Point Normalized screen coordinates NDC = [1,1]2of user’s gaze
Gaze Origin 3D position of eyes, from the screen center in absolute millimeters
Eye Position Points in 3D space for both eyes, normalized within the tracking area
Head Pose 3D position and rotation, from the screen center in millimeters
User Presence Boolean that indicates if a pair of eyes is currently being recognized
Notifications Various events, e.g., the calibration status or tracking area has changed
8 M. ochting, M. Trapp
Interaction Technique
Application Components Shared Resources
Filtered Callbacks Call when required Call when requiredFiltered Callbacks
Raw Callbacks
Draw-to-User-InterfaceImage PreprocessingUser Gaze-Movement User Blink
Eye Tracking Device
Eye Tracking Processing Application Loop Texture Cache
Gaze-Point Mask Texture
Image Processing Technique
Fig. 4. Overview diagram of the interaction between the application and the interaction
technique (Section 4.2).
4 Eye Tracking Interaction for Image-Abstraction
This section describes the parametrization of the image abstraction techniques
(Section 4.1) and how they are influenced by the input data of the eye tracking
sensor. Furthermore, details w.r.t. the context of the interaction techniques (Sec-
tion 4.2) and the implementation itself (Section 4.3) are discussed.
4.1 Mapping Sensor Data to Levels-of-Control
Before discussing mappings between sensor data events and state changes of
abstraction techniques and the application itself, we briefly review the different
LOCs offered by modern, real-time implementation of image and video abstrac-
tion techniques [20]. In particular, these are:
Pipeline Manipulation. An image-abstraction pipeline is made up of one or more
image-abstraction effects which each have local and global parameters and pre-
sets representing certain sets of parameter values. Pipeline presets and pipeline
parameters can be used to describe a set of effects and their parameter con-
figuration. In this work, however, only single effects are used, which therefore
eliminates the need for pipeline manipulation.
Effect Selection. An image-abstraction effect applies a certain aesthetic change
to the processed image. The effect selection for each interaction technique can
be either constant, sequentially changing (e.g., on every blink or periodically) or
based on randomness.
Preset Selection. Within each effect, different presets represent individual sets of
parameters which create certain aesthetic changes. They correspond to specific
parameter value configurations that produce a change in the image which is
distinctly unique between each other. Presets can be selected in a similar way
such as effects: either constant, sequential or random selection. In the case that
no preset is selected, a default preset will be used for the effect.
Tracking Eye Movement for Controlling Image-Abstraction Techniques 9
Adjusting Global and Local Parameters. Each effect possesses different parame-
ters that influence the processing and therefore the caused visual change of the
effect. These parameters are typically numerical, or an enumeration value. They
can be changed on a global basis, i.e., for every pixel in the image, or locally,
i.e., for a subset of pixels in the image, usually caused by mask painting [20].
4.2 Interaction Context
The interaction context is given by an input image and an image-abstraction
effect. The effect is influenced by its parameter values, which can be locally
controlled on a global basis and using masks, and generates the artwork. In
order to record (1) the eye position, (2) the gaze movement, and (3) blinks over
time, a glasses-free eye tracker is mounted to the display.
Fundamentals for Interaction Techniques. The active abstraction tech-
nique uses the parameter values to achieve the desired visual change (Section 4.3)
by implementing their own behavior in four different points in time (Figure 4).
The interaction techniques are therefore implemented using a Strategy pat-
tern [8] as described in the following.
Gaze Movement Trajectories. In order to abstract from the raw callbacks de-
livered by the eye tracker API (Section 3.3), the gaze tracking values are first
interpolated and adjusted to the application window before the interaction tech-
nique handles them. Then, these adjusted eye tracking values are processed by
the interaction technique regularly (2 to 4 times of the sensor interval, i.e., 22 ms
to 44 ms). The eye gaze position paints a circle shape at its position in the Gaze
Movement Texture by default. This behavior can be extended and changed by
implementing custom behavior based on gaze position.
Blink Events. A blink event is generated by the application whenever the user
blinks. For this, the raw gaze movement callbacks and their respective valid-
ity are analyzed and processed through a confidence model (Section 5.3). By
default, this event causes the Gaze Movement Texture to reset, however, this
behavior can be overridden and altered similar to the other events. This be-
havior is common to the design of many implemented interaction techniques,
which allows users to understand the fundamental paradigm quickly (“blink =
advance, loss”, Section 6).
Image Loading and Preprocessing. In this stage, the abstract image(s) are ren-
dered into the two provided Cache Textures, effectively preparing for the upcom-
ing user interface update. By pre-processing these images, the need to render
them multiple times is eliminated, increasing application performance signifi-
cantly. In order to achieve the desired behavior (Section 4.3), the interaction
technique applies the necessary changes to the Processing Effect configuration
on different LOCs (Section 4.1), e.g., applying effect presets or changing the
active effect or image, followed by rendering the effect image(s) to the Cache
10 M. ochting, M. Trapp
(a) (b) (c)
Fig. 5. Successive frames from a spot-light interaction mode session.
Draw-to-User-Interface. Once the main window/the user interface is prompted
for an update, i.e., the user interface and the user-side representation of the
artwork have to be redrawn, this last stage is called. Usually the two Caching
Textures or the Gaze Movement Texture are drawn in a certain order with
specific composition modes that achieve the desired effect. This stage, however,
can also be restricted to just drawing one of the Cache Textures to the interface.
Global Application State. To enable the desired interaction technique and
application logic, state information between the different events is stored in the
following global resources which are provided to each callback:
Cache Textures: Two textures are provided by the application for rendering
and caching the abstracted images. They are stored on the GPU to reduce
transfer between GPU and Central Processing Unit (CPU) in the Paint to
User Interface phase and are also resized to the rendering resolution auto-
matically. With these two textures, interesting techniques that use blending
between different parameter configurations within the same effect or between
different effects can be achieved. The application framework can be easily
extended to allow for more than two textures, however, not more were con-
sidered as required for the implemented, exemplary abstraction techniques.
Gaze-Movement Texture: By default, the Gaze Movement Texture is man-
aged by the application and contains a gray-scale mask of the previous gaze
movement of the user since the last time he blinked. It is used during the im-
age preprocessing phase to create compositions together with the abstracted
images and reset automatically once the user blinks.
Processing Effect: The Processing Effect represents the interface to the ren-
dering core. It is possible to modify parameters of the active effect, the chosen
preset and also change the active effect. It can also be used to combine effects
in a pipeline, however, this functionality is not used in this work.
4.3 Exemplary Interaction Modes
In the following, different interaction modes are presented, that each follow sim-
ilar design principles but represent distinct experiences (Figures 5 to 7).
Tracking Eye Movement for Controlling Image-Abstraction Techniques 11
(a) (b) (c)
Fig. 6. Successive frames from a shift interaction mode session.
Spotlight Interaction Mode. Starting from a black canvas or a highly blurred
input photo, the artwork is revealed successively using gaze movement (Figure 5).
For this, an alpha mask, which blends the gaze canvas with the abstracted
image, is manipulated by the abstracted image circle or similar shape at the
position where the user’s eye is detected. When the user blinks (e.g., a certain
number in a certain period of time), the mask is cleared and the revelation
process must be performed from the beginning.
Shift Interaction Mode. Using a similar alpha-mask like the Spotlight-Mode,
this mode blends between different level-of-abstractions of the same effect (Fig-
ure 6). This means that two sets of parameter combinations or level-of-abstractions
are used to create two versions of the image, which are blended to create the
effect of transforming one image into the other. For instance, low abstraction
(or the original image) could be displayed at the gaze focus area and high ab-
straction in the remaining area of the image. Through this, the artwork becomes
unique dynamic and the user never sees the complete image. Furthermore, the
system is capable of reconstructing the creation process and can generate a video
for sharing.
Coloring Interaction Mode. In this mode, gaze movement blends from a
light pencil sketch of an image to a colored version of the same image at the
gaze location (Figure 7). With this, the Coloring-Mode is a special case of the
Shift-Mode, for which two presets of an image-processing technique that imitates
(a) (b) (c)
Fig. 7. Successive frames from a coloring interaction mode session.
12 M. ochting, M. Trapp
Interaction Technique Application Components Application Libraries Hardware
Image Preprocessing
User Gaze-Movement
User Blink Legend: Communication Utilization
Eye Tracking Device
Graphics & Central
Processing Unit
Eye-Tracking Engine
Real-time Image Processor
UI Framework
Eye Tracking
Canvas Widget
Main Window
Fig. 8. Overview of the system architecture, external libraries and hardware. The main
components of the application collaborate with external libraries and the respective
hardware, to achieve the desired behavior of the proposed approach.
watercolor painting are chosen. One preset produces color-less pencil sketches
while the other produces softly colored watercolor paintings of the original image.
Therefore, blending these two presets results in the desired effect of “coloring-in”
the image.
5 Implementation Aspects
In this chapter, the implementation of the proposed concept is discussed. First,
an overview of the system architecture is given (Section 5.1) and the implemen-
tation of the image-abstraction techniques is presented (Section 5.2). Finally,
the analysis and processing of the eye tracking data is explained (Section 5.3).
5.1 Overview of System Architecture
In Figure 8, an overview of the system architecture is displayed. The Main-
Window utilizes a User Interface (UI) Framework and relies on the respective
widgets and rendering management. CanvasWidget implements most of the ap-
plication logic as it described in the previous chapter. Fundamentally, it handles
the creation and handling of the respective instance of an Interaction Technique
by providing the respective events and resources (Figure 4). It achieves this
by collaborating with the Real-time Image Processor and handling the Process-
ing Effect object that is mainly accessed by the interaction technique. The Eye
Tracking Component performs the batching, analysis, as well as processing of
the eye tracking sensor data (Section 5.3) and communicates with the connected
Tobii eye tracker using the Tobii Stream Engine library.
5.2 Integration of Interactive Image-Abstraction Techniques
The image-abstraction techniques are implemented using a platform indepen-
dent effect format, which allows parametrization on local and global LOCs.
urschmid et al. stated the following requirements on document formats [6]:
Tracking Eye Movement for Controlling Image-Abstraction Techniques 13
Implementation Set
(e.g. Vulkan)
(e.g. OpenGL ES)
Common asset
(e.g. Textures)
Common asset
(e.g. Icons)
Implementation Set
(e.g. WebGL)
(e.g. OpenGL)
Effect 2
Effect 1
Depends on
Presets UI Presets UI
Fig. 9. Structure of an image-abstraction technique in an XML-based document format
(after [6]).
first, it should enable platform-independent effect specifications to allow their
cross-platform provisioning and sharing. Secondly, it should have a modular
structure to allow the reuse of common building blocks. Thereby rapid proto-
typing of new effects and modification of existing ones even by inexperienced
users should be possible. Further, it should be easily parsed and serialized by
different clients.
To address these challenges, Semmo et al . introduced a document format
that allows to decompose effects into several components [20] . In the proposed
format, implementation-specific files are complemented by human-readable eX-
tensible Markup Language (XML) description files, which describe an effect in
an abstract way. For the processing of such effects, a C++ processor that sup-
ports this format is utilized as a library for this project. The structure of these
XML description files is defined by a domain-specific XML scheme that sep-
arates platform-independent parts from platform- and implementation-specific
parts Figure 9. Thereby it uses the following components:
Effect Definition. A definition component consists of a single definition XML file,
which represents the interface to an effect. The definition file references other
XML files and components that are part of the effect description. Additionally, it
specifies the inputs, outputs, parameters, brushes, and painting tools. Parameter
definitions consist of the parameter name, the data type of the parameter, its
value range, and a default value. Brush definitions reference a brush preset and
can include specifications of brush strength and stroke width. Specifications of
painting tools include a reference to a painting brush and a masking texture.
Further, it links to an implementation set, presets, as well as user interface data.
14 M. ochting, M. Trapp
Implementation Set. An implementation set component includes a single im-
plementation set XML file that lists the target platforms and graphic APIs for
which implementations of the effect are available. To allow clients to choose an
appropriate implementation, the implementation set file provides information
regarding the performance of each implementation and lists the graphic API
extensions that are required to execute it.
Implementation. The implementation describes how the effect is executed in a
specific environment, e.g., following an operating system or graphics API con-
straint. Implementation components encapsulate the platform-specific parts of
an effect description. They include implementation XML files describing the im-
plementation of an effect for one specific target platform. To support different
target platforms and graphic APIs, there can be several implementation files for
an effect.
In each implementation file, shader programs and textures are specified that
are required to execute the respective implementation. Subsequently several ren-
dering passes are defined, each referencing a shader program and defining its in-
puts and outputs. The effect parameters defined in the corresponding definition
XML file are mapped on inputs of these rendering passes. Finally, the control
flow of the processing algorithm is defined by specifying an execution order of
the rendering passes.
Preset and UI Data. A UI file defines how an effect should be presented in user
interfaces. It defines a display name and optionally a description for the effect.
Furthermore, it contains descriptions of the parameters, presets, brushes, and
painting tools. It references icon files that can be used as thumbnails for the effect
and its parameters, presets, brushes, etc. Presets files can be used to represent
frequently used, well-working parameter settings. Such pre-configurations should
allow inexperienced users to create suitable results when using the effects.
Common Assets. Common components can basically include two types of XML
description files, UI files and preset files. Besides these XML files, common com-
ponents can contain arbitrary files such as icons, textures, geometry, or shader
programs. Common components can be shared by several effects.
Pipelines. A Pipeline component can be used to combine multiple effect into a
complex one. It includes a pipeline definition XML file and a pipeline presets
file. The pipeline definition file specifies a sequence of effects that make up the
composite by referencing the respective definition file of each effect. The pipeline
presets file is used to store global presets for the composite effect. These global
presets consist of a set of presets for the effects that form the foundation the
Tracking Eye Movement for Controlling Image-Abstraction Techniques 15
5.3 Analysis and Processing of Eye Tracking Data
To implement the interaction mode for controlling the respective image abstrac-
tion techniques, we distinguish between processing of high-level and low-level eye
tracking events for controlling the parameter values of image-processing tech-
niques as follows (Figure 10).
Sensor 0 Sen sor 1 ... Sensor n
Processing and A ggregation of Low- Level Sensor Data
Local + Global Pro cessing Parameters and S tate
Mapping of Low -Level + High-leve l Events to Parameter
Stream 0
Stream 1 ... Data
Stream n
Stream of Low-le vel + High-level Events
Image Processi ng and Rendering
Fig. 10. Overview of the process-
ing pipeline supporting the map-
ping of low-level and high-level
events of multiple sensor streams
parameters values of image pro-
cessing techniques.
Low-level Eye Tracking Events. Before
being used to influence the image-abstraction
technique, the low-level eye tracking data is
analyzed and converted. At first, the raw call-
backs of the Stream Engine API are collected
and saved in batches. Then, after a certain
time frame, these batches are collectively pro-
cessed. This way, the raw callbacks are essen-
tially underclocked to an interval set by the
application. This allows for more consistent
and predictable behavior when implementing
an interaction technique.
By default, a circle shape is painted onto
the provided Gaze Movement Texture at ev-
ery valid detected gaze point (Section 4.2).
The input gaze data is smoothed to achieve
a pleasant and non-erratic interaction. If the
interaction technique desires such behavior,
the gaze point can also be utilized for custom
processing. Each interaction technique utilizes
the Gaze Movement Texture in a different
way, e.g., as a clipping mask. By default, the
texture mask is reset once the user blinks in
all interaction techniques.
High-level Eye Tracking Events. To detect whether the user has blinked or
not, a basic confidence model is used. For this, the amount of gaze point call-
backs that judged their respective measurement as invalid is analyzed, therefore
hinting at the absence of a valid pair of eyes. Through this, the stability of the
blink detection is improved significantly, since even during stable measurement
periods, some raw invalid callbacks occur, which would produce erroneous blink
events without this confidence model.
Sustained gazes, i.e., fixations of the same point, are detected during gaze
point processing. For this, the Manhattan distance between the last ngaze points
is computed. A fixation is assumed if it does not exceed a certain threshold. As
soon as the current gaze point moves a sufficient amount and exceeds the distance
threshold, the fixation is finished and regular gaze point processing commences.
It is possible to summarize high-level eye tracking events into patterns. Such
patterns can be defined by sequences of certain eye movements or blinks. With
16 M. ochting, M. Trapp
these patterns, interaction techniques can be even more customized by adding
special behavior to intricate gaze patterns.
6 Results and Discussion
This section describes exemplary applications Section 6.1 of the proposed sys-
tem, evaluates the runtime performance of its prototypical implementation Sec-
tion 6.2, discusses conceptual and technical challenges Section 6.3, and outlines
future research ideas Section 6.4.
6.1 Fundamental Application Scenarios
With the digital age and the wide-spread usage of smart phones capable of taking
high-quality pictures, creating, reproducing, and sharing digital visual media
such as images or videos has become incredibly frictionless. As low-cost mobile
data infrastructure has been developed in the majority of western countries,
sharing images and videos is becoming one of the most dominant components
of global bandwidth usage. In this mobile age, social networking apps such as
Instagram or Snapchat are centered around communication using images and
short videos such digital visual content has become a major part of modern
digital life.
With this development, however, the consumption of visual media seems to
have become arbitrary. Creating and sharing such media has turned so common
and casual that it has become an almost mandatory task to express oneself in
this social age frequently while older content is rarely looked at. This yields the
hypothesis that the frequency in which visual media content is created seems
more important than the represented content itself.
Therefore, going against the loss of appreciation for visual media, the idea
of the proposed system that provides users with a temporary visual and aes-
thetic experience arose. Instead of having the users fully control the system, the
proposed approach aims to increase the value of the visual content through insta-
bility and volatility. The power dynamic that usually exists between a user and
a technical device, in which the user can do almost everything with the tools he
is provided with, is reversed. In the proposed approach, however, the interactive
system takes charge and prompts the user to follow a certain behavior and, if
not successful, takes away from the “reward” for the user, therefore nudging the
user into the pre-determined behavior. Furthermore, the volatile visual content
can increase the appreciation for the evolving artwork. In contrast to still im-
ages, the interaction allows the art to become more intriguing and provide more
enjoyment to observers.
Creating a unique interaction experience between the user and the system
that enables the user to take part in the creation of the presented, computer-
transformed visual content is the main goal. The approach aims to create a
volatile artwork. The interaction tries to be as frictionless and immersive as
possible through the use of sensors, such as eye tracking, while also having the
Tracking Eye Movement for Controlling Image-Abstraction Techniques 17
option to constrain the user to a certain behavior, e.g., not blinking for a set
amount of time, as a game-like mechanic in case it is desired by the chosen
interaction technique.
6.2 Runtime Performance
Test System. We tested the rendering performance of our implementation using a
NVIDIA GeForce GTX 1070 GPU with 4096 MB Video Random Access Memory
(RAM) (VRAM) on an Intel Core i7-8700 CPU with 3.2 GHz and 32 GB RAM
running Windows 10.
Test Setup. The rendering is performed at a viewport resolution of 1280 ×720
pixels. The application runs in windowed with vertical synchronization turned
on. The spotlight mode comprises a Watercolor effect using a gaze mask and
a single input image. The shift-mode comprises a Pencil-hatching effect using
a gaze mask and two input images. The coloring mode comprises a Watercolor
effect using a gaze mask and two input images as well.
We measured three stages of our system: (i) image abstraction using the
gaze mask and the input images, (ii) compositing, (iii) gaze point processing.
The performance of the stage (i) comprise: (1) transfer the source image(s) to
GPU, (2) run image processing using one (spotlight mode) or two images (other
modes), (3) transfer the processing result back to CPU as well as fill the source
image buffer (i.e., load image from disk into memory). The performance of the
stage (ii) basically comprise the compositing of two or three images differing
blending modes to achieve the desired effect. Stage (iii) draws new gaze points
into the respective mask. All input images are scaled in a preprocessing to a
resolution of 1280 ×720 pixels.
Test Results. Table 2 shows the performance results in milliseconds. With respect
to this, the sample size for step (i) is 10, (ii) is 200 to 350, and (iii) 700 to
1000. The run-time performance mainly depends on the number of processed
images that are used. While the Spotlight-Mode uses only one processed image
in addition to the gaze point mask, the Shift- and the Coloring-Mode use two
images. This difference can be observed in the image abstraction performance
(i): the processing of the abstract image(s) takes considerably longer in the Shift-
and Coloring-Modes in contrast to the Spotlight-Mode. Most of the time can be
attributed to transfer of data between GPU and CPU, while only a small amount
of time is used for the actual image abstraction on the GPU. The difference in the
modes also apply to the compositing stage (ii): it takes roughly 9 milliseconds
in the Shift- and the Coloring-Mode while it takes about 4.5 milliseconds in
the Spotlight-Mode. The gaze handling takes around 0.6 to 0.8 milliseconds on
average, independent of the interaction technique. Since the combined time of (ii)
compositing and (iii) gaze handling is almost always less than 16 milliseconds,
the proposed application is consistently interactive and achieves a frame rate of
more than 60 frames per second.
18 M. ochting, M. Trapp
Table 2. Rendering performance results in milliseconds.
Spotlight Shift Coloring Gaze Handling
(i) (ii) (i) (ii) (i) (ii) (iii)
Average 480.727 4.472 555.910 9.232 514.919 9.231 0.649
Median 478.693 4.426 560.967 9.198 517.438 9.197 0.886
Std. Dev. 8.267 0.088 12.621 0.130 12.137 0.134 0.364
6.3 Observed Effects on Users
Using the proposed system architecture, it was possible to implement the pre-
sented concept. With the presented system components, i.e., the Tobii 4C Eye
Tracker, all interaction techniques are executed with an interactive frame and
response rate. During application run-time, minor lag (<1 s) can occur during
blink events, as described in Section 6.2, since images are pre-loaded from the
hard drive and the effect pipeline may be adapted to change the currently active
effect. However, this lag is less noticeable, as the user is blinking at that time.
During the development of the Spotlight-Mode (Section 4.3), an interesting
observation on the necessity of highly conscious eye movements occurred. In stan-
dard human visual behavior, the eye movement is determined by visual interest
(interesting patterns or objects) while a lot of information is already gathered
from peripheral vision. However, the proposed Spotlight-Mode requires users to
repeatedly look at different parts of a black canvas, going strongly against the
normal behavior of human eye movement. Even when the user has adapted to
the system behavior, the eye movements still feel unnatural. These repeated
conscious movements may even be perceived as exhausting when presented with
this mode for longer periods of time. Interestingly, conscious eye movements like
these are also used in relaxation exercises and even have been trialed for the
therapy of mental health conditions [23].
The experienced discomfort is significantly smaller for the other interaction
techniques since there is no necessity to look at a black canvas. Yet, triggering
direct effects in the presented interface based on eye-tracker input still feels
unfamiliar. This is most likely the case because everyday digital devices such
as consumer Personal Computers (PCs) and mobile devices operate with touch
or mouse/keyboard input, allowing the eyes to look at arbitrary points in the
interface without triggering any direct effects.
As most interaction techniques reuse certain interaction patterns, the user
forms a common understanding of the interaction principles of the system. With
time, the user associates interaction sequences with their respective meaning. For
example, blinking usually causes a change in the style of the displayed artwork
and a reset in progress. Additionally, the fundamental interaction of causing
direct change to the picture wherever the eye movement is directed towards is
highly intuitive and quickly learned, while also partially infuriating, since no
point in the artwork can be looked at without it transforming into something
Tracking Eye Movement for Controlling Image-Abstraction Techniques 19
6.4 Potential Future Research Directions
For future work, complementing the current sensor inputs with additional sensors
in order to facilitate interaction techniques which make use of various inputs
could be possible. For example, microphones measuring acoustic pressure or
reacting to voice commands, ambient light sensors that influence the colors in
the artwork, and wearables such as smart watches that transmit the heart rate of
the user could be implemented as complementary sensors. In addition, tracking
the head position could yield supplementary data that can be used to improve
the existing interaction techniques. For example, it could be used to reveal parts
of the image while the eye gaze transforms the picture even further, therefore
allowing a parallel interaction with two sensory inputs.
Instead of forcing the user into only one certain behavior that is triggered
by their gaze, it is possible to give them more control through the selection of
a “digital brush”. These brushes could influence the artwork in different ways,
acting as different manipulators regarding the current interaction technique. In
this, also traditional paint brush strokes can be implemented, giving the user the
real feel of painting a set picture. Brush attributes such as color, size, opacity or
even texture should be correlated to the sensor input. For this purpose, adding
additional sensors may allow for a fine-grained control. The brush selection may
pose a design dilemma since the current system only includes an eye tracker, for
which interface navigation is typically slow and inefficient. Additionally, undo-
and redo-functionality may also have to be considered when implementing these
brushes and in the interface design.
Finally, to allow for more dynamic shared resources the interaction technique
framework could be extended to enable more intricate interaction modes that
could even use 3rd-party APIs or libraries for additional sensor or miscellaneous
data. Additionally, productive use-cases in the medical domain, such as concen-
tration training and relaxation exercises, could be approached by extensions of
the presented approach. Specific interaction techniques that make the user fol-
low certain patterns with their eyes could imitate existing exercises and vision
therapies. The approach of gamification can be extended further for a museum
showcase. For this, the system could allow users to ”color-in” artworks displayed
in a museum collection and, after completion, offer detailed information on the
artwork and its author. This approach could be implemented by museums to
allow for virtual museum tours or as an interactive exhibit.
7 Conclusions
This work reports on techniques for interactive control of image-abstraction tech-
niques using suitable mapping of eye tracking sensor data. For it, eye movement
and blinking events are mapped to global and local LOC, that enables differ-
ent interaction techniques rooted on similar design principles. We found that
some interaction techniques, e.g., the spotlight-mode, put significant strain on
the user, while others, e.g., the coloring-mode, proved to be more relaxing. Over-
all, users liked the visual qualities and the temporary visual sensation that is
20 M. ochting, M. Trapp
presented by the system. The system has proven to be interactive and useable
in real-time without significant processing delays. Such a system can represent
a basis for more advanced research w.r.t. recreational or medical applications.
1. Alonso, R., Causse, M., Vachon, F., Parise, R., Dehais, F., Terrier, P.: Evaluation
of head-free eye tracking as an input device for air traffic control. Ergonomics 56,
246–255 (02 2013).
2. Anapur, E., Fink, G.: Drawing with his eyes - graham fink in an interview (2017),
3. Besan¸con, L., Semmo, A., Biau, D., Frachet, B., Pineau, V., Ariali, E.H., Taouachi,
R., Isenberg, T., Dragicevic, P.: Reducing affective responses to surgical images
through color manipulation and stylization. In: Proceedings of the Joint Sympo-
sium on Computational Aesthetics, Sketch-Based Interfaces and Modeling, and
Non-Photorealistic Animation and Rendering (Expressive). pp. 11:1–11:13. ACM,
New York (2018).
4. Blascheck, T., Kurzhals, K., Raschke, M., Burch, M., Weiskopf, D., Ertl, T.: State-
of-the-art of visualization for eye tracking data. In: EuroVis (2014)
5. DiVerdi, S., Krishnaswamy, A., Mech, R., Ito, D.: Painting with polygons: A proce-
dural watercolor engine. IEEE Transactions on Visualization and Computer Graph-
ics 19, 723–735 (2013)
6. urschmid, T., ochting, M., Semmo, A., Trapp, M., ollner, J.: Prosumerfx:
Mobile design of image stylization components. In: SIGGRAPH Asia 2017 Mobile
Graphics & Interactive Applications. pp. 1:1–1:8. SA ’17, ACM, New York, NY,
USA (2017).,
7. Fink, G.: Drawing with my eyes (2015),
8. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of
Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc.,
Boston, MA, USA (1995)
9. Isenberg, T.: Interactive npar: What type of tools should we create? In: Proceedings
of the Joint Symposium on Computational Aesthetics and Sketch Based Interfaces
and Modeling and Non-Photorealistic Animation and Rendering. pp. 89–96. Expre-
sive ’16, Eurographics Association, Aire-la-Ville, Switzerland, Switzerland (2016),
10. Kasprowski, P., Harezlak, K., Niezabitowski, M.: Eye movement tracking as
a new promising modality for human computer interaction. In: 2016 17th In-
ternational Carpathian Control Conference (ICCC). pp. 314–318 (May 2016).
11. Kiefer, P., Giannopoulos, I., Raubal, M., Duchowski, A.: Eye tracking for spatial
research: Cognition, computation, challenges. Spatial Cognition & Computation
17(1-2), 1–19 (2017).
12. Kiili, K., Ketamo, H., Kickmeier-Rust, M.: Evaluating the usefulness of
eye tracking in game-based learning. International Journal of Serious
Games 1(2) (Jun 2014)., http://journal.
13. Kyprianidis, J.E., Collomosse, J., Wang, T., Isenberg, T.: State of the ‘art’:
A taxonomy of artistic stylization techniques for images and video. IEEE
Tracking Eye Movement for Controlling Image-Abstraction Techniques 21
Transactions on Visualization and Computer Graphics 19(5), 866–885 (2013).
14. Majaranta, P., Bulling, A.: Eye Tracking and Eye-Based Human–Computer Inter-
action, pp. 39–65. Springer London, London (2014).
1-4471-6392-3 3
15. Mikami, S.: Molecular informatics. morphogenic substance via eye tracking. https:
// (1996)
16. Mishra, U.: Inventions on GUI for eye cursor controls systems. CoRR
abs/1404.6765 (2014),
17. Quian Quiroga, R., Pedreira, C.: How do we see art: An eye-tracker study. Frontiers
in Human Neuroscience 5, 98 (2011).
18. Quian Quiroga, R., Pedreira, C.: How do we see art: An eye-tracker study. Frontiers
in Human Neuroscience 5, 98 (2011).
19. Schwarz, M., Isenberg, T., Mason, K., Carpendale, S.: Modeling with ren-
dering primitives: An interactive non-photorealistic canvas. In: Proceed-
ings of the 5th International Symposium on Non-photorealistic Anima-
tion and Rendering. pp. 15–22. NPAR ’07, ACM, New York, NY, USA
20. Semmo, A., urschmid, T., Trapp, M., Klingbeil, M., ollner, J., Pase-
waldt, S.: Interactive image filtering with multiple levels-of-control on mo-
bile devices. In: Proceedings SIGGRAPH ASIA Mobile Graphics and In-
teractive Applications (MGIA). pp. 2:1–2:8. ACM, New York (2016).
21. ochting, M., Trapp, M.: Controlling image-stylization techniques using eye
tracking. In: Chessa, M., Paljic, A., Braz, J. (eds.) Proceedings of the
15th International Joint Conference on Computer Vision, Imaging and Com-
puter Graphics Theory and Applications, VISIGRAPP 2020, Volume 2: HU-
CAPP, Valletta, Malta, February 27-29, 2020. pp. 25–34. SCITEPRESS (2020).
22. Vandoren, P., Van Laerhoven, T., Claesen, L., Taelman, J., Raymaek-
ers, C., Van Reeth, F.: Intupaint: Bridging the gap between physi-
cal and digital painting. In: 2008 3rd IEEE International Workshop on
Horizontal Interactive Human Computer Systems. pp. 65–72 (Oct 2008).
23. Vaughan, K., Armstrong, M.S., Gold, R., O’Connor, N., Jenneke, W., Tar-
rier, N.: A trial of eye movement desensitization compared to image habit-
uation training and applied muscle relaxation in post-traumatic stress disor-
der. Journal of Behavior Therapy and Experimental Psychiatry 25(4), 283
291 (1994)., http:
24. Zhai, S., Morimoto, C., Ihde, S.: Manual and gaze input cascaded (magic)
pointing. In: Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems. pp. 246–253. CHI ’99, ACM, New York, NY, USA
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
With the spread of smart phones capable of taking high-resolution photos and the development of high-speed mobile data infrastructure, digital visual media is becoming one of the most important forms of modern communication. With this development, however, also comes a devaluation of images as a media form with the focus becoming the frequency at which visual content is generated instead of the quality of the content. In this work, an interactive system using image-abstraction techniques and an eye tracking sensor is presented, which allows users to experience diverting and dynamic artworks that react to their eye movement. The underlying modular architecture enables a variety of different interaction techniques that share common design principles, making the interface as intuitive as possible. The resulting experience allows users to experience a game-like interaction in which they aim for a reward, the artwork, while being held under constraints, e.g., not blinking. The conscious eye movements that are required by some interaction techniques hint an interesting, possible future extension for this work into the field of relaxation exercises and concentration training.
Full-text available
Presentation of Research Paper "ProsumerFX : Mobile Design of Image Stylization Components"
Conference Paper
Full-text available
With the continuous advances of mobile graphics hardware, high-quality image stylization—e.g., based on image filtering, stroke-based rendering, and neural style transfer—is becoming feasible and increasingly used in casual creativity apps. The creative expression facilitated by these mobile apps, however, is typically limited with respect to the usage and application of pre-defined visual styles, which ultimately does not include their design and composition—an inherent requirement of prosumers. We present ProsumerFX, a GPU-based app that enables to interactively design parameterizable image stylization components on-device by reusing building blocks of image processing effects and pipelines. Furthermore, the presentation of the effects can be customized by modifying the icons, names, and order of parameters and presets. Thereby, the customized visual styles are defined as platform-independent effects and can be shared with other users via a web-based platform and database. Together with the presented mobile app, this system approach supports collaborative works for designing visual styles, including their rapid prototyping, A/B testing, publishing, and distribution. Thus, it satisfies the needs for creative expression of both professionals as well as the general public.
Conference Paper
Full-text available
With the continuous development of mobile graphics hardware, interactive high-quality image stylization based on nonlinear filtering is becoming feasible and increasingly used in casual creativity apps. However, these apps often only serve high-level controls to parameterize image filters and generally lack support for low-level (artistic) control, thus automating art creation rather than assisting it. This work presents a GPU-based framework that enables to parameterize image filters at three levels of control: (1) presets followed by (2) global parameter adjustments can be interactively refined by (3) complementary on-screen painting that operates within the filters' parameter spaces for local adjustments. The framework provides a modular XML-based effect scheme to effectively build complex image processing chains-using these interactive filters as building blocks-that can be efficiently processed on mobile devices. Thereby, global and local parameterizations are directed with higher-level algorithmic support to ease the interactive editing process, which is demonstrated by state-of-the-art stylization effects, such as oil paint filtering and watercolor rendering.
Full-text available
The challenge of educational game design is to develop solutions that please as many players as possible, but are still educationally effective. How learning happens in games is methodologically very challenging to point out and thus it is usually avoided. In this paper we tackle this challenge with eye tracking method. The aim of this research is to study the meaning of cognitive feedback in educational games and evaluate the usefulness of eye tracking method in game based learning research and game design. Based on perceptual data we evaluated the playing behavior of 43 Finnish and Austrian children aged from 7 to 16. Four different games were used as test-beds. The results indicated that players’ perception patterns varied a lot and some players even missed relevant information during playing. The results showed that extraneous elements should be eliminated from the game world in order to avoid incidental processing in crucial moments. Animated content easily grasps player’s attention, which may disturb learning activities. Especially low performers and inattentive players have difficulties in distinguishing important and irrelevant content and tend to stick to salient elements no matter of their importance for a task. However, it is not reasonable to exclude all extraneous elements because it decreases engagement and immersion. Thus, balancing of extraneous and crucial elements is essential. Overall, the results showed that eye tracking can provide important information from game based learning process and game designs. However, we have to be careful when interpreting the perceptual data, because we cannot be sure if the player understands everything that he or she is paying attention to. Thus, eye tracking should be complemented with offline methods like retrospective interview that was successfully used in this research.
Spatial information acquisition happens in large part through the visual sense. Studying visual attention and its connection to cognitive processes has been the interest of many research efforts in spatial cognition over the years. Recent technological developments have led to an increasing popularity of eye-tracking methodology for investigating research questions related to spatial cognition, geographic information science (GIScience) and cartography. At the same time, eye trackers can nowadays be used as an input device for (cognitively engineered) user interfaces to geographic information. We provide an overview of the most recent literature advancing and utilizing eye-tracking methodology in these fields, introduce the research articles in this Special Issue, and discuss challenges and opportunities for future research.
Conference Paper
It is well known that eye movement tracking may reveal information about human intentions. Therefore, it seems that it would be easy to use gaze pointing as a replacement for other traditional human computer interaction modalities like e.g. mouse or trackball, especially when there are more and more affordable eye trackers available. However, it occurs that gaze contingent interfaces are often experienced as difficult and tedious by users. There are multiple reasons of these difficulties. First of all eye tracking requires prior calibration, which is unnatural for users. Secondly, gaze continent interfaces suffer from a so called Midas Touch problem, because it is difficult to detect a moment when a user wants to click a button or any other object on a screen. Eye pointing is also not as precise and accurate as e.g. mouse pointing. The paper presents problems concerned with gaze contingent interfaces and compares the usage of gaze, mouse and touchpad during a very simple shooting game.
Eye tracking has a long history in medical and psychological research as a tool for recording and studying human visual behavior. Real-time gaze-based text entry can also be a powerful means of communication and control for people with physical disabilities. Following recent technological advances and the advent of affordable eye trackers, there is a growing interest in pervasive attention-aware systems and interfaces that have the potential to revolutionize mainstream human-technology interaction. In this chapter, we provide an introduction to the state-of-the art in eye tracking technology and gaze estimation. We discuss challenges involved in using a perceptual organ, the eye, as an input modality. Examples of real life applications are reviewed, together with design solutions derived from research results. We also discuss how to match the user requirements and key features of different eye tracking systems to find the best system for each task and application.
Eye tracking has become a valuable approach to evaluate visualization techniques in a user centered design process. Apart from just relying on task accuracies and completion times, eye movements can additionally be recorded to later study visual task solution strategies and the cognitive workload of study participants. During an eye tracking experiment many data sets are recorded. Standard techniques to analyze this eye tracking data are heat map and scan path visualizations. However, it still requires a high effort to analyze scan path trajectory data to find common task solution strategies among the study participants. In this chapter we discuss three existing methodologies for analyzing the vast amount of eye tracking data from a visualization and visual analytics perspective. These three approaches are a classical static visualization, visual analytics techniques and finally a software prototype, which helps the user to manage, view and analyze the recorded data in a simple interactive way.
Operating a GUI through eyeball is a complex mechanism and not used as often as mouse or trackball. But there are situations where eye-mouse devices can play a tremendous role especially where the hands of the user are not available or busy to perform other activities. The difficulties of implementing an eye-cursor control system are many. The article illustrates some inventions on eye-cursor control system, which attempt to eliminate the difficulties of the prior art mechanisms.