Conference PaperPDF Available

Gaze-Guided Narratives: Adapting Audio Guide Content to Gaze in Virtual and Real Environments

Authors:

Abstract and Figures

Exploring a city panorama from a vantage point is a popular tourist activity. Typical audio guides that support this activity are limited by their lack of responsiveness to user behavior and by the difficulty of matching audio descriptions to the panorama. These limitations can inhibit the acquisition of information and negatively affect user experience. This paper proposes Gaze-Guided Narratives as a novel interaction concept that helps tourists find specific features in the panorama (gaze guidance) while adapting the audio content to what has been previously looked at (content adaptation). Results from a controlled study in a virtual environment (n=60) revealed that a system featuring both gaze guidance and content adaptation obtained better user experience, lower cognitive load, and led to better performance in a mapping task compared to a classic audio guide. A second study with tourists situated at a vantage point (n=16) further demonstrated the feasibility of this approach in the real world.
Content may be subject to copyright.
Gaze-Guided Narratives: Adapting Audio Guide
Content to Gaze in Virtual and Real Environments
Tiany C.K. Kwok
Institute of Cartography and
Geoinformation, ETH Zürich
Zürich, Switzerland
ckwok@ethz.ch
Peter Kiefer
Institute of Cartography and
Geoinformation, ETH Zürich
Zürich, Switzerland
pekiefer@ethz.ch
Victor R. Schinazi
Chair of Cognitive Science,
ETH Zürich
Zürich, Switzerland
scvictor@ethz.ch
Benjamin Adams
Department of Geography,
University of Canterbury
Christchurch, New Zealand
benjamin.adams@canterbury.ac.nz
Martin Raubal
Institute of Cartography and
Geoinformation, ETH Zürich
Zürich, Switzerland
mraubal@ethz.ch
Figure 1: A tourist using the Gaze-Guided Narrative system from a vantage point (red: example xation sequence with xation
locations (xations 1 - 9)). The system provides gaze guidance when the user does not nd the next object (xation 3), and
adapts the content to what has previously been looked at (xation 6).
ABSTRACT
Exploring a city panorama from a vantage point is a popular
tourist activity. Typical audio guides that support this activity
are limited by their lack of responsiveness to user behavior
and by the diculty of matching audio descriptions to the
panorama. These limitations can inhibit the acquisition of in-
formation and negatively aect user experience. This paper
proposes Gaze-Guided Narratives as a novel interaction con-
cept that helps tourists nd specic features in the panorama
(gaze guidance) while adapting the audio content to what has
been previously looked at (content adaptation). Results from
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from permissions@acm.org.
CHI 2019, May 4–9, 2019, Glasgow, Scotland Uk
©2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-5970-2/19/05. . . $15.00
https://doi.org/10.1145/3290605.3300721
a controlled study in a virtual environment (n=60) revealed
that a system featuring both gaze guidance and content adap-
tation obtained better user experience, lower cognitive load,
and led to better performance in a mapping task compared to
a classic audio guide. A second study with tourists situated at
a vantage point (n=16) further demonstrated the feasibility
of this approach in the real world.
CCS CONCEPTS
Human-centered computing HCI theory, concepts
and models; Usability testing; Field studies;
KEYWORDS
Gaze-Guided Narratives; Outdoor Eye Tracking; Tourist Guide
ACM Reference Format:
Tiany C.K. Kwok, Peter Kiefer, Victor R. Schinazi, Benjamin Adams,
and Martin Raubal. 2019. Gaze-Guided Narratives: Adapting Audio
Guide Content to Gaze in Virtual and Real Environments. In CHI
Conference on Human Factors in Computing Systems Proceedings
(CHI 2019), May 4–9, 2019, Glasgow, Scotland Uk. ACM, New York,
NY, USA, 12 pages. https://doi.org/10.1145/3290605.3300721
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 1
This is a pre-print of an article published in ACMDL. The final authenticated version
is available online at : https://doi.org/10.1145/3290605.3300721
1 INTRODUCTION
Tourist destinations around the world have fascinating sto-
ries to tell and interesting facts to be revealed. Not surpris-
ingly, tourism has always been among the early-adopting
domains for novel HCI technologies [1, 5, 12, 18].
Audio guides are often used for telling stories about tourist
destinations. Unlike visual displays, audio guides allow for an
unobtrusive user experience by not distracting the tourist’s
visual attention from the real world (see also [
17
]). In addi-
tion, audio can also be enriched to create immersive expe-
riences including interactive multi-narrative soundscapes
[42], 3D audio [6], or audio augmented reality [22].
A particular challenge of using audio, is that tourists must
be able to map descriptions (e.g., “the tower on your left”) to
the real world. This type of information mapping can lead
to misunderstandings (due to ambiguity, cultural biases, or
dierent conceptualizations of spatial descriptions) and to
tourists getting lost while exploring a city’s panorama. More-
over, classic audio guides are only capable of playing at a
specic pace and do not take into account the comprehension
speed of their users. This lack of customization often gen-
erates confusion and can lead to cumbersome interactions
that require stopping and rewinding the audio.
In this paper, we address these challenges with a gaze-
based interaction concept called Gaze-Guided Narratives.
Gaze-Guided Narratives adapt the audio content played to
tourists exploring a panorama from a vantage point based on
their real-time gaze. This technology is facilitated by recent
progress in pervasive outdoor eye tracking [
8
,
19
] that en-
ables gaze-based interactions with objects in the wild [
2
] and
developments in the smart glasses consumer market (e.g.,
the Pupil Labs eye tracking add-on for HoloLens [32]).
We believe that Gaze-Guided Narratives will help users
nd the reference objects (e.g., buildings) in a city panorama
(gaze guidance), while adapting the audio content to objects
that have been previously looked at (content adaptation). In
a controlled lab study (n=60), we investigated the inuence
of gaze guidance and content adaptation on user experience,
cognitive task load, content learning, and spatial learning
for three dierent panoramas projected on a virtual CAVE
(Cave Automatic Virtual Environment) environment. In a
second study (n=16), we tested the Gaze-Guided Narratives
interaction concept in the real world with tourists visiting a
popular vantage point in Zürich, Switzerland.
Our contributions are:
An implicit gaze-based interaction concept for audio
guides, applied to touristic panorama views,
An empirical evaluation of this interaction concept
through a study in a controlled lab environment with
60 participants,
A report of a real-world study demonstrating the fea-
sibility of Gaze-Guided Narratives with 16 tourists in
Zürich, Switzerland.
The paper is structured as follows. Section 2 positions our
paper with regards to related work. In section 3, we introduce
the Gaze-Guided Narratives concept and its implementation.
Sections 4 and 5 describe and present the results of the lab
and real-world studies. In section 6, we conclude the paper
and provide an outlook for future work.
2 RELATED WORK
Tourist Guides
Researchers often use tourism as an opportunity to inves-
tigate context-aware pervasive or ubiquitous computing
[
1
,
11
]. Much of the early work in this eld focused on
location-based services for tourism including the creation
of personalized city tours [
40
,
52
]. Instead of guiding users
through positioning technology, we investigate the interac-
tion between users and a city panorama from a xed loca-
tion. Augmented reality (AR) applications have become a
popular option for providing information about unknown
locations in a city with the help of hand-held displays [
35
]
or head-mounted displays [
16
]. However, researchers have
highlighted the importance of unobtrusive interaction with
nature [
17
]. Indeed, interacting with a display may separate
users from the panorama and diminish their user experience
(UX) by disconnecting them from the real world. Here, we
focus on human environment interactions without a display
that allow users to stay connected with the real world as
they engage with a task or activity.
Pervasive Eye Tracking and Eye-Based Interaction
Eye-based interaction is commonly used in HCI because the
user’s gaze can provide fast, natural, and intuitive ways to
interact with dierent stimuli [
36
]. New developments in eye
tracking technology have motivated researchers to work on
pervasive eye tracking [
8
] in indoor [
49
] and outdoor [
19
]
environments. Pervasive eye tracking has also become part
of everyday life, found in gaze controlled computer games
[
53
] and fatigue detection systems in cars [
15
]. Pervasive eye
tracking can be conducted with web cams [
41
], smart-phone
cameras [
24
], head-mounted eye trackers [
26
], with cameras
installed in the environment [
49
], or below public displays
[
27
]. Mobile eye trackers (such as the one used here) also
enable hands-free interaction that allows users to focus on
their surroundings. For example, Museum Guide 2.0 [
55
]
enabled explicit gaze-based interaction with real 3D objects
in an indoor space. Using Museum Guide 2.0 users could
trigger events, which introduced the objects the users xated
on. Mobile eye tracking has also been used for studies on the
way tourists explore (non-interactively) a city panorama [
28
]
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 2
as well as interact with a tourist map [
44
]. However, these
systems did not allow users to interact directly with real
3D objects. To the best of our knowledge, there is no prior
research on gaze-based interaction with a city panorama in
the real world.
Dierent types of eye events, such as saccades, xations
[
31
], or smooth pursuits [
50
] can be used for eye-based inter-
action. Here, we use xation data in order to determine the
interaction. An interaction can be designed as either explicit
or implicit [
51
]. With explicit interaction, users intentionally
trigger the interaction. During implicit interaction, the sys-
tem interprets the user’s regular interaction behavior on a
higher level (e.g., in terms of activities [
9
]). We propose an
implicit gaze-based interaction concept that adapts content
based on previous visual attention and interprets user’s gaze
in a wrong position as a failing visual search.
Gaze Guidance
The system we propose takes the role of a guide that helps the
user nd objects in the environment. Dierent approaches
for gaze guidance have been suggested. Krejtz et al. [
29
]
demonstrated that verbal audio descriptions can be used to
guide a person’s gaze to a target. Gaze guidance has also been
combined with vibrotactile feedback [
25
], visual feedback in
AR [
45
], and non-verbal auditory feedback [
34
]. In addition,
subconscious approaches to gaze guidance have been sug-
gested that use image modulations in the user’s periphery
[
4
,
37
]. In this research, we use verbal audio descriptions, but
make them dependent on the user’s current gaze position.
Since the interpretations of verbal descriptions of spatial
scenes often depend on the context [
13
], we conducted a
pre-study (see section 3) to determine the parameters for our
verbal gaze guidance.
Interactive Narratives and Spatial Narratives
The eld of narratology has a history of investigating the
structure of literary stories. Over the last decade, narratives
and stories are increasingly being told through a variety of
media. Indeed, the introduction of interactive narratives has
led to the development of formal computational models of
narrative [
10
]. Azaryahu and Foote [
3
] examined the way
historical spaces can operate as a medium for telling spatial
narratives. Looking at existing spatial narratives at historical
touristic sites, these authors discuss the way the structure
of narratives can vary from linear sequences in time and
space to more complex non-linear congurations and con-
gurations based on themes and sub-themes that operate
over space and time. Although it is easier to construct linear
(chronological) stories that maintain a dramatic arc, location-
aware narrative systems can introduce non-linear structures
depending on the amount of agency that users have to direct
the storyline [14].
3 GAZE-GUIDED NARRATIVES
Concept
Traditional audio guides follow an information push para-
digm. While many tourists appreciate the positive side of
getting carefully selected content in a well-designed narra-
tive sequence, traditional audio guides are somewhat inexi-
ble. This can be problematic since the sequence of described
objects is xed and the audio speed does not often match
the speed of comprehension and the activities taking place
in the real world.
With the Gaze-Guided Narratives concept, we aim at di-
minishing some of the negative aspects of traditional audio
guides. Gaze-Guided Narratives is based on two features:
1) Gaze Guidance: The system interactively helps the user
nd the objects of interest and only starts playing the con-
tent when the object has been found. Here, gaze guidance
is provided together with directional cues (see section 2) in
the form of verbal audio descriptions such as “building A is
far left of what you are looking at” (refer to Figure 1, xation
3). By providing gaze guidance along with directional cues,
we prevent a break between the content parts and the guid-
ance parts of the system while limiting the engagement of
additional modalities (e.g., haptics) during interaction.
2) Content Adaptation: The system ensures exibility re-
garding the sequence of objects described while maintaining
the qualities of a good narrative by adapting the content to
previously inspected objects (measured with eye tracking;
refer to Figure 1, xation 6). Content adaptations are made
based on previously modeled relations between objects. In
the context of tourism, we consider temporal, spatial, and
thematic relations. For instance, if building B has been looked
at at some point before building A, the system might tell that
A has been built before B (temporal), A is located left of B (spa-
tial), or A and B have the same architectural style (thematic).
These kinds of adaptations should improve learning about
the content and about the position of objects.
Implementation
Mapping Gaze to Objects in the Panorama. The system
must be able to determine which object (e.g., building) in the
panorama is being looked at. Our implementation is based
on that suggested by [
2
]. A reference image is manually an-
notated with Areas of Interest (AOIs) around buildings that
are relevant for the tourist guide. Geometric information of
those AOIs is stored in the JSON format [
2
]. Objects are de-
tected by matching the front-facing video of a head-mounted
eye tracker to the annotated reference image using an ORB
[
46
] feature detector. Fixations are calculated with the I-DT
method [
48
] (thresholds: dispersion 1
, min duration 200ms).
When a xation in an AOI occurs an event is triggered and
processed by the interaction logics. Apart from being robust
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 3
and fast for detection, this approach has been designed for
stationary observers, such as tourists in our scenario.
Interaction Logics. Verbal directional instructions are pro-
vided by the system to help users identify buildings. Instruc-
tions are relative to the user’s current gaze (i.e., left, far
left, right, far right, in front of, behind of) and are based on
the thresholds determined by a pre-study (described in the
next sub-section). A minimum time interval between each
two consecutive instructions is needed in order to provide
constant feedback to the user while ensuring that the user
has sucient time for interpreting and reacting to the in-
struction. We chose 4 seconds based on previous research
on tactile navigation [
43
]. When the user successfully lo-
cates the objects in the panorama, the system starts to play
the audio information. Based on the history of interaction,
the system then decides which audio le (i.e., information
with/without content adaptation) to play. While the audio
content is playing, the user can explore the view without
interruption.
Pre-study: Determining Parameters for Gaze Guidance. We
performed a pre-study with 8 subjects in the CAVE to de-
termine how participants interpret the verbal descriptions
of directions used for gaze guidance when viewing a city
panorama. Using the Zürich panorama as stimulus, partici-
pants were asked to xate on a given starting location and
then to xate (head movements were allowed) on an ending
location of their choice based on their interpretation of a
directional instruction provided by the experimenter. Partic-
ipants were tested on four directional instructions (left, far
left, right, far right) with three starting locations (the center,
the left, and the right of the city view). The vertical dimen-
sion (in front of, behind of) was not included in the pre-study.
Table 1 presents the results of the saccade distances, which
were then used to calculate the thresholds (mean plus one
standard deviation) for the directional instructions used in
the studies (see Table 2).
4 STUDY 1: CONTROLLED LAB STUDY
The goal of the rst study was to evaluate the two main
features (i.e., gaze guidance and content adaptation) of the
Gaze-Guided Narratives system in relation to traditional au-
dio guides. We designed four conditions in order to test the
individual and interaction eects of these features (see Table
3). In condition A, the traditional audio guide serves as base-
line. Conditions B and C include either the gaze guidance or
the content adaptation features. Condition D simultaneously
provides both gaze guidance and content adaptation.
This study focuses on the following research questions:
RQ1
: Does gaze guidance help participants identify and locate
buildings?
RQ2
: Do Gaze-Guided Narratives improve the ac-
quisition of information?
RQ3
: Do Gaze-Guided Narratives
enhance system usability and reduce cognitive load?
RQ4
: Do
Gaze-Guided Narratives enhance user experience (UX)?
Methodology
Participants. Sixty participants (28 females) were recruited
for the study. The age of the participants ranged from 18 to
55 years (M = 23.3, SD = 6.4). All participants were native
German speakers and were not tourists at the time of the
experiment. Participants were recruited via the DeSciL (De-
cision Science Lab) participant recruitment platform of ETH
Zürich and were required to have normal or corrected to
normal vision (with contact lenses) to participate.
Ethics statement. Written informed consent was obtained
from all participants prior to starting the experiment. The
participants were paid 30 CHF per hour and were told that
they were allowed to abort the experiment at any time.
Materials. The experiment was conducted in a controlled
CAVE environment with three large projection walls. A seat
was placed at the center of the room at a distance of 1.8
meters from the front facing projection wall. The lights were
turned o during the experiment. Participants wore SMI
Eye Tracking Glasses (120Hz) and a Bluetooth SONY MDR-
ZX770BN headphone during the experiment. The software
modules provided by the eye tracking vendor were used for
calibration and for recording the front-facing video image
frames and raw gaze data.
Table 1: Pre-study for gaze guidance: Mean and stan-
dard deviations for saccade distances in visual angles
for directional instructions.
Directional Instruction
Far Left Left Right Far Right
Starting Location
Left Not tested 24.51
(−6.95)
42.66
(25.22)
62.43
(21.56)
Center 55.44
(−14.64)
39.45
(−5.20)
38.76
(6.35)
58.47
(18.82)
Right 61.53
(−13.85)
44.92
(−17.00)
33.66
(7.57)Not tested
Table 2: Thresholds used for directional instructions
in gaze guidance.
Directional Instruction
Far Left Left Right Far Right
Starting Location
Left Not Used <00to
67.88>67.88
Center <44.6544.65
to 0
0to
45.11>45.11
Right <61.9261.92
to 0>0Not Used
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 4
Table 3: The four conditions considered in the studies
(Study 1: all conditions; Study 2: A and D).
Content Adaptation
Without With
Without
Condition A
Classic audio guide
(baseline)
Condition B
Adaptive narrative
Gaze Guidance
With
Condition C
Gaze-guiding
audio guide
Condition D
Gaze-Guided
Narrative
Figure 2: Number of correctly answered content-related
questions in study 1 (±standard error)
We used panoramas from three dierent cities in order
to vary the spatial structure and the types of buildings. The
panoramas included a view from the “Tokyo Tower” in Tokyo,
the “Top of the Rock” in New York, and the “Lindenhof” in
Zürich. In order to account for dierences in preferences
and experience, we selected dierent themes for the audio
content of the Tokyo (leisure and religion), New York (lm-
ing locations of famous movies) and Zürich (architecture)
panoramas.
For each panorama, we selected four dierent buildings as
target tourist attractions. The AOIs around these buildings
were dened by polygons bounding the buildings without
buer. To ensure better control of the experiment, the se-
quence of AOIs was xed and randomly generated before the
creation of the audio content. The content for the dierent
AOIs was created based on the ocial web-pages of each
building (if available) and Wikipedia [
57
]. The content was
chosen such that the audios for dierent AOIs were similar in
length (approximately 1 minute). The scripts were originally
written in English and translated into German. Audio les
were generated by an online text-to-speech reader [
38
] and
played through the headphone.
Participants completed the Santa Barbara Sense of Di-
rection Scale (SBSODS) [
21
]. The SBSODS is a self-report
measure of sense of direction ranging from zero (low) to
seven (high). In order to measure cognitive load and system
usability, participants completed the NASA Task Load Index
(NASA-TLX) [
20
] and the System Usability Scale (SUS) [
7
],
respectively. The scores for the NASA-TLX and SUS scores
were calculated on a scale ranging from zero to one hundred.
Participants were also asked to complete a 6 factor User
Experience Questionnaire (UEQ, [
33
]). This questionnaire
measured the attractiveness (i.e., overall impression of the
system), perspicuity (i.e., how easy is it to get familiar with
the system), eciency, dependability (i.e., whether users feel
in control during the interaction), stimulation (i.e., if the user
is excited and motivated to use the product) and novelty
(i.e., the level of creativeness). Items in UEQ range from
-3
to +3 with negative ratings representing a negative user
experience.
At the end of each trial, participants completed a ques-
tionnaire with one multiple choice (e.g., “According to the
audio, which of the following is the newest building?”) and
six “true/false or unknown” (e.g., “Does Roppongi Hills Mori
Tower have a museum in it?”) content-related questions
specic to each panorama. Participants also completed a
questionnaire (7-point Likert scale each) about their famil-
iarity with the panorama and overall system helpfulness.
Specically, participants were asked whether the system
was helpful in identifying the buildings, their familiarity
with the cities presented in the experiment, their familiarity
with the panorama they interacted with during the experi-
ment and their familiarity with the content presented by the
audio guide. All questionnaires were completed via paper
and pencil.
Procedure. Upon arrival, participants were given an infor-
mation sheet that described the experiment and were asked
to sign the consent form. Participants then completed a short
demographic questionnaire and the SBSODS before being
asked to sit on the chair at the center of the CAVE. At this
stage, the experimenter helped participants to adjust the eye
tracker and headphones to their head. The eye tracker was
3-point calibrated at the beginning of each trial.
On each trial, the system started by familiarizing the par-
ticipant with the sound eects for the audio transitions and
the target buildings. For all conditions, the system rst di-
rected the participant’s gaze location to the target building
with one initial instruction (see row “a” in Table 4). When
transitioning from one building to another, exactly one de-
scription of the relative location to the previous building
was provided (see row “d” in Table 4). Only the system with
gaze guidance provided additional verbal feedback until the
participant successfully located the building (refer to rows
“b” and “e” in Table 4).
At the end of each trial, participants completed the UEQ,
the content-related questions, and the questionnaire about
familiarity and system helpfulness. In addition, participants
were asked to draw a sketch map within an empty drawing
canvas on a tablet. The canvas size had the same resolution
as the image of the target environment. Subjects had to draw
rectangles based on their memory for building locations. A
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 5
Table 4: Example for audio provided (with gaze guid-
ance) and user actions for the Tokyo panorama.
System User
a
Now you can spend some time to
look around. In the following part,
you will hear information about
the panorama of Tokyo... [...]... Now
please look straight ahead. Zojoji
Temple is on your right hand side.
Explores the panoramas
freely and then tries to
locate Zojoji Temple
b
It is on the right of what you are
looking at.... [ more gaze-guidance
until AOI found ]
Searches and nds Zo-
joji Temple
c
[sound eect for target found] Zo-
joji Temple .. [ content ]
Listens to the content
d
We have now nished talking about
Zojoji Temple. Now please look at
Reiyukai Temple. You will nd it to
the far left of the Zojoji Temple.
Follows the instruction
and search for the next
building
e
It is on the right of what you are
looking at.... [ more gaze-guidance
until AOI found ]
Searches and nds
Reiyukai Temple
f .........
Continues to use the
system
g
This is the end of this part, please
ll in the questionnaire provided.
Fills in post-session
questionnaire
list with building names was provided during the sketch map
task. After the last trial, participants lled in the NASA-TLX
and the SUS questionnaires.
Study Design and Analysis. We adopted a between-subject
design with participants randomly assigned to one of the
four experimental conditions. Each participant completed
three trials (one for each city panorama) in a pseudo-random
order leading to 180 trials in total. The experiment required
approximately 45 minutes to be completed. All statistical
analyses were performed with SPSS.
Measurements. We were interested in how long it takes
for participants to identify the building after the audio of
the previous building has nished. The commonly used eye
tracking measure time to rst xation [
23
] is not suitable
in this case because the participant might have a random
xation on the building while still performing the search. In-
stead, we attempted to nd the rst xation on the target AOI
which falls into a phase of focal attention. We distinguished
phases of focal and ambient attention using coecient
K
[
30
], which is calculated by subtracting the standardized
(z-score) xation duration from the standardized amplitude
of the subsequent saccade. The Time to Object Identication
(TTOI) was dened as the time between the end of the last
instruction (“d” in Table 4) and the rst xation on the target
AOI in a phase of focal attention.
Sketch maps were analyzed using bidimensional regres-
sion (BDR) [
54
]. BDR quanties the relationship between two
sets of coordinates. The returned squared regression coe-
cient
R2
can be used as a similarity measure between the 2D
conguration on the sketch map and that of the panorama
image.
Results
Familiarity and Sense of Direction. Participants were more
familiar with the Zürich panorama and its contents played
through audio guide compared to New York and Tokyo. A
one-way repeated-measures ANOVA revealed signicant dif-
ferences in the level of familiarity with the three panoramas
(F2,112 =
232
.
630
,p< .
001
)
. Post-hoc tests with Bonferroni
correction further indicated that participants were more fa-
miliar with the Zürich panorama
(
6
.
9
±
0
.
8
)
compared to the
New York
(
3
.
4
±
1
.
9
,p< .
001
)
and the Tokyo
(
1
.
6
±
1
.
4
,p<
.
001
)
panoramas. Participants were also more familiar with
the New York panorama compared to Tokyo
(p< .
001
)
. A
one-way repeated-measures ANOVA also revealed signif-
icant dierences in familiarity with the audio contents in
dierent panoramas
(F2,112 =
55
.
839,
p< .
001
)
. Here again,
post-hoc tests with Bonferroni correction showed that par-
ticipants were more familiar with Zürich
(
3
.
3
±
1
.
5
)
than
New York
(
2
.
6
±
1
.
5
,p=.
007
)
and Tokyo
(
1
.
4
±
0
.
9
,p< .
001
)
.
Participants were also more familiar with the audio contents
of New York compared to the contents of Tokyo (p< .001).
With regards to sense of direction, the mean SBSODS
for all participants was 4.81 (SD = 0.83 ). Results from a
one-way ANOVA revealed no signicant dierences in the
self-reported sense of direction for participants in the four
experimental conditions (F3,56 =.492,p=.689).
Content-Related estions. After each trial, participants
answered seven content-related questions. The total number
of correctly answered questions in each trial was counted
(see Figure 2). We conducted a 3 (city) x 4 (condition) mixed
factorial ANOVA on the number of total correct answers
across trials. Results revealed a signicant main eect for
city
(F2,112 =
18
.
26
,p< .
001
)
. Pairwise comparisons with
Bonferroni correction further revealed that the average num-
ber of correct answers in Zürich (3.1
±
1.4) and New York
(3.6
±
1.6) was lower compared to Tokyo (4.6
±
1.6). There
were no other main eects or interactions (p> .59).
Sketch Maps. A 3 (city) x 4 (condition) mixed factorial
ANOVA on the results of the BDR (see example of a sketch
map in Figure 3) revealed that there was a signicant main ef-
fect of city
(F2,112 =
28
.
793
,p< .
001
)
and condition
(F3,56 =
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 6
3
.
852
,p< .
014
)
. In addition, there was a signicant interac-
tion between city and condition
(F6,112 =
3
.
633
,p=.
015
)
.
Additional pairwise contrasts with Bonferroni correction re-
vealed that these dierences were driven by the interaction
between condition A (0.438
±
0.416) and D (0.829
±
0.310)
for the Tokyo panorama (p=.036).
User Experience Ratings. The UEQ results are illustrated
in Figure 4. In general, a trend can be observed in which
the conditions with gaze guidance (C and D) were rated
higher compared to the conditions without gaze guidance
(A and B). We conducted a 3 (city) by 4 (condition) mixed
ANOVA for each of the six factors in the UEQ question-
naire. Results revealed a signicant main eect for city in
terms of attractiveness
(F2,112 =
4
.
185
,p=.
018
)
, perspicuity
(F1.81,101.47 =
4
.
185
,p< .
001
)
, eciency
(F2,112 =
7
.
331
,p=
.
001
)
, dependability
(F2,112 =
17
.
240
,p< .
001
)
and nov-
elty
(F1.70,95.34 =
6
.
876
,p=.
002
)
. The between-subjects
test revealed a signicant main eect of condition for per-
spicuity
(F3,56 =
5
.
013
,p=.
004
)
and dependability
(F3,56 =
4
.
070
,p=.
011
)
. Post-hoc tests with Bonferroni correction
indicated that condition D had higher scores in perspicu-
ity compared to conditions A
(p=.
033
)
and B
(p=.
037
)
.
Condition D also had higher scores in controllability when
compared to condition B
(p=.
024
)
. Results of the ANOVA
also revealed a signicant condition by city interaction for
eciency
(F6,112 =
2
.
208
,p=.
047
)
and novelty
(F5.11,95.34 =
2
.
364
,p=.
035
)
. No signicant dierence was found in the
post-hoc test analysis with Bonferroni correction.
System Usability and Cognitive Load. A one-way ANOVA
comparing the four conditions in terms of system usability
(SUS) revealed no signicant dierences between conditions
(see Figure 5). With regards to cognitive load (NASA-TLX),
results of a one-way ANOVA revealed a signicant dierence
between conditions
(F3,56 =
3
.
752
,p=.
016
)
. Post-hoc tests
with Bonferroni correction revealed that condition A had
higher scores in cognitive load compared to condition D
(p=.
035
)
. No signicant dierences were found between
the other pairs of conditions.
Helpfulness in Identifying Building. Participants were asked
in the post-trial questionnaire whether the system helped
Figure 3: A sketch map drawn by a participant in study 1
(condition D, Zürich).
them to identify the buildings they were looking for. Results
(see Figure 6) from a 3 (city) x 4 (condition) mixed ANOVA
revealed a main eect of city
(F2,112 =
35
.
049
,p< .
001
)
and
condition
(F3,56 =
9
.
481
,p< .
001
)
. Results also revealed a
signicant condition by city interaction
(F6,112 =
5
.
310
,p<
.
001
)
. Post-hoc tests with Bonferroni correction further in-
dicated signicant dierences in the Tokyo panorama be-
tween conditions A and C
(p< .
001
)
, conditions A and D
(p< .
001
)
, conditions B and C
(p< .
001
)
, and conditions
B and D
(p=.
023
)
. Signicant pairwise dierences were
also found for the New York panorama between conditions
A and C
(p=.
014
)
, and conditions A and D
(p=.
045
)
. No
signicant dierences between conditions were found for
the Zürich panorama.
Average Time to Object Identification. Figure 7 presents
the average TTOI for the dierent conditions and cities. In
general, conditions with gaze guidance tended to have longer
TTOI than conditions without gaze guidance (i.e., condition
C vs A and D vs B). Meanwhile, conditions with content
adaptation also tended to have longer TTOI than that without
content adaptation (i.e., condition B vs A and D vs C).
A 3 (city) x 4 (condition) mixed ANOVA revealed a main ef-
fect of city
(F2,112 =
11
.
799
,p< .
001
)
and condition
(F3,56 =
3
.
572
,p=.
020
)
. Additional pairwise contrasts with Bonfer-
roni correction revealed signicant dierences between New
York (13.94 s
±
1.232 s) and Zürich (9.163 s
±
0.879 s, p < .001),
and New York and Tokyo (9.97 s
±
0.95 s, p = .002). Signicant
pairwise dierences were also found between conditions D
(14.94 s ±1.67 s) and A (7.40 s ±1.67 s, p = .004).
Discussion
Results from the experiment provide interesting insights on
the interaction with Gaze-Guided Narratives. Critically, we
found that:
RQ1
: Does gaze guidance help participants identify and
locate buildings? As expected, the conditions with gaze guid-
ance (C and D) were rated as more helpful than conditions
without (A and B). This result indicates that our approach
to gaze guidance was accepted by users. Surprisingly, these
eects were not apparent for the Zürich panorama.
Results of the eye tracking/TTOI analysis provide a more
comprehensive understanding of the helpfulness of gaze
guidance. Given that conditions with gaze guidance (C and D)
require additional time for providing directional instructions,
we did not expect to nd dierences in terms of TTOI when
compared to conditions without gaze guidance (A and B).
Findings were consistent with this expectation and revealed
that it took approximately 7 seconds longer to identify the
object in condition D compared to condition A.
Overall, gaze guidance appears to be more helpful in guid-
ing users to identify buildings, albeit slower. In the context
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 7
Figure 4: Comparison of UEQ factors between dierent conditions. Error bars indicate Standard Error, * signicance.
of tourism, we believe that this delay is acceptable as tourists
exploring a panorama are typically not under time pressure.
RQ2
: Do Gaze-Guided Narratives improve the acquisition of
information? Results from this study show that the number
of correctly answered content-related questions did not sig-
nicantly dier between dierent conditions. Interestingly,
we found dierences in correct answers between cities. Here,
participants were better in answering these questions for
Tokyo compared to New York and Zürich. We believe that
these dierences may be related to participants paying more
attention to panoramas they were the least familiar with.
Results for the sketch map task revealed that participants
were more accurate in drawing a sketch map of Zürich, fol-
lowed by New York and then by Tokyo. This result was
expected given their familiarity with these cities. However,
sketch map accuracy was also signicantly higher for con-
dition D than condition A for the Tokyo panorama. This
result is particularly interesting since it shows that Gaze-
Guided Narratives is capable of assisting participants in the
acquisition of spatial knowledge in unfamiliar cities.
Taken together, the results for the content-related ques-
tions and sketch maps, it seems that Gaze-Guided Narratives
can be particularly helpful in unfamiliar panoramas – which
is a common case in tourism.
RQ3
: Do Gaze-Guided Narratives enhance system usability
and reduce cognitive load? With gaze guidance, users were
expected to have lower workload during search and more
attention for listening to the audio guides. System usabil-
ity was thus expected to improve by adding gaze guidance.
Surprisingly, results revealed no signicant dierences in
system usability between the four conditions. One reason
for this may be that most of the participants complained that
the text-to-speech voice used in the audio was unnatural.
Since all conditions adopted the same text-to-speech, we
suspect that negative eects caused by this limitation may
counteract the potential positive eects of gaze guidance and
content adaptation. In order to further investigate this issue,
we used a native speaker to record the audios for the study
in the real world (study 2, see section 5).
Although gaze guidance and content adaptation did not
bring signicant improvements in terms of system usability,
they were capable of reducing the cognitive load when both
features were present (condition D). This is an important
nding since low cognitive load leaves free capacity that
can be allocated for other tasks including the acquisition
of information (see above, RQ2). Moreover, low cognitive
workload means less fatigue and may allow tourists to stay
attentive for longer periods of time.
RQ4
: Do Gaze-Guided Narratives enhance user experience
(UX)?. Results revealed that condition D was easier to get
familiar with than conditions A and B (perspicuity), and that
it provided more controllability than condition B (depend-
ability). The result on dependability was expected because
gaze guidance does not overwhelm the user by pushing in-
formation in an uncontrolled speed.
A general trend can be observed in which condition D
obtained higher scores compared to the baseline (condition
A) in all six factors of the UEQ. However, no signicant dif-
ferences were found between the conditions in terms of at-
tractiveness, eciency, stimulation and novelty. Here again,
this may be related to the quality of the text-to-speech voice
in diminishing the positive eects brought on by the gaze
guidance and content adaptation features of the system.
5 STUDY 2: REAL WORLD STUDY
We performed a study with tourists in order to investigate
the feasibility of Gaze-Guided Narratives while exploring a
city panorama in the real world (
RQ5
). In this experiment
we focused solely on the classic audio guide (condition A)
and the full Gaze-Guided Narratives system (condition D).
Methodology
Participants. Sixteen participants (8 females, M = 30.1, SD
= 6.5, range 22-44) with normal or corrected to normal vision
(with contact lenses) were recruited. All participants were
native Chinese speaking tourists that were visiting Zürich for
the rst time and were not familiar with the testing site prior
to the experiment. The average time spent by participants in
Zürich prior to testing was 2.1 days (SD = 1.4; range 1 - 6).
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 8
Figure 5: Average SUS and NASA-TLX scores in dierent con-
ditions (±standard error).
Figure 6: Average scores of helpfulness in identifying build-
ing in study 1 (±standard error), * signicance.
Figure 7: Average TTOI in dierent conditions and cities.
The decision to switch the language between study 1 and 2
was based on a 5-day observation at the experiment location
that revealed that native Chinese speakers made up a large
proportion of tourists arriving in small groups without a
tourist guide.
Study Setup. The experiment was conducted at the Lin-
denhof in Zürich which provided participants with the same
view as used in the Zürich stimulus (panorama) in study
1. Study 2 also used the same hardware and software for
calibration and data collection as in study 1.
Materials. All materials from study 1 relevant for the Zürich
panorama were translated into Chinese. Based on the feed-
back we received in study 1, the audio guides were recorded
by a Chinese native speaker.
Procedure. Participants were recruited on site by the ex-
perimenter before they had a chance to inspect the vantage
point. Potential participants were greeted and given a short
explanation regarding the aim and procedure of the experi-
ment. All participants completed the informed consent form
prior to the start of the experiment. They were paid 20 CHF
and were allowed to abort the experiment at any time.
The experiment procedure was similar to study 1 (see sec-
tion 4) except that participants completed only one trial. In
addition, the pre-study questionnaire included a question
about the length of stay in Zürich, and the post-study ques-
tionnaire contained an additional question on whether the
participants were interested in the content they listened to.
Study Design, Analysis and Measurements. We adopted a
between subject design with participants randomly assigned
to either condition A or D. Participants from both groups
were exposed to the real-world panorama and required ap-
proximately 25 minutes to complete the experiment. All sta-
tistical analyses were performed with SPSS. Similar to exper-
iment 1, participants completed the SBSODS, SUS and the
NASA-TLX. Participants also answered content questions,
familiarity questions and were asked to draw a sketch map.
Sketch maps were analyzed using bidimensional regression
and the average TTOI was calculated from the gaze data.
Results
Familiarity and Sense of Direction. Separate one-way ANO-
VAs were conducted to investigate dierences between par-
ticipants in conditions A and D in terms of their familiarity
with the city of Zürich, their familiarity with the contents
of the view, their level of interest with the contents in the
view and their self-reported sense of direction (SBSDOS).
Results revealed that participants in these two conditions
did not dier in terms of familiarity with the city of Zürich
(F1,14 =
1
.
960
,p=.
183
)
, familiarity with the contents of the
view
(F1,14 =
2
.
483
,p=.
137
)
, their interest in the contents
of the view
(F1,14 =.
360
,p=.
558
)
and self-reported sense
of direction (F1,14 =.381,p=.547).
User Experience Ratings. Table 5 presents results of the
UEQ questionnaire. In general, a trend can be observed in
which condition D obtains higher scores than condition A
in all six user experience factors.
System Usability and Cognitive Load. The mean and stan-
dard deviation of SUS and NASA-TLX scores are presented
in Table 6. The average SUS score for condition D
(M=
82
.
81
,SD =
12
.
06
)
is almost 15% higher than that of condi-
tion A
(M=
67
.
19
,SD =
30
.
46
)
. The large standard deviation
in the average SUS scores for condition A is related to an
outlier who rated the SUS scores as 10.
Content Related estions and Sketch Map. Similar to study
1, participants answered seven content-related questions and
had to drew a sketch map without looking at the panorama.
The number of correctly answered content-related questions
and the
R2
values returned by BDR are presented in Table 6.
Participants in condition D answered a higher number (4.13
questions) of correct questions and were more accurate (
R2
:
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 9
Table 5: Mean scores and standard deviations of UEQ
factors in study 2 (-3/+3 = neg./pos. experience)
Condition A Condition D
M (SD) M (SD)
Attractiveness 1.46 (0.77) 2.17 (0.87)
Perspicuity 1.72 (1.08) 2.19 (1.16)
Eciency 1.03 (0.69) 1.88 (0.76)
Dependability 1.41 (1.03) 1.88 (1.16)
Stimulation 0.84 (0.74) 1.91 (0.95)
Novelty 0.94 (1.18) 2.19 (0.97)
Table 6: Mean and standard deviations of SUS (0 - 100),
NASA-TLX scores (0 - 100), number of correct answers
(0 - 7), BDR (R2)(0 - 1) and TTOI (in sec) in study 2
Condition A Condition D
M (SD) M (SD)
SUS 67.19 (30.46) 82.81 (12.06)
NASA-TLX 40.83 (15.77) 25.42 (12.92)
Number of Correct Answers
2.86 (1.36) 4.13 (2.10)
BDR (R2) 0.88 (0.16) 0.96 (0.04)
Average TTOI 13.02 (17.87) 19.85 (14.37)
0.96) in drawing sketch maps compared to participants in
condition A (2.86 questions and R2: 0.88).
Average Time to Object Identification. Results show that
participants in condition A were faster (13.02 sec) compared
to participants in condition D (19.85 sec) (see Table 6).
Open Comments from Participants. After the experiment,
some participants expressed their positive attitude towards
Gaze Guided Narratives by asking when it will be publicly
available. Some of them suggested that a slower audio speed
might have helped them capture important information.
Discussion
RQ5
: Is it feasible to use Gaze-Guided Narratives in the real
world? Comparing the results on system usability between
the two studies, we nd that the system with Gaze-Guided
Narratives (condition D) obtained an average SUS score of
82.81 (SD = 12.06) in the real world and 69.33 (SD = 12.31) in
the CAVE (see section 4). An average SUS score above 80 has
been suggested to be considered as “pretty good”, while a
score of 66 could be considered as average [
56
]. One possible
explanation for this dierence could be the higher degree of
immersion in the real world. It is also possible that replacing
text-to-speech with a voice recording had an inuence on
system usability. Overall, the SUS score results indicate that
participants were capable of using the system in the real
world without loss in system usability.
Inferential statistics were not conducted in study 2 be-
cause of the limited number of participants (16 participants).
However, results for the NASA-TLX, UEQ and TTOI are
consistent with those obtained in study 1. Together, these
ndings suggest that applications of Gaze-Guided Narratives
are not limited to virtual environments. Interestingly, study
2 obtained higher scores for all six factors in UEQ, for both
condition A and D. This general trend may also be caused
by the introduction of a more natural voice in the audio.
Missing Gaze Data. The challenge of achieving a high
tracking quality has always been considered as a core re-
search question in pervasive eye tracking [
8
] and eye track-
ing “in the wild” [
19
]. Although the tracking quality was
not our main research focus and often determined by the
commercial eye tracking hardware, the feasibility of using
our system in the real world (RQ5) may be nevertheless in-
uenced by it. Sunlight can disturb the reective properties
of the infrared light used by the eye tracker. In the CAVE en-
vironment with controlled lighting, the average percentage
of missing gaze data for 180 trials was 2.14%, which is 9.66%
less than that in the outdoor environment for 16 trials of
data (11.8%). Although the percentage of missing gaze data
in the real world was larger than that in a controlled envi-
ronment, this did not seem to aect the SUS scores, and no
participant complained about interruptions or unresponsive
system behavior.
6 CONCLUSION AND OUTLOOK
We have proposed Gaze-Guided Narratives as an implicit
gaze-based interaction concept for audio guides, which is
particularly suited for touristic panorama views. The concept
was evaluated through an empirical controlled lab study with
60 participants. Results revealed that the Gaze-Guided Nar-
ratives system obtained better UX, lower cognitive load, and
better performance in a mapping task compared to a classic
audio guide. We demonstrated the feasibility of our approach
“in the wild” with a real-world study with 16 tourists.
Although the results of our study are promising, they rep-
resent only the start of gaze-based interaction for tourist
assistance. Next steps include gaze-based tourist recommen-
dations, interest detection, and a support of touristic activi-
ties beyond panorama exploration including waynding or
shopping. These directions will be accelerated by further de-
velopments and pervasive eye tracking technology. Finally,
Gaze-Guided Narratives could be applied beyond tourism
scenarios, such as for mobile learning [
47
] of place-related
content or for the creation of dynamic story lines in location-
based games [39].
ACKNOWLEDGMENTS
This work is supported by an ETH Zürich Research Grant
[ETH-38 14-2].
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 10
REFERENCES
[1]
Gregory D Abowd, Christopher G Atkeson, Jason Hong, Sue Long,
Rob Kooper, and Mike Pinkerton. 1997. Cyberguide: A mobile context-
aware tour guide. Wireless networks 3, 5 (1997), 421–433.
[2]
Vasileios-Athanasios Anagnostopoulos, Michal Havlena, Peter Kiefer,
Ioannis Giannopoulos, Konrad Schindler, and Martin Raubal. 2017.
Gaze-Informed Location Based Services. International Journal of
Geographical Information Science 31, 9 (2017), 1770–1797. https:
//doi.org/10.1080/13658816.2017.1334896
[3]
Maoz Azaryahu and Kenneth E. Foote. 2008. Historical space as nar-
rative medium: on the conguration of spatial narratives of time
at historical sites. GeoJournal 73, 3 (Nov 2008), 179–194. https:
//doi.org/10.1007/s10708-008- 9202-4
[4]
Reynold Bailey, Ann McNamara, Nisha Sudarsanam, and Cindy Grimm.
2009. Subtle gaze direction. ACM Transactions on Graphics (TOG) 28,
4 (2009), 100.
[5]
Rafael Ballagas, André Kuntze, and Steen P Walz. 2008. Gaming
tourism: Lessons from evaluating REXplorer, a pervasive game for
tourists. In International Conference on Pervasive Computing. Springer,
244–261.
[6]
Stephen Brewster, Joanna Lumsden, Marek Bell, Malcolm Hall, and
Stuart Tasker. 2003. Multimodal ’eyes-free’ interaction techniques for
wearable devices. In Proceedings of the SIGCHI conference on Human
factors in computing systems. ACM, 473–480.
[7]
John Brooke. 1996. SUS - A quick and dirty usability scale. Usability
evaluation in industry 189, 194 (1996), 4–7. https://doi.org/10.1002/
hbm.20701
[8]
Andreas Bulling, Andrew T Duchowski, and Päivi Majaranta. 2011.
PETMEI 2011: the 1st international workshop on pervasive eye track-
ing and mobile eye-based interaction. In Proceedings of the 13th inter-
national conference on Ubiquitous computing. ACM, 627–628.
[9]
Andreas Bulling, Jamie A. Ward, Hans Gellersen, and Gerhard Troster.
2011. Eye Movement Analysis for Activity Recognition Using Elec-
trooculography. IEEE Trans. Pattern Anal. Mach. Intell. 33, 4 (April
2011), 741–753. https://doi.org/10.1109/TPAMI.2010.86
[10]
Marc Cavazza and David Pizzi. 2006. Narratology for interactive sto-
rytelling: A critical introduction. In International Conference on Tech-
nologies for Interactive Digital Storytelling and Entertainment. Springer,
72–83.
[11]
Keith Cheverst, Nigel Davies, Keith Mitchell, and Adrian Friday. 2000.
Experiences of developing and deploying a context-aware tourist guide:
the GUIDE project. In Proceedings of the 6th annual international con-
ference on Mobile computing and networking. ACM, 20–31.
[12]
Keith William John Cheverst, Ian Norman Gregory, and Helen Turner.
2016. Encouraging Visitor Engagement and Reection with the Land-
scape of the English Lake District: Exploring the potential of Locative
Media. In International Workshop on’Unobtrusive User Experiences with
Technology in Nature’. 1–5.
[13]
Kenny R Coventry, Thora Tenbrink, and John Bateman. 2009. Spatial
language and dialogue. Vol. 3. OUP Oxford.
[14]
Steven Dow, Jaemin Lee, Christopher Oezbek, Blair Maclntyre,
Jay David Bolter, and Maribeth Gandy. 2005. Exploring spatial narra-
tives and mixed reality experiences in Oakland Cemetery. In Proceed-
ings of the 2005 ACM SIGCHI International Conference on Advances in
computer entertainment technology. ACM, 51–60.
[15]
Lex Fridman, Heishiro Toyoda, Sean Seaman, Bobbie Seppelt, Linda
Angell, Joonbum Lee, Bruce Mehler, and Bryan Reimer. 2017. What
Can Be Predicted from Six Seconds of Driver Glances?. In Proceedings
of the 2017 CHI Conference on Human Factors in Computing Systems
(CHI ’17). ACM, New York, NY, USA, 2805–2813. https://doi.org/10.
1145/3025453.3025929
[16]
Brian F. Goldiez, Ali M. Ahmad, and Peter A. Hancock. 2007. Ef-
fects of Augmented Reality Display Settings on Human Waynd-
ing Performance. IEEE Transactions on Systems, Man, and Cyber-
netics, Part C (Applications and Reviews) 37, 5 (Sept 2007), 839–845.
https://doi.org/10.1109/TSMCC.2007.900665
[17]
Jonna Häkkilä, Keith Cheverst, Johannes Schöning, Nicola J. Bidwell,
Simon Robinson, and Ashley Colley. 2016. NatureCHI: Unobtrusive
User Experiences with Technology in Nature. In Proceedings of the 2016
CHI Conference Extended Abstracts on Human Factors in Computing
Systems (CHI EA ’16). ACM, New York, NY, USA, 3574–3580. https:
//doi.org/10.1145/2851581.2856495
[18]
Dai-In Han, Timothy Jung, and Alex Gibson. 2013. Dublin AR: imple-
menting augmented reality in tourism. In Information and communi-
cation technologies in tourism 2014. Springer, 511–523.
[19]
Dan Witzner Hansen and Arthur E.C. Pece. 2005. Eye tracking in the
wild. Computer Vision and Image Understanding 98, 1 (2005), 155 –
181. https://doi.org/10.1016/j.cviu.2004.07.013 Special Issue on Eye
Detection and Tracking.
[20]
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-
TLX (Task Load Index): Results of Empirical and Theoretical Research.
In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati
(Eds.). Advances in Psychology, Vol. 52. North-Holland, 139 – 183.
https://doi.org/10.1016/S0166-4115(08)62386- 9
[21]
Mary Hegarty. 2002. Development of a self-report measure of envi-
ronmental spatial ability. Intelligence 30, 5 (2002), 425–447. https:
//doi.org/10.1016/s0160-2896(02)00116- 2
[22]
Florian Heller, Aaron Krämer, and Jan Borchers. 2014. Simplifying
Orientation Measurement for Mobile Audio Augmented Reality Ap-
plications. In Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (CHI ’14). ACM, New York, NY, USA, 615–624.
https://doi.org/10.1145/2556288.2557021
[23]
Kenneth Holmqvist, Marcus Nyström, Richard Andersson, Richard De-
whurst, Halszka Jarodzka, and Joost Van de Weijer. 2011. Eye tracking:
A comprehensive guide to methods and measures. OUP Oxford.
[24]
Michael Xuelin Huang, Jiajia Li, Grace Ngai, and Hong Va Leong. 2017.
ScreenGlint: Practical, In-situ Gaze Estimation on Smartphones. In
Proceedings of the 2017 CHI Conference on Human Factors in Computing
Systems (CHI ’17). ACM, New York, NY, USA, 2546–2557. https://doi.
org/10.1145/3025453.3025794
[25]
Jari Kangas, Deepak Akkil, Jussi Rantala, Poika Isokoski, Päivi Ma-
jaranta, and Roope Raisamo. 2014. Gaze Gestures and Haptic Feedback
in Mobile Devices. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA,
435–438. https://doi.org/10.1145/2556288.2557040
[26]
Moritz Kassner, William Patera, and Andreas Bulling. 2014. Pupil: An
Open Source Platform for Pervasive Eye Tracking and Mobile Gaze-
based Interaction. In Proceedings of the 2014 ACM International Joint
Conference on Pervasive and Ubiquitous Computing: Adjunct Publication
(UbiComp ’14 Adjunct). ACM, New York, NY, USA, 1151–1160. https:
//doi.org/10.1145/2638728.2641695
[27]
Mohamed Khamis, Axel Hoesl, Alexander Klimczak, Martin Reiss,
Florian Alt, and Andreas Bulling. 2017. EyeScout: Active Eye Tracking
for Position and Movement Independent Gaze Interaction with Large
Public Displays. In Proceedings of the 30th Annual ACM Symposium
on User Interface Software and Technology (UIST ’17). ACM, New York,
NY, USA, 155–166. https://doi.org/10.1145/3126594.3126630
[28]
Peter Kiefer, Ioannis Giannopoulos, Dominik Kremer, Christoph
Schlieder, and Martin Raubal. 2014. Starting to get bored: An out-
door eye tracking study of tourists exploring a city panorama. In
Proceedings of the Symposium on Eye Tracking Research and Appli-
cations (ETRA ’14). ACM, New York, NY, USA, 315–318. https:
//doi.org/10.1145/2578153.2578216
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 11
[29]
Izabela Krejtz, Agnieszka Szarkowska, Krzysztof Krejtz, Agnieszka
Walczak, and Andrew Duchowski. 2012. Audio Description As an Aural
Guide of Children’s Visual Attention: Evidence from an Eye-tracking
Study. In Proceedings of the Symposium on Eye Tracking Research and
Applications (ETRA ’12). ACM, New York, NY, USA, 99–106. https:
//doi.org/10.1145/2168556.2168572
[30]
Krzysztof Krejtz, Andrew Duchowski, Izabela Krejtz, Agnieszka
Szarkowska, and Agata Kopacz. 2016. Discerning Ambient/Focal At-
tention with Coecient K. ACM Trans. Appl. Percept. 13, 3, Article 11
(May 2016), 20 pages. https://doi.org/10.1145/2896452
[31]
Kuno Kurzhals, Emine Cetinkaya, Yongtao Hu, Wenping Wang, and
Daniel Weiskopf. 2017. Close to the Action: Eye-Tracking Evaluation of
Speaker-Following Subtitles. In Proceedings of the 2017 CHI Conference
on Human Factors in Computing Systems (CHI ’17). ACM, New York,
NY, USA, 6559–6568. https://doi.org/10.1145/3025453.3025772
[32]
Pupil Labs. 2017. Hololens and BT300 eye track-
ing add-ons. https://pupil-labs.com/blog/2017- 03/
hololens-and- bt300-eye- tracking-add-ons/. Accessed: 2018-12-25.
[33]
Bettina Laugwitz, Theo Held, and Martin Schrepp. 2008. Construction
and Evaluation of a User Experience Questionnaire. In HCI and Usabil-
ity for Education and Work, Andreas Holzinger (Ed.). Springer Berlin
Heidelberg, Berlin, Heidelberg, 63–76.
[34]
Viktor Losing, Lukas Rottkamp, Michael Zeunert, and Thies Pfeier.
2014. Guiding visual search tasks using gaze-contingent auditory
feedback. In Proceedings of the 2014 ACM International Joint Conference
on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM,
1093–1102.
[35]
Patrick Luley, Roland Perko, Johannes Weinzerl, Lucas Paletta, and
Alexander Almer. 2012. Mobile Augmented Reality for Tourists -
MARFT. Advances in Location-Based Services Lecture Notes in Geoin-
formation and Cartography (2012), 21–36. https://doi.org/10.1007/
978-3- 642-24198- 7_2
[36]
Päivi Majaranta and Andreas Bulling. 2014. Eye Tracking and Eye-
Based Human–Computer Interaction. Springer London, London, 39–65.
https://doi.org/10.1007/978-1- 4471-6392- 3_3
[37]
Ann McNamara, Thomas Booth, Srinivas Sridharan, Stephen Caey,
Cindy Grimm, and Reynold Bailey. 2012. Directing gaze in narrative
art. In Proceedings of the ACM Symposium on Applied Perception. ACM,
63–70.
[38]
NaturalSoft Ltd. 2018. NaturalReader Commercial. https://www.
naturalreaders.com/commercial.html [Online].
[39]
Marcel Neuenhaus and Maha Aly. 2017. Empathy Up. In Proceedings
of the 2017 CHI Conference Extended Abstracts on Human Factors in
Computing Systems (CHI EA ’17). ACM, New York, NY, USA, 86–92.
https://doi.org/10.1145/3027063.3049276
[40]
Valeria Orso, Alessandra Varotto, Stefano Rodaro, Anna Spagnolli,
Giulio Jacucci, Salvatore Andolina, Jukka Leino, and Luciano Gam-
berini. 2017. A Two-step, User-centered Approach to Personalized
Tourist Recommendations. In Proceedings of the 12th Biannual Con-
ference on Italian SIGCHI Chapter (CHItaly ’17). ACM, New York, NY,
USA, Article 7, 5 pages. https://doi.org/10.1145/3125571.3125594
[41]
Seonwook Park, Xucong Zhang, Andreas Bulling, and Otmar Hilliges.
2018. Learning to Find Eye Region Landmarks for Remote Gaze Es-
timation in Unconstrained Settings. In Proceedings of the 2018 ACM
Symposium on Eye Tracking Research & Applications (ETRA ’18). ACM,
New York, NY, USA, Article 21, 10 pages. https://doi.org/10.1145/
3204493.3204545
[42]
Daniela Petrelli, Nick Dulake, Mark T. Marshall, Anna Pisetti, and Elena
Not. 2016. Voices from the War: Design As a Means of Understanding
the Experience of Visiting Heritage. In Proceedings of the 2016 CHI Con-
ference on Human Factors in Computing Systems (CHI ’16). ACM, New
York, NY, USA, 1033–1044. https://doi.org/10.1145/2858036.2858287
[43]
Martin Pielot, Benjamin Poppinga, Wilko Heuten, and Susanne Boll.
2012. PocketNavigator: Studying Tactile Navigation Systems In-situ. In
Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems (CHI ’12). ACM, New York, NY, USA, 3131–3140. https://doi.
org/10.1145/2207676.2208728
[44]
Pernilla Qvarfordt, David Beymer, and Shumin Zhai. 2005. RealTourist
– A Study of Augmenting Human-Human and Human-Computer Di-
alogue with Eye-Gaze Overlay. In Human-Computer Interaction - IN-
TERACT 2005, Maria Francesca Costabile and Fabio Paternò (Eds.).
Springer Berlin Heidelberg, Berlin, Heidelberg, 767–780.
[45]
Patrick Renner and Thies Pfeier. 2017. Attention guiding techniques
using peripheral vision and eye tracking for feedback in augmented-
reality-based assistance systems. In 2017 IEEE Symposium on 3D User
Interfaces (3DUI). 186–194. https://doi.org/10.1109/3DUI.2017.7893338
[46]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011.
ORB: An ecient alternative to SIFT or SURF. In 2011 International
Conference on Computer Vision. 2564–2571. https://doi.org/10.1109/
ICCV.2011.6126544
[47]
Christian Sailer, Peter Kiefer, and Raubal Martin. 2015. An integrated
learning management system for location-based mobile learning. Pro-
ceedings of the 11th International Conference on Mobile Learning 2015,
118–122.
[48]
Dario D. Salvucci and Joseph H. Goldberg. 2000. Identifying Fixations
and Saccades in Eye-tracking Protocols. In Proceedings of the 2000
Symposium on Eye Tracking Research & Applications (ETRA ’00). ACM,
New York, NY, USA, 71–78. https://doi.org/10.1145/355017.355028
[49]
Thiago Santini, Hanna Brinkmann, Luise Reitstätter, Helmut Leder,
Raphael Rosenberg, Wolfgang Rosenstiel, and Enkelejda Kasneci. 2018.
The art of pervasive eye tracking: unconstrained eye tracking in the
Austrian Gallery Belvedere. In Proceedings of the 7th Workshop on
Pervasive Eye Tracking and Mobile Eye-Based Interaction. ACM, 5.
[50]
Simon Schenk, Marc Dreiser, Gerhard Rigoll, and Michael Dorr. 2017.
GazeEverywhere: Enabling Gaze-only User Interaction on an Unmod-
ied Desktop PC in Everyday Scenarios. In Proceedings of the 2017
CHI Conference on Human Factors in Computing Systems (CHI ’17).
ACM, New York, NY, USA, 3034–3044. https://doi.org/10.1145/3025453.
3025455
[51]
Albrecht Schmidt. 2000. Implicit human computer interaction through
context. Personal Technologies 4, 2 (01 Jun 2000), 191–199. https:
//doi.org/10.1007/BF01324126
[52]
Johannes Schöning, Brent Hecht, and Nicole Starosielski. 2008. Evalu-
ating automatically generated location-based stories for tourists. In
CHI’08 extended abstracts on Human factors in computing systems. ACM,
2937–2942.
[53]
Tobii. 2018. Enhanced PC Games with Eye Tracking. https://
tobiigaming.com/games/. Accessed: 2018-12-25.
[54]
Waldo R. Tobler. 1994. Bidimensional Regression. Geographical Anal-
ysis 26, 3 (1994), 187–212. https://doi.org/10.1111/j.1538-4632.1994.
tb00320.x
[55]
Takumi Toyama, Thomas Kieninger, Faisal Shafait, and Andreas Den-
gel. 2011. Museum Guide 2.0 – An Eye-Tracking based Personal Assis-
tant for Museums and Exhibits. 1 (05 2011), 103–110.
[56]
Tom Tullis and Bill Albert. 2013. Chapter 6 - Self-Reported Metrics.
In Measuring the User Experience (Second Edition) (second edition ed.),
Tom Tullis and Bill Albert (Eds.). Morgan Kaufmann, Boston, 121 –
161. https://doi.org/10.1016/B978-0- 12-415781- 1.00006-6
[57]
Wikimedia Foundation, Inc. 2018. Wikipedia: The free encyclopedia.
https://www.wikipedia.org [Online].
CHI 2019 Paper
CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK
Paper 491
Page 12
... Indeed, eye trackers may become ubiquitous in the near future, with such sensors being utilized in everyday life. Therefore, gaze-controlled HCIs, which can provide more intelligent and personalized information to users, may become pervasive [3][4][5][6]. More importantly, gaze-controlled HCIs are a vital assistive technology since they only require users to move their eyes, freeing up their hands. ...
... Giannopoulos et al. [42] presented a gaze-based pedestrian navigation system that communicates route information based on where and what users are looking at. Kwok et al. [5] designed a gaze-guided narrative system that provides voice content adaptation to city tourists; this system first determines which building a user is looking at through a head-mounted eye tracker and then provides relevant information about the building through voice. ...
Article
Full-text available
The modes of interaction (e.g., mouse and touch) between maps and users affect the effectiveness and efficiency of transmitting cartographic information. Recent advances in eye tracking technology have made eye trackers lighter, cheaper and more accurate, broadening the potential to interact with maps via gaze. In this study, we focused exclusively on using gaze to choose map features (i.e., points, polylines and polygons) via the select operation, a fundamental action preceding other operations in map interactions. We adopted an approach based on the dwell time and buffer size to address the low spatial accuracy and Midas touch problem in gaze-based interactions and to determine the most suitable dwell time and buffer size for the gaze-based selection of map features. We conducted an experiment in which 38 participants completed a series of map feature selection tasks via gaze. We compared the participants’ performance (efficiency and accuracy) between different combinations of dwell times (200 ms, 600 ms and 1000 ms) and buffer sizes (point: 1°, 1.5°, and 2°; polyline: 0.5°, 0.7° and 1°). The results confirmed that a larger buffer size raised efficiency but reduced accuracy, whereas a longer dwell time lowered efficiency but enhanced accuracy. Specifically, we found that a 600 ms dwell time was more efficient in selecting map features than 200 ms and 1000 ms but was less accurate than 1000 ms. However, 600 ms was considered to be more appropriate than 1000 ms because a longer dwell time has a higher risk of causing visual fatigue. Therefore, 600 ms supports a better balance between accuracy and efficiency. Additionally, we found that buffer sizes of 1.5° and 0.7° were more efficient and more accurate than other sizes for selecting points and polylines, respectively. Our results provide important empirical evidence for choosing the most appropriate dwell times and buffer sizes for gaze-based map interactions.
... (Chuang et al., 2019;Kiefer et al., 2015). For example, Kwok et al. (2019) presented a gaze-guided narrative system for tourists. In the system, a tourist wears eye tracking glasses and a connected Bluetooth headphone to explore a city panorama (in either a virtual or a real environment). ...
... For example, when a tour starts, a tourism company may provide tourists with wearable devices embedded with an eye tracker and a headset offering a gazeguided narrative of tourist attractions (e.g. Kwok et al. (2019)'s work). When the tour ends, the devices are collected for reuse. ...
Article
Full-text available
Individuals with different characteristics exhibit different eye movement patterns in map reading and wayfinding tasks. In this study, we aim to explore whether and to what extent map users’ eye movements can be used to detect who created them. Specifically, we focus on the use of gaze data for inferring users’ identities when users are performing map-based spatial tasks. We collected 32 participants’ eye movement data as they utilized maps to complete a series of self-localization and spatial orientation tasks. We extracted five sets of eye movement features and trained a random forest classifier. We used a leave-one-task-out approach to cross-validate the classifier and achieved the best identification rate of 89%, with a 2.7% equal error rate. This result is among the best performances reported in eye movement user identification studies. We evaluated the feature importance and found that basic statistical features (e.g. pupil size, saccade latency and fixation dispersion) yielded better performance than other feature sets (e.g. spatial fixation densities, saccade directions and saccade encodings). The results open the potential to develop personalized and adaptive gaze-based map interactions but also raise concerns about user privacy protection in data sharing and gaze-based geoapplications.
... There are a lot of works dealing with gaze visualization in virtual reality, augmented reality, mixed reality, and real-world environments [17,20,22,24]. Efficient visual guidance could contribute to reducing cognitive load while conducting visual search activities [15], as well as [3] used HoloLens for augmented collaboration under co-design scenarios in shared space. Shared gaze was used to improve robot performance as well [19]. ...
... In order to measure the gaze pattern of the users, we used a Tobii 4C remote eye-tracker installed on a 27" display, with a sampling rate of 90 Hz. Fixations were detected by an implementation of the popular ID-T algorithm [229] with a dispersion threshold of 1°and a duration threshold of 100ms [229,162,79,166]. After identifying the fixations, we categorised them in different AOIs. ...
Thesis
Full-text available
Recommender systems are software tools that automatically select a set of relevant items for users based on their interests. Despite their intelligence to adapt the recommended items to the user, most recommender systems lack the intelligence to adapt their interface to the needs of the user. Additionally, most recommender systems do not allow users to steer the recommendation process or correct wrong assumptions. Moreover, recommender systems often act as black boxes which might hinder users to steer the recommendation process in an informed way. In this dissertation, we take the first steps towards personalized recommender systems that can adapt themselves based on the personal characteristics of the user. In particular, we investigate how the system can adapt visualizations to control the recommender system and how it can adapt scrutable explanations that allow both control and understanding of the recommender system. More concretely, this dissertation aims to address three main research questions: RQ1 - How do personal characteristics influence the way users perceive and interact with different visualizations to control a music recommender system? RQ2 - Which personal characteristics influence the way users perceive and interact with scrutable explanations in a music recommender system? RQ3 - How should explanations be adapted to different personal characteristics in a music recommender system. To investigate these research questions, we conducted eight different user studies in which we investigated how users interacted and perceived a music recommender system interface. In particular, we focused on the reaction of users to visualizations to control the recommender system and on scrutable explanations. Additionally, we observed how personal characteristics such as personality, cognitive style, cognitive abilities, and domain experience influenced the reaction of users to these interfaces. Our results indicate that two personal characteristics influence the way users perceive and interact with visualizations to control recommender systems, namely musical sophistication, and tech-savviness. We also found three personal characteristics that influence the perception of scrutable explanations in a music recommender interface: need for cognition, musical sophistication, and openness. In three user studies, we investigated how scrutable explanations could be tailored to these three personal characteristics. Based on our results we recommend providing explanations up-front for all recommendations at once, but for users with a low need for cognition it must be possible to turn off these explanations. To tailor explanations to users with low musical sophistication, we recommend providing brief explanations which do not require musical knowledge. For users with high musical sophistication, we recommend providing these users with the choice between these brief explanations and interactive explanations which contain a mix of information. To adapt explanations to users with low openness, we recommend providing explanations with only one explanation element. For users high in openness, we recommend providing users the choice between these explanations and explanations that support exploration and that contain multiple explanation elements.
... LBS can be further enhanced by other context information, such as the user's gaze. This allows taking the user's viewing direction into account (Anagnostopoulos et al. 2017), leading, for example, to personalized audio guides that help users to find objects in the environment, and adapting the audio content to what has previously been looked at (Kwok et al. 2019). This directly relates to geographic human-computer interaction, i.e., people's interaction with geographic information technologies (Hecht et al. 2011). ...
Chapter
Full-text available
Urban mobility and the transport of people have been increasing in volume inexorably for decades. Despite the advantages and opportunities mobility has brought to our society, there are also severe drawbacks such as the transport sector’s role as one of the main contributors to greenhouse-gas emissions and traffic jams. In the future, an increasing number of people will be living in large urban settings, and therefore, these problems must be solved to assure livable environments. The rapid progress of information and communication, and geographic information technologies, has paved the way for urban informatics and smart cities, which allow for large-scale urban analytics as well as supporting people in their complex mobile decision making. This chapter demonstrates how geosmartness, a combination of novel spatial-data sources, computational methods, and geospatial technologies, provides opportunities for scientists to perform large-scale spatio-temporal analyses of mobility patterns as well as to investigate people’s mobile decision making. Mobility-pattern analysis is necessary for evaluating real-time situations and for making predictions regarding future states. These analyses can also help detect behavioral changes, such as the impact of people’s travel habits or novel travel options, possibly leading to more sustainable forms of transport. Mobile technologies provide novel ways of user support. Examples cover movement-data analysis within the context of multi-modal and energy-efficient mobility, as well as mobile decision-making support through gaze-based interaction.
Chapter
The integration of navigation systems and smart tour guide apps has gained popularity among travellers with the rapid development of the internet, mobile technology, and the wide acceptance of smartphones. The purpose of the study is twofold: (1) to assess the growth of smart tour guide apps in India and (2) to examine the tourists' experiences in using smart tour guide apps. To achieve the purpose of the study, a content analysis method was employed to analyse the users' reviews on the “Audio Odigos” and the “Trip My Way,” which are very popular tour guide apps in India. The results reveal that smart tour guide apps are more preferred than the human tour guide. An app-based tour guide facilitates exceptional experiences for accurate and useful information on historical monument tours, city tours, and destination tours. Thus, the findings can be used to improve the existing apps and develop more sophisticated apps in the future that can ensure sustainable smart tourism.
Chapter
In this paper, we propose an interdisciplinary theoretical and empirical framework to investigate the particular faculties related to human “narrative cognition”, in general, and in relation to MRT in particular. In order to contextualize our approach, we shortly review the cognitive turn in narratology, as well as state of the art in different domains that have undertaken psychophysiological studies that either characterize aspects that are relevant to narrative cognition, or which investigate mixed reality experiences. The idea is to bring together knowledge and insights from narratology, different branches of semiotics and cognitive sciences, with empirical strategies that bridge the gap between first-person phenomenological approaches and psychophysiological and behavioural methods. We propose a rationale in order to combine tools and techniques from MRT/VR/AR, interactive digital narratives and storytelling, with a suite of integrated psychophysiological methods (such as EEG, HR, GSR and eye tracking) and phenomenological-subjective approaches.
Conference Paper
Full-text available
Conventional feature-based and model-based gaze estimation methods have proven to perform well in settings with controlled illumination and specialized cameras. In unconstrained real-world settings, however, such methods are surpassed by recent appearance-based methods due to difficulties in modeling factors such as illumination changes and other visual artifacts. We present a novel learning-based method for eye region landmark localization that enables conventional methods to be competitive to latest appearance-based methods. Despite having been trained exclusively on synthetic data, our method exceeds the state of the art for iris localization and eye shape registration on real-world imagery. We then use the detected landmarks as input to iterative model-fitting and lightweight learning-based gaze estimation methods. Our approach outper-forms existing model-fitting and appearance-based methods in the context of person-independent and personalized gaze estimation.
Conference Paper
Full-text available
Pervasive mobile eye tracking provides a rich data source to investigate human natural behavior, providing a high degree of ecological validity in natural environments. However, challenges and limitations intrinsic to unconstrained mobile eye tracking makes its development and usage to some extent an art. Nonetheless, researchers are pushing the boundaries of this technology to help assess museum visitors' attention not only between the exhibited works, but also within particular pieces, providing significantly more detailed insights than traditional timing-and-tracking or external observer approaches. In this paper, we present in detail the eye tracking system developed for a large scale fully-unconstrained study in the Austrian Gallery Belvedere, providing useful information for eye-tracking system designers. Furthermore, the study is described, and we report on usability and real-time performance metrics. Our results suggest that, although the system is comfortable enough, further eye tracker improvements are necessary to make it less conspicuous. Additionally, real-time accuracy already suffices for simple applications such as audio guides for the majority of users even in the absence of eye-tracker slippage compensation.
Preprint
Full-text available
Conventional feature-based and model-based gaze estimation methods have proven to perform well in settings with controlled illumination and specialized cameras. In unconstrained real-world settings, however, such methods are surpassed by recent appearance-based methods due to difficulties in modeling factors such as illumination changes and other visual artifacts. We present a novel learning-based method for eye region landmark localization that enables conventional methods to be competitive to latest appearance-based methods. Despite having been trained exclusively on synthetic data, our method exceeds the state of the art for iris localization and eye shape registration on real-world imagery. We then use the detected landmarks as input to iterative model-fitting and lightweight learning-based gaze estimation methods. Our approach outperforms existing model-fitting and appearance-based methods in the context of person-independent and personalized gaze estimation.
Conference Paper
Full-text available
While gaze holds a lot of promise for hands-free interaction with public displays, remote eye trackers with their confined tracking box restrict users to a single stationary position in front of the display. We present EyeScout, an active eye tracking system that combines an eye tracker mounted on a rail system with a computational method to automatically detect and align the tracker with the user’s lateral movement. EyeScout addresses key limitations of current gaze-enabled large public displays by offering two novel gaze-interaction modes for a single user: In “Walk then Interact” the user can walk up to an arbitrary position in front of the display and interact, while in “Walk and Interact” the user can interact even while on the move. We report on a user study that shows that EyeScout is well perceived by users, extends a public display’s sweet spot into a sweet line, and reduces gaze interaction kickoff time to 3.5 seconds – a 62% improvement over state of the art solutions. We discuss sample applications that demonstrate how EyeScout can enable position and movement-independent gaze interaction with large public displays.
Conference Paper
Full-text available
Freeform deformation techniques are powerful and flexible tools for interactive 3D shape editing. However, while interactivity is the key constraint for the usability of such tools, it cannot be maintained when the complexity of either the 3D model or the applied deformation exceeds a given workstation-dependent threshold. In this work, we solve this scalability problem by introducing a streaming system based on a sampling-reconstruction approach. First a fast out-of-core adaptive simplification algorithm is performed in a pre-processing step, for quick generation, of a simplified version of the model. The resulting model can then be submitted to arbitrary FFD tools, as its reduced size ensures interactive response. Second, a post-processing step performs a feature-preserving deformation reconstruction that applies to the original model the deformation undergone by its simplified version. Both bracketing steps share a streaming and point-based basis, making them fully scalable and compatible both with point-clouds and non-manifold meshes. Our system also offers a generic out-of-core multi-scale layer to FFD tools, since the two bracketing steps remain available for partial up-sampling during the interactive session. Arbitrarily large 3D models can thus be interactively edited with most FFD tools, opening the use of advanced deformation metaphors to models ranging from million to billion samples. Our system also enables offers to work on models that fit in memory but exceed the capabilities of a given FFD tool.
Conference Paper
Full-text available
The incorporation of subtitles in multimedia content plays an important role in communicating spoken content. For example, subtitles in the respective language are often preferred to expensive audio translation of foreign movies. The traditional representation of subtitles displays text centered at the bottom of the screen. This layout can lead to large distances between text and relevant image content, causing eye strain and even that we miss visual content. As a recent alternative, the technique of speaker-following subtitles places subtitle text in speech bubbles close to the current speaker. We conducted a controlled eye-tracking laboratory study (n = 40) to compare the regular approach (center-bottom subtitles) with content-sensitive, speaker-following subtitles. We compared different dialog-heavy video clips with the two layouts. Our results show that speaker-following subtitles lead to higher fixation counts on relevant image regions and reduce saccade length, which is an important factor for eye strain.
Conference Paper
Full-text available
We consider a large dataset of real-world, on-road driving from a 100-car naturalistic study to explore the predictive power of driver glances and, specifically, to answer the following question: what can be predicted about the state of the driver and the state of the driving environment from a 6-second sequence of macro-glances? The context-based nature of such glances allows for application of supervised learning to the problem of vision-based gaze estimation, making it robust, accurate, and reliable in messy, real-world conditions. So, it's valuable to ask whether such macro-glances can be used to infer behavioral, environmental, and demographic variables? We analyze 27 binary classification problems based on these variables. The takeaway is that glance can be used as part of a multi-sensor real-time system to predict radio-tuning, fatigue state, failure to signal, talking, and several environment variables.
Conference Paper
Geo-localized, mobile applications can simplify a tourist visit, making the relevant Point of Interests more easily and promptly discernible to users. At the same time, such solutions must avoid creating unfitting or rigid user profiles that impoverish the users' options instead of refining them. Currently, user profiles in recommender systems rely on dimensions whose relevance to the user is more often presumed than empirically defined. To avoid this drawback, we build our recommendation system in a two-step process, where profile parameters are evaluated preliminarily and separately from the recommendations themselves. We describe this two-step evaluation process including an initial survey (N = 206), and a subsequent controlled study (N = 24). We conclude by emphasizing the benefit and generalizability of the approach
Article
Location-Based Services (LBS) provide more useful, intelligent assistance to users by adapting to their geographic context. For some services that context goes beyond a location and includes further spatial parameters, such as the user's orientation or field of view. Here, we introduce Gaze-Informed LBS (GAIN-LBS), a novel type of LBS that takes into account the user's viewing direction. Such a system could, for instance, provide audio information about the speci�c building a tourist is looking at from a vantage point. To determine the viewing direction relative to the environment, we record the gaze direction relative to the user's head with a mobile eye tracker. Image data from the tracker's forward-looking camera serve as input to determine the orientation of the head w.r.t. the surrounding scene, using computer vision methods that allow one to estimate the relative transformation between the camera and a known view of the scene in real-time and without the need for artificial markers or additional sensors. We focus on how to map the Point of Regard of a user to a reference system, for which the objects of interest are known in advance. In an experimental validation on three real city panoramas, we confirm that the approach can cope with head movements of varying speed, including fast rotations up to 63 deg/s. We further demonstrate the feasibility of GAIN-LBS for tourist assistance with a proof-of-concept experiment in which a tourist explores a city panorama, where the approach achieved a recall that reaches over 99%. Finally, a GAIN-LBS can provide objective and qualitative ways of examining the gaze of a user based on what the user is currently looking at.