A Web-Based Evaluation Tool to Predict Long Eye Glances

Conference Paper (PDF Available) · June 2015with 88 Reads
DOI: 10.17077/drivingassessment.1584
Conference: 8th International Driving Symposium on Human Factors in Driver Assessment, At Salt Lake City, Utah
Abstract
A web-based evaluation tool that simulates drivers’ eye glances to interface designs of in-vehicle information systems (IVISs) is presented. This application computes saliency of each location of a candidate interface and simulates eye fixations based on the saliency, until it arrives at the region of interest. Designers can use this tool to estimate the duration of drivers’ eye glance needed to find regions of interest, such as particular icons on a touch screen. The overall goal of developing this application is to bridge the gap between guidelines and empirical evaluations. This evaluation tool acts as an interactive model-based design guideline to help designers craft less distracting IVIS interfaces.
PROCEEDINGS of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
282
A WEB-BASED EVALUATION TOOL TO PREDICT LONG EYE GLANCES
Ja Young Lee
John D. Lee
University of Wisconsin-Madison
Madison, WI, USA
Email: jayoung.lee@wisc.edu, jdlee@engr.wisc.edu
Summary: We present a web-based evaluation tool that simulates drivers’ eye
glances to interface designs of in-vehicle information systems (IVISs). This tool
computes saliency of each location of a candidate interface and simulates eye
fixations based on the saliency, until it arrives at the region of interest. Designers
can use this tool to estimate the duration of drivers’ eye glance needed to find
regions of interest, such as particular icons on a touch screen. The overall goal of
developing this application is to bridge the gap between guidelines and empirical
evaluations. This evaluation tool acts as an interactive model-based design
guideline to help designers craft less distracting IVIS interfaces.
OBJECTIVE
Digital displays are now widely embedded into cars to provide drivers with information and
entertainment. For instance, Tesla Motors has installed a 17” touch display on the center stack to
serve as a master control interface for the car. Both Apple CarPlay and Android Auto have been
developed to provide seamless connection between smartphones and vehicles. Such systems
offer drivers information in a form that might be less distracting than accessing that information
on a smartphone. However, interacting with such In-Vehicle Information Systems (IVISs) might
shift drivers’ attention away from the roadway and increase crash risk.
Naturalistic driving data has shown that instances where drivers’ eyes were off the forward
roadway for more than 2.0 seconds accounted for 18% of crashes and near crashes, producing an
odds ratio of 2.19 (Klauer, Dingus, Neale, Sudweeks, & Ramsey, 2006). Similarly, NHTSA’s
driver distraction guidelines (2014) suggest that mean duration of glance away from the road
should not exceed 2.0 seconds. Long eye glances can occur when drivers search for an object of
interest on a cluttered display (Tsimhoni & Green, 2001). Saliency, or conspicuity, of objects is
one factor that guides visual attention and influences the time to locate objects (Wickens, Goh,
Helleberg, Horrey, & Talleur, 2003). If the object of interest is salient then search times can be
short. However, if the object of interest is not salient and is surrounded by highly salient then
search times can be long. Long search times associated with such misplaced salience can lead to
long off-road glances and heightened crash risk.
Empirical evaluation using eye tracking is an expensive and time-consuming method to assess
the distraction potential of candidate IVIS interfaces. Recruiting participants, collecting data, and
analyzing the eye tracking data can take weeks or months. On the other hand, modeling driver
glances represents a promising approach to reduce the cost and time to assess the distraction
potential of displays. N-SEEV is a computational model that was developed to predict a time to
detect visual changes on displays (Steelman-Allen, McCarley, & Wickens, 2009). The model is
PROCEEDINGS of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
283
based on four attentional influences: salience of the signal, effort required to attend to the signal,
expectancy of the signal, and value of the signal. Although it considers multiple attentional
influences, N-SEEV does not fit perfectly with driving situations, as it focuses on detecting
changes while monitoring a display. Also, it requires experts to determine the perceived value of
design features, making the model less accessible to ordinary designers. The tool introduced in
this paper is simpler than N-SEEV, with focused functionality: it predicts search time associated
with salience of the visual objects. We used markedly advanced salience model compared to the
one implemented in N-SEEV model.
We launched a web-based tool that predicts the distraction potential of design alternatives based
on the saliency of the display elements by estimating search time. The purpose of this tool is to
support designers with a simple model of visual attention that can be used as an interactive
guideline to identify misplaced salience that might otherwise go unaddressed.
METHODS
User Interface
Rather than empirical results, we present a prototype web tool that can serve as an interactive
design guide. The program is written in Python and Django and deployed on web
(http://distraction.engr.wisc.edu/). Figure 1(a) shows how designers can upload two image files
representing different design options. On the first page, designers set five input parameters:
screen resolutions (width and height, in pixels), screen dimensions (width and height, in
millimeters), and the distance from the driver to the image (in millimeters). These parameters are
used to determine the size of foveal area of the display and region of interest (e.g., an icon). On
the next page, the original image and the image after saliency calculation are displayed.
Designers can select region of interest (ROI) for each of two images (Figure 1(b)). The ROI
identifies the object that the driver would be searching for. The selected ROI is immediately
shown on the corresponding saliency map. On pressing the ‘Simulate’ button, the simulation
executes 1000 search trials where a series of eye movements are produced to estimate the
number of fixations and associated glance time required to locate the ROI. The trials can be
thought of as instances where drivers look to the display trying to find a particular icon. The right
side of Figure 1(b) shows the results as Pareto graphs, showing the distribution of glances and
highlighting the proportion of glances longer than two seconds.
Simulating Eye Fixations
We used Boolean Map based Saliency model (BMS) (Zhang and Sclaroff 2013), which is
currently the most accurate saliency algorithm according to the MIT Saliency Benchmark
(http://saliency.mit.edu/). BMS uses surroundedness, which is a geometric cue that Gestalt
psychology has identified as guiding figure and ground assignment. Regions that are surrounded
are seen as the figure. BMS identifies regions separated from the surrounding background to
create a Boolean map (Huang & Pashler, 2007) and uses that to compute saliency. A saliency
map calculated from BMS is a normalized black-and-white image where the most salient areas
are white and least salient are black.
PROCEEDINGS of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
284
(a) (b)
Figure 1. User interface of the glance calculation web application. Designers can upload designs and set
parameters on the front page (a). On the next page (b), designers can see original images and corresponding
saliency maps. The results appear once ROI is selected for each image and the ‘simulate’ button is clicked
Considering visual attention as a spotlight, when ones’ eyes focus on a certain location, only
about 2.0 degrees around the fixation point is seen clearly (Henderson, 2003). To locate a target
outside the focal area, eyes saccade rapidly over roughly 2.5 ~ 20 degrees (Land, Mennie, &
Rusted, 1999). The present application assumed that eyes fixate on points across the image until
they arrives within 2.0 degrees of the ROI. The probability of a saccade landing on a particular
location is proportional to its saliency. In the current model, the saliency map was divided into
rectangular cells, where each cell corresponds to the 2.0 degrees span of focal vision. The pixel
size of the area was calculated based on display size and resolution as well as the distance from
display to the viewer. The formula below was used to calculate how many image pixels (p) are
within k degrees of visual field. The variable d is the distance from the observer and the image, r
is the image resolution in pixel, and l is the image length or width. This implementation used 2
for k. This formula was used to calculate the number of pixels for width and height.
(1)
Average saliency within each cell was then calculated, and this value represented saliency of the
cell. Figure 2 (c) shows how the saliency map was divided into cells and was assigned with a
saliency value. From this value, the possibility of eye fixation was calculated as the formula (2)
below. An assumption here was that the probability of eye fixation is proportional to the saliency
of a cell. When there are n total cells, the possibility of eye fixation moving from cell i to cell j
(P
ij
) was calculated based on the saliency of cell i (s
i
) and cell j (s
j
):
(2)
It was assumed that the fixation starts in the center of the image because people tend to fixate in
the center of images (Tatler, 2007). After this initial fixation, the location of the next fixation
p2dr
l
tan k
2rad
P
ij
s
j
s
k
s
i
k1
n
PROCEEDINGS of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
285
was calculated according to the probability associated with the cell saliency, as in formula (2).
With each fixation the program added 230ms to the total time elapsed, assuming 230ms between
saccades (Henderson & Hollingworth, 1999). A cell that has been visited was not visited again,
reflecting the inhibition of return. With each fixation, the system also checked if its destination
cell included an ROI. The search terminated once a fixation arrived at the ROI and then the total
time elapsed was recorded. The process was repeated 1000 times, and the result was represented
as a Pareto chart. Figure 2 summarizes this process.
(a) Original image
(b) Saliency map
(c) Average saliency of cells
(d) Eye fixation prediction
Figure 2. Example of an eye fixation simulation. Image (d) is an example of one run with four saccades and
fixations after initial central fixation. This run of simulation is recorded as 230 x 4 = 920ms
RESULTS
The application allows designers to compare expected percentage of long glances (over 2.0
seconds) for multiple ROIs and multiple design concepts. For example, a designer can test
distraction potential of two different icons in one screen. The first two results from Figure 3
shows the eye glance predictions for two different icons: ‘Maps’ and ‘Settings’. Pareto plots
show that ‘Maps’ is easier to locate than ‘Settings’. The software estimates that 4% of trials will
fail to locate ‘Maps’ within 2.0 seconds, but 39% will fail to locate ‘Settings’. A designer can
also predict eye glance durations for the same function ‘Settings’ in two different design
alternatives. It would more likely to take longer to find ‘Settings’ icon in the second design
(P(time>2s) = 0.55) than in the first design (P(time>2s) = 0.39). Such comparison can help
explore trade-offs between design parameters.
Formula 2
BMS
PROCEEDINGS of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
286
‘Maps’
‘Settings’
‘Settings’
(a) Original image and saliency map (b) Distribution of eye glance durations for each
ROI
Figure 3. Sample runs of the application with different designs
CONCLUSION
A fundamental challenge in evaluating the distraction potential of vehicle displays is the costly
and time-consuming data collection with human participants. Written design guidelines help
address this issue, but they rarely provide quantitative results to guide design. They often fail to
support tradeoff analysis in comparing alternate designs. Computational models, such as the one
described in this paper can help bridge this gap. Such models can improve the design workflow
by providing designers with immediate and measurable feedback regarding their design
alternatives. This standalone web-based application makes it easy for designers to reflect on
human factors considerations at the start of the design process and thus reduce the cost of
redesigning interfaces at later stages. In addition, hundreds of design alternatives can be quickly
iteratively assessed, helping to producing less distracting interface designs.
This software also has a potential to be integrated into more comprehensive driver distraction
models. Many driver models have been developed to predict drivers’ behaviors in using IVIS,
but none of them simulate dynamic eye glances guided by saliency of visual objects. For
example, the Keystroke Level Model puts multiple perceptual processes together into ‘mentally
prepare for 1.35 seconds’ (Pettitt, Burnett, & Stevens, 2007), and Distract-R allocates 150ms for
finding and processing visual objects (Salvucci, Zuber, Beregovaia, & Markley, 2005). Likewise,
PROCEEDINGS of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
287
the Queuing Network Model uses a strategy in which eyes fixate on the closest unattended visual
field with a target feature (Lim & Liu, 2004. These static values can be replaced with dynamic
values that reflect specific display characteristics. For instance, the Itti & Koch salience model
has been combined with Distract-R driver model to predict driver distraction (Lee, 2014), where
salience model captures bottom-up (e.g., visual attention) influences and driver model captures
top-down (e.g., goal) influences on driving behaviors. Integrating the present application into
these driver models will allow better simulation of driving with secondary tasks. Specifically,
such integration makes it possible to account for value- or expectation-driven attention (top-
down) that might dominate the search of objects of interest with familiar displays in addition to
the saliency of visual objects (bottom-up). It will help address the important fact that saliency is
not the only force governing glance duration and visual search.
The current model captures important features of salience-driven attention, and so is an example
of a simple model that represents a specific human function and can address a limited set of
design issues (Rasmussen, 1983). It addresses the issue on misplaced salience based on well-
established theory of visual perception and attention and has been validated on several test
datasets from other domains. As such, this tool represents an interactive design guideline
conveniently available on the web that can help designers address one contributor to visual
distraction. Nevertheless, further validation with vehicle display designs is required to fully
demonstrate the utility of this tool. Input from potential end-users (i.e., designers) will also help
refine the tool to better support the design process.
REFERENCES
Bylinskii, Z., Judd, T., Durand, F., Oliva, A., & Torralba, A. (2012). MIT saliency benchmark.
http://saliency.mit.edu/
Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in
cognitive sciences, 7(11), 498-504.
Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual review of
psychology, 50(1), 243-271.
Huang, L., & Pashler, H. (2007). A Boolean map theory of visual attention. Psychological
review, 114(3), 599.
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of
visual attention. Vision research, 40(10), 1489-1506.
Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans
look. Proceedings of the IEEE international conference on Computer Vision, 2106-2113.
Klauer, S. G., Dingus, T. A., Neale, V. L., Sudweeks, J. D., & Ramsey, D. J. (2006). The impact
of driver inattention on near-crash/crash risk: An analysis using the 100-car naturalistic
driving study data. Tecnical Report No. DOT HS 810 594. Washington, DC: National
Highway Traffic Safety Administration (NHTSA).
Land, M., Mennie, N., & Rusted, J. (1999). The roles of vision and eye movements in the control
of activities of daily living. Perception, 28(11), 1311-1328.
PROCEEDINGS of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
288
Lee, J. (2014). Integrating the saliency map with distract-r to assess driver distraction of vehicle
displays. (Doctoral dissertation) Retrieved from ProQuest Dissertations and Theses. (Order
No. 3611485)
Lim, J. H., & Liu, Y. (2004). A Queuing Network Model for Visual Search and Menu Selection.
Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 1846-1850.
National Highway Traffic Safety Administration. (2014). Visual-manual NHTSA driver
distraction guidelines for in-vehicle electronic devices. Washington, DC: National Highway
Traffic Safety Administration (NHTSA), Department of Transportation (DOT).
Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention
for optimizing detection speed. Proceedings of IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, 2049-2056.
Pettitt, M., Burnett, G., & Stevens, A. (2007). An extended keystroke level model (KLM) for
predicting the visual demand of in-vehicle information systems. In Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, 1515-1524.
Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other
distinctions in human performance models. Systems, Man and Cybernetics, IEEE
Transactions on, (3), 257-266.
Salvucci, D. D., Zuber, M., Beregovaia, E., & Markley, D. (2005). Distract-R: Rapid prototyping
and evaluation of in-vehicle interfaces. Proceedings of the SIGCHI conference on Human
factors in computing systems, 581-589.
Steelman-Allen, K. S., McCarley, J. S., Wickens, C., Sebok, A., & Bzostek, J. (2009, October).
N-SEEV: A computational model of attention and noticing. Proceedings of the Human
Factors and Ergonomics Society Annual Meeting (Vol. 53, No. 12, pp. 774-778). SAGE
Publications.
Tatler, B. W. (2007). The central fixation bias in scene viewing: Selecting an optimal viewing
position independently of motor biases and image feature distributions. Journal of Vision,
7(14), 4.
Tsimhoni, O., & Green, P. (2001). Visual demand of driving and execution of display-intensive
in-vehicle tasks. Proceedings of Human Factors and Ergonomics Society Annual Meeting,
1586-1590.
Wickens, C. D., Goh, J., Helleberg, J., Horrey, W. J., & Talleur, D. A. (2003). Attentional
models of multitask pilot performance using advanced display technology. Human factors,
45(3), 360-380.
Zhang, J., & Sclaroff, S. (2013). Saliency detection: a boolean map approach. Proceedings of
IEEE International Conference on Computer Vision, 153-160.
  • Conference Paper
    As in-vehicle infotainment systems gain new functionality, their potential to distract drivers increases. Searching for an item on interface is a critical concern because a poorly designed interface that draws drivers’ attention to less important items can extend drivers’ search for items of interest and pull attention away from roadway events. This potential can be assessed in simulator-based experiments, but computational models of driver behavior might enable designers to assess this potential and revise their designs more quickly than if they have to wait weeks to compile human subjects data. One such model, reported in this paper, predicts the sequence of eye fixations of drivers based on a Boolean Map-based Saliency model augmented with top-down feature bias. Comparing the model predictions to empirical data shows that the model can predict search time, especially in cluttered scenes and when a target item is highlighted. We also describe the integration of this model into a web application (http://distraction.engr.wisc.edu/) that can help assess the distraction potential of interface designs.
  • Article
    Most models of visual search, whether involving overt eye movements or covert shifts of attention, are based on the concept of a saliency map, that is, an explicit two-dimensional map that encodes the saliency or conspicuity of objects in the visual environment. Competition among neurons in this map gives rise to a single winning location that corresponds to the next attended target. Inhibiting this location automatically allows the system to attend to the next most salient location. We describe a detailed computer implementation of such a scheme, focusing on the problem of combining information across modalities, here orientation, intensity and color information, in a purely stimulus-driven manner. The model is applied to common psychophysical stimuli as well as to a very demanding visual search task. Its successful performance is used to address the extent to which the primate visual system carries out visual search via one or more such saliency maps and how this can be tested.
  • Article
    Full-text available
    Random menu search is a task component involved in many human-machine interfaces and has been modeled with various cognitive models including ACT-R and EPIC. Based on a review of empirical data in menu search and strengths and limitations of existing models, this article proposes a model that is based on the queueing network approach, which has been successfully applied in some other task domains (e.g., response time, driver performance). The queueing network model for random menu selection was implemented and evaluated through model simulation. In contrast to existing models that rely on multiple task-specific strategies to account for performance and eye movement data, the queueing network model uses only one strategy already employed in an existing cognitive model to account for the same data. The value of this ?minimal task strategy? approach for modeling complex menu search tasks is discussed, based on the reported findings of the queueing network model and a comparison to existing models.
  • Article
    Full-text available
    To gain insight as to when telematics can be distracting, 16 participants drove a simulator on roads with long curves of several different radii. Participants read electronic maps displayed in the center console while both parked and driving. In separate trials, the visual demand/workload of the same straight and curved sections was measured using the visual occlusion technique. Visual demand was correlated with inverse curve radius. As visual demand increased, driving performance declined. Participants made shorter glances at the display, made more of them, but waited longer between glances. Overall, task completion time increased when the task was performed while driving (versus while parked), except for short duration tasks (a single glance or under 3 seconds timed while parked), where task time decreased. While driving, task completion times were relatively unaffected by the driving workload.
  • Conference Paper
    A novel Boolean Map based Saliency (BMS) model is proposed. An image is characterized by a set of binary images, which are generated by randomly thresholding the image's color channels. Based on a Gestalt principle of figure-ground segregation, BMS computes saliency maps by analyzing the topological structure of Boolean maps. BMS is simple to implement and efficient to run. Despite its simplicity, BMS consistently achieves state-of-the-art performance compared with ten leading methods on five eye tracking datasets. Furthermore, BMS is also shown to be advantageous in salient object detection.
  • The introduction of information technology based on digital computers for the design of man-machine interface systems has led to a requirement for consistent models of human performance in routine task environments and during unfamiliar task conditions. A discussion is presented of the requirement for different types of models for representing performance at the skill-, rule-, and knowledge-based levels, together with a review of the different levels in terms of signals, signs, and symbols. Particular attention is paid to the different possible ways of representing system properties which underlie knowledge-based performance and which can be characterised at several levels of abstraction-from the representation of physical form, through functional representation, to representation in terms of intention or purpose. Furthermore, the role of qualitative and quantitative models in the design and evaluation of interface systems is mentioned, and the need to consider such distinctions carefully is discussed.
  • The N-SEEV is a stochastic model of overt attention within a visual display or workspace. The model integrates elements from several existing models of attention (Bundesen, 1987, 1990; Itti & Koch, 2000; Wolfe, 1994; Wickens, et al., 2003) to provide (1) predictions of the allocation of visual attention among discrete display channels; (2) the likelihood of a scanning transition between any pair of channels; and (3) the number of eye movements needed to fixate the onset of a visual signal or event. Preliminary tests of the model show a close fit between model predictions and actual pilot scanning and noticing times.
  • Conference Paper
    As driver distraction from in-vehicle devices increasingly becomes a concern on our roadways, researchers have searched for better scientific understanding of distraction along with better engineering tools to build less distracting devices. This paper presents a new system, Distract-R, that allows designers to rapidly prototype and evaluate new in-vehicle interfaces. The core engine of the system relies on a rigorous cognitive model of driver performance, which the system integrates with models of behavior on the prototyped interfaces to generate predictions of distraction. Distract-R allows a designer to prototype basic interfaces, demonstrate possible tasks on these interfaces, specify relevant driver characteristics and driving scenarios, and finally simulate, visualize, and analyze the resulting behavior as generated by the cognitive model. The paper includes two sample studies that demonstrate the system's ability to account for effects of input modality and driver age on performance.
  • Conference Paper
    Full-text available
    To assess the potential distraction of In-Vehicle Information Systems (IVIS), simple, low cost evaluation methods are required for use in early design stages. The occlusion technique evaluates IVIS tasks in interrupted vision conditions, aiming to predict likely visual demand. However, the technique necessitates performance-focused user trials utilising robust prototypes, and consequently has limitations as an economic evaluation method. HCI practitioners view the Keystroke Level Model (KLM) as a reliable and valid means of modelling human performance, not requiring empirical trials or working prototypes. This paper proposes an extended KLM, which aims to predict measures based on the occlusion protocol. To validate the new method, we compared results of an occlusion study with predictions based on the assumptions of the extended KLM. Analysis revealed significant correlations between observed and predicted results (R=0.93-0.98) and low error rates (7-13%). In conclusion, the extended KLM shows considerable merit as a first-pass design tool.
  • Conference Paper
    For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottom-up computation that does not consider top-down image semantics and often does not match actual eye movements. To address this problem, we collected eye tracking data of 15 viewers on 1003 images and use this database as training and testing examples to learn a model of saliency based on low, middle and high-level image features. This large database of eye tracking data is publicly available with this paper.