Conference Paper

A Visual Search Model for In-Vehicle Interface Design

If you want to read the PDF, try requesting it from the authors.

Abstract

As in-vehicle infotainment systems gain new functionality, their potential to distract drivers increases. Searching for an item on interface is a critical concern because a poorly designed interface that draws drivers’ attention to less important items can extend drivers’ search for items of interest and pull attention away from roadway events. This potential can be assessed in simulator-based experiments, but computational models of driver behavior might enable designers to assess this potential and revise their designs more quickly than if they have to wait weeks to compile human subjects data. One such model, reported in this paper, predicts the sequence of eye fixations of drivers based on a Boolean Map-based Saliency model augmented with top-down feature bias. Comparing the model predictions to empirical data shows that the model can predict search time, especially in cluttered scenes and when a target item is highlighted. We also describe the integration of this model into a web application (http://distraction.engr.wisc.edu/) that can help assess the distraction potential of interface designs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
A web-based evaluation tool that simulates drivers’ eye glances to interface designs of in-vehicle information systems (IVISs) is presented. This application computes saliency of each location of a candidate interface and simulates eye fixations based on the saliency, until it arrives at the region of interest. Designers can use this tool to estimate the duration of drivers’ eye glance needed to find regions of interest, such as particular icons on a touch screen. The overall goal of developing this application is to bridge the gap between guidelines and empirical evaluations. This evaluation tool acts as an interactive model-based design guideline to help designers craft less distracting IVIS interfaces.
Conference Paper
Full-text available
Drivers show a wide range of behavior while performing a secondary task behind the wheel. In the current study, we categorized drivers into groups based on their glance behavior at task boundary (i.e., pressing a touch screen button after reading driving-related messages) and compared driving capabilities of drivers in each group. The comparison between the groups identifies different eye glance strategies, or task switching decisions, and associated vehicle control behaviors. Senders' uncertainty model was adapted to explain the results and to suggest future directions in developing driver models.
Article
Full-text available
To gain insight as to when telematics can be distracting, 16 participants drove a simulator on roads with long curves of several different radii. Participants read electronic maps displayed in the center console while both parked and driving. In separate trials, the visual demand/workload of the same straight and curved sections was measured using the visual occlusion technique. Visual demand was correlated with inverse curve radius. As visual demand increased, driving performance declined. Participants made shorter glances at the display, made more of them, but waited longer between glances. Overall, task completion time increased when the task was performed while driving (versus while parked), except for short duration tasks (a single glance or under 3 seconds timed while parked), where task time decreased. While driving, task completion times were relatively unaffected by the driving workload.
Conference Paper
Full-text available
Driver distraction is a leading cause of motor vehicle crashes. As more in-vehicle systems are developed, they represent increasing potential for distraction. Designers of these systems require a quantitative way to assess their distraction potential that does not involve time-consuming test track or simulator testing. A critical contribution to driver distraction concerns the search time for items in an in-vehicle system display. This study tests the saliency map’s ability to predict search time, and proposes a potential application of the saliency map in assessing driver distraction. Empirical data for search tasks were collected and used to test a modified driver model based on the saliency map. The results show that the modified saliency map can predict search time, and suggest that the driver model could be used to understand how design features influence the bottom-up visual search process. More broadly, such a model can complement guidelines and user testing to help designers to incorporate human factors considerations earlier in the design process.
Conference Paper
Full-text available
Visual saliency has been an increasingly active research area in the last ten years with dozens of saliency models recently published. Nowadays, one of the big challenges in the field is to find a way to fairly evaluate all of these models. In this paper, on human eye fixations ,we compare the ranking of 12 state-of-the art saliency models using 12 similarity metrics. The comparison is done on Jian Li's database containing several hundreds of natural images. Based on Kendall concordance coefficient, it is shown that some of the metrics are strongly correlated leading to a re-dundancy in the performance metrics reported in the avail-able benchmarks. On the other hand, other metrics provide a more diverse picture of models' overall performance. As a recommendation, three similarity metrics should be used to obtain a complete point of view of saliency model perfor-mance.
Article
Full-text available
Because of the visual nature of computer use, researchers and designers of com-puter systems would like to gain some insight into the visual search strategies of computer users. Icons, a common component of graphical user interfaces, serve as the focus for a set of studies aimed at (1) developing a detailed understanding of how people search for an icon in a typically crowded screen of other icons that vary in similarity to the target, and (2) building a cognitively plausible model that simulates the processes inferred in the human search process. An eye-tracking study of the task showed that participants rarely refixated icons that they had pre-viously examined, and that participants used an efficient search strategy of exam-ining distractor icons nearest to their current point of gaze. These findings were integrated into an ACT-R model of the task using EMMA and a "nearest" strat-egy. The model fit the response time data of participants as well as a previous model of the task, but was a much better fit to the eye movement data. Michael Fleetwood is an applied cognitive scientist with interests in human per-formance modeling and human vision; he is a PhD candidate in the psychology department at Rice University. Michael Byrne is an applied cognitive scientist with an interest in developing computational systems for application to human factors problems; he is an assistant professor in the psychology department at Rice University.
Article
Full-text available
Three areas of high-level scene perception research are reviewed. The first concerns the role of eye movements in scene perception, focusing on the influence of ongoing cognitive processing on the position and duration of fixations in a scene. The second concerns the nature of the scene representation that is retained across a saccade and other brief time intervals during ongoing scene perception. Finally, we review research on the relationship between scene and object identification, focusing particularly on whether the meaning of a scene influences the identification of constituent objects.
Article
Full-text available
The aim of this study was to determine the pattern of fixations during the performance of a well-learned task in a natural setting (making tea), and to classify the types of monitoring action that the eyes perform. We used a head-mounted eye-movement video camera, which provided a continuous view of the scene ahead, with a dot indicating foveal direction with an accuracy of about 1 deg. A second video camera recorded the subject's activities from across the room. The videos were linked and analysed frame by frame. Foveal direction was always close to the object being manipulated, and very few fixations were irrelevant to the task. The first object-related fixation typically led the first indication of manipulation by 0.56 s, and vision moved to the next object about 0.61 s before manipulation of the previous object was complete. Each object-related act that did not involve a waiting period lasted an average of 3.3 s and involved about 7 fixations. Roughly a third of all fixations on objects could be definitely identified with one of four monitoring functions: locating objects used later in the process, directing the hand or object in the hand to a new location, guiding the approach of one object to another (e.g. kettle and lid), and checking the state of some variable (e.g. water level). We conclude that although the actions of tea-making are 'automated' and proceed with little conscious involvement, the eyes closely monitor every step of the process. This type of unconscious attention must be a common phenomenon in everyday life.
Article
Full-text available
Unlabelled: A theory is presented that attempts to answer two questions. What visual contents can an observer consciously access at one moment? Answer: only one feature value (e.g., green) per dimension, but those feature values can be associated (as a group) with multiple spatially precise locations (comprising a single labeled Boolean map). How can an observer voluntarily select what to access? Answer: in one of two ways: (a) by selecting one feature value in one dimension (e.g., selecting the color red) or (b) by iteratively combining the output of (a) with a preexisting Boolean map via the Boolean operations of intersection and union. Boolean map theory offers a unified interpretation of a wide variety of visual attention phenomena usually treated in separate literatures. In so doing, it also illuminates the neglected phenomena of attention to structure.
Article
Full-text available
A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail. Index terms: Visual attention, scene analysis, feature extraction, target detection, visual search. \Pi I. Introduction Primates have a remarkable ability to interpret complex scenes in real time, despite the limited speed of the neuronal hardware available for such tasks. Intermediate and higher visual processes appear to select a subset of the available sensory information before further processing [1], most likely to reduce the complexity of scene analysis [2]. This selection appears to be implemented in the ...
Conference Paper
Human-technology interactions involving errors undermine acceptance and performance. The effect of errors and the ability to recover from them represent a particularly important consideration for design in safety-critical multitasking situations. However, few studies have considered the recovery process of errors in multitasking situations, such as their contribution to driver distraction. This paper investigates errors that drivers make interacting with an infotainment system. In this study, participants (N = 46) drove a stimulated vehicle and performed word entry tasks on a touch screen. Errors undermined driving and task performance. We also identified four different error recovery strategies and found that the accumulated information related to the driving situation and the characteristics of an infotainment system affected the choice of strategy. Implications for in-vehicle interface design, driver models, and general multitasking design are discussed.
Article
Most models of visual search, whether involving overt eye movements or covert shifts of attention, are based on the concept of a saliency map, that is, an explicit two-dimensional map that encodes the saliency or conspicuity of objects in the visual environment. Competition among neurons in this map gives rise to a single winning location that corresponds to the next attended target. Inhibiting this location automatically allows the system to attend to the next most salient location. We describe a detailed computer implementation of such a scheme, focusing on the problem of combining information across modalities, here orientation, intensity and color information, in a purely stimulus-driven manner. The model is applied to common psychophysical stimuli as well as to a very demanding visual search task. Its successful performance is used to address the extent to which the primate visual system carries out visual search via one or more such saliency maps and how this can be tested.
Article
Voluntary and relatively involuntary subsystems of attention often compete. On one hand, people can intentionally "tune" attention for features that then receive visual priority; on the other hand, more reflexive attentional shifts can "short-circuit" top-down control in the face of urgent, behaviourally relevant stimuli. Thus, it is questionable whether voluntary attentional tuning (i.e., attentional set) can affect one's ability to respond to unexpected, urgent information in the real world. We show that the consequences of such tuning extend to a realistic, safety-relevant scenario. Participants drove in a first-person driving simulation where they searched at every intersection for either a yellow or blue arrow indicating which way to turn. At a critical intersection, a yellow or blue motorcycle - either matching or not matching drivers' attentional set - suddenly veered into drivers' paths and stopped in their way. Collision rates with the motorcycle were substantially greater when the motorcycle did not match drivers' attentional sets.
Conference Paper
A novel Boolean Map based Saliency (BMS) model is proposed. An image is characterized by a set of binary images, which are generated by randomly thresholding the image's color channels. Based on a Gestalt principle of figure-ground segregation, BMS computes saliency maps by analyzing the topological structure of Boolean maps. BMS is simple to implement and efficient to run. Despite its simplicity, BMS consistently achieves state-of-the-art performance compared with ten leading methods on five eye tracking datasets. Furthermore, BMS is also shown to be advantageous in salient object detection.
Article
In this study, the authors used algorithms to estimate driver distraction and predict crash and near-crash risk on the basis of driver glance behavior using the data set of the 100-Car Naturalistic Driving Study. Driver distraction has been a leading cause of motor vehicle crashes, but the relationship between distractions and crash risk lacks detailed quantification. The authors compared 24 algorithms that varied according to how they incorporated three potential contributors to distraction--glance duration, glance history, and glance location--on how well the algorithms predicted crash risk. Distraction estimated from driver eye-glance patterns was positively associated with crash risk. The algorithms incorporating ongoing off-road glance duration predicted crash risk better than did the algorithms incorporating glance history.Augmenting glance duration with other elements of glance behavior--1.5th power of duration and duration weighted by glance location--produced similar prediction performance as glance duration alone. The distraction level estimated by the algorithms that include current glance duration provides the most sensitive indicator of crash risk. The results inform the design of algorithms to monitor driver state that support real-time distraction mitigation systems.
Visual analysis appears to be functionally divided between an early preattentive level of processing at which simple features are coded spatially in parallel and a later stage at which focused attention is required to conjoin the separate features into coherent objects. Evidence supporting this dichotomy comes from behavioral studies of visual search, from differences in the ease of texture segregation, from reports of illusory conjunctions when attention is overloaded, from subjects' ability to identify simple features correctly even when they mislocate them, and from the substantial benefit of pre-cuing the location of a relevant item when the task requires that features be conjoined but not when simple features are sufficient. Some further studies of search have revealed a striking asymmetry between several pairs of stimuli which differ in the presence or absence of a single part or property. The asymmetry depends solely on which of the pair is allocated the role of target and which is replicated to form the background items. It suggests that search for the presence of a visual primitive is automatic and parallel, whereas search for the absence of the same feature is serial and requires focused attention. The search asymmetry can be used as an additional diagnostic to help define the functional features extracted by the visual system.
Conference Paper
For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottom-up computation that does not consider top-down image semantics and often does not match actual eye movements. To address this problem, we collected eye tracking data of 15 viewers on 1003 images and use this database as training and testing examples to learn a model of saliency based on low, middle and high-level image features. This large database of eye tracking data is publicly available with this paper.
Article
Most models of visual search, whether involving overt eye movements or covert shifts of attention, are based on the concept of a saliency map, that is, an explicit two-dimensional map that encodes the saliency or conspicuity of objects in the visual environment. Competition among neurons in this map gives rise to a single winning location that corresponds to the next attended target. Inhibiting this location automatically allows the system to attend to the next most salient location. We describe a detailed computer implementation of such a scheme, focusing on the problem of combining information across modalities, here orientation, intensity and color information, in a purely stimulus-driven manner. The model is applied to common psychophysical stimuli as well as to a very demanding visual search task. Its successful performance is used to address the extent to which the primate visual system carries out visual search via one or more such saliency maps and how this can be tested.
Integration of goal-driven, top-down attention and image-driven, bottom-up attention is crucial for visual search. Yet, previous research has mostly focused on models that are purely top-down or bottom-up. Here, we propose a new model that combines both. The bottom-up component computes the visual salience of scene locations in different feature maps extracted at multiple spatial scales. The topdown component uses accumulated statistical knowledge of the visual features of the desired search target and background clutter, to optimally tune the bottom-up maps such that target detection speed is maximized. Testing on 750 artificial and natural scenes shows that the model’s predictions are consistent with a large body of available literature on human psychophysics of visual search. These results suggest that our model may provide good approximation of how humans combine bottom-up and top-down cues such as to optimize target detection speed.
MIT Saliency Benchmark
  • Z Bylinskii
  • T Judd
  • A Borji
  • L Itti
  • F Duran
  • A Oliva
  • A Torralba
Bylinskii, Z., Judd, T., Borji, A., Itti, L., Duran, F., Oliva, A., & Torralba, A. (n.d.). MIT Saliency Benchmark. Retrieved from http://saliency.mit.edu/
Integrating the saliency map with distract-r to assess driver distraction of vehicle displays
  • J Lee
Lee, J. (2014). Integrating the saliency map with distract-r to assess driver distraction of vehicle displays. University of Wisconsin-Madison.
3M Visual Attention Service Validation Study
3M Commercial Graphics Division. (2010). 3M Visual Attention Service Validation Study. http://solutions.3m.com/3MContentRetrievalAPI/ BlobServlet?lmd=1371740697000&locale=en_WW&assetType=MMM _Image&assetId=1361624948678&blobAttribute=ImageFile&WT.mc_i d=www.3m.com/VASstudy