[Show abstract][Hide abstract] ABSTRACT: Perception of scenes has typically been investigated by using static or simplified visual displays. How attention is used to perceive and evaluate dynamic, realistic scenes is more poorly understood, in part due to the problem of comparing eye fixations to moving stimuli across observers. When the task and stimulus is common across observers, consistent fixation location can indicate that that region has high goal-based relevance. Here we investigated these issues when an observer has a specific, and naturalistic, task: closed-circuit television (CCTV) monitoring. We concurrently recorded eye movements and ratings of perceived suspiciousness as different observers watched the same set of clips from real CCTV footage. Trained CCTV operators showed greater consistency in fixation location and greater consistency in suspiciousness judgements than untrained observers. Training appears to increase between-operators consistency by learning "knowing what to look for" in these scenes. We used a novel "Dynamic Area of Focus (DAF)" analysis to show that in CCTV monitoring there is a temporal relationship between eye movements and subsequent manual responses, as we have previously found for a sports video watching task. For trained CCTV operators and for untrained observers, manual responses were most highly related to between-observer eye position spread when a temporal lag was introduced between the fixation and response data. Several hundred milliseconds after between-observer eye positions became most similar, observers tended to push the joystick to indicate perceived suspiciousness. Conversely, several hundred milliseconds after between-observer eye positions became dissimilar, observers tended to rate suspiciousness as low. These data provide further support for this DAF method as an important tool for examining goal-directed fixation behavior when the stimulus is a real moving image.
Frontiers in Human Neuroscience 01/2013; 7:441. · 2.91 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Clutter is something that is encountered in everyday life, from a messy desk to a crowded street. Such clutter may interfere with our ability to search for objects in such environments, like our car keys or the person we are trying to meet. A number of computational models of clutter have been proposed and shown to work well for artificial and other simplified scene search tasks. In this paper, we correlate the performance of different models of visual clutter to human performance in a visual search task using natural scenes. The models we evaluate are Feature Congestion (Rosenholtz, Li, & Nakano, 2007), Sub-band Entropy (Rosenholtz et al., 2007), Segmentation (Bravo & Farid, 2008), and Edge Density (Mack & Oliva, 2004) measures. The correlations were performed across a range of target-centered subregions to produce a correlation profile, indicating the scale at which clutter was affecting search performance. Overall clutter was rather weakly correlated with performance (r ≈ 0.2). However, different measures of clutter appear to reflect different aspects of the search task: correlations with Feature Congestion are greatest for the actual target patch, whereas the Sub-band Entropy is most highly correlated in a region 12° × 12° centered on the target.
Journal of Vision 01/2013; 13(5). · 2.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: An innovative motoric measure of slant based on gait is proposed as the angle between the foot and the walking surface during walking. This work investigates whether the proposed action-based measure is affected by factors such as material and inclination of the walking surface. Experimental studies were conducted in a real environment set-up and in its virtual simulation counterpart evaluating behavioural fidelity and user performance in ecologically-valid simulations. In the real environment, the measure slightly overestimated the inclined path whereas in the virtual environment it slightly underestimated the inclined path. The results imply that the proposed slant measure is modulated by motoric caution. Since the “reality” of the synthetic environment was relatively high, performance results should have revealed the same degree of caution as in the real world, however, that was not the case. People become more cautious when the ground plane was steep, slippery, or virtual.
International Journal of Human-Computer Studies 11/2012; 70(11):781–793. · 1.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Background / Purpose:
The purpose of this study was to look at the predictive capabilities of image measures when applied to natural scenes, when compared to human search performance.
Clutter metrics have a predictive capability, but the target has to be considered.
Vision Sciences Society Annual Meeting 2012; 05/2012
[Show abstract][Hide abstract] ABSTRACT: Various visual functions decline in ageing and even more so in patients with Alzheimer's disease (AD). Here we investigated whether the complex visual processes involved in ignoring illumination-related variability (specifically, cast shadows) in visual scenes may also be compromised. Participants searched for a discrepant target among items which appeared as posts with shadows cast by light-from-above when upright, but as angled objects when inverted. As in earlier reports, young participants gave slower responses with upright than inverted displays when the shadow-like part was dark but not white (control condition). This is consistent with visual processing mechanisms making shadows difficult to perceive, presumably to assist object recognition under varied illumination. Contrary to predictions, this interaction of "shadow" colour with item orientation was maintained in healthy older and AD groups. Thus, the processing mechanisms which assist complex light-independent object identification appear to be robust to the effects of both ageing and AD. Importantly, this means that the complexity of a function does not necessarily determine its vulnerability to age- or AD-related decline.We also report slower responses to dark than light "shadows" of either orientation in both ageing and AD, in keeping with increasing light scatter in the ageing eye. Rather curiously, AD patients showed further slowed responses to "shadows" of either colour at the bottom than the top of items as if they applied shadow-specific rules to non-shadow conditions. This suggests that in AD, shadow-processing mechanisms, while preserved, might be applied in a less selective way.
PLoS ONE 01/2012; 7(9):e45104. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Over the last decade, television screens and display monitors have increased in size considerably, but has this improved our televisual experience? Our working hypothesis was that the audiences adopt a general strategy that "bigger is better." However, as our visual perceptions do not tap directly into basic retinal image properties such as retinal image size (C. A. Burbeck, 1987), we wondered whether object size itself might be an important factor. To test this, we needed a task that would tap into the subjective experiences of participants watching a movie on different-sized displays with the same retinal subtense. Our participants used a line bisection task to self-report their level of "presence" (i.e., their involvement with the movie) at several target locations that were probed in a 45-min section of the movie "The Good, The Bad, and The Ugly." Measures of pupil dilation and reaction time to the probes were also obtained. In Experiment 1, we found that subjective ratings of presence increased with physical screen size, supporting our hypothesis. Face scenes also produced higher presence scores than landscape scenes for both screen sizes. In Experiment 2, reaction time and pupil dilation results showed the same trends as the presence ratings and pupil dilation correlated with presence ratings, providing some validation of the method. Overall, the results suggest that real-time measures of subjective presence might be a valuable tool for measuring audience experience for different types of (i) display and (ii) audiovisual material.
[Show abstract][Hide abstract] ABSTRACT: Low-level stimulus salience and task relevance together determine the human fixation priority assigned to scene locations (Fecteau and Munoz in Trends Cogn Sci 10(8):382-390, 2006). However, surprisingly little is known about the contribution of task relevance to eye movements during real-world visual search where stimuli are in constant motion and where the 'target' for the visual search is abstract and semantic in nature. Here, we investigate this issue when participants continuously search an array of four closed-circuit television (CCTV) screens for suspicious events. We recorded eye movements whilst participants watched real CCTV footage and moved a joystick to continuously indicate perceived suspiciousness. We find that when multiple areas of a display compete for attention, gaze is allocated according to relative levels of reported suspiciousness. Furthermore, this measure of task relevance accounted for twice the amount of variance in gaze likelihood as the amount of low-level visual changes over time in the video stimuli.
Experimental Brain Research 08/2011; 214(1):131-7. · 2.22 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We measured the temporal relationship between eye movements and manual responses while experts and novices watched a videotaped football match. Observers used a joystick to continuously indicate the likelihood of an imminent goal. We measured correlations between manual responses and between-subjects variability in eye position. To identify the lag magnitude, we repeated these correlations over a range of possible delays between these two measures and searched for the most negative correlation coefficient. We found lags in the order of 2 sec and an effect of expertise on lag magnitude, suggesting that expertise has its effect by directing eye movements to task-relevant areas of a scene more quickly, facilitating a longer processing duration before behavioral decisions are made. This is a powerful new method for examining the eye movement behavior of multiple observers across complex moving images.
[Show abstract][Hide abstract] ABSTRACT: We are studying how people perceive naturalistic suprathreshold changes in the colour, size, shape or location of items in images of natural scenes, using magnitude estimation ratings to characterise the sizes of the perceived changes in coloured photographs. We have implemented a computational model that tries to explain observers' ratings of these naturalistic differences between image pairs. We model the action-potential firing rates of millions of neurons, having linear and non-linear summation behaviour closely modelled on real VI neurons. The numerical parameters of the model's sigmoidal transducer function are set by optimising the same model to experiments on contrast discrimination (contrast 'dippers') on monochrome photographs of natural scenes. The model, optimised on a stimulus-intensity domain in an experiment reminiscent of the Weber-Fechner relation, then produces tolerable predictions of the ratings for most kinds of naturalistic image change. Importantly, rating rises roughly linearly with the model's numerical output, which represents differences in neuronal firing rate in response to the two images under comparison; this implies that rating is proportional to the neuronal response.
Seeing and perceiving 10/2010; 23(4):349-72. · 0.98 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In a virtual environment (VE), efficient techniques are often needed to economize on rendering computation without compromising the information transmitted. The reported experiments devise a functional fidelity metric by exploiting research on memory schemata. According to the proposed measure, similar information would be transmitted across synthetic and real-world scenes depicting a specific schema. This would ultimately indicate which areas in a VE could be rendered in lower quality without affecting information uptake. We examine whether computationally more expensive scenes of greater visual fidelity affect memory performance after exposure to immersive VEs, or whether they are merely more aesthetically pleasing than their diminished visual quality counterparts. Results indicate that memory schemata function in VEs similar to real-world environments. “High-level” visual cognition related to late visual processing is unaffected by ubiquitous graphics manipulations such as polygon count and depth of shadow rendering; “normal” cognition operates as long as the scenes look acceptably realistic. However, when the overall realism of the scene is greatly reduced, such as in wireframe, then visual cognition becomes abnormal. Effects that distinguish schema-consistent from schema-inconsistent objects change because the whole scene now looks incongruent. We have shown that this effect is not due to a failure of basic recognition.
[Show abstract][Hide abstract] ABSTRACT: The combining of visible light and infrared visual representations occurs naturally in some creatures, including the rattlesnake. This process, and the wide-spread use of multi-spectral multi-sensor systems, has influenced research into image fusion methods. Recent advances in image fusion techniques have necessitated the creation of novel ways of assessing fused images, which have previously focused on the use of subjective quality ratings combined with computational metric assessment. Previous work has shown the need to apply a task to the assessment process; the current work continues this approach by extending the novel use of scanpath analysis. In our experiments, participants were shown two video sequences, one in high luminance (HL) and one in low luminance (LL), both featuring a group of people walking around a clearing of trees. Each participant was shown visible and infrared (IR) inputs alone; and side-by-side (SBS); in an average (AVE) fused; a discrete wavelet transform (DWT) fused; and a dual-tree complex wavelet transform (DT-CWT) fused displays. Participants were asked to track one individual in each video sequence, as well as responding by key press when other individuals carried out secondary actions. Results showed the SBS display to lead to much poorer accuracy than the other displays, while reaction times in carrying out the secondary task favoured AVE in the HL sequence and DWT in the LL sequence. Results are discussed in relation to previous findings regarding item saliency and task demands, and the potential for comparative experiments evaluating human performance when viewing fused sequences against naturally occurring fusion processes such as the rattlesnake is highlighted.
[Show abstract][Hide abstract] ABSTRACT: Simple everyday tasks, such as visual search, require a visual system that is sensitive to differences. Here we report how observers perceive changes in natural image stimuli, and what happens if objects change color, position, or identity-i.e., when the external scene changes in a naturalistic manner. We investigated whether a V1-based difference-prediction model can predict the magnitude ratings given by observers to suprathreshold differences in numerous pairs of natural images. The model incorporated contrast normalization and surround suppression, and elongated receptive-fields. Observers' ratings were better predicted when the model included phase invariance, and even more so when the stimuli were inverted and negated to lessen their semantic impact. Some feature changes were better predicted than others: the model systematically underpredicted observers' perception of the magnitude of blur, but over-predicted their ability to report changes in textures.
Journal of Vision 01/2010; 10(4):12.1-22. · 2.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The allocation of overt visual attention while viewing photographs of natural scenes is commonly thought to involve both bottom-up feature cues, such as luminance contrast, and top-down factors such as behavioural relevance and scene understanding. Profiting from the fact that light sources are highly visible but uninformative in visual scenes, we develop a mixture model approach that estimates the relative contribution of various low and high-level factors to patterns of eye movements whilst viewing natural scenes containing light sources. Low-level salience accounts predicted fixations at luminance contrast and at lights, whereas these factors played only a minor role in the observed human fixations. Conversely, human data were mostly explicable in terms of a central bias and a foreground preference. Moreover, observers were more likely to look near lights rather than directly at them, an effect that cannot be explained by low-level stimulus factors such as luminance or contrast. These and other results support the idea that the visual system neglects highly visible cues in favour of less visible object information. Mixture modelling might be a good way forward in understanding visual scene exploration, since it makes it possible to measure the extent that low-level or high-level cues act as drivers of eye movements.
[Show abstract][Hide abstract] ABSTRACT: Differences in the processing mechanisms underlying visual feature and conjunction search are still under debate, one problem being a common emphasis on performance measures (speed and accuracy) which do not necessarily provide insights to the underlying processing principles. Here, eye movements and pupil dilation were used to investigate sampling strategy and processing load during performance of a conjunction and two feature-search tasks, with younger (18-27 years) and healthy older (61-83 years) age groups compared for evidence of differential age-related changes. The tasks involved equivalent processing time per item, were controlled in terms of target-distractor similarity, and did not allow perceptual grouping. Close matching of the key tasks was confirmed by patterns of fixation duration and an equal number of saccades required to find a target. Moreover, moment-to-moment pupillary dilation was indistinguishable across the tasks for both age groups, suggesting that all required the same total amount of effort or resources. Despite matching, subtle differences in eye movement patterns occurred between tasks: the conjunction task required more saccades to reach a target-absent decision and involved shorter saccade amplitudes than the feature tasks. General age-related changes were manifested in an increased number of saccades and longer fixation durations in older than younger participants. In addition, older people showed disproportionately longer and more variable fixation durations for the conjunction task specifically. These results suggest a fundamental difference between conjunction and feature search: accurate target identification in the conjunction context requires more conservative eye movement patterns, with these further adjusted in healthy ageing. The data also highlight the independence of eye movement and pupillometry measures and stress the importance of saccades and strategy for understanding the processing mechanisms driving different types of visual search.
[Show abstract][Hide abstract] ABSTRACT: Deficits in inefficient visual search task performance in Alzheimer's disease (AD) have been linked both to a general depletion of attentional resources and to a specific difficulty in performing conjunction discriminations. It has been difficult to examine the latter proposal because the uniqueness of conjunction search as compared to other visual search tasks has remained a matter of debate. We explored both these claims by measuring pupil dilation, as a measure of resource application, while patients with AD performed a conjunction search task and two single-feature search tasks of similar difficulty in healthy individuals. Maximum pupil dilation in the AD group was greater during performance of the conjunction than the feature search tasks, although pupil response was indistinguishable for the three tasks in healthy controls. This, together with patients' false positive errors for the conjunction task, indicates an AD-specific deficit impacting upon the ability to combine information on multiple dimensions. In addition, maximum pupil dilation was no less for patients than the control group during task performance, which tends to oppose the concept of general resource depletion in AD. However, eye movement patterns in the patient group indicated that they were less able than controls to use organised strategies to assist with task performance. The data are therefore in keeping with a loss of access to resource-saving strategies, rather than a loss of resources per se, in AD. Moreover they demonstrate an additional processing mechanism in performing conjunction search compared with inefficient single-feature search.
[Show abstract][Hide abstract] ABSTRACT: Shadows may be "discounted" in human visual perception because they do not provide stable, lighting-invariant, information about the properties of objects in the environment. Using visual search, R. A. Rensink and P. Cavanagh (2004) found that search for an upright discrepant shadow was less efficient than for an inverted one. Here we replicate and extend this work using photographs of real objects (pebbles) and their shadows. The orientation of the target shadows was varied between 30 and 180 degrees. Stimuli were presented upright (light from above, the usual situation in the world) or inverted (light from below, unnatural lighting). RTs for upright images were slower for shadows angled at 30 degrees, exactly as found by Rensink and Cavanagh. However, for all other shadow angles tested, the RTs were faster for upright images. This suggests, for small discrepancies in shadow orientation, a switch of processing from a relatively coarse-scaled shadow system to other general-purpose visual routines. Manipulations of the visual heterogeneity of the pebbles that cast the shadows differentially influenced performance. For inverted images, heterogeneity had the expected influence: reducing search efficiency and increasing overall search time. This effect was greatly reduced when images were presented upright, presumably when the distractors were processed as shadows. We suggest that shadows may be processed in a functionally separate, spatially coarse, mechanism. The pattern of results suggests that human vision does not use a shadow-suppressing system in search tasks.
Journal of Vision 02/2009; 9(1):37.1-14. · 2.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Visual difference predictor (VDP) models have played a key role in digital image applications such as the development of image quality metrics. However, little attention has been paid to their applicability to peripheral vision. Central (i.e. foveal) vision is extremely sensitive for the contrast detection of simple stimuli such as sinusoidal gratings, but peripheral vision is less sensitive. Furthermore, crowding is a well-documented phenomenon whereby differences in suprathreshold peripherally-viewed target objects (such as individual letters or patches of sinusoidal grating) become more difficult to discriminate when surrounded by other objects (flankers). We examine three factors that might influence the degree of crowding with natural-scene stimuli (cropped from photographs of natural scenes): 1) location in the visual field, 2) distance between target and flankers, and 3) flanker-target similarity. We ask how these factors affect crowding in a suprathreshold discrimination experiment where observers rate the perceived differences between two sequentially-presented target patches of natural images. The targets might differ in the shape, size, arrangement or color of items in the scenes. Changes in uncrowded peripheral targets are perceived to be less than for the same changes viewed foveally. Consistent with previous research on simple stimuli, we find that crowding in the periphery (but not in the fovea) reduces the magnitudes of perceived changes even further, especially when the flankers are closer and more similar to the target. We have tested VDP models based on the response behavior of neurons in visual cortex and the inhibitory interactions between them. The models do not explain the lower ratings for peripherally-viewed changes even when the lower peripheral contrast sensitivity was accounted for; nor could they explain the effects of crowding, which others have suggested might arise from errors in the spatial localization of features in the peripheral image. This suggests that conventional VDP models do not port well to peripheral vision. CR Categories: J.2 (Physical Sciences and Engineering): Engineering; J.4 (Social and Behavioral Sciences): Psychology
[Show abstract][Hide abstract] ABSTRACT: Despite embodying fundamentally different assumptions about attentional allocation, a wide range of popular models of attention include a max-of-outputs mechanism for selection. Within these models, attention is directed to the items with the most extreme-value along a perceptual dimension via, for example, a winner-take-all mechanism. From the detection theoretic approach, this MAX-observer can be optimal under specific situations, however in distracter heterogeneity manipulations or in natural visual scenes this is not always the case. We derive a Bayesian maximum a posteriori (MAP)-observer, which is optimal in both these situations. While it retains a form of the max-of-outputs mechanism, it is based on the maximum a posterior probability dimension, instead of a perceptual dimension. To test this model we investigated human visual search performance using a yes/no procedure while adding external orientation uncertainty to distracter elements. The results are much better fitted by the predictions of a MAP observer than a MAX observer. We conclude a max-like mechanism may well underlie the allocation of visual attention, but this is based upon a probability dimension, not a perceptual dimension.
Journal of Vision 01/2009; 9(5):15.1-11. · 2.48 Impact Factor