[Show abstract][Hide abstract] ABSTRACT: Perception of scenes has typically been investigated by using static or simplified visual displays. How attention is used to perceive and evaluate dynamic, realistic scenes is more poorly understood, in part due to the problem of comparing eye fixations to moving stimuli across observers. When the task and stimulus is common across observers, consistent fixation location can indicate that that region has high goal-based relevance. Here we investigated these issues when an observer has a specific, and naturalistic, task: closed-circuit television (CCTV) monitoring. We concurrently recorded eye movements and ratings of perceived suspiciousness as different observers watched the same set of clips from real CCTV footage. Trained CCTV operators showed greater consistency in fixation location and greater consistency in suspiciousness judgements than untrained observers. Training appears to increase between-operators consistency by learning "knowing what to look for" in these scenes. We used a novel "Dynamic Area of Focus (DAF)" analysis to show that in CCTV monitoring there is a temporal relationship between eye movements and subsequent manual responses, as we have previously found for a sports video watching task. For trained CCTV operators and for untrained observers, manual responses were most highly related to between-observer eye position spread when a temporal lag was introduced between the fixation and response data. Several hundred milliseconds after between-observer eye positions became most similar, observers tended to push the joystick to indicate perceived suspiciousness. Conversely, several hundred milliseconds after between-observer eye positions became dissimilar, observers tended to rate suspiciousness as low. These data provide further support for this DAF method as an important tool for examining goal-directed fixation behavior when the stimulus is a real moving image.
Full-text · Article · Aug 2013 · Frontiers in Human Neuroscience
[Show abstract][Hide abstract] ABSTRACT: Clutter is something that is encountered in everyday life, from a messy desk to a crowded street. Such clutter may interfere with our ability to search for objects in such environments, like our car keys or the person we are trying to meet. A number of computational models of clutter have been proposed and shown to work well for artificial and other simplified scene search tasks. In this paper, we correlate the performance of different models of visual clutter to human performance in a visual search task using natural scenes. The models we evaluate are Feature Congestion (Rosenholtz, Li, & Nakano, 2007), Sub-band Entropy (Rosenholtz et al., 2007), Segmentation (Bravo & Farid, 2008), and Edge Density (Mack & Oliva, 2004) measures. The correlations were performed across a range of target-centered subregions to produce a correlation profile, indicating the scale at which clutter was affecting search performance. Overall clutter was rather weakly correlated with performance (r ≈ 0.2). However, different measures of clutter appear to reflect different aspects of the search task: correlations with Feature Congestion are greatest for the actual target patch, whereas the Sub-band Entropy is most highly correlated in a region 12° × 12° centered on the target.
No preview · Article · Apr 2013 · Journal of Vision
[Show abstract][Hide abstract] ABSTRACT: An innovative motoric measure of slant based on gait is proposed as the angle between the foot and the walking surface during walking. This work investigates whether the proposed action-based measure is affected by factors such as material and inclination of the walking surface. Experimental studies were conducted in a real environment set-up and in its virtual simulation counterpart evaluating behavioural fidelity and user performance in ecologically-valid simulations. In the real environment, the measure slightly overestimated the inclined path whereas in the virtual environment it slightly underestimated the inclined path. The results imply that the proposed slant measure is modulated by motoric caution. Since the “reality” of the synthetic environment was relatively high, performance results should have revealed the same degree of caution as in the real world, however, that was not the case. People become more cautious when the ground plane was steep, slippery, or virtual.
No preview · Article · Nov 2012 · International Journal of Human-Computer Studies
[Show abstract][Hide abstract] ABSTRACT: Various visual functions decline in ageing and even more so in patients with Alzheimer's disease (AD). Here we investigated whether the complex visual processes involved in ignoring illumination-related variability (specifically, cast shadows) in visual scenes may also be compromised. Participants searched for a discrepant target among items which appeared as posts with shadows cast by light-from-above when upright, but as angled objects when inverted. As in earlier reports, young participants gave slower responses with upright than inverted displays when the shadow-like part was dark but not white (control condition). This is consistent with visual processing mechanisms making shadows difficult to perceive, presumably to assist object recognition under varied illumination. Contrary to predictions, this interaction of “shadow” colour with item orientation was maintained in healthy older and AD groups. Thus, the processing mechanisms which assist complex light-independent object identification appear to be robust to the effects of both ageing and AD. Importantly, this means that the complexity of a function does not necessarily determine its vulnerability to age- or AD-related decline.
We also report slower responses to dark than light “shadows” of either orientation in both ageing and AD, in keeping with increasing light scatter in the ageing eye. Rather curiously, AD patients showed further slowed responses to “shadows” of either colour at the bottom than the top of items as if they applied shadow-specific rules to non-shadow conditions. This suggests that in AD, shadow-processing mechanisms, while preserved, might be applied in a less selective way.
[Show abstract][Hide abstract] ABSTRACT: Over the last decade, television screens and display monitors have increased in size considerably, but has this improved our televisual experience? Our working hypothesis was that the audiences adopt a general strategy that "bigger is better." However, as our visual perceptions do not tap directly into basic retinal image properties such as retinal image size (C. A. Burbeck, 1987), we wondered whether object size itself might be an important factor. To test this, we needed a task that would tap into the subjective experiences of participants watching a movie on different-sized displays with the same retinal subtense. Our participants used a line bisection task to self-report their level of "presence" (i.e., their involvement with the movie) at several target locations that were probed in a 45-min section of the movie "The Good, The Bad, and The Ugly." Measures of pupil dilation and reaction time to the probes were also obtained. In Experiment 1, we found that subjective ratings of presence increased with physical screen size, supporting our hypothesis. Face scenes also produced higher presence scores than landscape scenes for both screen sizes. In Experiment 2, reaction time and pupil dilation results showed the same trends as the presence ratings and pupil dilation correlated with presence ratings, providing some validation of the method. Overall, the results suggest that real-time measures of subjective presence might be a valuable tool for measuring audience experience for different types of (i) display and (ii) audiovisual material.
[Show abstract][Hide abstract] ABSTRACT: Completely natural scene search is a paradigm that cannot be directly compared to the typical types of search task studied, where objects are distinct and definable. Here we have look at the possibility of predicting the performance of humans for completely natural scene tasks, using a direct comparison of human performance against new and existing computer models of viewing natural images. For the human task, participants were asked to perform a target present/target absent search task on 120 natural Scenes, the target being a subsection of the Scene and the false-target matched to the scene. The identical task was given to a selection of reproductions of existing computer processing techniques, including Feature congestion (Rosenholtz et al., 2005 SIGCHI 761–770), Saliency (Itti & Koch, 2001 Journal of Electronic Imaging 10 161–169), Target Acquisition Model (Zelinsky, 2008 Psychological Review 115 787–835) and a new variation on the Visual Difference Predictor (To et al., 2008 Proceedings of the Royal Society B: Biological Sciences 275 2299–2308). We show that the models are very bad at generating parameters that predict performance, but that A' of Human performance is predicted pretty well by the simple clutter in the image and these results lead us to conclude that in natural search tasks, the nature of both the Scene and the Target are important, and that the global influence of local feature groups can have an influence of the task difficulty.
[Show abstract][Hide abstract] ABSTRACT: With sponsorship from Network Rail, Human Engineering Limited and the University of Bristol conducted a programme of work which aimed to assess the feasibility of, and subsequently develop, a visibility assessment tool to support Network Rail's sign and signal sighting work on the railway. This paper outlines the development of this visibility tool, the 'conspicuity camera', from its conception to the current time. In the first phase of this program, the distances at which a range of signs could be detected were determined through trials in a virtual simulator, at a range of linespeeds. From this, an angular subtense could be calculated for each sign; the smaller the angular subtense, the greater the visibility of the sign. In the second phase of the study, the application potential of using angular subtense to inform on visibility was explored. Specifically, this phase began to examine the feasibility of developing a tool, composed of a digital SLR camera and the "Cortex Model" of primate vision, in the prediction of railway signal/sign visibility. Signs from a virtual reality environment were photographed at a range of angular subtenses, and thesewere subsequently analysed in the related Cortex Model. Output from this phase demonstrated that the Cortex Model was able to predict the detection of signs in realistic settings. The most recent phase of work defined a baseline set of visibility measurements, i.e. a range of conspicuity thresholds, from the Cortex Model from signs and signals in the real environment, and validated the model output against subjective estimates from Subject Matter Experts (SMEs). It is envisaged that Network Rail could deploy the conspicuity camera in future to ensure that the signals and signs along a route are clearly visible to drivers.
[Show abstract][Hide abstract] ABSTRACT: Low-level stimulus salience and task relevance together determine the human fixation priority assigned to scene locations (Fecteau and Munoz in Trends Cogn Sci 10(8):382-390, 2006). However, surprisingly little is known about the contribution of task relevance to eye movements during real-world visual search where stimuli are in constant motion and where the 'target' for the visual search is abstract and semantic in nature. Here, we investigate this issue when participants continuously search an array of four closed-circuit television (CCTV) screens for suspicious events. We recorded eye movements whilst participants watched real CCTV footage and moved a joystick to continuously indicate perceived suspiciousness. We find that when multiple areas of a display compete for attention, gaze is allocated according to relative levels of reported suspiciousness. Furthermore, this measure of task relevance accounted for twice the amount of variance in gaze likelihood as the amount of low-level visual changes over time in the video stimuli.
Full-text · Article · Aug 2011 · Experimental Brain Research
[Show abstract][Hide abstract] ABSTRACT: We conducted suprathreshold discrimination experiments to compare how natural-scene information is processed in central and peripheral vision (16° eccentricity). Observers' ratings of the perceived magnitude of changes in naturalistic scenes were lower for peripheral than for foveal viewing, and peripheral orientation changes were rated less than peripheral colour changes. A V1-based Visual Difference Predictor model of the magnitudes of perceived foveal change was adapted to match the sinusoidal grating sensitivities of peripheral vision, but it could not explain why the ratings for changes in peripheral stimuli were so reduced. Perceived magnitude ratings for peripheral stimuli were further reduced by simultaneous presentation of flanking patches of naturalistic images, a phenomenon that could not be replicated foveally, even after M-scaling the foveal stimuli to reduce their size and the distances from the flankers. The effects of the peripheral flankers are very reminiscent of crowding phenomena demonstrated with letters or Gabor patches.
[Show abstract][Hide abstract] ABSTRACT: “Presence” is the illusion of being in a mediated experience rather than simply being an observer. It is a concept often applied to the question of realism of virtual environments. However, it is equally applicable to the act of watching a movie. A movie provides a markedly different visual environment to that given by the natural world—particularly because of frequent edits. And yet, the audience in a movie achieves high levels of presence. We investigate the relationship between presence and the optical and temporal parameters of movies. We find effects of mean shot length, colour/b&w, and 3D/2D. We find that short shots, while being unnatural, are associated with high levels of presence. We consider why such artificial stimuli should appear so real and immersive.
[Show abstract][Hide abstract] ABSTRACT: Psychophysical measurements of human contrast sensitivity suggest that luminance is encoded in a spatially bandpass manner, whereas chrominance has a low-pass characteristic. However, the Fourier content of natural scenes usually does not reflect this different encoding - such scenes are not particularly rich in chromatic low spatial frequency information. Is this true for all such scenes? In particular, what are the spatio-chromatic properties of those scenes, such as fruit in foliage, for which color vision is thought to have evolved? We produced a set of images of natural colored objects on backgrounds of foliage using a digital camera calibrated to give relative human cone responses for each pixel. We transformed these images into red-green chrominance, and luminance, images. We measured the spectral slopes (log amplitude versus log spatial frequency) of these image pairs. If certain image types are rich in low spatial-frequency information, the chromatic slope is steeper than the luminance slope, and this situation would be in keeping with human contrast sensitivity data. We found that images of fruit and other colored objects on backgrounds of foliage are rich in low spatial-frequency chromatic content if the objects are viewed from relatively close; the effect becomes especially marked when the colored objects are at around normal grasping distance. The effect does not hold when the fruit etc is seen as a distant landscape. The effect holds under both sunny and cloudy illumination. The results of this analysis suggest that human spatio-chromatic encoding is particularly well suited to the properties of a subset of natural scenes.
No preview · Article · Dec 2010 · Journal of Vision
[Show abstract][Hide abstract] ABSTRACT: In a virtual environment (VE), efficient techniques are often needed to economize on rendering computation without compromising the information transmitted. The reported experiments devise a functional fidelity metric by exploiting research on memory schemata. According to the proposed measure, similar information would be transmitted across synthetic and real-world scenes depicting a specific schema. This would ultimately indicate which areas in a VE could be rendered in lower quality without affecting information uptake. We examine whether computationally more expensive scenes of greater visual fidelity affect memory performance after exposure to immersive VEs, or whether they are merely more aesthetically pleasing than their diminished visual quality counterparts. Results indicate that memory schemata function in VEs similar to real-world environments. “High-level” visual cognition related to late visual processing is unaffected by ubiquitous graphics manipulations such as polygon count and depth of shadow rendering; “normal” cognition operates as long as the scenes look acceptably realistic. However, when the overall realism of the scene is greatly reduced, such as in wireframe, then visual cognition becomes abnormal. Effects that distinguish schema-consistent from schema-inconsistent objects change because the whole scene now looks incongruent. We have shown that this effect is not due to a failure of basic recognition.
Full-text · Article · Oct 2010 · ACM Transactions on Applied Perception
[Show abstract][Hide abstract] ABSTRACT: The Euclidean and MAX metrics have been widely used to model cue summation psychophysically and computationally. Both rules happen to be special cases of a more general Minkowski summation rule , where m = 2 and ∞, respectively. In vision research, Minkowski summation with power m = 3-4 has been shown to be a superior model of how subthreshold components sum to give an overall detection threshold. Recently, we have previously reported that Minkowski summation with power m = 2.84 accurately models summation of suprathreshold visual cues in photographs. In four suprathreshold discrimination experiments, we confirm the previous findings with new visual stimuli and extend the applicability of this rule to cue combination in auditory stimuli (musical sequences and phonetic utterances, where m = 2.95 and 2.54, respectively) and cross-modal stimuli (m = 2.56). In all cases, Minkowski summation with power m = 2.5-3 outperforms the Euclidean and MAX operator models. We propose that this reflects the summation of neuronal responses that are not entirely independent but which show some correlation in their magnitudes. Our findings are consistent with electrophysiological research that demonstrates signal correlations (r = 0.1-0.2) between sensory neurons when these are presented with natural stimuli.
Full-text · Article · Oct 2010 · Proceedings of the Royal Society B: Biological Sciences
[Show abstract][Hide abstract] ABSTRACT: We measured the temporal relationship between eye movements and manual responses while experts and novices watched a videotaped football match. Observers used a joystick to continuously indicate the likelihood of an imminent goal. We measured correlations between manual responses and between-subjects variability in eye position. To identify the lag magnitude, we repeated these correlations over a range of possible delays between these two measures and searched for the most negative correlation coefficient. We found lags in the order of 2 sec and an effect of expertise on lag magnitude, suggesting that expertise has its effect by directing eye movements to task-relevant areas of a scene more quickly, facilitating a longer processing duration before behavioral decisions are made. This is a powerful new method for examining the eye movement behavior of multiple observers across complex moving images.
[Show abstract][Hide abstract] ABSTRACT: The extent of dilation of pupil of the eye is a reliable measure of cognitive load. We have previously shown that with appropriate luminance controls, pupillary dilation during visual tasks offers insight into the extent of higher level processing occurring during task performance (Porter, Troscianko, and Gilchrist, Perception 31 suppl, 170-171; 2002). In particular, differences were found in the dilatory pattern between difficult visual search and counting tasks despite matched reaction times and identical stimuli. Counting elicited immediate marked pupil dilation, sustained until response, whereas dilation during search increased gradually until response and was reduced in overall magnitude. To investigate whether these patterns correspond to memory load, pupil size was measured during performance of search tasks in which the memory component was manipulated. By changing a traditional "target absent or target present" search task to "one target present or two targets present", spatial memory was increased, as the need to remember one target's location, once found, was introduced. Accordingly, pupillary dilation was slightly greater in the "one or two" task than in the target absent/present task, but only when nearing response. When target identity varied by trial, greater dilation was seen early in the search process than when target identity remained constant. This corresponded to the increased effort in encoding the target. These results suggest that pupil dilation is sensitive to both spatial and recognition memory load, and that memory for both target identity and location is differentially involved in different visual search tasks.
No preview · Article · Oct 2010 · Journal of Vision
[Show abstract][Hide abstract] ABSTRACT: The human visual system (hvs), as well as that of other trichromatic primates, has different contrast sensitivity functions for chromatic and luminance stimuli. The spatial filtering is low-pass for chromatic stimuli and band-pass for luminance. Previous results have shown that a subset of natural scenes, namely those with red objects (e.g.fruit) on a background of leaves have spatial properties that correspond to this physiological spatial filtering (Parraga, Troscianko and Tolhurst; Current Biology 12, 483-487; 2000). Our original dataset on which these conclusions were based was consisted of English natural scenes. Here we analysed the spatiochromatic properties of a dataset of natural scenes obtained in Kibale Forest, Uganda, which is a natural habitat containing large numbers of wild trichromatic primates. We used the same calibrated digital camera as in the previous study, which delivers L,M,S cone responses, and opponent-channel responses, for each pixel. We obtained 270 images of scenes, many of them containing red fruit, red leaves, red flowers and green leaves corresponding to the primate visual environment as seen from the ground and from the canopy. All the red fruit and leaves were confirmed as forming a significant part of the diet of trichromatic primates. Our results support the earlier finding (with English plants), namely that the luminance and chromatic Fourier spectra of pictures containing reddish objects on a background of leaves correspond well to the spatio-chromatic properties of the luminance and red-green systems in human vision, at viewing distances of the same order of magnitude as the grasping distance.
No preview · Article · Oct 2010 · Journal of Vision
[Show abstract][Hide abstract] ABSTRACT: We are studying how people perceive naturalistic suprathreshold changes in the colour, size, shape or location of items in images of natural scenes, using magnitude estimation ratings to characterise the sizes of the perceived changes in coloured photographs. We have implemented a computational model that tries to explain observers' ratings of these naturalistic differences between image pairs. We model the action-potential firing rates of millions of neurons, having linear and non-linear summation behaviour closely modelled on real VI neurons. The numerical parameters of the model's sigmoidal transducer function are set by optimising the same model to experiments on contrast discrimination (contrast 'dippers') on monochrome photographs of natural scenes. The model, optimised on a stimulus-intensity domain in an experiment reminiscent of the Weber-Fechner relation, then produces tolerable predictions of the ratings for most kinds of naturalistic image change. Importantly, rating rises roughly linearly with the model's numerical output, which represents differences in neuronal firing rate in response to the two images under comparison; this implies that rating is proportional to the neuronal response.
No preview · Article · Oct 2010 · Seeing and perceiving