ArticlePublisher preview available

Priming of Simple and Complex Scene Layout: Rapid Function From the Intermediate Level

American Psychological Association
Journal of Experimental Psychology: Human Perception and Performance
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Three experiments examined the time course of layout priming with photographic scenes varying in complexity (number of objects). Primes were presented for varying durations (800-50 ms) before a target scene with 2 spatial probes; observers indicated whether the left or right probe was closer to viewpoint. Reaction time was the main measure. Scene primes provided maximum benefits with 200 ms or less prime duration, indicating that scene priming is rapid enough to influence everyday distance perception. The time course of prime processing was similar for simple and complex scene primes and for upright and inverted primes, suggesting that the prime representation was intermediate level in nature.
Priming of Simple and Complex Scene Layout: Rapid Function From the
Intermediate Level
Thomas Sanocki and Noah Sulman
University of South Florida
Three experiments examined the time course of layout priming with photographic scenes varying in
complexity (number of objects). Primes were presented for varying durations (800–50 ms) before a target
scene with 2 spatial probes; observers indicated whether the left or right probe was closer to viewpoint.
Reaction time was the main measure. Scene primes provided maximum benefits with 200 ms or less
prime duration, indicating that scene priming is rapid enough to influence everyday distance perception.
The time course of prime processing was similar for simple and complex scene primes and for upright
and inverted primes, suggesting that the prime representation was intermediate level in nature.
Keywords: priming, layout, scene perception, intermediate-level processing, spatial processing
Human abilities such as navigation and scene identification are
amazingly good compared to artificial systems. Of particular in-
terest here is the representation and perception of spatial layout
within familiar scenes, as examined with a scene priming para-
digm. Our hypotheses begin with the assumption that observers
extract a representation of a scene’s layout over a brief learning
period (as short as a few trials with normal scenes; see Sanocki,
Michelet, Sellers, & Reynolds, 2006). This representation is as-
sumed to be activated by brief exposure to the scene (a scene
prime) and to facilitate distance perception when a similar target
scene follows the prime (Sanocki, 2003; Sanocki & Epstein, 1997;
Sanocki et al., 2006).
Speed of Processing and Scope of Processing
We view the benefits of scene primes as one facet of the
efficient use of spatial information from the environment (see, e.g.,
Cutting & Vishton, 1995; Domini, Caudek, & Tassinari, 2006;
Gibson, 1979; Ni, Braunstein, & Andersen, 2005; Sedgwick,
1986). Another facet of efficient information processing is speed.
In the literature on identification of scene categories, the speed of
identification is a provocative finding. In seminal experiments,
Potter (1975, 1976) found that observers can pick out a prespeci-
fied scene category (e.g., beach) with high accuracy from a stream
of different scenes, each presented for as little as 167 ms (see also,
e.g., Evans & Treisman, 2005; Michod & Intraub, 2007). Using
briefly presented low- and high-pass images, Oliva and Schyns
(1997; Schyns & Oliva, 1994) found that scene categorization is
facilitated by general layout information even when objects are
obscured. And, in binary categorization tasks (e.g., Is an animal
present?) with novel scenes on each trial, evidence of rapid cate-
gorization has been found (e.g., Rousselet, Fabre-Thorpe, &
Thorpe, 2002; Rousselet, Joubert, & Fabre-Thorpe, 2005). This
includes directional eye movement responses initiated in less than
120 ms from stimulus onset on many trials (Kirchner & Thorpe,
2006).
However, note that categorization requires only a single deci-
sion about each scene, limiting the scope of scene processing to
that decision. Yet, a typical scene contains a complex of informa-
tion distributed across space, including multiple spatial relations.
Is scene processing efficient across the spatial extent of a scene?
Here, we measured the speed of processing scene primes that
represented a typical view and contained either a few or many
spatial relations. We presented the scene primes for varying
amounts of time and used their effects to infer how much they
were processed. We asked the following question: How long must
the prime be presented for its benefits for target processing to
reach maximum levels?
Other seminal research in scene perception has demonstrated
effects of semantic constraints and physical-layout constraints
throughout much of the scenes (see, e.g., Biederman, 1972, 1981).
These experiments are consistent with the hypothesis of rapid
processing of an entire scene; however, the experiments were not
designed to provide precise information about the time course of
processing, including particular stages of processing.
Stages of Processing
Most theories of vision assume that processing proceeds through
several stages, from lower to high levels. In the present experi-
ments, we examined the stage or stages at which the prime-induced
scene representations operate. At one extreme, scene primes could
be processed to create high-level representations, such as a scene
category or a construction of attention (e.g., Logan, 1995; Rensink,
2000). However, such representations are limited in scope, to no
more than several arguments or entities; consequently, their use-
fulness for complex scene layouts would be limited. At the other
Thomas Sanocki and Noah Sulman, Department of Psychology, Uni-
versity of South Florida.
This research was supported by the University of South Florida Center
for Pattern Recognition. This research was motivated by comments by
Pierre Jolicœur and Ken Nakayama.
Correspondence concerning this article should be addressed to Thomas
Sanocki, Department of Psychology, PCD 4118, University of South
Florida, Tampa, FL 33624. E-mail: sanocki@usf.edu
Journal of Experimental Psychology: © 2009 American Psychological Association
Human Perception and Performance
2009, Vol. 35, No. 3, 735–749 0096-1523/09/$12.00 DOI: 10.1037/a0013032
735
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
... Trial timing was chosen to match previous spatial congruency bias paradigms (Golomb et al, 2014). The 500ms presentation time should enable sufficient time to process the depth cue and accumulate accurate information for depth perception from binocular disparity 1 (Adam et al., 1993;Sanocki & Sulman, 2009;Uttal, Davis, & Welke, 1994). Masks were included to ensure visual afterimages were not used to help with the same/different location task. ...
... For example, as noted earlier, stimulus timing may influence the results. Here we chose a single stimulus duration (500ms) that has been shown to reliably induce a 2D location bias (Finlayson & Golomb, 2016;Golomb et al., 2014;Shafer-Skelton et al., 2017), and should be sufficient to allow accumulation of disparity information (Adam et al., 1993;Sanocki & Sulman, 2009;Uttal et al., 1994). While this duration was clearly sufficient to evoke some depth effects in our study (RT priming), it remains possible that with longer stimulus durations, we might begin to see a congruency bias for depth as well. ...
Article
Visual cognition in our 3D world requires understanding how we accurately localize objects in 2D and depth, and what influence both types of location information have on visual processing. Spatial location is known to play a special role in visual processing, but most of these findings have focused on the special role of 2D location. One such phenomena is the spatial congruency bias, where 2D location biases judgments of object features but features do not bias location judgments. This paradigm has recently been used to compare different types of location information in terms of how much they bias different types of features. Here we used this paradigm to ask a related question: whether 2D and depth-from-disparity location bias localization judgments for each other. We found that presenting two objects in the same 2D location biased position-in-depth judgments, but presenting two objects at the same depth (disparity) did not bias 2D location judgments. We conclude that an object’s 2D location may be automatically incorporated into perception of its depth location, but not vice versa, which is consistent with a fundamentally special role for 2D location in visual processing.
... The structure or geometry of a scene provides global contextual information that assists rapid scene analysis in visual search and navigation (Ross & Oliva, 2010). Structural cues such as layout, depth, openness, and perspective can be perceived in a short presentation of an image (Greene & Oliva, 2009;Schyns & Oliva, 1994;Joubert et al., 2007;Sanocki & Sulman, 2009). Further, global scene context or Gist 1 (Torralba et al., 2006) and layout 2 (Rensink, 2000) also guide attention to likely target locations in a top-down manner. ...
... Previous research has shown that humans are capable of automatic and rapid analysis of scene structure when navigating an environment or searching for objects. It has also been shown that several global scene properties such as coarse spatial layout (Schyns & Oliva, 1994), naturalness (Joubert et al., 2007), navigability (Greene & Oliva, 2009), complexity or clutter (Sanocki & Sulman, 2009;Rosenholtz et al., 2007), distance and depth (Sanocki, 2003), and openness (Torralba et al., 2006) can be perceived in a short presentation of a scene. Inspired by these findings, in this study, we showed that a particular type of scene structure, related to the scene layout, known as the vanishing point, strongly influences eye movements in free viewing of natural scenes as well as visual search. ...
Article
Full-text available
To investigate whether the vanishing point (VP) plays a significant role in gaze guidance, we ran two experiments. In the first one, we recorded fixations of 10 observers (4 female; mean age 22; SD=0.84) freely viewing 532 images, out of which 319 had VP (shuffled presentation; each image for 4 secs). We found that the average number of fixations at a local region (80x80 pixels) centered at the VP is significantly higher than the average fixations at random locations (t-test; n=319; p=1.8e-35). To address the confounding factor of saliency, we learned a combined model of bottom-up saliency and VP. AUC score of our model (0.85; SD=0.01) is significantly higher than the original saliency model (e.g., 0.8 using AIM model by Bruce & Tsotsos (2009), t-test; p= 3.14e-16) and the VP-only model (0.64, t-test; p= 4.02e-22). In the second experiment, we asked 14 subjects (4 female, mean age 23.07, SD=1.26) to search for a target character (T or L) placed randomly on a 3x3 imaginary grid overlaid on top of an image. Subjects reported their answers by pressing one of two keys. Stimuli consisted of 270 color images (180 with a single VP, 90 without). The target happened with equal probability inside each cell (15 times L, 15 times T). We found that subjects were significantly faster (and more accurate) when target happened inside the cell containing the VP compared to cells without VP (median across 14 subjects 1.34 sec vs. 1.96; Wilcoxon rank-sum test; p = 0.0014). Response time at VP cells were also significantly lower than response time on images without VP (median 2.37; p= 4.77e-05). These findings support the hypothesis that vanishing point, similar to face and text (Cerf et al., 2009) as well as gaze direction (Borji et al., 2014) attracts attention in free-viewing and visual search.
... However, further processing is necessary for information beyond gist, such as the identity and details of noncentral objects, relations between objects and surfaces, and atypical information. Often, such further processing does not get completed during brief glances, as has been indicated in a variety of paradigms (e.g., Biederman et al., 1988;Botros, Greene, & Fei-Fei, 2013;Fei-Fei, Iyer, Koch, & Perona, 2007;Franconeri, Scimeca, Roth, Helseth, & Kahn, 2012;Sanocki & Sulman, 2009;Treisman, 1988). ...
Article
Full-text available
How does scene complexity influence the detection of expected and appropriate objects within the scene? Traffic research has indicated that vulnerable road users (VRUs: pedestrians, bicyclists, and motorcyclists) are sometimes not perceived, despite being expected. Models of scene perception emphasize competition for limited neural resources in early perception, predicting that an object can be missed during quick glances because other objects win the competition to be individuated and consciously perceived. We used pictures of traffic scenes and manipulated complexity by inserting or removing vehicles near a to-be-detected VRU (crowding). The observers' sole task was to detect a VRU in the laterally presented pictures. Strong bias effects occurred, especially when the VRU was crowded by other nearby vehicles: Observers failed to detect the VRU (high miss rates), while making relatively few false alarm errors. Miss rates were as high as 65% for pedestrians. The results indicated that scene context can interfere with the perception of expected objects when scene complexity is high. Because urbanization has greatly increased scene complexity, these results have important implications for public safety.
... If visual relations are processed in such a serial fashion, why do we feel as if we have a more detailed percept of the relations around us? One possibility is that other visual information about the objects within the relation supports this percept of detail, such as how many are present (Franconeri et al., 2009), the global shape of their arrangement (Sanocki and Sulman, 2009), and statistical information about their identities (Ariely, 2001). Individual relations may be produced "on demand" so quickly that they give the conscious impression that they were already available (Rensink et al., 1997;Noe and O'Regan, 2000). ...
Article
Full-text available
Describing certain types of spatial relationships between a pair of objects requires that the objects are assigned different “roles” in the relation, e.g., “A is above B” is different than “B is above A.” This asymmetric representation places one object in the “target” or “figure” role and the other in the “reference” or “ground” role. Here we provide evidence that this asymmetry may be present not just in spatial language, but also in perceptual representations. More specifically, we describe a model of visual spatial relationship judgment where the designation of the target object within such a spatial relationship is guided by the location of the “spotlight” of attention. To demonstrate the existence of this perceptual asymmetry, we cued attention to one object within a pair by briefly previewing it, and showed that participants were faster to verify the depicted relation when that object was the linguistic target. Experiment 1 demonstrated this effect for left-right relations, and Experiment 2 for above-below relations. These results join several other types of demonstrations in suggesting that perceptual representations of some spatial relations may be asymmetrically coded, and further suggest that the location of selective attention may serve as the mechanism that guides this asymmetry.
... The primes were presented for 400 ms, followed by a 100-ms mask and then the target. The prime duration seemed a reasonable initial setting, because 200 ms provides an optimal priming effect with familiar scenes (Sanocki & Sulman, 2009), and 500 ms is sufficient for fairly detailed interpretations of novel scenes to be encoded into memory (e.g., Fei-Fei, Iyer, Koch, & Perona, 2007). ...
Article
Facilitatory scene priming is the positive effect of a scene prime on the immediately subsequent spatial processing of a related target, relative to control primes. In the present experiments, a large set of scenes were presented, each several times. The accuracy of a relational spatial-layout judgment was the main measure (which of two probes in a scene was closer?). The effect of scene primes on sensitivity was near zero for the first presentation of a scene; advantages for scene primes occurred only after two or three presentations. In addition, a bias effect emerged in reaction times for novel scenes. These results imply that facilitatory scene priming requires learning and is top-down in nature. Scene priming may require the consolidation of interscene relations in a memory representation.
... We need spatial layout information for getting to a place, searching things, interacting with objects as well as for object perception. Appropriate spatial layouts facilitate object perception in variety of conditions (Loftus and Mackworth, 1978; Shioiri and Ikeda, 1989; Hollingworth and Henderson, 1999; Sanocki, 2003; Bar, 2004; Davenport and Potter, 2004; Sanocki and Sulman, 2009). The effect of viewpoint in the perception of object layouts is different from that in the perception of objects in terms of potential contribution of self-motion. ...
Article
Full-text available
We usually perceive things in our surroundings as unchanged despite viewpoint changes caused by self-motion. The visual system therefore must have a function to process objects independently of viewpoint. In this study, we examined whether viewpoint-independent spatial layout can be obtained implicitly. For this purpose, we used a contextual cueing effect, a learning effect of spatial layout in visual search displays known to be an implicit effect. We investigated the transfer of the contextual cueing effect to images from a different viewpoint by using visual search displays of 3D objects. For images from a different viewpoint, the contextual cueing effect was maintained with self-motion but disappeared when the display changed without self-motion. This indicates that there is an implicit learning effect in environment-centered coordinates and suggests that the spatial representation of object layouts can be obtained and updated implicitly. We also showed that binocular disparity plays an important role in the layout representations.
... We explore the flexible system that allows us to judge relative spatial relationships among objects in the visual world. Relational processing for some frequently encountered objects, such as the location and appearance of facial features (Tanaka & Farah, 2006) or the location of features, patterns, or structures within a scene (Henderson & Hollingworth, 1999;Oliva & Torralba, 2007;Sanocki & Sulman, 2009) might be subserved by existing long-term representations. But for 0010-0277/$ -see front matter Ó 2011 Elsevier B.V. All rights reserved. ...
... The speed of these two processes also indicates a similarity between them. Both boundary extension and scene priming can occur with very brief exposures, as little as 42 ms for boundary extension and as little as 50 ms for scene priming (Sanocki & Sulman, 2009). ...
Article
Full-text available
Four experiments examined whether scene processing is facilitated by layout representation, including layout that was not perceived but could be predicted based on a previous partial view (boundary extension). In a priming paradigm (after Sanocki, 2003), participants judged objects' distances in photographs. In Experiment 1, full scenes (target), partial scenes, and two control primes were used. Partial scenes excluded the target objects' locations, but these areas could be predicted. Full and partial scenes produced equal performance facilitation. In Experiment 2, task-irrelevant partial scene primes were also tested. These primes did not facilitate performance (i.e. simple scene previews did not help). Experiment 3 showed that a partial prime's utility depended on the area of the scene that would be tested; the task-irrelevant primes used in Experiment 2 were useful for other distance judgments. Experiment 4 showed that partial scene facilitation is not limited to the area immediately surrounding the prime. The study demonstrated that perceived and mentally extrapolated layouts are equally effective.
Article
The relationship between image features and scene structure is central to the study of human visual perception and computer vision, but many of the specifics of real-world layout perception remain unknown. We do not know which image features are relevant to perceiving layout properties, or whether those features provide the same information for every type of image. Furthermore, we do not know the spatial resolutions required for perceiving different properties. This paper describes an experiment and a computational model that provides new insights on these issues. Humans perceive the global spatial layout properties such as dominant depth, openness, and perspective, from a single image. This work describes an algorithm that reliably predicts human layout judgments. This model's predictions are general, not specific to the observers it trained on. Analysis reveals that the optimal spatial resolutions for determining layout vary with the content of the space and the property being estimated. Openness is best estimated at high resolution, depth is best estimated at medium resolution, and perspective is best estimated at low resolution. Given the reliability and simplicity of estimating the global layout of real-world environments, this model could help resolve perceptual ambiguities encountered by more detailed scene reconstruction schemas.
Article
Full-text available
Used 3 converging procedures to determine whether pictures presented in a rapid sequence at rates comparable to eye fixations are understood and then quickly forgotten. In 2 experiments, with 96 and 16 college students, respectively, sequences of 16 color photographs were presented at rates of 113, 167, 250, or 333 msec/picture. In 1 group, Ss were given an immediate test of recognition memory for the pictures and in other groups they searched for a target picture. Even when the target had only been specified by a title (e.g., a boat), detection of a target was strikingly superior to recognition memory. Detection was slightly but significantly better for pictured than named targets. In Exp III, with 8 college students, pictures were presented for 50, 70, 90, or 120 msec preceded and followed by a visual mask; at 120 msec recognition memory was as accurate as detection had been. Results, taken together with those of M. C. Potter and E. I. Levy for slower rates of sequential presentation, suggest that on the average a scene is understood and so becomes immune to ordinary visual masking within about 100 msec but requires about 300 msec of further processing before the memory representation is resistant to conceptual masking from a following picture. Possible functions of a short-term conceptual memory (e.g., the control of eye fixations) are discussed. (25 ref)
Article
Full-text available
Dynamic visual identification was investigated in 4 experiments. In Experiments 1 and 2, 2 perceptual objects (2 frames, each containing a letter or 1 containing a letter and the other a plus sign) were previewed in the periphery. A saccade brought these objects to central vision. During the saccade the display was changed so that 1 frame contained a letter and the other a plus sign, and the subject identified the letter by naming it aloud as rapidly as possible. In Experiment 3, the retinal events of Experiments 1 and 2 were simulated. In Experiment 4, both the preview and the target were presented centrally within a single fixation. In all experiments both object-specific and nonspecific preview benefits were observed. These results support a theory in which the preview benefits observed during visual identification arise from 2 processes, object file review and type priming.
Article
A series of experiments explored a form of object-specific priming. In all experiments a preview field containing two or more letters is followed by a target letter that is to be named. The displays are designed to produce a perceptual interpretation of the target as a new state of an object that previously contained one of the primes. The link is produced in different experiments by a shared location, by a shared relative position in a moving pattern, or by successive appearance in the same moving frame. An object-specific advantage is consistently observed: naming is facilitated by a preview of the target, if (and in some cases only if) the two appearances are linked to the same object. The amount and the object specificity of the preview benefit are not affected by extending the preview duration to 1 s, or by extending the temporal gap between fields to 590 ms. The results are interpreted in terms of a reviewing process, which is triggered by the appearance of the target and retrieves just one of the previewed items. In the absence of an object link, the reviewing item is selected at random. We develop the concept of an object file as a temporary episodic representation, within which successive states of an object are linked and integrated.
Book
Implicit memory refers to a change in task performance due to an earlier experience that is not consciously remembered. The topic of implicit memory has been studied from two quite different perspectives. On the one hand, researchers interested in memory have set out to characterize the memory system (or systems) underlying implicit memory, and see how they relate to those underlying other forms of memory. The alternative framework has considered implicit memory as a by-product of perceptual, conceptual, or motor systems that learn. That is, on this view the systems that support implicit memory are heavily constrained by pressures other than memory per se. Both approaches have yielded results that have been valuable in helping us to understand the nature of implicit memory, but studied somewhat in isolation and with little collaboration. This volume contrasts these approaches, bringing together scientists from both camps to consider this important issue in psychology and neuroscience.
Article
This study aimed at assessing the processing time of a natural scene in a fast categorization task of its context or “gist”. In Experiment 1, human subjects performed 4 go/no‐go categorization tasks in succession with colour pictures of real‐world scenes belonging to 2 natural categories: “Sea” and “mountain”, and 2 artificial categories: “Indoor” and “urban”. Experiment 2 used colour and grey‐level scenes in the same tasks to assess the role of colour cues on performance. Pictures were flashed for 26 ms. Both experiments showed that the gist of real‐world scenes can be extracted with high accuracy (>90%), short median RT (400–460 ms) and early responses triggered with latencies as short as 260–300 ms. Natural scenes were processed faster than artificial scenes. Categories for which colour could have a diagnostic value were processed faster in colour than in grey. Finally, processing speed is compared for scene and object categorization tasks.
Article
The onset of a new, meaningful visual event is thought to induce conceptual masking (Potter, 1976). Does layout contribute to the ““meaning”” of a scene? Given the same objects, would changes in layout induce conceptual masking? In Experiment 1, each of 16 photographs (the targets) was presented for 125 ms, interspersed with a to-be-ignored photograph for 250 ms (the “conceptual mask”) and a visual noise mask for 500 ms. Each conceptual mask contained the same background with a new set of 3 objects, however, in one condition (N = 32) the objects were always presented in the same layout, whereas in the other, they appeared in a different layout each time (N =32). Following presentation, a 2AFC test was administered, in which each target photograph was paired with a similar distractor (same concept, different details). Although the conceptual mask contained the same new objects in both conditions, recognition memory decreased significantly with a changing layout (80% vs. 70%; t(62) = 3.10, p [[lt]] .01). In Experiment 2, more complex conceptual masks were presented. There were 4 conceptual mask conditions (N = 40 in each): a) different objects/same layout, b) different objects/different layout c) same objects/different layout and d) same objects/same layout (i.e., a repeating picture). Conceptual masking is limited or non-existent when the same mask repeats (Intraub, 1984). In comparison to the repeating picture condition (78% recognized), orthogonal planned comparisons showed that memory decreased significantly when the conceptual mask changed each time: different objects/same layout (72%), different objects/different layout (70%), or same objects/different layout (74%). Thus, even when gist was maintained (same objects; same background), layout changes interfered with memory for the attended pictures. Both experiments demonstrate that conceptual masking involves more than the onset of a new global concept; a new layout also disrupts processing of briefly presented scenes.
Article
Observers responded to full-color images of scenes by indicating which of two critical objects was closer in the pictorial space. These target images were preceded by prime images of the same scene sans critical objects, or by control primes or different-scene primes. Reaction times were faster following same-scene primes than following the various control and different-scene primes. Same-scene facilitation was obtained with color primes, line-drawing primes, and primes with shifted views. The effect occurred with natural scenes having gist and simple artificial scenes having little or no gist. The results indicate that prime-induced representations influence the perception of spatial layout in pictures.