Human object recognition is remarkably efficient. In recent years, significant advancements have been made in our understanding of how the brain represents visual objects and organizes them into categories. Recent studies using pattern analyses methods have characterized a representational space of objects in human and primate inferior temporal cortex in which object exemplars are discriminable and cluster according to category (e.g., faces and bodies). In the present study we examined how category structure in object representations emerges in the first 1000 ms of visual processing. In the study, participants viewed 24 object exemplars with a planned categorical structure comprised of four levels ranging from highly specific (individual exemplars) to highly abstract (animate vs. inanimate), while their brain activity was recorded with magnetoencephalography (MEG). We used a sliding time window decoding approach to decode the exemplar and the exemplar's category that participants were viewing on a moment-to-moment basis. We found exemplar and category membership could be decoded from the neuromagnetic recordings shortly after stimulus onset (<100 ms) with peak decodability following thereafter. Latencies for peak decodability varied systematically with the level of category abstraction with more abstract categories emerging later, indicating that the brain hierarchically constructs category representations. In addition, we examined the stationarity of patterns of activity in the brain that encode object category information and show these patterns vary over time, suggesting the brain might use flexible time varying codes to represent visual object categories.
We studied the relationship between the decline in sensitivity that occurs with eccentricity for stimuli of different spatial scale defined by either luminance (LM) or contrast (CM) modulation. We show that the detectability of CM stimuli declines with eccentricity in a spatial frequency-dependent manner, and that the rate of sensitivity decline for CM stimuli is roughly that expected from their 1st order carriers, except, possibly, at finer scales. Using an equivalent noise paradigm, we investigated the possible reasons for why the foveal sensitivity for detecting LM and CM stimuli differs as well as the reason why the detectability of 1st order stimuli declines with eccentricity. We show the former can be modeled by an increase in internal noise whereas the latter involves both an increase in internal noise and a loss of efficiency. To encompass both the threshold and suprathreshold transfer properties of peripheral vision, we propose a model in terms of the contrast gain of the underlying mechanisms.
This is a survey of psychophysical studies of motion perception carried out mainly in the last 10 years. It covers a wide range of topics, including the detection and interactions of local motion signals, motion integration across various dimensions for vector computation and global motion perception, second-order motion and feature tracking, motion aftereffects, motion-induced mislocalizations, timing of motion processing, cross-attribute interactions for object motion, motion-induced blindness, and biological motion. While traditional motion research has benefited from the notion of the independent "motion processing module," recent research efforts have been also directed to aspects of motion processing in which interactions with other visual attributes play critical roles. This review tries to highlight the richness and diversity of this large research field and to clarify what has been done and what questions have been left unanswered.
Analyzing a scene requires shifting attention from object to object. Although several studies have attempted to determine the speed of these attentional shifts, there are large discrepancies in their estimates. Here, we adapt a method pioneered by T. A. Carlson, H. Hogendoorn, and F. A. J. Verstraten (2006) that directly measures pure attentional shift times. We also test if attentional shifts can be handled in parallel by the independent resources available in the two cortical hemispheres. We present 10 "clocks," with single revolving hands, in a ring around fixation. Observers are asked to report the hand position on one of the clocks at the onset of a transient cue. The delay between the reported time and the veridical time at cue onset can be used to infer processing and attentional shift times. With this setup, we use a novel subtraction method that utilizes different combinations of exogenous and endogenous cues to determine shift times for both types of attention. In one experiment, subjects shift attention to an exogenously cued clock (baseline condition) in one block, and in other blocks, subjects perform one further endogenous shift to a nearby clock (test condition). In another experiment, attention is endogenously cued to one clock (baseline condition), and on other trials, an exogenous cue further shifts attention to a nearby clock (test condition). Subtracting report delays in the baseline condition from those obtained in the test condition allows us to isolate genuine attentional shift times. In agreement with previous studies, our results reveal that endogenous attention is much slower than exogenous attention (endogenous: 250 ms; exogenous: 100 ms). Surprisingly, the dependence of shift time on distance is minimal for exogenous attention, whereas it is steep for endogenous attention. In the final experiment, we find that endogenous shifts are faster across hemifields than within a hemifield suggesting that the two hemispheres can simultaneously process at least parts of these shifts.
Paying attention to a stimulus affords it many behavioral advantages, but whether attention also changes its subjective appearance is controversial. K. A. Schneider and M. Komlos (2008) demonstrated that the results of previous studies suggesting that attention increased perceived contrast could also be explained by a biased decision mechanism. This bias could be neutralized by altering the methodology to ask subjects whether two stimuli were equal in contrast or not rather than which had the higher contrast. K. Anton-Erxleben, J. Abrams, and M. Carrasco (2010) claimed that, even using this equality judgment, attention could still be shown to increase perceived contrast. In this reply, we analyze their data and conclude that the effects that they reported resulted from fitting symmetric functions that poorly characterized the individual subject data, which exhibited significant asymmetries between the high- and low-contrast tails. The strength of the effect attributed to attentional enhancement in each subject was strongly correlated with this skew. By refitting the data with a response model that included a non-zero asymptotic response in the low-contrast regime, we show that the reported attentional effects are better explained as changes in subjective criteria. Thus, the conclusion of Schneider and Komlos that attention biases the decision mechanism but does not alter appearance is still valid and is in fact supported by the data from Anton-Erxleben et al.
A wealth of literature suggests that emotional faces are given special status as visual objects: Cognitive models suggest that emotional stimuli, particularly threat-relevant facial expressions such as fear and anger, are prioritized in visual processing and may be identified by a subcortical “quick and dirty” pathway in the absence of awareness (Tamietto & de Gelder, 2010). Both neuroimaging studies (Williams, Morris, McGlone, Abbott, & Mattingley, 2004) and backward masking studies (Whalen, Rauch, Etcoff, McInerney, & Lee, 1998) have supported the notion of emotion processing without awareness. Recently, our own group (Adams, Gray, Garner, & Graf, 2010) showed adaptation to emotional faces that were rendered invisible using a variant of binocular rivalry: continual flash suppression (CFS, Tsuchiya & Koch, 2005). Here we (i) respond to Yang, Hong, and Blake's (2010) criticisms of our adaptation paper and (ii) provide a unified account of adaptation to facial expression, identity, and gender, under conditions of unawareness.
Whether attention modulates the appearance of stimulus features is debated. Whereas many previous studies using a comparative judgment have found evidence for such an effect, two recent studies using an equality judgment have not. Critically, these studies have relied on the assumption that the equality paradigm yields bias-free PSE estimates and is as sensitive as the comparative judgment, without testing these assumptions. Anton-Erxleben, Abrams, and Carrasco (2010) compared comparative judgments and equality judgments with and without the manipulation of attention. They demonstrated that the equality paradigm is less sensitive than the comparative judgment and also bias-prone. Furthermore, they reported an effect of attention on the PSE using both paradigms. Schneider (2011) questions the validity of the latter finding, stating that the data in the equality experiment are corrupted because of skew in the response distributions. Notably, this argument supports the original conclusion by Anton-Erxleben et al.: that the equality paradigm is bias-prone. Additionally, the necessary analyses to show that the attention effect observed in Anton-Erxleben et al. was due to skew in the data were not conducted. Here, we provide these analyses and show that although the equality judgment is bias-prone, the effects we observe are consistent with an increase of apparent contrast by attention.
Keywords: Ternus-Pikler display ; retinotopic processing ; nonretinotopic processing ; spatio-temporal filters Reference EPFL-ARTICLE-188506doi:10.1167/13.10.19View record in Web of Science Record created on 2013-09-13, modified on 2016-08-09
The dynamics of overt visual attention shifts evoke certain patterns of responses in eye and head movements. In this work, we detail novel findings regarding the interaction of eye gaze and head pose under various attention-switching conditions in complex environments and safety critical tasks such as driving. In particular, we find that sudden, bottom-up visual cues in the periphery evoke a different pattern of eye-head movement latencies as opposed to those during top-down, task-oriented attention shifts. In laboratory vehicle simulator experiments, a unique and significant (p < 0.05) pattern of preparatory head motions, prior to the gaze saccade, emerges in the top-down case. This finding is validated in qualitative analysis of naturalistic real-world driving data. These results demonstrate that measurements of eye-head dynamics are useful data for detecting driver distractions, as well as in classifying human attentive states in time and safety critical tasks.
Visual psychophysicists have recently developed tools to measure the maximal speed at which the brain can accurately carry out different types of computations (H. Kirchner & S. J. Thorpe, 2006). We use this methodology to measure the maximal speed with which individuals can make magnitude comparisons between two single-digit numbers. We find that individuals make such comparisons with high accuracy in 306 ms on average and are able to perform above chance in as little as 230 ms. We also find that maximal speeds are similar for "larger than" and "smaller than" number comparisons and in a control task that simply requires subjects to identify the number in a number-letter pair. The results suggest that the brain contains dedicated processes involved in implementing basic number comparisons that can be deployed in parallel with processes involved in low-level visual processing.
Three experimental paradigms were used to investigate the perception of orientation relative to internal categorical standards of vertical and horizontal. In Experiment 1, magnitude estimation of orientation (in degrees) relative to vertical and horizontal replicated a previously reported spatial orientation bias also measured using verbal report: Orientations appear farther from horizontal than they are, whether numeric judgments are made relative to vertical or to horizontal. Analyses of verbal response patterns, however, suggested that verbal reports underestimate the true spatial bias. A non-verbal orientation bisection task (Experiment 2) confirmed that spatial errors are not due to numeric coding and are larger than the 6° error replicated using verbal methods. A spatial error of 8.6° was found in the bisection task, such that an orientation of about 36.4° from horizontal appears equidistant from vertical and horizontal. Finally, using a categorization ("ABX") paradigm in Experiment 3, it was found that there is less memory confusability for orientations near horizontal than for orientations near vertical. Thus, three different types of measures, two of them non-verbal, provide converging evidence that the coding of orientation relative to the internal standards of horizontal and vertical is asymmetrically biased and that horizontal appears to be the privileged axis.
We investigated the low-level motion mechanisms for color and luminance and their integration process using 2D and 3D motion aftereffects (MAEs). The 2D and 3D MAEs obtained in equiluminant color gratings showed that the visual system has the low-level motion mechanism for color motion as well as for luminance motion. The 3D MAE is an MAE for motion in depth after monocular motion adaptation. Apparent 3D motion can be perceived after prolonged exposure of one eye to lateral motion because the difference in motion signal between the adapted and unadapted eyes generates interocular velocity differences (IOVDs). Since IOVDs cannot be analyzed by the high-level motion mechanism of feature tracking, we conclude that a low-level motion mechanism is responsible for the 3D MAE. Since we found different temporal frequency characteristics between the color and luminance stimuli, MAEs in the equiluminant color stimuli cannot be attributed to a residual luminance component in the color stimulus. Although a similar MAE was found with a luminance and a color test both for 2D and 3D motion judgments after adapting to either color or luminance motion, temporal frequency characteristics were different between the color and luminance adaptation. The visual system must have a low-level motion mechanism for color signals as for luminance ones. We also found that color and luminance motion signals are integrated monocularly before IOVD analysis, showing a cross adaptation effect between color and luminance stimuli. This was supported by an experiment with dichoptic presentations of color and luminance tests. In the experiment, color and luminance tests were presented in the different eyes dichoptically with four different combinations of test and adaptation: color or luminance test in the adapted eye after color or luminance adaptation. Findings of little or no influence of the adaptation/test combinations indicate the integration of color and luminance motion signals prior to the binocular IOVD process.
A 2D perspective image of a slanted rectangular object is sufficient for a strong 3D percept. Two computational assumptions that could be used to interpret 3D from images of rectangles are as follows: (1) converging lines in an image are parallel in the world, and (2) skewed angles in an image are orthogonal in the world. For an accurate perspective image of a slanted rectangle, either constraint implies the same 3D interpretation. However, if an image is rescaled, the 3D interpretations based on parallelism and orthogonality generally conflict. We tested the roles of parallelism and orthogonality by measuring perceived depth within scaled perspective images. Stimuli were monocular images of squares, slanted about a horizontal axis, with an elliptical hole. Subjects judged the length-to-width ratio of the holes, which provided a measure of perceived depth along the object. The rotational alignment of squares within their surface plane was varied from 0 degrees (trapezoidal projected contours) to 20 degrees (skewed projected contours). In consistent-cue conditions, images were accurate projections of either a 10 degree- or 20 degree-wide square, with slants of 75 degrees and 62 degrees, respectively. In cue-conflict conditions, images were generated either by magnifying a 10 degrees image to have a projected size of 20 degrees or by minifying a 20 degree image to have a projected size of 10 degrees. For the aligned squares, which do not produce a conflicting skew cue, we found that subjects' judgments depended primarily on projected size and not on the size used to generate the prescaled images. This is consistent with reliance on the convergence cue, corresponding to a parallelism assumption. As squares were rotated away from alignment, producing skewed projected contours, judgments were increasingly determined by the original image size. This is consistent with use of the skew cue, corresponding to an orthogonality assumption. Our results demonstrate that both parallelism and orthogonality constraints are used to perceive depth from linear perspective.
Experimental evidence has given strong support to the theory that the primary visual cortex (V1) realizes a bottom-up saliency map (A. R. Koene & L. Zhaoping, 2007; Z. Li, 2002; L. Zhaoping, 2008a; L. Zhaoping & K. A. May, 2007). Unlike the conventional models of texture segmentation, this theory predicted that segmenting two textures in an image I(rel) comprising obliquely oriented bars would become much more difficult when a task-irrelevant texture I(ir) of spatially alternating horizontal and vertical bars is superposed on the original texture I(rel). The irrelevant texture I(ir) interferes with I(rel)'s ability to direct attention. This predicted interference was confirmed (L. Zhaoping & K. A. May, 2007) in the form of a prolonged task reaction time (RT). In this study, we investigate whether and how 3D depth perception, believed to be processed mostly beyond V1 and starting in V2 (J. S. Bakin, K. Nakayama, & C. D. Gilbert, 2000; B. G. Cumming & A. J. Parker, 2000; F. T. Qiu & R. von der Heydt, 2005; R. von der Heydt, H. Zhou, & H. S. Friedman, 2000), contribute additionally to direct attention. We measured the reduction of the interference or the RT when the position of the texture grid for I(ir) was offset horizontally from that for I(rel), forming an offset, 2D, stimulus. This reduction was compared with that when this positional offset was only present in the input image to one eye, or when it was in the opposite directions in the images for the two eyes, creating a 3D stimulus with a depth separation between I(ir) and I(rel). The contribution by 3D processes to attentional guidance would be manifested by any extra RT reduction associated with the 3D stimulus over the offset 2D stimulus. This 3D contribution was not present unless the task was so difficult that RT (by button press) based on 2D cues alone was longer than about 1 second. Our findings suggest that, without other top-down factors, V1 plays a dominant role in attentional guidance during an initial window of processing, while cortical areas beyond V1 play an increasing role in later processing. Subject-dependent variations in the manifestations of the 3D effects also suggest that this later, 3D, contribution to attentional guidance can be easily influenced by top-down control.
A study was conducted to examine the time required to process lateral motion and motion-in-depth for luminance- and disparity-defined stimuli. In a 2 × 2 design, visual stimuli oscillated sinusoidally in either 2D (moving left to right at a constant disparity of 9 arcmin) or 3D (looming and receding in depth between 6 and 12 arcmin) and were defined either purely by disparity (change of disparity over time [CDOT]) or by a combination of disparity and luminance (providing CDOT and interocular velocity differences [IOVD]). Visual stimuli were accompanied by an amplitude-modulated auditory tone that oscillated at the same rate and whose phase was varied to find the latency producing synchronous perception of the auditory and visual oscillations. In separate sessions, oscillations of 0.7 and 1.4 Hz were compared. For the combined CDOT + IOVD stimuli (disparity and luminance [DL] conditions), audiovisual synchrony required a 50 ms auditory lag, regardless of whether the motion was 2D or 3D. For the CDOT-only stimuli (disparity-only [DO] conditions), we found that a similar lag (∼60 ms) was needed to produce synchrony for the 3D motion condition. However, when the CDOT-only stimuli oscillated along a 2D path, the auditory lags required for audiovisual synchrony were much longer: 170 ms for the 0.7 Hz condition, and 90 ms for the 1.4 Hz condition. These results suggest that stereomotion detectors based on CDOT are well suited to tracking 3D motion, but are poorly suited to tracking 2D motion.
Two experiments investigated infants' and adults' perception of 3D shape from line junction information. Participants in both experiments viewed a concave wire half-cube frame. In Experiment 1, adults reported that the concave wire frame appeared to be convex when it was viewed monocularly (with one eye covered) and that it appeared to be concave when it was viewed binocularly. In Experiment 2, 5- and 7-month-old infants were shown the concave wire frame under monocular and binocular viewing conditions, and their reaching behavior was recorded. The infants in both age groups reached preferentially toward the center of the wire frame in the monocular condition and toward its edges in the binocular condition. Because infants typically reach to what they perceive to be closest to them, these reaching preferences provide evidence that they perceived the wire frame as convex when they viewed it monocularly and as concave when they viewed it binocularly. These findings suggest that, by 5 months of age, infants, like adults, use line junction information to perceive depth and object shape.
In our previous studies, we showed that monocular perception of 3D shapes is based on a priori constraints, such as 3D symmetry and 3D compactness. The present study addresses the nature of perceptual mechanisms underlying binocular perception of 3D shapes. First, we demonstrate that binocular performance is systematically better than monocular performance, and it is close to perfect in the case of three out of four subjects. Veridical shape perception cannot be explained by conventional binocular models, in which shape was derived from depth intervals. In our new model, we use ordinal depth of points in a 3D shape provided by stereoacuity and combine it with monocular shape constraints by means of Bayesian inference. The stereoacuity threshold used by the model was estimated for each subject. This model can account for binocular shape performance of all four subjects. It can also explain the fact that when viewing distance increases, the binocular percept gradually reduces to the monocular one, which implies that monocular percept of a 3D shape is a special case of the binocular percept.
Cue combination rules have often been applied to the perception of surface shape but not to judgements of object location. Here, we used immersive virtual reality to explore the relationship between different cues to distance. Participants viewed a virtual scene and judged the change in distance of an object presented in two intervals, where the scene changed in size between intervals (by a factor of between 0.25 and 4). We measured thresholds for detecting a change in object distance when there were only 'physical' (stereo and motion parallax) or 'texture-based' cues (independent of the scale of the scene) and used these to predict biases in a distance matching task. Under a range of conditions, in which the viewing distance and position of the target relative to other objects was varied, the ratio of 'physical' to 'texture-based' thresholds was a good predictor of biases in the distance matching task. The cue combination approach, which successfully accounts for our data, relies on quite different principles from those underlying traditional models of 3D reconstruction.
Inhibitory capacity was investigated by measuring the eye movements of normal subjects asked to fixate a central point, and to suppress eye movements toward visual distracters appearing in the periphery or in depth. Eight right-handed young adults performed such a suppression or distracter task. In different conditions, the distracter could appear at 10 degrees left or right at a distance of 20, 40, or 150 cm (calling for horizontal saccades), or in a central position far or close (calling for convergence or divergence), or 7.5 degrees up or down at 40 or 150 cm (calling for vertical saccades). Eye movements were recorded binocularly with an infrared light eye-movement device. Results showed that (1) suppression performance was not perfect, as the subjects still produced eye movements; (2) errors were distributed unequally in three-dimensional space, with more frequent errors toward distracters calling for convergence, or leftward and downward saccades at a close distance; (3) distracters calling for saccade suppression yielded saccades in the direction of the distracter (that we called prosaccades), and saccades directed away from it (that we called spontaneous antisaccades); (4) for vergence, only distracters calling for convergence yielded errors, which were always promovements; (5) in addition, a small convergent drift was found for convergence distracters. Differences in the errors between saccade and vergence suggest that different inhibitory mechanisms may be involved in the two systems. Spatial left/right, up/down, and close/far asymmetries are interpreted in terms of attentional biases.
Experience has long-term effects on perceptual appearance (Q. Haijiang, J. A. Saunders, R. W. Stone, & B. T. Backus, 2006). We asked whether experience affects the appearance of structure-from-motion stimuli when the optic flow is caused by observer ego-motion. Optic flow is an ambiguous depth cue: a rotating object and its oppositely rotating, depth-inverted dual generate similar flow. However, the visual system exploits ego-motion signals to prefer the percept of an object that is stationary over one that rotates (M. Wexler, F. Panerai, I. Lamouret, & J. Droulez, 2001). We replicated this finding and asked whether this preference for stationarity, the "stationarity prior," is modulated by experience. During training, two groups of observers were exposed to objects with identical flow, but that were either stationary or moving as determined by other cues. The training caused identical test stimuli to be seen preferentially as stationary or moving by the two groups, respectively. We then asked whether different priors can exist independently at different locations in the visual field. Observers were trained to see objects either as stationary or as moving at two different locations. Observers' stationarity bias at the two respective locations was modulated in the directions consistent with training. Thus, the utilization of extraretinal ego-motion signals for disambiguating optic flow signals can be updated as the result of experience, consistent with the updating of a Bayesian prior for stationarity.
Although the role of surface-level processes has been demonstrated, visual interpolation models often emphasize contour relationships. We report two experiments on geometric constraints governing 3D interpolation between surface patches without visible edges. Observers were asked to classify pairs of planar patches specified by random dot disparities and visible through circular apertures (aligned or misaligned) in a frontoparallel occluder. On each trial, surfaces appeared in parallel or converging planes with vertical (in Experiment 1) or horizontal (in Experiment 2) tilt and variable amounts of slant. We expected the classification task to be facilitated when patches were perceived as connected. We found enhanced sensitivity and speed for 3D relatable vs. nonrelatable patches. Here 3D relatability does not involve oriented edges but rather inducing patches' orientations computed from stereoscopic information. Performance was markedly affected by slant anisotropy: both sensitivity and speed were worse for patches with horizontal tilt. We found nearly identical advantages of 3D relatability on performance, suggesting an isotropic unit formation process. Results are interpreted as evidence that inducing slant constrains surface interpolation in the absence of explicit edge information: 3D contour and surface interpolation processes share common geometric constraints as formalized by 3D relatability.
A new computational analysis is described that is capable of estimating the 3D shapes of continuously curved surfaces with anisotropic textures that are viewed with negligible perspective. This analysis assumes that the surface texture is homogeneous, and it makes specific predictions about how the apparent shape of a surface should be distorted in cases where that assumption is violated. Two psychophysical experiments are reported in an effort to test those predictions, and the results confirm that observers' ordinal shape judgments are consistent with what would be expected based on the model. The limitations of this analysis are also considered, and a complimentary model is discussed that is only appropriate for surfaces viewed with large amounts of perspective.
Analyzing the factors that determine our choice of visual search strategy may shed light on visual behavior in everyday situations. Previous results suggest that increasing task difficulty leads to more systematic search paths. Here we analyze observers' eye movements in an "easy" conjunction search task and a "difficult" shape search task to study visual search strategies in stereoscopic search displays with virtual depth induced by binocular disparity. Standard eye-movement variables, such as fixation duration and initial saccade latency, as well as new measures proposed here, such as saccadic step size, relative saccadic selectivity, and x-y target distance, revealed systematic effects on search dynamics in the horizontal-vertical plane throughout the search process. We found that in the "easy" task, observers start with the processing of display items in the display center immediately after stimulus onset and subsequently move their gaze outwards, guided by extrafoveally perceived stimulus color. In contrast, the "difficult" task induced an initial gaze shift to the upper-left display corner, followed by a systematic left-right and top-down search process. The only consistent depth effect was a trend of initial saccades in the easy task with smallest displays to the items closest to the observer. The results demonstrate the utility of eye-movement analysis for understanding search strategies and provide a first step toward studying search strategies in actual 3D scenarios.
We measured the ability to discriminate 3D shapes across changes in viewpoint and illumination based on rich monocular 3D information and tested whether the addition of stereo information improves shape constancy. Stimuli were images of smoothly curved, random 3D objects. Objects were presented in three viewing conditions that provided different 3D information: shading-only, stereo-only, and combined shading and stereo. Observers performed shape discrimination judgments for sequentially presented objects that differed in orientation by rotation of 0°-60° in depth. We found that rotation in depth markedly impaired discrimination performance in all viewing conditions, as evidenced by reduced sensitivity (d') and increased bias toward judging same shapes as different. We also observed a consistent benefit from stereo, both in conditions with and without change in viewpoint. Results were similar for objects with purely Lambertian reflectance and shiny objects with a large specular component. Our results demonstrate that shape perception for random 3D objects is highly viewpoint-dependent and that stereo improves shape discrimination even when rich monocular shape cues are available.
In the natural world, objects are characterized by a variety of attributes, including color and shape. The contributions of these two attributes to object recognition are typically studied independently of each other, yet they are likely to interact in natural tasks. Here we examine whether color and size (a component of shape) interact in a real three-dimensional (3D) object similarity task, using solid domelike objects whose distinct apparent surface colors are independently controlled via spatially restricted illumination from a data projector hidden to the observer. The novel experimental setup preserves natural cues to 3D shape from shading, binocular disparity, motion parallax, and surface texture cues, while also providing the flexibility and ease of computer control. Observers performed three distinct tasks: two unimodal discrimination tasks, and an object similarity task. Depending on the task, the observer was instructed to select the indicated alternative object which was "bigger than," "the same color as," or "most similar to" the designated reference object, all of which varied in both size and color between trials. For both unimodal discrimination tasks, discrimination thresholds for the tested attribute (e.g., color) were increased by differences in the secondary attribute (e.g., size), although this effect was more robust in the color task. For the unimodal size-discrimination task, the strongest effects of the secondary attribute (color) occurred as a perceptual bias, which we call the "saturation-size effect": Objects with more saturated colors appear larger than objects with less saturated colors. In the object similarity task, discrimination thresholds for color or size differences were significantly larger than in the unimodal discrimination tasks. We conclude that color and size interact in determining object similarity, and are effectively analyzed on a coarser scale, due to noise in the similarity estimates of the individual attributes, inter-attribute attentional interactions, or coarser coding of attributes at a "higher" level of object representation.
We investigated how human observers estimate an object's three-dimensional (3D) motion trajectory during visually guided self-motion. Observers performed a task in an immersive virtual reality system consisting of front, left, right, and floor screens of a room-sized cube. In one experiment, we found that the presence of an optic flow simulating forward self-motion in the background induces a world-centered frame of reference, instead of an observer-centered frame of reference, for the perceived rotation of a 3D surface from motion. In another experiment, we found that the perceived direction of 3D object motion is biased toward a world-centered frame of reference when an optic flow pattern is presented in the background. In a third experiment, we confirmed that the effect of the optic flow pattern on the perceived direction of 3D object motion was not caused only by local motion detectors responsible for the change of the retinal size of the target. These results suggest that visually guided self-motion from optic flow induces world-centered criteria for estimates of 3D object motion.
In an immersive virtual reality environment, subjects fail to notice when a scene expands or contracts around them, despite correct and consistent information from binocular stereopsis and motion parallax, resulting in gross failures of size constancy (A. Glennerster, L. Tcheang, S. J. Gilson, A. W. Fitzgibbon, & A. J. Parker, 2006). We determined whether the integration of stereopsis/motion parallax cues with texture-based cues could be modified through feedback. Subjects compared the size of two objects, each visible when the room was of a different size. As the subject walked, the room expanded or contracted, although subjects failed to notice any change. Subjects were given feedback about the accuracy of their size judgments, where the "correct" size setting was defined either by texture-based cues or (in a separate experiment) by stereo/motion parallax cues. Because of feedback, observers were able to adjust responses such that fewer errors were made. For texture-based feedback, the pattern of responses was consistent with observers weighting texture cues more heavily. However, for stereo/motion parallax feedback, performance in many conditions became worse such that, paradoxically, biases moved away from the point reinforced by the feedback. This can be explained by assuming that subjects remap the relationship between stereo/motion parallax cues and perceived size or that they develop strategies to change their criterion for a size match on different trials. In either case, subjects appear not to have direct access to stereo/motion parallax cues.
The precision and accuracy of speed discrimination performance for stereomotion stimuli were assessed for several receding 3D trajectories confined to the horizontal meridian. It has previously been demonstrated in a variety of tasks that detection thresholds are substantially higher when subjects observe a stereomotion stimulus than when simply viewing one of its component monocular half-images--a phenomenon known as stereomotion suppression (C. W. Tyler, 1971). Using monocularly visible motion in depth targets, we found mean speed discrimination thresholds to be higher for stereomotion, compared with monocular lateral speed discrimination thresholds for equivalent stimuli, demonstrating a disadvantage for binocular viewing in the case of speed discrimination as well. Furthermore, speed discrimination thresholds for motion in depth were not systematically affected by trajectory angle; hence, the disadvantage of binocular viewing persists even when there are concurrent changes in binocular visual direction. Lastly, there was a tendency for oblique trajectories of stereomotion to be perceived as faster than equally rapid motion receding directly away from the subject along the midline. Our data, in addition to earlier stereomotion suppression observations, are consistent with a stereomotion system that takes a noisy, weighted difference of the stimulus velocities in the two eyes to compute motion in depth.
Among other cues, the visual system uses shading to infer the 3D shape of objects. The shading pattern depends on the illumination and reflectance properties (BRDF). In this study, we compared 3D shape perception between identical shapes with different BRDFs. The stimuli were photographed 3D printed random smooth shapes that were either painted matte gray or had a gray velvet layer. We used the gauge figure task (J. J. Koenderink, A. J. van Doorn, & A. M. L. Kappers, 1992) to quantify 3D shape perception. We found that the shape of velvet objects was systematically perceived to be flatter than the matte objects. Furthermore, observers' judgments were more similar for matte shapes than for velvet shapes. Lastly, we compared subjective with veridical reliefs and found large systematic differences: Both matte and velvet shapes were perceived more flat than the actual shape. The isophote pattern of a flattened Lambertian shape resembles the isophote pattern of an unflattened velvet shape. We argue that the visual system uses a similar shape-from-shading computation for matte and velvet objects that partly discounts material properties.
We sought to determine if perceived depth can elicit vergence eye movements independent of binocular disparity. A flat surface in the frontal plane appears slanted about a vertical axis when the image in one eye is vertically compressed relative to the image in the other eye: the induced size effect (Ogle, 1938). We show that vergence eye movements accompany horizontal gaze shifts across such surfaces, consistent with the direction of the perceived slant, despite the absence of a horizontal disparity gradient. All images were extinguished during the gaze shifts so that eye movements were executed open-loop. We also used vertical compression of one eye's image to null the perceived slant resulting from prior horizontal compression of that image, and show that this reduces the vergence accompanying horizontal gaze shifts across the surface, even though the horizontal disparity is unchanged. When this last experiment was repeated using vertical expansions in place of the vertical compressions, the perceived slant was increased and so too was the vergence accompanying horizontal gaze shifts, although the horizontal disparity again remained unchanged. We estimate that the perceived depth accounted, on average, for 15-41% of the vergence in our experiments depending on the conditions.
Little is known about the perception of 3D shape in the visual periphery. Here we ask whether identification accuracy in shape-from-texture and shape-from-motion tasks can be equated across the visual field with sufficient stimulus magnification. Both tasks employed 3D surfaces comprising hills, valleys, and plains in three possible locations, yielding a 27 alternative forced-choice task (27AFC). Participants performed the task at eccentricities of 0 to 16 deg in the right visual field over a 64-fold range of stimulus sizes. Performance reached ceiling levels at all eccentricities, indicating that stimulus magnification was sufficient to compensate for eccentricity-dependent sensitivity loss. The parameter E(2) (in the equation F = 1 + E / E(2)) was used to characterize the rate at which stimulus size must increase with eccentricity (E) to achieve foveal levels of performance. Three parameter models (mu, sigma, and E(2)) captured most of the variability in the psychometric functions relating stimulus size and eccentricity to accuracy for all participants' data in the two experiments. For the shape-from-texture task, the average E(2) was 1.52, and for the shape-from-motion task, it was 0.61. The E(2) values indicate that sensitivity to structure from motion declines at a faster rate with eccentricity than does sensitivity to structure from texture. Although size scaling with F = 1 + E / E(2) eliminated most eccentricity variation from the structure-from-motion data, there was some evidence that E(2) increases as accuracy decreases in the shape-from-texture task, suggesting that there may be more than one eccentricity-dependent limitation on performance in this task.
Humans often perform visually guided arm movements in a dynamic environment. To accurately plan visually guided manual tracking movements, the brain should ideally transform the retinal velocity input into a spatially appropriate motor plan, taking the three-dimensional (3D) eye-head-shoulder geometry into account. Indeed, retinal and spatial target velocity vectors generally do not align because of different eye-head postures. Alternatively, the planning could be crude (based only on retinal information) and the movement corrected online using visual feedback. This study aims to investigate how accurate the motor plan generated by the central nervous system is. We computed predictions about the movement plan if the eye and head position are taken into account (spatial hypothesis) or not (retinal hypothesis). For the motor plan to be accurate, the brain should compensate for the head roll and resulting ocular counterroll as well as the misalignment between retinal and spatial coordinates when the eyes lie in oblique gaze positions. Predictions were tested on human subjects who manually tracked moving targets in darkness and were compared to the initial arm direction, reflecting the motor plan. Subjects spatially accurately tracked the target, although imperfectly. Therefore, the brain takes the 3D eye-head-shoulder geometry into account for the planning of visually guided manual tracking.
Recently, T. B. Czuba, B. Rokers, K. Guillet, A. C. Huk, and L. K. Cormack, (2011) and Y. Sakano, R. S. Allison, and I. P. Howard (2012) published very similar studies using the motion aftereffect to probe the way in which motion through depth is computed. Here, we compare and contrast the findings of these two studies and incorporate their results with a brief follow-up experiment. Taken together, the results leave no doubt that the human visual system incorporates a mechanism that is uniquely sensitive to the difference in velocity signals between the two eyes, but--perhaps surprisingly--evidence for a neural representation of changes in binocular disparity over time remains elusive.
Surface specularity distorts the optic flow generated by a moving object in a way that provides important cues for identifying surface material properties (Doerschner, Fleming et al., 2011). Here we show that specular flow can also affect the perceived rotation axis of objects. In three experiments, we investigate how three-dimensional shape and surface material interact to affect the perceived rotation axis of unfamiliar irregularly shaped and isotropic objects. We analyze observers' patterns of errors in a rotation axis estimation task under four surface material conditions: shiny, matte textured, matte untextured, and silhouette. In addition to the expected large perceptual errors in the silhouette condition, we find that the patterns of errors for the other three material conditions differ from each other and across shape category, yielding the largest differences in error magnitude between shiny and matte, textured isotropic objects. Rotation axis estimation is a crucial implicit computational step to perceive structure from motion; therefore, we test whether a structure from a motion-based model can predict the perceived rotation axis for shiny and matte, textured objects. Our model's predictions closely follow observers' data, even yielding the same reflectance-specific perceptual errors. Unlike previous work (Caudek & Domini, 1998), our model does not rely on the assumption of affine image transformations; however, a limitation of our approach is its reliance on projected correspondence, thus having difficulty in accounting for the perceived rotation axis of smooth shaded objects and silhouettes. In general, our findings are in line with earlier research that demonstrated that shape from motion can be extracted based on several different types of optical deformation (Koenderink & Van Doorn, 1976; Norman & Todd, 1994; Norman, Todd, & Orban, 2004; Pollick, Nishida, Koike, & Kawato, 1994; Todd, 1985).
The current study investigated the long-term representation of spatiotemporal signature (J. V. Stone, 1998) and its coding nature in a dynamic object recognition task. In Experiment 1, the observers' recognition performance was impaired by an overall reversal of the studied objects' learning view sequences even when they were unsmooth, suggesting that the spatiotemporal appearance of the objects was used for recognition, and this effect was not restricted to smooth motion condition. In another four experiments, a feature reversal paradigm was applied that only the global-scale or local-scale dynamic feature of the view sequences was reversed at a time. The reversal effect still held, but it was selective to the sequence's feature saliency, suggesting that statistical representation based on specific features instead of the whole view sequence was used for recognition. Furthermore, top-down regulation on sequence smoothness was observed that the observers perceived the objects as moving in a smoother manner than they actually were. These results extend an emerging framework that argues the spatiotemporal appearance of a dynamic object contributes to its recognition. The spatiotemporal signature might be coded in a feature-based manner under the law of perceptual organization, and the coding process is adaptive to variation of the sequence's temporal order.
Computational models for determining three-dimensional shape from texture based on local foreshortening or gradients of scaling are able to achieve accurate estimates of surface relief from an image when it is observed from the same visual angle with which it was photographed or rendered. These models produce conflicting predictions, however, when an image is viewed from a different visual angle. An experiment was performed to test these predictions, in which observers judged the apparent depth profiles of hyperbolic cylinders under a wide variety of conditions. The results reveal that the apparent patterns of relief from texture are systematically underestimated; convex surfaces appear to have greater depth than concave surfaces, large camera angles produce greater amounts of perceived depth than small camera angles, and the apparent depth-to-width ratio for a given image of a surface is greater for small viewing angles than for large viewing angles. Because these results are incompatible with all existing computational models, a new model is presented based on scaling contrast that can successfully account for all aspects of the data.
The theoretical horopter is an interesting qualitative tool for conceptualizing binocular correspondence, but its quantitative applications have been limited because they have ignored ocular kinematics and vertical binocular sensory fusion. Here we extend the mathematical definition of the horopter to a full surface over visual space, and we use this extended horopter to quantify binocular alignment and visualize its dependence on eye position. We reproduce the deformation of the theoretical horopter into a spiral shape in tertiary gaze as first described by Helmholtz (1867). We also describe a new effect of ocular torsion, where the Vieth-Müller circle rotates out of the visual plane for symmetric vergence conditions in elevated or depressed gaze. We demonstrate how these deformations are reduced or abolished when the eyes follow the modification of Listing's law during convergence called L2, which enlarges the extended horopter and keeps its location and shape constant across gaze directions.
A new computational analysis is described for estimating 3D shapes from orthographic images of surfaces that are textured with planar cut contours. For any given contour pattern, this model provides a family of possible interpretations that are all related by affine scaling and shearing transformations in depth, depending on the specific values of its free parameters that are used to compute the shape estimate. Two psychophysical experiments were performed in an effort to compare the model predictions with observers' judgments of 3D shape for developable and non-developable surfaces. The results reveal that observers' perceptions can be systematically distorted by affine scaling and shearing transformations in depth and that the magnitude and direction of these distortions vary systematically with the 3D orientations of the contour planes.
This study tested perception of symmetry of 3D shapes from single 2D images. In Experiment 1, performance in discrimination between symmetric and asymmetric 3D shapes from single 2D line drawings was tested. In Experiment 2, performance in discrimination between different degrees of asymmetry of 3D shapes from single 2D line drawings was tested. The results showed that human performance in the discrimination was reliable. Based on these results, a computational model that performs the discrimination from single 2D images is presented. The model first recovers the 3D shape using a priori constraints: 3D symmetry, maximal 3D compactness, minimum surface area, and maximal planarity of contours. Then the model evaluates the degree of symmetry of the 3D shape. The model provided good fit to the subjects' data.
In this study human color constancy was tested for two-dimensional (2D) and three-dimensional (3D) setups with real objects and lights. Four different illuminant changes, a natural selection task and a wide choice of target colors were used. We found that color constancy was better when the target color was learned as a 3D object in a cue-rich 3D scene than in a 2D setup. This improvement was independent of the target color and the illuminant change. We were not able to find any evidence that frequently experienced illuminant changes are better compensated for than unusual ones. Normalizing individual color constancy hit rates by the corresponding color memory hit rates yields a color constancy index, which is indicative of observers' true ability to compensate for illuminant changes.
Humans can precisely judge relative location between two objects moving with the same speed and direction, as numerous studies have shown. However, the precision for localizing a single moving object relative to stationary references remains a neglected topic. Here, subjects reported the perceived location of a moving object at the time of a cue. The variability of the reported positions increased steeply with the speed of the object, such that the distribution of responses corresponds to the distance that the object traveled in 70 ms. This surprisingly large temporal imprecision depends little on the characteristics of the trajectory of the moving object or of the cue that indicates when to judge the position. We propose that the imprecision reflects a difficulty in identifying which position of the moving object occurs at the same time as the cue. This high-level process may involve the same low temporal resolution binding mechanism that, in other situations, pairs simultaneous features such as color and motion.
The activity of neurons in the primate posterior parietal cortex reflects the location of visual stimuli relative to the eye, body, and world, and is modulated by selective attention and task rules. It is not known however how these effects interact with each other. To address this question, we recorded neuronal activity from area 7a of monkeys trained to perform two variants of a delayed match-to-sample task. The monkeys attended a spatial location defined in either spatiotopic (world-centered) or retinotopic (eye-centered) coordinates. We found neuronal responses to be remarkably plastic depending on the task. In contrast to previous studies using the simple version of the delayed match-to-sample task, we discovered that after training in a task where the locus of attention shifted during the trial, neural responses were typically enhanced for a match stimulus. Our results further revealed that responses were mostly enhanced for stimuli matching in spatiotopic coordinates, although the proportion of neurons modulated by either coordinate frame was influenced by the behavioral task executed.
Dynamic registration uncertainty of a wavefront-guided correction with respect to underlying wavefront error (WFE) inevitably decreases retinal image quality. A partial correction may improve average retinal image quality and visual acuity in the presence of registration uncertainties. The purpose of this paper is to (a) develop an algorithm to optimize wavefront-guided correction that improves visual acuity given registration uncertainty and (b) test the hypothesis that these corrections provide improved visual performance in the presence of these uncertainties as compared to a full-magnitude correction or a correction by Guirao, Cox, and Williams (2002). A stochastic parallel gradient descent (SPGD) algorithm was used to optimize the partial-magnitude correction for three keratoconic eyes based on measured scleral contact lens movement. Given its high correlation with logMAR acuity, the retinal image quality metric log visual Strehl was used as a predictor of visual acuity. Predicted values of visual acuity with the optimized corrections were validated by regressing measured acuity loss against predicted loss. Measured loss was obtained from normal subjects viewing acuity charts that were degraded by the residual aberrations generated by the movement of the full-magnitude correction, the correction by Guirao, and optimized SPGD correction. Partial-magnitude corrections optimized with an SPGD algorithm provide at least one line improvement of average visual acuity over the full magnitude and the correction by Guirao given the registration uncertainty. This study demonstrates that it is possible to improve the average visual acuity by optimizing wavefront-guided correction in the presence of registration uncertainty.
Scheimpflug imaging was used to measure in six meridians the shape of the anterior and posterior cornea of the right eye of 114 subjects, ranging in age from 18 to 65 years. Subsequently, a three-dimensional model of the shape of the whole cornea was reconstructed, from which the coma aberration of the anterior and whole cornea could be calculated. This made it possible to investigate the compensatory role of the posterior surface to the coma aberration of the anterior corneal surface with age. Results show that, on average, the posterior surface compensates approximately 3.5% of the coma of the anterior surface. The compensation tends to be larger for young subjects (6%) than for older subjects (0%). This small effect of the posterior cornea on the coma aberration makes it clear that for the coma aberration of the whole eye, only the anterior corneal surface and the crystalline lens play a role. Consequently, for the design of an intraocular lens that is able to correct for coma aberration, it would be sufficient to only take the anterior corneal surface into account.
Theoretical and ray-tracing calculations on an accommodative eye model based on published anatomical data, together with wave-front experimental results on 15 eyes, are computed to study the change of spherical aberration during accommodation and its influence on the accommodation response. The three methodologies show that primary spherical aberration should decrease during accommodation, while secondary spherical aberration should increase. The hyperbolic shape of the lens' surfaces is the main factor responsible for the change of those aberrations during accommodation. Assuming that the eye accommodated to optimize image quality by minimizing the RMS of the wave front, it is shown that primary spherical aberration decreases the accommodation response, while secondary spherical aberration slightly increases it. The total effect of the spherical aberration is a reduction of around 1/7 D per diopter of stimulus approximation, although that value depends on the pupil size and its reduction during accommodation. The apparent accommodation error (lead and lag), typically present in the accommodation/response curve, could then be explained as a consequence of the strategy used by the visual system, and the apparatus of measurement, to select the best image plane that can be affected by the change of the spherical aberration during accommodation.
Correction of spherical (SA) and longitudinal chromatic aberrations (LCA) significantly improves monocular visual acuity (VA). In this work, the visual effect of SA correction in polychromatic and monochromatic light on binocular visual performance is investigated. A liquid crystal based binocular adaptive optics visual analyzer capable of operating in polychromatic light is employed in this study. Binocular VA improves when SA is corrected and LCA effects are reduced separately and in combination, resulting in the highest value for SA correction in monochromatic light. However, the binocular summation ratio is highest for the baseline condition of uncorrected SA in polychromatic light. Although SA correction in monochromatic light has a greater impact monocularly than binocularly, bilateral correction of both SA and LCA may further improve binocular spatial visual acuity which may support the use of aspheric-achromatic ophthalmic devices, in particular, intraocular lenses (IOLs).