We studied whether the blur/sharpness of an occlusion boundary between a sharply focused surface and a blurred surface is used as a relative depth cue. Observers judged relative depth in pairs of images that differed only in the blurriness of the common boundary between two adjoining texture regions, one blurred and one sharply focused. Two experiments were conducted; in both, observers consistently used the blur of the boundary as a cue to relative depth. However, the strength of the cue, relative to other cues, varied across observers. The occlusion edge blur cue can resolve the near/far ambiguity inherent in depth-from-focus computations.
... Varying accommodation distance has been shown to affect perceived slant, for example, by altering the estimate of distance used to interpret binocular disparities (Watt et al., 2005a;Hoffman et al., 2008). Blur can also contribute significantly to depth perception at occlusion edges (Marshall, Burbeck, Ariely, Rolland, & Martin, 1996;Mather, 1997;Nguyen, Howard, & Allison, 2005), and for scene points nearer and farther than fixation (Held, Cooper, & Banks, 2012). Moreover, changes to the "global" blur gradient in a scene can dramatically affect perception of its overall spatial scale, as seen in the phenomenon of tilt-shift miniaturization, in which increasing the "global" blur gradient causes natural scenes to resemble scale models (Held, Cooper, O'Brien, & Banks, 2010;Vishwanath & Blaser, 2010). ...
... This is consistent with previous work showing that depth-dependent blur can generate a sense of "realness" to perceived 3-D structure similar to that derived from binocular stereopsis and motion parallax (Vishwanath & Hibbard, 2013), and demonstrates that at least under some circumstances correct focus cues confer benefits for perceptual realism in 3-D imagery. This finding also adds to accumulating evidence that retinal blur in general plays a more important role in perception than has often been thought (Marshall et al., 1996;Mather, 1997;Nguyen et al., 2005;Held et al., 2010;Vishwanath & Blaser, 2010;Held et al., 2012;Vishwanath & Hibbard, 2013;March et al., 2022). ...
... Ancient artists have been able to paint and render a flat canvas into an impression of 3-dimensional depth by tricking our brains with pictorial cues. These pictorial cues in the visual scene such as texture (Johnston et al. 1993;Hornsey and Hibbard 2021), texture gradient (Hillis et al. 2004;Stevens 1981;Tozawa 2012;Tsutsui et al. 2002), blur (Held et al. 2012;Mather 1997), relative size, lighting direction (Langer and Bülthoff 2000), occlusion (Marshall et al. 1996), tilt (Fiorentini and Maffei 1971;Oluk et al. 2022), distance to the horizon (Gardner et al. 2010), and shading (Chen and Tyler 2015) also provide additional information to depth perception or stereopsis. In 3D shapes, different material properties, local luminance contrast changes at the borders, shadow, reflectance, and motion could help us to identify the figure from its background even with matching texture (Troscianko et al. 2009). ...
The spatial frequency (SF) content of an object’s texture is an important cue for depth perception, although less is known about the role of background texture. Here, we used bandpass-filtered noise patterns to systematically study the interactions between target and background textures in a virtual environment. During the trials, three square targets were presented at 3 m against a background wall 6 m away from the observer. One of the squares was presented closer than the other two, and the subjects had to indicate it with a key press. The threshold distance from the two reference tiles was determined using a staircase procedure. Both the target and background were tested with different combinations of SF textures and a non-textured gray, which were rendered onto the flat surfaces. Against a gray background, the distance thresholds were smallest when the targets were presented with a mid-SF texture. Performance declined significantly with a non-textured target against a textured background. With different combinations of target-background texture, the background texture significantly affected the performance. We propose several hypotheses to explain the behavioral result. Understanding the effect of surrounding texture can be useful in improving the depth perception experience in virtual reality.
... Figure 1 shows an example from the BSDS300, in which two deer obstruct the lawn; their boundaries are considered occlusion edges, while their shadows are not. This method of inferring occlusion relationships between different objects from a monocular 2D image has been widely applied in many fields, including visual tracking [1,2], mobile robotics [3,4], object detection [5][6][7][8], and segmentation [9][10][11]. ...
Analysis of the occlusion relationships between different objects in an image is fundamental to computer vision, including both the accurate detection of multiple objects’ contours in an image and each pixel’s orientation on the contours of objects with occlusion relationships. However, the severe imbalance between the edge pixels of an object in an image and the background pixels complicates occlusion relationship reasoning. Although progress has been made using convolutional neural network (CNN)-based methods, the limited coupling relationship between the detection of object occlusion contours and the prediction of occlusion orientation has not yet been effectively used in a full network architecture. In addition, the prediction of occlusion orientations and the detection of occlusion edges are based on the accurate extraction of the local details of contours. Therefore, we propose an innovative multitask coupling network model (MTCN). To address the abovementioned issues, we also present different submodules. The results of extensive experiments show that the proposed method surpasses state-of-the-art methods by 2.1% and 2.5% in Boundary-AP and by 3.5% and 2.8% in Orientation AP on the PIOD and BSDS datasets, respectively, indicating that the proposed method is more advanced than comparable methods.
... The severity of defocus blur increases linearly with distance from the plane of focus, producing gradients of blur across the image [18]; and because image blur depends on depth it can, in principle, be used to recover it [19]. In fact, experiments have shown that human vision can exploit this, at least when judging ordinal depth [20][21][22][23][24] or surface slant [25]. Furthermore, depth of field decreases as the focal plane is brought forward in the scene, which means that as that happens, defocus blur increases more rapidly with distance from the focal plane. ...
One of the primary jobs of visual perception is to build a three-dimensional representation of the world around us from our flat retinal images. These are a rich source of depth cues but no single one of them can tell us about scale (i.e., absolute depth and size). For example, the pictorial depth cues in a (perfect) scale model are identical to those in the real scene that is being modelled. Here we investigate image blur gradients, which derive naturally from the limited depth of field available for any optical device and can be used to help estimate visual scale. By manipulating image blur artificially to produce what is sometimes called fake tilt shift miniaturization, we provide the first performance-based evidence that human vision uses this cue when making forced-choice judgements about scale (identifying which of an image pair was a photograph of a full-scale railway scene, and which was a 1:76 scale model). The orientation of the blur gradient (relative to the ground plane) proves to be crucial, though its rate of change is less important for our task, suggesting a fairly coarse visual analysis of this image parameter.
... On the other hand, small windows increase the sensitivity to noise. (23,24) Pertuz et al. suggested that the optimum window size is a trade-off between spatial resolution and robustness. (21) As a more robust approach, an adaptive window size determined by the median absolute deviation (MAD) has been reported. ...
... Firstly, although the participants' position was maintained using a chin-rest, any residual movement would have provided motion parallax information indicating the structure of the surface of the display screen. Second, when the participants made eye-movements around the scene, to locations at different depicted depths, these were not accompanied by the expected changes in accommodation and image blur [62,[71][72][73]. These focus cues tend to conflict in traditional pictures and displays, but may be rendered in a way that is consistent with the scene content using a multi-focal display [74,75]. ...
Binocular disparity is an important cue to three-dimensional shape. We assessed the contribution of this cue to the reliability and consistency of depth in stereoscopic photographs of natural scenes. Observers viewed photographs of cluttered scenes while adjusting a gauge figure to indicate the apparent three-dimensional orientation of the surfaces of objects. The gauge figure was positioned on the surfaces of objects at multiple points in the scene, and settings were made under monocular and binocular, stereoscopic viewing. Settings were used to create a depth relief map, indicating the apparent three-dimensional structure of the scene. We found that binocular cues increased the magnitude of apparent depth, the reliability of settings across repeated measures, and the consistency of perceived depth across participants. These results show that binocular cues make an important contribution to the precise and accurate perception of depth in natural scenes that contain multiple pictorial cues.
... The inconsistent defocus patterns destroy the relationship between the depth and the blur, which is crucial in the context of depth perception 22,23 . Moreover, the presence of a lucid boundary at the interface between objects with different depths due to interference distorts the perception of the relative depth between objects 24 . Thus, for the high image quality hologram without distortion of depth perception, both advantages of diffusive holograms and non-diffusive holograms are required. ...
Holography is one of the most prominent approaches to realize true-to-life reconstructions of objects. However, owing to the limited resolution of spatial light modulators compared to static holograms, reconstructed objects exhibit various coherent properties, such as content-dependent defocus blur and interference-induced noise. The coherent properties severely distort depth perception, the core of holographic displays to realize 3D scenes beyond 2D displays. Here, we propose a hologram that imitates defocus blur of incoherent light by engineering diffracted pattern of coherent light with adopting multi-plane holography, thereby offering real world-like defocus blur and photorealistic reconstruction. The proposed hologram is synthesized by optimizing a wave field to reconstruct numerous varifocal images after propagating the corresponding focal distances where the varifocal images are rendered using a physically-based renderer. Moreover, to reduce the computational costs associated with rendering and optimizing, we also demonstrate a network-based synthetic method that requires only an RGB-D image.
In European painting, a transition took place where artists started to consciously introduce blurred or soft contours in their works. There may have been several reasons for this. One suggestion in art historical literature is that this may have been done to create a stronger sense of volume in the depicted figures or objects. Here we describe four experiments in which we tried to test whether soft or blurred contours do indeed enhance a sense volume or depth. In the first three experiments, we found that, for both paintings and abstract shapes, three dimensionality was actually decreased instead of increased for blurred (and line) contours, in comparison with sharp contours. In the last experiment, we controlled for the position of the blur (on the lit or dark side) and found that blur on the lit side evoked a stronger impression of three dimensionality. Overall, the experiments robustly show that an art historical conjecture that a blurred contour increases three dimensionality is not granted. Because the blurred contours can be found in many established art works such as from Leonardo and Vermeer, there must be other rationales behind this use than the creation of a stronger sense of volume or depth.
In three experiments we examined the influence of spatial frequency on perceptual organization. In each experiment a pattern was tested that was ambiguous in terms of figure and ground. In each experiment, the stimuli were 20 variations of the pattern, which were generated by filling the two regions of the pattern with horizontal sine wave gratings differing in spatial frequency. Five spatial frequencies were tested: 0.5, 1, 2, 4, and 8 cycles per degree. The response measure was the percentage of response time one of the regions was seen as the figure. This region was seen as the figure a higher percentage of the time in those stimuli where it contained the relatively higher spatial frequency sine wave grating compared with those stimuli where it contained the relatively lower spatial frequency sine wave grating.
The injection rate of particles at the front of interplanetary shocks has usually been studied locally, around the shock passage by the observer's position, but little is known of how the efficiency of the particle-acceleration process evolves as the shock propagates from the Sun to the Earth. In many events accelerated particles are observed long in advance of the arrival of the shock (from 5 hours to 2 days), and they show large anisotropies. We have used a compound shock-particle model to derive the injection rate of particles at the shock front and their energy spectrum, as a function of time, by fitting the observed particle fluxes and anisotropies between 100 and 1000 keV. We have studied three individual low-energy particle events taken as representatives of West, Central Meridian and East events.
One of the major unsolved problems in designing an autonomous agent [robot] that must function in a complex, moving environment is obtaining reliable, real-time depth information, preferably without the limitations of active scanners. Stereo remains computationally intensive and prone to severe errors, the use of motion information is still quite experimental, and autofocus schemes can measure depth at only one point at a time. We examine a novel source of depth information: focal gradients resulting from the limited depth of field inherent in most optical systems. We prove that this source of information can be used to make reliable depth maps of useful accuracy with relatively minimal computation. Experiments with realistic imagery show that measurement of these optical gradients can potentially provide depth information roughly comparable to stereo disparity or motion parallax, while avoiding image-to-image matching problems. A potentially real-time version of this algorithm is described.
This paper examines a novel source of depth information: focal gradients resulting from the limited depth of field inherent in most optical systems. Previously, autofocus schemes have used depth of field to measured depth by searching for the lens setting that gives the best focus, repeating this search separately for each image point. This search is unnecessary, for there is a smooth gradient of focus as a function of depth. By measuring the amount of defocus, therefore, we can estimate depth simultaneously at all points, using only one or two images. It is proved that this source of information can be used to make reliable depth maps of useful accuracy with relatively minimal computation. Experiments with realistic imagery show that measurement of these optical gradients can provide depth information roughly comparable to stereo disparity or motion parallax, while avoiding image-to-image matching problems.
A new method for extracting depth information from 2-dimensional images is propsed. The depth structure of a scene is computed using measurements of the degree of blur in an image which is only in parts sharply focused. Preliminary tests suggest that the method is surprisingly powerful and may have interesting applications.
In recent times a great deal of interest has been shown, amongst the computer vision and robotics research community, in the acquisition of range data for supporting scene analysis leading to remote (noncontact) determination of configurations and space filling extents of three-dimensional object assemblages. This paper surveys a variety of approaches to generalized range finding and presents a perspective on their applicability and shortcomings in the context of computer vision studies.
Image regions corresponding to partially hidden objects are enclosed by two types of bounding contour: those inherent to the object itself (intrinsic) and those defined by occlusion (extrinsic). Intrinsic contours provide useful information regarding object shape, whereas extrinsic contours vary arbitrarily depending on accidental spatial relationships in scenes. Because extrinsic contours can only degrade the process of surface description and object recognition, it is argued that they must be removed prior to a stage of template matching. This implies that the two types of contour must be distinguished relatively early in visual processing and we hypothesize that the encoding of depth is critical for this task. The common border is attached to and regarded as intrinsic to the closer region, and detached from and regarded as extrinsic to the farther region. We also suggest that intrinsic borders aid in the segmentation of image regions and thus prevent grouping, whereas extrinsic borders provide a linkage to other extrinsic borders and facilitate grouping. Support for these views is found in a series of demonstrations, and also in an experiment where the expected superiority of recognition was found when partially sampled faces were seen in a back rather than a front stereoscopic depth plane.
In three experiments we examined the influence of spatial frequency on perceptual organization. In each experiment a pattern was tested that was ambiguous in terms of figure and ground. In each experiment, the stimuli were 20 variations of the pattern, which were generated by filling the two regions of the pattern with horizontal sine wave gratings differing in spatial frequency. Five spatial frequencies were tested: 0.5, 1, 2, 4, and 8 cycles per degree. The response measure was the percentage of response time one of the regions was seen as the figure. This region was seen as the figure a higher percentage of the time in those stimuli where it contained the relatively higher spatial frequency sine wave grating compared with those stimuli where it contained the relatively lower spatial frequency sine wave grating.