Figure - available from: Virtual Reality
This content is subject to copyright. Terms and conditions apply.
Relationships between actual distance and perceived thirds in the trisection task, presented for two possibilities of nonlinear misperception of distance. Accurate linear perception in black; perceived thirds to be set not far enough in the expansion instance; perceived thirds shown to be set further than the true thirds in the expansion instance

Relationships between actual distance and perceived thirds in the trisection task, presented for two possibilities of nonlinear misperception of distance. Accurate linear perception in black; perceived thirds to be set not far enough in the expansion instance; perceived thirds shown to be set further than the true thirds in the expansion instance

Source publication
Article
Full-text available
We assessed the contribution of binocular disparity and the pictorial cues of linear perspective, texture, and scene clutter to the perception of distance in consumer virtual reality. As additional cues are made available, distance perception is predicted to improve, as measured by a reduction in systematic bias, and an increase in precision. We as...

Similar publications

Article
Full-text available
Binocular disparity is an important cue to three-dimensional shape. We assessed the contribution of this cue to the reliability and consistency of depth in stereoscopic photographs of natural scenes. Observers viewed photographs of cluttered scenes while adjusting a gauge figure to indicate the apparent three-dimensional orientation of the surfaces...

Citations

... Instead, it is shaped by a range of 8 systematic distortions, resulting in a perceived space that deviates in complex ways 9 from Euclidean geometry [1]. These biases encompass a variety of phenomena, including 10 compressed depth perception, where distances at greater depths are systematically 11 underestimated [2][3][4][5][6][7][8]; shifts in vanishing point, which alters the overall geometry of the 12 scene [9,10]; characteristic distortions observed when viewing photographs [11,12]; and 13 context-dependent depth perception, with environmental factors modulating how 14 distance cues are weighted [13][14][15][16]. The precise nature of these distortions, their 15 underlying causes within the visual system, and the form of the spatial representation 16 that emerges from monocular vision remain open and actively debated questions. ...
... To quantitatively analyze the error patterns in both 87 human and DNN depth judgments, we employed exponential-affine fitting. This method 88 is grounded in psychophysical findings that human depth perception is subject to both 89 depth compression [2][3][4][5][6][7][8] and affine distortions [44][45][46][47][48]. Exponential fitting captures the 90 non-linear compression of perceived depth with increasing distance, while affine fitting Intriguingly, we demonstrate that DNNs trained for monocular depth estimation also 97 partially exhibit these human-like biases, particularly depth compression, and that 98 higher accuracy in DNNs correlates with greater similarity to human error patterns in 99 terms of affine components. These findings establish a foundation for advancing 100 comparative studies in depth estimation and suggest the potential for developing more 101 human-like depth estimation models in a data-driven manner. ...
... Why do DNNs partially share some of the systematic depth judgment biases 773 observed in humans? Notably, DNNs exhibit the depth compression effect, a 774 well-documented nonlinear phenomenon in human perception [2][3][4][5][6][7][8]. In human vision, 775 such compression is often explained by optimal integration of uncertain sensory cues at 776 long distances, where priors skewed toward nearer distances dominate perception, 777 leading to a systematic underestimation of far-distances [72,73]. ...
Preprint
Full-text available
Human depth perception from 2D images is systematically distorted, yet the nature of these distortions is not fully understood. To gain insights into this fundamental problem, we compare human depth judgments with those of deep neural networks (DNNs), which have shown remarkable abilities in monocular depth estimation. Using a novel human-annotated dataset of natural indoor scenes and a systematic analysis of absolute depth judgments, we investigate error patterns in both humans and DNNs. Employing exponential-affine fitting, we decompose depth estimation errors into depth compression, per-image affine transformations (including scaling, shearing, and translation), and residual errors. Our analysis reveals that human depth judgments exhibit systematic and consistent biases, including depth compression, a vertical bias (perceiving objects in the lower visual field as closer), and consistent per-image affine distortions across participants. Intriguingly, we find that DNNs with higher accuracy partially recapitulate these human biases, demonstrating greater similarity in affine parameters and residual error patterns. This suggests that these seemingly suboptimal human biases may reflect efficient, ecologically adapted strategies for depth inference from inherently ambiguous monocular images. However, while DNNs capture metric-level residual error patterns similar to humans, they fail to reproduce human-level accuracy in ordinal depth perception within the affine-invariant space. These findings underscore the importance of evaluating error patterns beyond raw accuracy, providing new insights into how humans and computational models resolve depth ambiguity. Our dataset and methodology provide a framework for evaluating the alignment between computational models and human perceptual biases, thereby advancing our understanding of visual space representation and guiding the development of models that more faithfully capture human depth perception. Author summary Understanding the characteristics of errors in depth judgments exhibited by humans and deep neural networks (DNNs) provides a foundation for developing functional models of human brain and artificial models with enhanced interpretability. To address this, we constructed a human depth judgment dataset using indoor photographs and compared human depth judgments with those of DNNs. Our results show that humans systematically compress far distances and exhibit distortions related to viewpoint shift, which remain remarkably consistent across observers. Strikingly, the better the DNNs were at depth estimation, the more they also exhibited human-like biases. This suggests that these seemingly suboptimal human biases could in fact reflect efficient strategies for inferring 3D structure from ambiguous 2D inputs. However, we also found a limit: while DNNs mimicked some human errors, they weren’t as good as humans at judging the relative order of objects in depth, especially when we accounted for viewpoint distortions. We believe that our dataset and discovery of multiple error factors will drive further comparative studies between humans and DNNs, facilitating model evaluations that go beyond simple accuracy to uncover how depth perception truly works—and how it might best be replicated in computational models.
... The visible room included everyday life stimuli such as chairs and tables which may have helped participants adapting to VR. Humans were found to use objects' familiar size to estimate distances to them in laboratory paradigms, especially when spatial cues were reduced e.g. by monocular view [48]. Also in VR, adding spatial cues such as texture or furniture reduced the error of object size estimations [49]. ...
Article
Full-text available
Investigating auditory perception and cognition in realistic, controlled environments is made possible by virtual reality (VR). However, when visual information is presented, sound localization results from multimodal integration. Additionally, using headmounted displays leads to a distortion of visual egocentric distances. With two different paradigms, we investigated the extent to which different visual scenes influence auditory distance perception, and secondary presence and realism. To be more precise, different room models were displayed via HMD while participants had to localize sounds emanating from real loudspeakers. In the first paradigm, we manipulated whether a room was congruent or incongruent to the physical room. In a second paradigm, we manipulated room visibility - displaying either an audiovisual congruent room or a scene containing almost no spatial information- and localization task. Participants indicated distances either by placing a virtual loudspeaker, walking, or verbal report. While audiovisual room incongruence had a detrimental effect on distance perception, no main effect of room visibility was found but an interaction with the task. Overestimation of distances was higher using the placement task in the non-spatial scene. The results suggest an effect of visual scene on auditory perception in VR implying a need for consideration e.g., in virtual acoustics research
... 2) Visuospatial Perception in VR: The human visual system integrates various cues to determine the objects' position, including monocular images, disparity-based binocular cues, shape, parallax from object motion, and physiological factors such as lens accommodation, and gaze convergence [30], [31]. In VR, hardware factors such as the field of view and focus accommodation distance can also affect perceived depth [30]. ...
... 2) Visuospatial Perception in VR: The human visual system integrates various cues to determine the objects' position, including monocular images, disparity-based binocular cues, shape, parallax from object motion, and physiological factors such as lens accommodation, and gaze convergence [30], [31]. In VR, hardware factors such as the field of view and focus accommodation distance can also affect perceived depth [30]. ...
Article
Full-text available
Autistic individuals often exhibit superior local visual sensitivity but may struggle with global visual processing, affecting their visuomotor integration (VMI). Goal-directed overhand throwing is common in both the physical environment (PE) and virtual reality (VR) games, demanding spatial and temporal accuracy to perceive position and motion, and precise VMI. Understanding VMI in autistic individuals and exploring supportive designs in VR are crucial for rehabilitation and improving accessibility.We assessed static visuospatial accuracy and VMI with autistic ( n = 16) and non-autistic ( n = 16) adults using spatial estimation and overhand throwing tasks with eye and hand tracking, comparing VR to PE. In VR, all participants exhibited reduced visual accuracy, increased visual scanning, and shortened quiet eye duration and eye following duration after the ball release, which led to decreased throwing performance. However, simplifying visual information in VR throwing improved these measures, and resulted in autistic individuals outperforming non-autistic peers.
... Several studies indicate that in high fidelity, immersion, and graphics quality, distance underestimation is less significant [26,44,63]. Moreover, texture gradients considerably reduced distance underestimation for short distances [25]. Other work suggests that graphics quality minimally impacts distance estimates [36,47,75,77] and reducing visual realism does not significantly impact distance perception [38,76,77]. ...
... Increasing the number of familiar size objects to participants affords comparing their relative size, and increases clutter, occlusion, and shadows serving as depth cues that improve spatial judgments [8,10,26,68,73]. With increased distance from the viewer, parallel lines appear to converge (linear perspective), and textures appear denser (texture gradient), providing additional relative depth cues that can help improve distance and depth estimations [7,25,37,68]. In our study, rug and sidewalk tiles alongside grass and carpet textures provided texture gradient cues. ...
... 4.1.1). This finding was surprising as we expected to see distance underestimation decrease gradually from low to high visual complexity as indicated in prior findings [25,26,44,63]. This could be due to differences between our modern hardware compared to what exists in the literature, as newer devices have less distance underestimation [4,7,[32][33][34], and it is plausible that combining the depth cues we implemented when using modern hardware with a large FOV reduces their effect on distance judgment. ...
Article
Full-text available
Virtual Reality (VR) systems are widely used, and it is essential to know if spatial perception in virtual environments (VEs) is similar to reality. Research indicates that users tend to underestimate distances in VR. Prior work suggests that actual distance judgments in VR may not always match the users self-reported preference of where they think they most accurately estimated distances. However, no explicit investigation evaluated whether user preferences match actual performance in a spatial judgment task. We used blind walking to explore potential dissimilarities between actual distance estimates and user-selected preferences of visual complexities, VE conditions, and targets. Our findings show a gap between user preferences and actual performance when visual complexities were varied, which has implications for better visual perception understanding, VR applications design, and research in spatial perception, indicating the need to calibrate and align user preferences and true spatial perception abilities in VR
... Beams et al. reported the impact of camera rotation (e.g., pupil rotation and eye rotation) on the spatial resolution in VR [19]. Perception in VR combines the display and optical limitations of the HMD (e.g., display resolution, refresh rate, aberration, diffraction, veiling glare, contrast degradation, nonuniformity, and color inaccuracy etc. [31,32]) with the human visual capabilities (e.g., contrast sensitivity of the native human eye, visual acuity, spectral response, and binocular disparity etc. [33][34][35]). However, it is uncertain if the measured optical characteristics can precisely describe the perceptual image quality in VR without involving human vision. ...
Article
Full-text available
Visual perception on virtual reality head-mounted displays (VR HMDs) involves human vision in the imaging pipeline. Image quality evaluation of VR HMDs may need to be expanded from optical bench testing by incorporating human visual perception. In this study, we implement a 5-degree-of-freedom (5DoF) experimental setup that simulates the human eye geometry and rotation mechanism. Optical modulation transfer function (MTF) measurements are performed using various camera rotation configurations namely pupil rotation, eye rotation, and eye rotation with angle kappa of the human visual system. The measured MTFs of the VR HMD are inserted into a human eye contrast sensitivity model to predict the perceptual contrast sensitivity function (CSF) on a VR HMD. At the same time, we develop a WebXR test platform to perform human observer experiments. Monocular CSFs of human subjects with different interpupillary distance (IPD) are extracted and compared with those calculated from optical MTF measurements. The result shows that image quality, measured as MTF and CSF, degrades at the periphery of display field of view, especially for subjects with an IPD different than that of the HMD. We observed that both the shift of visual point on the HMD eyepiece and the angle between the optical axes of the eye and eyepiece degrade image quality due to optical aberration. The computed CSFs from optical measurement correlates with those of the human observer experiment, with the optimal correlation achieved using the eye rotation with angle kappa setup. The finding demonstrates that more precise image quality assessment can be achieved by integrating eye rotation and human eye contrast sensitivity into optical bench testing.
... Representation of Eqs. (1) and(2). ...
Article
Full-text available
Shadows in physical space are copious, yet the impact of specific shadow placement and their abundance is yet to be determined in virtual environments. This experiment aimed to identify whether a target’s shadow was used as a distance indicator in the presence of binocular distance cues. Six lighting conditions were created and presented in virtual reality for participants to perform a perceptual matching task. The task was repeated in a cluttered and sparse environment, where the number of cast shadows (and their placement) varied. Performance in this task was measured by the directional bias of distance estimates and variability of responses. No significant difference was found between the sparse and cluttered environments, however due to the large amount of variance, one explanation is that some participants utilised the clutter objects as anchors to aid them, while others found them distracting. Under-setting of distances was found in all conditions and environments, as predicted. Having an ambient light source produced the most variable and inaccurate estimates of distance, whereas lighting positioned above the target reduced the mis-estimation of distances perceived.
... For example, the perception of distance normally arises from the combination of monocular cues, including pictorial cues (e.g., occlusion, perspective, texture gradient, shading), motion parallax and accommodation, with binocular cues (e.g., disparity, vergence; see [17] for a review). These cues provide complimentary information and are reported to be similarly effective in VR at intermediate distance (3 m), with pictorial cues dominating at far distance [18]. Conversely, ☆ This paper was recommended for publication by Prof Guangtao Zhai. ...
... errors of binocular cues, in particular the matching of disparities, lead to moderate errors in reaching to targets in peri-personal space [19]. Thus, both the visual perception of depth [18,20,21] and visual-motor interaction in depth [15,19] are impaired in VR. ...
... Overall, our results confirmed the expected egocentric distance underestimation and a clear difference between the perceived and the simulated egocentric distances. Our study was conducted in an indoor VE consisting of a broad range of binocular and pictorial cues, which according to Hornsey and Hibbard (2021), improve the accuracy of distance estimation in VR. However, despite the high fidelity of the indoor VE, we still observed underestimation. ...
Article
In virtual reality (VR) studies, where object distance plays a role of an independent variable, unknown egocentric distance perception values can affect the interpretation of the collected data. It is known that the perceived egocentric distance in VR is often underestimated, which may affect other judgments that implicitly depend on it. In order to prepare later experiments on the effect of distance on audiovisual (a)synchrony perception, this study quantifies the egocentric distance perception in a virtual indoor environment using two methods: verbal judgment (VJ) and position adjustment (PA). For the VJ method, participants verbally estimated the distance between their own position and a cardboard box position at a distance between a nominal 5 m and 30 m, with increments of 5 m. For the PA method, participants were asked to position a cardboard box to an instructed distance of a nominal 5 m to 13 m, with increments of 1 m. Both methods (VJ and PA) showed significant and substantial levels of underestimation, where simulated distance was underestimated on average by 38.5%. Our study suggests taking these findings into account when treating distance as an independent parameter in experiments conducted in purely simulated virtual spaces which do not exist physically in the real world.
... It is not however possible to predict from these studies the importance of binocular disparity or other cues in everyday vision, since this will depend on the particular balance of information provided by the many depth cues available in any given scene [25]. To answer this, it is necessary to design experiments using complex natural stimuli [26,27]. ...
Article
Full-text available
Binocular disparity is an important cue to three-dimensional shape. We assessed the contribution of this cue to the reliability and consistency of depth in stereoscopic photographs of natural scenes. Observers viewed photographs of cluttered scenes while adjusting a gauge figure to indicate the apparent three-dimensional orientation of the surfaces of objects. The gauge figure was positioned on the surfaces of objects at multiple points in the scene, and settings were made under monocular and binocular, stereoscopic viewing. Settings were used to create a depth relief map, indicating the apparent three-dimensional structure of the scene. We found that binocular cues increased the magnitude of apparent depth, the reliability of settings across repeated measures, and the consistency of perceived depth across participants. These results show that binocular cues make an important contribution to the precise and accurate perception of depth in natural scenes that contain multiple pictorial cues.
... Kelly [5] found that the newest HMDs, on average, show distance estimation at about 82% of actual distance, which is improved, but still underestimated relative to the real world. Many factors have been examined as explanations for the underperception of scale including but not limited to: FOV and weight of HMDs [29][30][31][32], geometric distortions in displays [8,[33][34][35][36][37], graphics quality or realism [28,[38][39][40], pictorial or ground-surface cues [41][42][43][44][45], and response measures [38,46,47]. Although some of these variables influence distance estimates when manipulated, none have completely explained the differences between estimates made in VEs and the real world. ...
Article
Full-text available
Decades of research have shown that absolute egocentric distance is underestimated in virtual environments (VEs) when compared with the real world. This finding has implications on the use of VEs for applications that require an accurate sense of absolute scale. Fortunately, this underperception of scale can be attenuated by several factors, making perception more similar to (but still not the same as) that of the real world. Here, we examine these factors as two categories: (i) experience inherent to the observer, and (ii) characteristics inherent to the display technology. We analyse how these factors influence the sources of information for absolute distance perception with the goal of understanding how the scale of virtual spaces is calibrated. We identify six types of cues that change with these approaches, contributing both to a theoretical understanding of depth perception in VEs and a call for future research that can benefit from changing technologies. This article is part of the theme issue ‘New approaches to 3D vision’.