Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Stereoscopic three-dimensional (3-D) services do not always prevail when compared with their two-dimensional (2-D) counterparts, though the former can provide more immersive experience with the help of binocular depth. Various specific 3-D artefacts might cause discomfort and severely degrade the Quality of Experience (QoE). In this paper, we analyze one of the most annoying artefacts in the visualization stage of stereoscopic imaging, namely, crosstalk, by conducting extensive subjective quality tests. A statistical analysis of the subjective scores reveals that both scene content and camera baseline have significant impacts on crosstalk perception, in addition to the crosstalk level itself. Based on the observed visual variations during changes in significant factors, three perceptual attributes of crosstalk are summarized as the sensorial results of the human visual system (HVS). These are shadow degree, separation distance, and spatial position of crosstalk. They are classified into two categories: 2-D and 3-D perceptual attributes, which can be described by a Structural SIMilarity (SSIM) map and a filtered depth map, respectively. An objective quality metric for predicting crosstalk perception is then proposed by combining the two maps. The experimental results demonstrate that the proposed metric has a high correlation (over 88%) when compared with subjective quality scores in a wide variety of situations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Besides, the authors of [11] use two similar natural scenes with varying crosstalk levels (0%, 5%, 10%, 15%) and camera baselines (0, 4, and 12cm), to investigate the effect of crosstalk on perceived image distortion, perceived depth, and visual discomfort. In [15], the authors give a detailed analysis on the effect of 2D and 3D perceptual attributes on crosstalk. Then they integrate the Structural SIMilarity map (SSIM) and the filtered depth map to build objective metric for crosstalk perception. ...
... Then they integrate the Structural SIMilarity map (SSIM) and the filtered depth map to build objective metric for crosstalk perception. Although the metric in [15] could be used to predict the subjective judgment by humans with a high correlation, the authors use the comparison between the crosstalk images and the original images, which is not useful in practice. For instance, they have to synthesize crosstalk images from original pairs, which would introduce errors inevitably. ...
... So the traditional measure of crosstalk is displaying full-black and full-white in left and right eye channels, and using an optical sensor to measure the amount of leakage between channels. In this metric, four cross-combinations of full-white and full-black in each eye channel have been used, and the system-introduced crosstalk can be modeled as follows [7], [15], [17]: ...
Article
Full-text available
We propose a new metric to predict the perceived crosstalk using the original images rather than both the original and ghosted images. The proposed metrics are based on color information. First, we extract a disparity map, a color difference map and a color contrast map from original image pairs. Then we use those maps to construct two new metrics (Vdispc and Vdlogc). Metric Vdispc considers the effect of the disparity map and the color difference map, while Vdlogc addresses the influence of the color contrast map. The prediction performance is evaluated using various types of stereoscopic crosstalk images. By incorporating Vdispc and Vdlogc, the new metric Vpdlc is proposed to achieve a higher correlation with the perceived subject crosstalk scores. Experimental results show that the new metrics achieve better performance than previous methods, which indicates that color information is one key factor for crosstalk visible prediction. Furthermore, we construct a new dataset to evaluate our new metrics.
... Subjective evaluation is a psychophysical method that tests whether subjects experience discomfort or fatigue symptoms such as eyestrain, double or blurred vision and headache when watching certain types of stereoscopic images or videos. Researchers have found that several factors may induce visual discomfort, including excessive screen disparity [1], accommodation-vergence conflict [2] [3], binocular asymmetry [4], vertical disparities [5], and crosstalk artifacts [6]. Because human eyes are the ultimate receiver of stereoscopic images, subjective experiment is regarded as the most reliable way to evaluate stereoscopic image quality. ...
... In objective metric, visual comfort is predicted by quantitative measurement, in which stereoscopic image analysis is usually performed to quantify comfort. Based on the research findings in the subjective experiments [1][2][3][4][5][6], several objective SVCA metrics have considered the founded discomfort induced factors. For excessive screen disparity factor, Yong et al. [7][8], Kim et al. [9] and Nojiri et al. [10] investigated the relationship between the disparity distribution and visual comfort for stereoscopic image or video. ...
... For binocular asymmetry factor, Yano et al. [12] detected visual discomfort image scenes based on the correlation of left and right images. For crosstalk artifact factor, Xing et al. [6] proposed an objective quality metric for predicting crosstalk perception by combining the structural similarity map and a filtered depth map. In addition, Jung et al. [13], Sohn et al. [14], Ide et al. [15] and Jones et al. [16] used their proposed SVCA metrics to lessen visual discomfort. ...
... Pervious researches have found that the crosstalk distortion caused by camera baseline and crosstalk level was usually perceptible in edge regions for the same scene content. In addition, when different scene contents are considered, the closer the 3D depth of the foreground to the nearest 3D depth of the scene is, the more visible the perceived crosstalk is [2] [3], and it is known that human subjects tend to pay more attention to perceptually significant regions (such as image saliency regions [4]). In this study, we focus on an objective assessment metric for the crosstalk with different crosstalk levels, camera baselines and scene contents. ...
... All of the three components are combined and pooled to predict the perceived crosstalk. The SSIM quality measure proposed by Wang et al. [5] can describe the 2D crosstalk region to some extent [2]. In our study, the structural dissimilarity map is constructed between the original image and the distorted version with both the system introduced and the simulated. ...
... The region of visible crosstalk is defined based on SSIM, because it is observed that crosstalk is more visible in the regions where the pixel value of structural dissimilarity map is smaller than a threshold, whose value is chosen to be 0.023 in our experiment [2].Since 3-D perceptual attributes discover that visible crosstalk of foreground objects has more impacts on perception than background objects, more weights should be assigned to the visible crosstalk of foreground than background [2]. Therefore, the following formula is used to extract the filtered visual importance map. ...
Conference Paper
Full-text available
CONTEXT: Nowadays, almost all stereoscopic displays suffer from crosstalk, which is one of the most dominant degradation factors of image quality and visual comfort for 3D display devices. To deal with such problems, it is worthy to quantify the amount of perceived crosstalk OBJECTIVE: Crosstalk measurements are usually based on some certain test patterns, but scene content effects are ignored. To evaluate the perceived crosstalk level for various scenes, subjective test may bring a more correct evaluation. However, it is a time consuming approach and is unsuitable for real­ time applications. Therefore, an objective metric that can reliably predict the perceived crosstalk is needed. A correct objective assessment of crosstalk for different scene contents would be beneficial to the development of crosstalk minimization and cancellation algorithms which could be used to bring a good quality of experience to viewers. METHOD: A patterned retarder 3D display is used to present 3D images in our experiment. By considering the mechanism of this kind of devices, an appropriate simulation of crosstalk is realized by image processing techniques to assign different values of crosstalk to each other between image pairs. It can be seen from the literature that the structures of scenes have a significant impact on the perceived crosstalk, so we first extract the differences of the structural information between original and distorted image pairs through Structural SIMilarity (SSIM) algorithm, which could directly evaluate the structural changes between two complex-structured signals. Then the structural changes of left view and right view are computed respectively and combined to an overall distortion map. Under 3D viewing condition, because of the added value of depth, the crosstalk of pop-out objects may be more perceptible. To model this effect, the depth map of a stereo pair is generated and the depth information is filtered by the distortion map. Moreover, human attention is one of important factors for crosstalk assessment due to the fact that when viewing 3D contents, perceptual salient regions are highly likely to be a major contributor to determining the quality of experience of 3D contents. To take this into account, perceptual significant regions are extracted, and a spatial pooling technique is used to combine structural distortion map, depth map and visual salience map together to predict the perceived crosstalk more precisely. To verify the performance of the proposed crosstalk assessment metric, subjective experiments are conducted with 24 participants viewing and rating 60 simuli (5 scenes * 4 crosstalk levels * 3 camera distances). After an outliers removal and statistical process, the correlation with subjective test is examined using Pearson and Spearman rank-order correlation coefficient. Furthermore, the proposed method is also compared with two traditional 2D metrics, PSNR and SSIM. The objective score is mapped to subjective scale using a nonlinear fitting function to directly evaluate the performance of the metric. RESULIS: After the above-mentioned processes, the evaluation results demonstrate that the proposed metric is highly correlated with the subjective score when compared with the existing approaches. Because the Pearson coefficient of the proposed metric is 90.3%, it is promising for objective evaluation of the perceived crosstalk. NOVELTY: The main goal of our paper is to introduce an objective metric for stereo crosstalk assessment. The novelty contributions are twofold. First, an appropriate simulation of crosstalk by considering the characteristics of patterned retarder 3D display is developed. Second, an objective crosstalk metric based on visual attention model is introduced.
... I MAGE quality assessment (IQA) is currently a fundamentally researched topic in image processing [1]. However, with the fast development of the communication technology, the stereoscopic image quality assessment (SIQA) has been a particularly urgent issue due to the fact that the quality degradation of stereoscopic image has stronger effects on human visual system [2] such as visual fatigue and dizziness etc. In order to solve this problem, a better SIQA method is needed to be studied so that we can provide a standard for the producer of 3D content and improve the 3D technology such as its coding, transmission or display. ...
... The T CE * v , vv(l, r) represents the total contrast energy for gain enhancement. The equation (1) can be rewritten as (2). ...
... Quality impairment is sufficient to affect depth perception with as little as 4% [154], but there is no proof that such levels of crosstalk cause discomfort. Annoyance due to crosstalk increases with increasing disparity [164], increasing camera base distance [136,176] contrast [164], and scene content [176]. ...
... Quality impairment is sufficient to affect depth perception with as little as 4% [154], but there is no proof that such levels of crosstalk cause discomfort. Annoyance due to crosstalk increases with increasing disparity [164], increasing camera base distance [136,176] contrast [164], and scene content [176]. ...
Article
This paper reviews the causes of discomfort in viewing stereoscopic content. These include objective factors, such as misaligned images, as well as subjective factors, such as excessive disparity. Different approaches to the measurement of visual discomfort are also reviewed, in relation to the underlying physiological and psychophysical processes. The importance of understanding these issues, in the context of new display technologies, is emphasized.
... The ultimate purpose for us to introduce the light field technique into synthesizing of the mapping for lenticular 3-D displays is to reduce crosstalk. The crosstalk is a critical defect affecting the image quality in multiview lenticular 3-D display [9]. The incomplete isolation of image channels convey to the left and right eye so that the content from one channel is partly presented in another channel [10]. ...
... Each subpixel of the synthetic image combines a pair of rays from neighboring view images, the dominate view and non-dominate view. Eq. (6) can be expressed as (7) where (8) (9) and ...
Article
Full-text available
Crosstalk is a primary defect in affecting the image quality of stereoscopic three-dimensional (3-D) displays. Until now, the crosstalk reduction methods either require extra devices or need tedious calibration procedures, which require precise measurement on each display device. We propose herein a new method of synthesizing lenticular 3-D display based on the light field decomposition and optimization to minimize the crosstalk. The light field concept is introduced into lenticular 3-D display. Rays of multiview light field are back-projected to the LCD plane to form a synthetic image, with subpixel resolution. A weighted value considering all arriving rays is assigned for the subpixel to reduce crosstalk. We developed a new algorithm of ray's mergence and assignment for a smooth fusion of different views and crosstalk reduction. We also performed validation experiments which convincingly demonstrated that our new method is capable of reducing the crosstalk on synthetic graph. Compared with existing methods, our proposed new method is simple and effective, and implementation cost is low.
... Crosstalk is an important factor that affects the 3D viewing experience [27,28], it is defined as the leakage of one eye's image into the other eye [29,30]. It is decided by many factors such as image contrast, parallax, size, and so on. ...
Article
Full-text available
The mainstream light sources of display systems currently include LEDs, OLEDs, micro-LEDs, and lasers, primarily based on the three primary color systems with different color rendering abilities. A narrow-spectrum light source, such as laser, is typically used to enlarge the color gamut of a display system. Another approach is to add more primaries. In this regard, we develop a six-primary-laser projection display system compatible with 2D and 3D display, with wavelengths of 445 nm, 465 nm, 520 nm, 550 nm, 638 nm and 660 nm. We propose a simple, fast method to determine the luminance of each primary laser, by which the gamut volume can be calculated. We also propose a gamut measurement method for the six-primary-laser display system, describe the gamut boundary, and measure the gamut volume. The calculated maximum color gamut of the proposed system is 2347400, corresponding to 184.49% NTSC, while the measured color gamut is 2269900, corresponding to 178.4% NTSC. These results are in good agreement with the theoretical calculations, indicating the accuracy of the proposed analytical and experimental methods. Moreover, the time-multiplexed stereoscopic display technology and the spectral separation method would allow the development of a remarkable three-dimensional visual experience with high light efficiency and low crosstalk in full-field of view.
... Crosstalk is an important factor that affects the 3D viewing experience, it is defined as the leakage of one eye's image into the other eye [11][12] . It is decided by many factors such as image contrast, parallax, size, and so on. ...
... sound source localizability due to spatial audio [98], perceived naturalness due to fidelity manipulation [23,24,99]) and their interrelations with perceived quality might open up promising new research directions. In view of immersive visual 3D technology, for example, Avarvand et al. [100] recently proposed the visual P1 component as an indicator of vertical disparity, an important influencing factor on the perceived quality of stereoscopic images [101]. An analogous identification of neurophysiological measures in the auditory modality, with the P1-N1-P2 complex as potential candidate, could prove useful for future psychophysiological evaluation of immersive audio and speech communication technologies [102]. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 A c c e p t e d M a n u s c r i p t Methodologically, the present study offers an alternative analysis approach for ERP-based assessment and evaluation of speech transmission quality. ...
Article
Full-text available
Objective: Degradations of transmitted speech have been shown to affect perceptual and cognitive processing in human listeners, as indicated by the P3 component of the event-related brain potential (ERP). However, research suggests that previously observed P3 modulations might actually be traced back to earlier neural modulations in the time range of the P1-N1-P2 complex of the cortical auditory evoked potential (CAEP). This study investigates whether auditory sensory processing, as reflected by the P1-N1-P2 complex, is already systematically altered by speech quality degradations. Approach: Electrophysiological data from two studies were analyzed to examine effects of speech transmission quality (high-quality, noisy, bandpass-filtered) for spoken words on amplitude and latency parameters of individual P1, N1 and P2 components. Main results: In the resultant ERP waveforms, an initial P1-N1-P2 manifested at stimulus onset, while a second N1-P2 occurred within the ongoing stimulus. Bandpass-filtered versus high-quality word stimuli evoked a faster and larger initial N1 as well as a reduced initial P2, hence exhibiting effects as early as the sensory stage of auditory information processing. Significance: The results corroborate the existence of systematic quality-related modulations in the initial N1-P2, which may potentially have carried over into P3 modulations demonstrated by previous studies. In future psychophysiological speech quality assessments, rigorous control procedures are needed to ensure the validity of P3-based indication of speech transmission quality. An alternative CAEP-based assessment approach is discussed, which promises to be more efficient and less constrained than the established approach based on P3.
... However, if the hole region is too large, the quality of the resulting virtual image will be decreased. In additional, objective crosstalk metric in stereoscopic displays by method applying a 2D structural similarity (SSIM) map result and depth map to consider perceptual 3D crosstalk attributes by Xing et al. [15]. To obtain the very high-quality image, paper [16] adopts deep learning technology to remove false contour in HEVC compressed images. ...
Article
Full-text available
A high quality multi-view generation algorithm is proposed to eliminate the camera parameter limits for a sparse camera configuration. The proposed algorithm includes multi-view generation and artifact reduction. The feature points of the foreground objects in the input images are used to find the disparity function for the central view and then we use this function to determine the disparity calibration function for all virtual views to replace the camera parameters. To improve the generated virtual image quality, artifact reduction based on a modified inpainting algoritlim is proposed to reconstruct the holes and false contours. Experimental results show that the quality of the virtual view images in multi virtual view generation and in artifact reduction can be raised by 2.67dB and 4.57dB, respectively in terms of PSNR. From the experimental results, it is found that the proposed algorithm can provide multi virtual views of high quality and can be adapted to different multi-view display systems.
... In addition to these methods, people have come up with some new metrics for stereoscopic video. Xing et al. [45] proposed three perceptual attributes based on the human visual system (HVS), which are shadow degree, separation distance, and spatial position of crosstalk. In [46], a new HVS model with the phenomena of binocular suppression and recurrent excitation was proposed. ...
Article
Full-text available
Virtual reality (VR), a new type of simulation and interaction technology, has aroused widespread attention and research interest. It is necessary to evaluate the virtual reality quality and provide a standard for the rapidly developing technology. To the best of our knowledge, few researchers have built benchmark databases and designed related algorithms, which has hindered the further development of VR technology. In this paper, a free available dataset (VRQ-TJU) for virtual reality quality assessment is proposed with subjective scores for each sample data. The validity for the designed database has been proved based on the traditional multimedia quality assessment metrics. In addition, an end-to-end 3D convolutional neural networks (CNN) is introduced to predict the VR video quality without a referenced VR video. This method can extract spatiotemporal features and does not require using hand-crafted features. At the same time, a new score fusion strategy is designed, based on the characteristics of the VR video projection process. Taking the pre-processed VR video patches as input, the network captures local spatiotemporal features and gets the score of every patch. Then the new quality score fusion strategy is applied to get the final score. Such approach shows advanced performance on this database.
... Constructing stereoscopic vision is more and more critical to provide strong support to the development of the data visualization, data analysis, medical imaging applications [1]. In general, the stereoscopic vision [2] is to produce the respective images of a particular target scene corresponding to the left right eyes, in the same word, is to generate the images with a little bit of parallax in order to build up the scene depth for the 3D display. In reality, the reason for us to perceive 3D scene is the existence of the parallax between the left right eyes. ...
Article
Full-text available
This paper analyzes employing the stereoscopic vision methods to display the 3-dimension (3D) medical models through focusing on the displaying means and the principles of stereoscopic vision. Relying on the perspective projections, the target spatial object is re-projected corresponding to left and right eyes to re-generate the stereoscopic perspective projections respectively and produce the parallax.
... 3D image and video quality assessment is a more difficult and complex problem compared to its 2D counterpart. Due to different nature of acquisition, representation, transmission, and rendering of 3D images, they suffer from different types of quality artifacts [3][4][5][6]. Moreover, the additional dimension of depth maps in 3D content also introduces various quality artifacts. ...
Article
Full-text available
Three Dimensional (3D) image quality assessment is a challenging problem as compared to 2D images due to their different nature of acquisition, representation, coding, and display. The additional dimension of depth in multiview video plus depth (MVD) format is exploited to obtain images at novel intermediate viewpoints using depth image based rendering (DIBR) techniques, enabling 3D television and free-viewpoint television (FTV) applications. Depth maps introduce various quality artifacts in the DIBR-synthesized (virtual) images. In this paper, we propose a novel methodology to evaluate the quality of synthesized views in absence of the corresponding original reference views. It computes the statistical characteristics of the side views from whom the virtual view is generated, and fuses this information to estimate the statistical characteristics of the cyclopean image which are compared to those of the synthesized image to evaluate its quality. In addition to texture images, the proposed algorithm also considers the depth maps in evaluating the quality of the synthesized images. The algorithm blends two quality metrics, one estimating the texture distortion in the synthesized texture image induced by compression, transmission, 3D warping, or other causes and the second one determining the distortion of the depth maps. The two metrics are combined to obtain an overall quality assessment of the synthesized image. The proposed Synthesized Image Quality Metric (SIQM) is tested on the challenging MCL-3D and SIAT-3D datasets. The evaluation results show that the proposed metric significantly improves over state-of-the-art 3D image quality assessment algorithms.
... We have considered and detailed camera interval and coded degradation at each viewpoint. Studies have focused on the relations among crosstalk, camera baseline, and scene content in stereoscopic images 12), 13) . Generally, there has been limited discussion about the relations between camera interval and coded degradation with an 8 viewpoint lenticular lens method. ...
Article
Recently, the use of 3D video systems without glasses has increased, and therefore 3D image quality and presence evaluation is important. There are various stereo-logical image quality evaluation methods for multi-view 3D systems without glasses. However, there is no uniform method for evaluating 3D video systems. In this study, we focus on camera interval and JPEG coding degradation with a multi-view 3D system. Previously, many studies have examined camera interval or JPEG coding degradation with 3D glasses or the binocular method. In such systems, viewers perceive stereoscopic and depth effects. Moreover, they can see from different angles, increasing viewpoints with multi-view 3D systems. However, viewers feel discomfort when changing their viewpoint. Hence, we consider, in particular, the accommodation of the camera interval and JPEG coding degradation while changing viewpoints. We have performed subjective evaluations using the absolute category rating system to assess the effects of changing the camera interval of 3D CG images or video content using an 8 viewpoint lenticular lens method. We measure assessors' ability to identify the degree of the camera interval. We analyze the results of our subjective evaluations statistically and discuss the results. Using the optimal camera interval, we perform a subjective quality evaluation employing the double stimulus impairment scale to determine assessors' ability to identify JPEG coding degradation by degree. The experimental results of this subjective evaluation are also statistically analyzed.
... With the development of science and technology, image processing technology is becoming more and more mature [8,12,18]. Human visual system (HVS) [14,34] is widely applied into image processing. One of the most important properties of HVS is the contrast sensitivity [3,9] which has been widely used in 2D image processing [4,7]. ...
Article
Full-text available
In this paper, it tries to extend the characteristics of human eyes’ contrast sensitivity Function(CSF) into (3D) space, but the experimental results show that the traditional characteristics of CSF have limitations in 3D space for lack of depth information. In order to investigate the characteristics of CSF in 3D space, traditional CSF tests are further developed to measure the corresponding properties of CSF with different inclined planes, and describe the −CSF characteristics of human eyes based on the inclined angles. According to the tests, the mathematical expression of −CSF is built up. In addition, the concept of spatial frequency in the direction of depth (fD) is proposed, and fD−CSF characteristic surface is also achieved. The proposed 3D CSF has significant effects on the research of human visual characteristics and 3D image processing.
... In this research, we focused on a camera's interval and coded degradation with a multi-view glassless 3D method [3,4]. About camera's interval and coded degradation with 3D glasses, or 2 viewpoints and glassless, they were studied a number of research until now. ...
Article
Full-text available
Recently, we are able to watch 3D videos or movies increasingly without glasses. However, they are various stereological and evaluation methods for multi-view 3D with no glasses for image quality, and their display methods are not unified. In this paper, we showed 3D CG images with 8 viewpoints lenticular lens method by ACR and DSIS methods, when we analyzed the results statistically with subjective evaluation. The experiment examined whether or not assessor were able to comfortable view the images by degree of camera’s interval and viewpoints, and whether or not they perceive or annoy degree of coded degradation at certain viewpoints.
... In [10], the performances of several state-of-the-art 2D quality metrics were compared for quantification of the quality of stereo pairs formed from two synthesized views. In [11] the authors studied the perception of stereoscopic crosstalk and performed a set of subjective tests to obtain mean opinion scores (MOS) of stereoscopic videos. They attempted to predict the MOS by combination of a structural similarity index (SSIM) map and pre-filtered dense disparity map. ...
Article
Full-text available
3D video is expected to provide an enhanced user experience by using the impression of depth to bring greater realism to the user. Quality assessment plays an important role in the design and optimization of 3D video processing systems. In this paper, a new 3D image quality model that is specifically tailored for mobile 3D video is proposed. The model adopts three quality components, called the cyclopean view, binocular rivalry, and the scene geometry, in which the quality must be quantified. The cyclopean view formation process is simulated and its quality is evaluated using the three proposed approaches. Binocular rivalry is quantified over the distorted stereo pairs, and the scene quality is quantified over the disparity map. Based on the model, the 3D image quality can then be assessed using state-of-the-art 2D quality measures selected appropriately through a machine learning approach. To make the metric simple, fast, and efficient, final selection of the quality features is accomplished by also considering the computational complexity and the CPU running time. The metric is compared with several currently available 2D and 3D metrics. Experimental results show that the compound metric gives a significantly high correlation with the mean opinion scores that were collected through large-scale subjective tests run on mobile 3D video content.
... In [21], objective quality assessment of mobile 3D video for the left and right eye views was discussed. In [22]- [24], multi-view 3D quality assessment was discussed; however, for ROIs, the results were neither satisfactory nor clear. In [25], image quality assessment of 2D still images constituting monochrome pictures was studied. ...
Article
Many previous studies on image quality assessment of 3D still images or video clips have been conducted. In particular, it is important to know the region in which assessors are interested or on which they focus in images or video clips, as represented by the ROI (Region of Interest). For multi-view 3D images, it is obvious that there are a number of viewpoints; however, it is not clear whether assessors focus on objects or background regions. It is also not clear on what assessors focus depending on whether the background region is colored or gray scale. Furthermore, while case studies on coded degradation in 2D or binocular stereoscopic videos have been conducted, no such case studies on multi-view 3D videos exist, and therefore, no results are available for coded degradation according to the object or background region in multi-view 3D images. In addition, in the case where the background region is gray scale or not, it was not revealed that there were affection for gaze point environment of assessors and subjective image quality. In this study, we conducted experiments on the subjective evaluation of the assessor in the case of coded degradation by JPEG coding of the background or object or both in 3D CG images using an eight viewpoint parallax barrier method. Then, we analyzed the results statistically and classified the evaluation scores using an SVM.
... Crosstalk is a critical factor affecting the image quality in multiview three-dimensional (3D) displays [1], which is caused by the incomplete isolation of different image channels. In order to mitigate the Moiré fringe [2] and balance the horizontal versus vertical resolution [3] of 3D displays, the slanted parallax barrier or lenticular lens array is used [4]. ...
Article
Full-text available
In multiview three-dimensional (3D) displays, crosstalk is one of the most annoying artefacts degrading the quality of the 3D image. In this paper, we present a system-introduced crosstalk measurement method and derive an improved crosstalk reduction method. The proposed measurement method is applied to measure the exact crosstalk among subpixels corresponding to different view images and the obtained results are very effective for crosstalk reduction method. Furthermore, an improved crosstalk reduction method is proposed to alleviate crosstalk by searching for the optimal integral intensity values of subpixels on the synthetic image. The derived algorithm based on modified Schnorr-Euchner strategy is implemented to seek the optimal solution to this box-constrained integer least squares (BILS) problem, such that the Euclidean distance between solution and its target decreases substantially. The method we develop is applicable to both multiview 3D parallax barrier displays and multiview 3D lenticular displays. Both simulation and experimental results indicate that the derived method is capable of improving 3D image quality more effectively than the existing method on multiview 3D displays.
... The perception of degradations measured in subjective assessment studies is influenced by the viewing conditions. In stereoscopic 3DTV, selecting and calibrating the display may be more important than in 2D, as additional technological factors for the display such as maximum perceived brightness or crosstalk may have significant influence, and may be difficult to measure across subjective assessment labs [2] [3]. ...
Conference Paper
Full-text available
Subjective assessment of Quality of Experience in stereoscopic 3D requires new guidelines for the environmental setup as existing standards such as ITU-R BT.500 may no longer be appropriate. A first step is to perform cross-lab experiments in different viewing conditions on the same video sequences. Three international labs performed Absolute Category Rating studies on a freely available video database containing degradations that are mainly related to video quality degradations. Different conditions have been used in the labs: Passive polarized displays, active shutter displays, differences in viewing distance, the number of parallel viewers, and the voting device. Implicit variations were introduced due to the three different languages in Sweden, South Korea, and France. Although the obtained Mean Opinion Scores are comparable, slight differences occur in function of the video degradations and the viewing distance. An analysis on the statistical differences obtained between the MOS of the video sequences revealed that obtaining an equivalent number of differences may require more observers in some viewing conditions. It was also seen that the alignment of the meaning of the attributes used in Absolute Category Rating in different languages may be beneficial. Statistical analysis was performed showing influence of the viewing distance on votes and MOS results.
Preprint
Full-text available
Recently, 3D light field (LF) display technology is increasingly being applied for commercial products, as 3D representations are widely used in various fields such as game and metaverse applications. Inter-view crosstalk, which is caused by one or more unintended views being added to the intended view, is a major factor that induces image quality degradation and visual discomfort in 3D LF displays. In this paper, we propose an algorithm that provides optimal crosstalk reduction while preventing color over-shoot artifact that is common in conventional methods. The proposed algorithm applies an optimized crosstalk reduction filter, determined adaptively for every pixel in the image, based on the correlation between neighboring LF views. The experimental results using a LF display prototype show that the proposed method can reduce crosstalk effectively without causing color over-shoot artifacts and reduces crosstalk by 20.4% compared to the previous method.
Preprint
Full-text available
Recently, 3D light field (LF) display technology is increasingly being applied for commercial products, as 3D representations are widely used in various fields such as game and metaverse applications. Inter-view crosstalk, which is caused by one or more unintended views being added to the intended view, is a major factor that induces image quality degradation and visual discomfort in 3D LF displays. In this paper, we propose an algorithm that provides optimal crosstalk reduction while preventing color over-shoot artifact that is common in conventional methods. The proposed algorithm applies an optimized crosstalk reduction filter, determined adaptively for every pixel in the image, based on the correlation between neighboring LF views. The experimental results using a LF display prototype show that the proposed method can reduce crosstalk effectively without causing color over-shoot artifacts and reduces crosstalk by 20.4% compared to the previous method.
Article
Considering that the human brain always follows a coarse-to-fine (low-to-high spatial frequency) visual processing and fusion mechanism, we propose a coarse-to-fine feedback guidance based stereo image quality assessment (SIQA) network which considers a coarse-to-fine feedback guidance and adaptive dominant eye mechanism. The proposed network consists of two main sub-network streams, each of which has three branches to extract low, middle and high spatial frequency information in parallel. To better realize the guidance of the high-level features in the low spatial frequency branch to the low-level features in the high spatial frequency branch, an information feedback guidance module (IFGM) is proposed, which realizes a top-down guidance mechanism in each sub-network stream. Simultaneously, according to the theory of ocular dominance in human visual system (HVS), we design an adaptive bi-directional parallax-based binocular fusion module (BPBFM), which synthesizes two types of fusion feature by taking the left and right view features as dominant eye input. Furthermore, in order to obtain the better perceptual quality of stereo images, we design a weighted fusion strategy to weigh the quality scores from the two types of fusion features obtained by using an ensemble model with two multi-layer perceptrons (MLPs). The experimental results on four public stereo image datasets show that the proposed method is superior to the mainstream metrics and achieves an excellent performance.
Article
Full-text available
A switchable autostereoscopic 3-dimensional (3D) display device with wide color gamut is introduced in this paper. In conjunction with a novel directional quantum-dot (QD) backlight, the precise scanning control strategy, and the eye-tracking system, this spatial-sequential solution enables our autostereoscopic display to combine all the advantages of full resolution, wide color gamut, low crosstalk, and switchable 2D/3D. And also, we fabricated an autostereoscopic display prototype and demonstrated its performances effectively. The results indicate that our system can both break the limitation of viewing position and provide high-quality 3D images. We present two working modes in this system. In the spatial-sequential mode, the crosstalk is about 6%. In the time-multiplexed mode, the viewer should wear auxiliary and the crosstalk is about 1%, just next to that of a commercial 3D display (BENQ XL2707-B and View Sonic VX2268WM). Additionally, our system is also completely compatible with active shutter glasses and its 3D resolution is same as its 2D resolution. Because of the excellent properties of the QD material, the color gamut can be widely extended to 77.98% according to the ITU-R recommendation BT.2020 (Rec.2020).
Article
The perceptual quality of stereoscopic images plays an essential role in the human perception of visual information. However, most available stereoscopic image quality assessment (SIQA) methods evaluate 3D visual experience using hand-crafted features or shallow architectures, which cannot model the visual properties of stereo images well. In this paper, we use convolutional neural networks (CNNs) to learn deeper local quality-aware structures for stereo images. With different inputs, two CNN models are designed for no-reference SIQA tasks. The one-column CNN model directly accepts a cyclopean view as the input, and the three-column CNN model jointly considers the cyclopean, left and right views as CNN inputs. The two SIQA frameworks share the same implementation approach: First, to overcome the obstacle of limited SIQA datasets, we accept image patches that have been cropped from corresponding stereopairs as inputs for local quality-sensitive feature extraction. Next, a local feature selection algorithm is used to remove related features on non-salient patches, which could cause large prediction errors. Finally, the reserved local visual structures of salient regions are aggregated into a final quality score in an end-to-end manner. Experimental results on three public SIQA databases demonstrate that our method outperforms most state-of-the-art no-reference (NR) SIQA methods. The results of a cross-database experiment also show the robustness and generality of the proposed method.
Article
Stereoscopic subtitle insertion is a fundamental and essential element in stereoscopic film and TV industry. However, little work has been dedicated to the optimal region selection for stereoscopic subtitle insertion. In addition, there is no public database reported for the performance evaluation of it. In this work, we build the first large-scale video database (TJU3D) for stereoscopic video subtitle insertion, which includes 50 video sequences with rich screen scenes. Compared with 2D subtitle region selection, there are several problems we have to consider in stereoscopic subtitle region selection: 1) the subtitle should avoid depth cue collision and occlusion from objects in stereoscopic video sequences; 2) the disparity value of the subtitle must be minimized to reduce visual discomfort; 3) the temporal coherence constraint must be considered during region selection for subtitles in video sequences. By considering these constraints, we propose an optimal region selection algorithm for stereoscopic subtitle insertion. First, we compute the disparity map of each video frame in video sequences. For each frame, the optimal position and disparity value of the subtitle are determined by a subtitle region selection algorithm, which contains two parts (i.e., the coarse selection and fine selection). After that, by considering the temporal consistency between adjacent frames, the position and disparity value of each frame are further classified and processed in order to avoid the subtitle jitter. We evaluate the proposed method on TJU3D video database through two visual discomfort prediction metrics and one subjective experiment. To further verify the effectiveness of the proposed method, we also validate the performance of the proposed method on video comfort assessment database, i.e., IEEE-SA Stereo database. Experimental results demonstrate that the visual discomfort is greatly reduced when using the proposed method compared with the basic method.
Article
Crosstalk is one of the most severe factors affecting the perceived quality of stereoscopic 3D (S3D) images. It arises from a leakage of light intensity between multiple views, as in auto-stereoscopic displays. Well-known determinants of crosstalk include the co-location contrast and disparity of the left and right images, which have been dealt with in prior studies. However, when a natural stereo image that contains complex naturalistic spatial characteristics is viewed on an auto-stereoscopic display, other factors may also play an important role in the perception of crosstalk. Here, we describe a new way of predicting the perceived severity of crosstalk, which we call the Binocular Perceptual Crosstalk Predictor (BPCP). BPCP uses measurements of three complementary 3D image properties (texture, structural duplication and binocular summation) in combination with two well-known factors (co-location contrast and disparity) to make predictions of crosstalk on two-view auto-stereoscopic displays. The new BPCP model includes two masking algorithms and a binocular pooling method. We explore a new masking phenomenon that we call duplicated structure masking, which arises from structural correlations between the original and distorted objects. We also utilize an advanced binocular summation model to develop a binocular pooling algorithm. Our experimental results indicate that BPCP achieves high correlations against subjective test results, improving upon those delivered by previous crosstalk prediction models.
Article
In this paper we investigate how visualization factors, such as disparity, mobility, angular resolution and viewpoint interpolation, influence the Quality of Experience (QoE) in a stereoscopic multiview environment. In order to do so, we set up a dedicated testing room and conducted subjective experiments. We also developed a framework that emulates a super-multiview environment. This framework can be used to investigate and assess the effects of angular resolution and viewpoint interpolation on the quality of experience produced by multiview systems, and provide relevant cues as to how the baselines of cameras and interpolation strategies in such systems affect user experience. Aspects such as visual comfort, model fluidity, sense of immersion and the 3D experience as a whole have been assessed for several test cases. Obtained results suggest that user experience in an motion parallax environment is not as critically influenced by configuration parameters such as disparity as initially thought. In addition, extensive subjective tests have indicated that while users are very sensitive to angular resolution in multiview 3D systems, this sensitivity tends not to be as critical when a user is performing a task that involves a great amount of interaction with the multiview content. These tests have also indicated that interpolating intermediate viewpoints can be effective in reducing the required view density without degrading the perceived QoE.
Article
Crosstalk is a critical defect affecting image quality in multiview lenticular 3D displays. Existing optimization methods require tedious computations and device-specific optical measurements, and results are often suboptimal. We propose a new method, on the basis of light field acquisition and optimization, for crosstalk reduction in super multiview displays. Theory and algorithms were developed, and experimental validation results showed superior performance.
Article
Perceptual quality prediction for stereoscopic images is of fundamental importance in determining the level of quality perceived by humans in terms of the 3D viewing experience. However, the existing no-reference quality assessment (NR-IQA) framework has its limitation in addressing binocular combination for stereoscopic images. In this paper, we propose a new NR-IQA for stereoscopic images using joint sparse representation. We analyze the relationship between left and right quality predictors, and formulate stereoscopic quality prediction as a combination of feature-prior and feature-distribution. Based on this finding, we extract feature vector that handles different features to be interacted by joint sparse representation, and use support vector regression to characterize feature-prior. Meanwhile, we implement feature-distribution using sparsity regularization as the basis of weights for binocular combination to derive the overall quality score. Experimental results on five public 3D IQA databases demonstrate that in comparison with the existing methods, the devised algorithm achieves high consistent alignment with subjective assessment.
Article
Three-dimensional display technologies based on lenticular sheet overlaid onto spatial light modulator screen have been studied for decades. However, the quality of these displays still suffers from insufficient number of views and zone-jumping between views. We present herein a subpixel multiplexing method in this paper. We propose to split mapping and alignment into two separate tasks, processed in parallel threads. Alignment thread deals with the task of computing the geometrical relationship between lenticular sheet and Liquid Crystal Display (LCD) panel for multiplexing. Afterwards, we conduct the multiplexing procedure through a box-constrained integer least squares algorithm. After multiplexing, each subpixel aggregated on the lenticular sheet is a multiplexing one that mixes up a number of subpixels in local region on the LCD plane. As a result, we multiplex subpixels on the synthetic image up to 27 views with a resolution of 1080 × 1920 and the rendering speed is 73.34 frames per second (fps).
Article
3D video technologies have been widely adopted by video service providers and consumer electronics stakeholders due to their potential of offering an immersive user experience. In case of 3D video streaming, the dynamic network conditions are the bottleneck that limits the content delivery at good perceived quality levels and an effective solution is to employ advanced 3D video adaptation schemes. Accurate real-time objective 3D video quality assessment is a critical factor in adaptive decision making. State-of-the-art objective 3D video quality assessment methods are in general reference-based and require the availability of the original 3D video sequence, which makes them not suitable for real-time applications. This paper proposes the extended no reference objective video quality metric (eNVQM), an innovative metric for real-time 3D video quality assessment. eNVQM estimates the 3D video quality by taking as the input parameters network packet loss, video transmission bitrate, and frame rate. Based on extensive subjective tests, eNVQM models the impact of network packet loss on 3D video at different bitrates and frame rates on the perceived stereoscopic 3D video quality. The performance of eNVQM is investigated by comparing its results with two state-of-the-art objective video quality metrics: 1) structural similarity index and 2) video quality metric. Results show that eNVQM maintains similar accuracy level in estimating 3D video quality with the alternative reference-based metrics.
Article
Perceptual crosstalk prediction for autostereoscopic 3D displays is of fundamental importance in determining the level of quality perceived by humans in terms of the display performance and the 3D viewing experience. However, no robust framework exists to quantify perceptual crosstalk while taking into account the hardware structure of a display as well as its content characteristics via content analysis. In this paper, we present a 3D Perceptual Crosstalk Predictor (3D-PCP) that can be used to predict crosstalk in a unique way when viewing autostereoscopic 3D displays. 3D-PCP captures hardware features using an Optical Fourier transform - Light Measurement Device and content features through content analysis based on information theory. By deriving the disparity, luminance, color, and texture maps, this approach defines the visual entropy, mutual information, and relative entropy in order to investigate the influences of the 3D scene characteristics on perceptual crosstalk. The experimental results demonstrate that the 3D-PCP output is highly correlated with subjective scores.
Article
With the prosperity of stereoscopic industry, the disparity-based stereoscopic technology becomes more and more popular. But this technology may bring viewing discomfort and various image distortions such as depth plane curvature, depth non-linearity and so on. To analyze the origins, characteristics and relation s of these perceptual issues, we investigated the stereoscopic distortion model and the comfortable viewing zone. This paper first summarized and compared three different configurations of the disparity-based stereoscopic video capturing systems, and focused on the parallel configuration system. The geometry of stereoscopic camera and display systems was presented. The mapping relationship, known as the distortion model, between the camera space and the viewing space was discussed in the parallel-shifting stereoscopic video systems. This distortion model is the basis of the stereoscopic information processing. The shape distortion factor and the depth factor were employed to explain the puppet theater effect and the cardboard effect. The comfortable viewing zone was considered to reduce the problem of visual discomfort and visual fatigue. Its three different representations we re compared both qualitatively and quantitatively. At last, the distortion model and the comfortable viewing zone were combined to draw a conclusion of the stereo scopic shooting rules.
Article
A new metric is proposed to predict the perceived crosstalk using crosstalk level and original images rather than using the original and ghosted images. Our metric fully utilizes the color information of original image. We first extracted disparity map, color difference map and color contrast map from original stereo images. Then these information together constructs new metrics, which have high correlations with the perceived subject crosstalk scores. The prediction performance of these metrics are evaluated using various types of stereoscopic crosstalk images. Our approach by Pearson correlation and Spearman correlation achieves better evaluation performance than previous methods, which indicates that color information is one considerable factor to achieve high accuracy of crosstalk visible prediction. The experimental results demonstrate that the proposed metrics perform better than the previous ones.
Article
Due to the rapid advancements in 3D video technologies, 3D quality of experience (3DQoE) assessment for 3D video networking service attracts the researchers' attention in both academic and industry world. The contribution factors of 3DQoE in the end-to-end processing chain are analyzed. The 3D video/image quality based 3DDoE evaluation methodologies and the current research progresses in 3DQoE modeling are reviewed. Based on the analysis of the current research status of 3DQoE assessment and modeling, we point out the inadequacies in accuracy, reliability and universality of the current 3DQoE assessment and modeling research, and indicate that 3DQoE model towards massive networking service is possibly the future research direction. ©, 2015, Chinese Institute of Electronics. All right reserved.
Article
Crosstalk, which is the incomplete separation between the left and right views in 3D displays, induces ghosting and causes difficulty of the eyes to fuse the stereo image for depth perception. Circularly polarized (CP) liquid crystal display (LCD) is one of the main-stream consumer 3D displays with the prospering of 3D movies and gamings. The polarizing system including the patterned retarder (PR) is one of the major causes of crosstalk in CP LCD. The contributions of this paper are the modeling of the polarizing system of CP LCD and a crosstalk reduction method that efficiently cancels crosstalk and preserves image contrast. For the modeling, practical orientation of the polarized glasses (PG) is considered. In addition, this paper calculates the rotation of the light-propagation coordinate for the Stokes vector as light propagates from LCD to PG, and this calculation is missing in previous works when applying Mueller calculus. The proposed crosstalk reduction method is formulated as a linear programming problem which can be easily solved. In addition, we propose excluding the highly textured areas in the input images to further preserve image contrast in crosstalk reduction.
Book
Quality of experience in 3D media requires new and innovative concepts for subjective assessment methodologies. Capturing the observer's opinion may be achieved by providing multiple voting scales, such as 2D image quality, depth quantity, and visual comfort. Pooling these different scales to achieve a single quality percept may be performed differently by each human observer. The chapter dives into the complexity of this subject by explaining the QoE concept using 3DTV as an example. It explains the meaning of the different scales, the current approaches to assess each of them, and the individual influence factors related to the voting which affects reproducibility of the obtained results. Methodologies for assessing the overall preference of experience using pair comparisons with a reasonable number of stimuli are provided. The viewers may also create their own attributes for evaluation in the Open Profiling methodology which has been recently adapted for 3DTV. The drawback of all these assessment methods is that they are intrusive in the sense that the assessor needs to concentrate on the task at hand. Medical and psychophysical measurement methods, such as EEG, EOG, EMG, and fMRI, may eliminate this drawback and are introduced with respect to the different QoE influence factors. Their value at this early stage of development is mostly in supporting and partly predicting subjectively perceived and annotated QoE. The chapter closes with a brief review of the most important technical constraints that impact on the capture, transmission, and display of 3DTV signals. © 2014 Springer Science+Business Media New York. All rights are reserved.
Article
Stereo image quality assessment (SIQA) is a key issue of stereo image processing. Image pixels have strong correlation and highly structured features, according to that an image quality mainly depends on the structure information distortion of the image, an objective stereo image quality assessment (OSIQA) model based on matrix decomposition is proposed. Firstly, the concavity and convexity maps of image are extracted through Hessian matrix decomposition, which reflects complexity of image, and the left-right image quality assessment (LR-IQA) value is gained by judging loss severity of concavity and convexity map, which is adopting singular value decomposition in the left and right images. Secondly, eigenvalues and eigenvectors of the absolute difference map that is the absolute differential value between the left image and right image in stereo image are extracted. Eigenvalues can reflect image energy of some directions, and eigenvectors can reflect the directionality of image. Depth perception quality assessment (DP-QA) value is gained by calculating the degree of the structure distortion under the edge and non-edge regions. Finally, OSIQA value is obtained through nonlinearly fitting of LR-IQA value and DP-QA value. Experimental results show that the proposed OSIQA model have a good consistency with subjective perception. The correlation coefficient and spearman rank order correlation coefficient between OSIQA model and subjective perception are more than 0.92, and rooted mean squared error is lower than 6.5. Index Terms—Stereo image quality assessment, left-right image quality assessment, depth perception quality assessment, Hessian matrix Stereo image quality assessment (SIQA) is a key issue of stereo image processing. Image pixels have strong correlation and highly structured features, according to that an image quality mainly depends on the structure information distortion of the image, an objective stereo image quality assessment (OSIQA) model based on matrix decomposition is proposed. Firstly, the concavity and convexity maps of image are extracted through Hessian matrix decomposition, which reflects complexity of image, and the left-right image quality assessment (LR-IQA) value is gained by judging loss severity of concavity and convexity map, which is adopting singular value decomposition in the left and right images. Secondly, eigenvalues and eigenvectors of the absolute difference map that is the absolute differential value between the left image and right image in stereo image are extracted. Eigenvalues can reflect image energy of some directions, and eigenvectors can reflect the directionality of image. Depth perception quality assessment (DP-QA) value is gained by calculating the degree of the structure distortion under the edge and non-edge regions. Finally, OSIQA value is obtained through nonlinearly fitting of LR-IQA value and DP-QA value. Experimental results show that the proposed OSIQA model have a good consistency with subjective perception. The correlation coefficient and spearman rank order correlation coefficient between OSIQA model and subjective perception are more than 0.92, and rooted mean squared error is lower than 6.5.
Article
In order to establish a stereoscopic image quality assessment method which is consistent with human visual perception, we propose an objective stereoscopic image quality assessment method. It takes into account the strong correlation and high degree of structural between pixels of image. This method contains two models. One is the quality synthetic assessment of left-right view images, which is based on human visual characteristics, we use the Singular Value Decomposition (SVD) that can represent the degree of the distortion, and combine the qualities of left and right images by the characteristics of binocular superposition. The other model is stereoscopic perception quality assessment, due to strong stability of image’s singular value characteristics, we calculate the distance of the singular values and structural characteristic similarity of the absolute difference maps, and utilize the statistical value of the global error to evaluate stereoscopic perception. Finally, we combine two models to describe the stereoscopic image quality. Experimental results show that the correlation coefficients of the proposed assessment method and the human subjective perception are above 0.93, and the mean square errors are all less than 6.2, under JPEG, JP2K compression, Gaussian blurring, Gaussian white noise, H.264 coding distortion, and hybrid cross distortion. It indicates that the proposed stereoscopic objective method is consistent with human visual properties and also of availability.
Conference Paper
At present, stereo image quality assessment has a great concern due to the rapid development of stereo image applications. In this paper, through analyzing multi-channel characteristics of human visual system and image's singular value characteristic, a reduced-reference objective stereo image quality assessment (RR-OSIQA) method is presented. First, the differences of singular value among sub-bands in wavelet domain are calculated to obtain the left-right image quality assessment (LR-IQA) index after multi-channel decomposition. Then, a concept of absolute disparity image is defined. Alterations of singular value in sub-bands are regarded to gain the stereo perception image quality assessment (SP-IQA) index. The features, derived from singular value decomposition of LR-IQA and SP-IQA, are used as indicators for the RR method. Finally, according to the weight of LR-IQA and SP-IQA indices, RR-OSIQA is obtained. Experimental results show that correlation coefficients of the assessment versus the subjective perception are satisfactory over five types of distortions: JPEG, JPEG2000, Gaussian blur, White noise, and H.264 distortions.
Article
This paper proposed a new image enhancement algorithm based on edge sharpening of wavelet coefficients for stereoscopic images. Our scheme uses the multi-scale characteristic of wavelet transform, decomposes the original image into low frequency approximation sub-graph and several high frequency direction. Under the multi-scale, the low frequency approximation sub-graph is processed by edge sharpening method. Then the low frequency sub-graph decomposes in multi-scale again. At last, the low frequency approximation graph after four layers decompose sharpening and the high frequency approximation of the decomposed sub-graph will be refactored to get the new image. Experimental results show that whether PSNR or visual effect, or the subjective assessment of the DMOS value, the proposed method has better enhanced performance than the conventional edge sharpening and wavelet transform. And it has good image edge enhancement, details protection. Meanwhile, the proposed algorithm has the same computational complexity with wavelet transform.
Article
In this paper, a novel reduced-reference stereoscopic image quality assessment (RR-SIQA) algorithm is proposed by means of an unconventional use of watermarking technique. Watermarking techniques are usually employed for authenticity verification and copyright protection. Here, watermarking technique is adopted to provide a new approach for RR-SIQA. Firstly, the features of image are extracted in reorganized discrete cosine transform domain, and then embedded into the stereoscopic image as invisible hidden information. In order to improve the reliability of the watermarking, some channel coding techniques are applied before the process of embedding watermark. At the receiver, the watermark can be decoded and used to measure the quality of the distorted stereoscopic image. The proposed algorithm overcomes the limitations of other existing methods that require an auxiliary channel. Experimental results illustrate that the proposed algorithm has a good consistency with subjective quality scores, and can reflect the visual perception of stereoscopic image effectively.
Presentation
Recently, we are able to watch 3D videos or movies increasingly without glasses. However, they are various stereological and evaluation methods for multi-view 3D with no glasses for image quality, and their display methods are not unified. In this paper, we showed 3D CG images with 8 viewpoints lenticular lens method by ACR and DSIS methods, when we analyzed the results statistically with subjective evaluation. The experiment examined whether or not assessor were able to comfortable view the images by degree of camera’s interval and viewpoints, and whether or not they perceive or annoy degree of coded degradation at certain viewpoints.
Article
Full-text available
Several metrics have been proposed in literature to assess the perceptual quality of two-dimensional images. However, no similar effort has been devoted to quality assessment of stereoscopic images. Therefore, in this paper, we review the different issues related to 3D visualization, and we propose a quality metric for the assessment of stereopairs using the fusion of 2D quality metrics and of the depth information. The proposed metric is evaluated using the SAMVIQ methodology for subjective assessment. Specifically, distortions deriving from coding are taken into account and the quality degradation of the stereopair is estimated by means of subjective tests.
Article
Full-text available
We identify, categorize and simulate artifacts which might occur during delivery of mobile stereoscopic video. We consider the stages of D video delivery dataflow: content creation, conversion to the desired format (multiview or dense-depth 3D video), coding/decoding, transmission, and visualization on 3D display. The D vision of humans works by assessing various depth cues - accommodation, binocular depth cues, pictorial cues and motion parallax. As a consequence any artifact which modifies these cues will impair the quality of a 3D scene. The perceptibility of each artifact can be estimated through subjective tests. The material for such tests should exhibit various artifacts with different amounts of impairment. We present a system for simulation of such artifacts. The artifacts are organized in groups with similar origins, and each group is simulated by a block in a simulation channel. The channel consists of the following blocks: simulation of sensor limitations, simulation of geometric distortions as the ones caused by the camera optics, spatial and temporal misalignments between video channels, spatial and temporal artifacts caused by coding, transmission losses, and visualization artifacts. For the case of dense depth video representation, format conversion artifacts are added.
Article
Full-text available
Visual discomfort has been the subject of considerable research in relation to stereoscopic and autostereoscopic displays. In this paper, the importance of various causes and aspects of visual discomfort is clarified. When disparity values do not surpass a limit of 1 °, which still provides sufficient range to allow satisfactory depth perception in stereoscopic television, classical determinants such as excessive binocular parallax and accommodation-vergence conflict appear to be of minor importance. Visual discomfort, however, may still occur within this limit and we believe the following factors to be the most pertinent in contributing to this: (1) temporally changing demand of accommodation-vergence linkage, e.g., by fast motion in depth; (2) three-dimensional artifacts resulting from insufficient depth information in the incoming data signal yielding spatial and temporal inconsistencies; and (3) unnatural blur. In order to adequately characterize and understand visual discomfort, multiple types of measurements, both objective and subjective, are required.
Article
Full-text available
We are interested in metrics for automatically predicting the compression settings for stereoscopic images so that we can minimize file size, but still maintain an acceptable level of image quality. Initially we investigate how Peak Signal to Noise Ratio (PSNR) measures the quality of varyingly coded stereoscopic image pairs. Our results suggest that symmetric, as opposed to asymmetric stereo image compression, will produce significantly better results. However, PSNR measures of image quality are widely criticized for correlating poorly with perceived visual quality. We therefore consider computational models of the Human Visual System (HVS) and describe the design and implementation of a new stereoscopic image quality metric. This point matches regions of high spatial frequency between the left and right views of the stereo pair and accounts for HVS sensitivity to contrast and luminance changes in regions of high spatial frequency, based on Michelson's Formula and Peli's Band Limited Contrast Algorithm. To establish a baseline for comparing our new metric with PSNR we ran a trial measuring stereoscopic image encoding quality with human subjects, using the Double Stimulus Continuous Quality Scale (DSCQS) from the ITU-R BT.500-11 recommendation. The results suggest that our new metric is a better predictor of human image quality preference than PSNR and could be used to predict a threshold compression level for stereoscopic image pairs.
Article
Full-text available
In this paper, we propose a depth map quality metric for three-dimensional videos which include stereoscopic videos and autostereoscopic videos. Recently, a number of researches have been done to figure out the relationship of perceptual quality and video impairment caused by various compression methods. However, we consider non-compression issues which are induced during acquisition and displaying. For instance, using multiple cameras structure may cause impairment such as misalignment. We demonstrate that the depth map can be a useful tool to find out the implied impairments. The proposed quality metrics using depth map are depth range, vertical misalignment, temporal consistency. The depth map is acquired by solving corresponding problems from stereoscopic video, widely known as disparity estimation. After disparity estimation, the proposed metrics are calculated and integrated into one value which indicates estimated visual fatigue based on the results of subjective assessment. We measure the correlation between objective quality metrics and subjective quality results to validate our metrics.
Article
Full-text available
Imperfections in binocular image pairs can cause serious viewing discomfort. For example, in stereo vision systems eye strain is caused by unintentional mismatches between the left and right eye images (stereo imperfections). Head-mounted displays can induce eye strain due to optical misalignments. We have experimentally determined the level of (dis)comfort experienced by human observers viewing brief presentations of imperfect binocular image pairs. We used a wide range of binocular image imperfections that are representative for commonly encountered optical errors (spatial distortions: shifts, magnification, rotation, keystone), imperfect filters (photometric asymmetries: luminance, color, contrast, crosstalk), and stereoscopic disparities. The results show that nearly all binocular image asymmetries seriously reduce visual comfort if present in a large enough amount. From our data we estimate threshold values for the onset of discomfort. The database collected in this study allows a more accurate prediction of visual comfort from the specification of a given binocular viewing system. Being able to predict the level of visual discomfort from the specification of binocular viewing systems greatly helps the design and selection process. This paper provides the basis.
Article
Full-text available
Visual discomfort has been the subject of considerable research in relation to stereoscopic and autostereoscopic displays, but remains an ambiguous concept used to denote a variety of subjective symptoms potentially related to different underlying processes. In this paper we clarify the importance of various causes and aspects of visual comfort. Classical causative factors such as excessive binocular parallax and accommodation-convergence conflict appear to be of minor importance when disparity values do not surpass one degree limit of visual angle, which still provides sufficient range to allow for satisfactory depth perception in consumer applications, such as stereoscopic television. Visual discomfort, however, may still occur within this limit and we believe the following factors to be the most pertinent in contributing to this: (1) excessive demand of accommodation-convergence linkage, e.g., by fast motion in depth, viewed at short distances, (2) 3D artefacts resulting from insufficient depth information in the incoming data signal yielding spatial and temporal inconsistencies, and (3) unnatural amounts of blur. In order to adequately characterize and understand visual discomfort, multiple types of measurements, both objective and subjective, are needed.
Article
Full-text available
In this paper, the new challenges of 3DTV for subjective assessment are discussed. Conventional 2D methods have severe limitations which will be revealed. Based on the understanding of the new characteristics brought by 3DTV, changes and additions in the requirements for subjective assessment are proposed in order to develop a novel subjective video quality assessment methodology for 3DTV. In particular, depth rendering for 3D display is selected to give a further discussion. The depth rendering abilities are defined as a combination of the physical parameters and the perceptual constrains. We analyze different types of stereoscopic and multiview displays. Several problems regarding depth rendering are discussed in order to highlight the diversity and complexity of assessing 3DTV.
Article
Full-text available
While objective and subjective quality assessment of 2D images and video have been an active research topic in the recent years, emerging 3D technologies require new quality metrics and methodologies taking into account the fundamental differences in the human visual perception and typical distortions of stereoscopic content. Therefore, this paper presents a comprehensive stereoscopic video database that contains a large variety of scenes captured using a stereoscopic camera setup consisting of two HD camcorders with different capture parameters. In addition to the video, the database also provides subjective quality scores obtained using a tailored single stimulus continuous quality scale (SSCQS) method. The resulting mean opinion scores can be used to evaluate the performance of visual quality metrics as well as for the comparison and for the design of new metrics.
Conference Paper
Full-text available
The two-dimensional quality metric peak-signal-to-noise-ratio (PSNR) is often used to evaluate the quality of coding schemes for integral imaging (II) based 3D-images. The PSNR may be applied to the full II resulting in single accumulate quality metric covering all possible views. Alternatively, it may be applied to each view results in a metric depending on viewing angle. However, both of these approaches fail to capture a coding scheme's distribution of artifacts at different depths within the 3D-image. In this paper we propose a metric that determines the 3D-image quality at different depths. First we introduce this ID measure, and the operations that it is based on, followed by the experimental setup used to evaluate it. Finally, the metric is evaluated on a set of 3D-images; each coded using four different coding schemes and compared with visual inspection of the introduced coding distortion. The results indicate a good correlation with the coding artifacts and their distribution over different depths.
Article
Full-text available
Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/∼lcv/ssim/.
Article
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.
Conference Paper
Two factors contributing to "ghosting" (image doubling) in plano-stereoscopic CRT displays are phosphor decay and dynamic range of the shutters. A ghosting threshold must be crossed before comfortable fusion can take place. The ghosting threshold changes as image brightness increases and with higher-contrast subjects and those with larger parallax values. Because of the defects of existing liquid crystal shutters, we developed a liquid-crystal shutter with high dynamic range, good transmission, and high speed. With these shutters, residual ghosting is a result of phosphor persistence.
Conference Paper
The ghost-image issue induced by crosstalk in stereoscopic, especially autostereoscopic, display systems has been believed to be a major factor to jeopardize stereopsis. Nevertheless, it is found that in some cases the stereopsis remains effective even with serious crosstalk. In fact, many other factors, such as contrast ratio, disparity, and monocular cues of the images, play important roles in the fusion of stereo images. In this paper, we study the factors in an image that may affect stereo fusion, and provide a macroscopic point of view to get a reasonable criterion of system crosstalk. Both natural and computer-generated images are used for detailed evaluation. Image processing techniques are adopted to produce desired characteristics. The results of this research shall be of reference value to content makers of stereoscopic displays, in addition to their designers.
Article
While the causes and nature of crosstalk, as well as crosstalk reduction techniques have been extensively studied, it is still difficult to eliminate. Perceptually, crosstalk is one of the most annoying distortions in the visualization stage of stereoscopic imaging. Therefore, to understand how users perceive crosstalk is of fundamental importance to improve the quality of 3D presentations. In this paper, we aim at analyzing the impact of crosstalk level, camera baseline and scene content on users' perception of crosstalk. Extensive subjective tests are conducted and the opinion scores are statistically analyzed and discussed. The results indicate that crosstalk level, camera baseline, as well as scene content all have major impacts on the perception of crosstalk. We also show that these three factors correlate with each other in terms of impact on the crosstalk perception. Furthermore, we propose a content descriptor for crosstalk perception (CDCP) and show its effectiveness.
Article
Nowadays, crosstalk is probably one of the most annoying distortions in 3D displays. So far, display designers still have a relative lack of knowledge about the relevant subjective attributes of crosstalk and how they are combined in an overall 3D viewing experience model. The aim of the current experiment is to investigate three perceptually important attributes influencing the overall viewing experience: perceived image distortion, perceived depth, and visual strain. The stimulus material used in this experiment consisted of two natural scenes varying in depth (0, 4, and 12cm camera base distance) and crosstalk level (0, 5, 10, and 15%). Subjects rated the attributes according to the ITU BT.500–10 in a controlled experiment. Results show that image distortion ratings show a clear increase with increasing crosstalk and increasing camera base distance. Especially higher crosstalk levels are more visible at larger camera base distances. Ratings of visual strain and perceived depth only increase with increasing camera base distance and remain constant with increasing crosstalk (at least until 15% crosstalk).
Conference Paper
Crosstalk is a critical factor determining the image quality of stereoscopic displays. Also known as ghosting or leakage, high levels of crosstalk can make stereoscopic images hard to fuse and lack fidelity; hence it is important to achieve low levels of crosstalk in the development of high-quality stereoscopic displays. In the wider academic literature, the terms crosstalk, ghosting and leakage are often used interchangeably but it would be helpful to have unambiguous descriptive and mathematical definitions of these terms. The paper reviews a wide range of mechanisms by which crosstalk occurs in various stereoscopic displays, including: time-sequential on PDPs and CRTs (phosphor afterglow, shutter timing, shutter efficiency), MicroPol LCDs (polarization quality, viewing angle), time-sequential on LCDs (pixel response rate, update method, shutter timing & efficiency), autostereoscopic (inter-zone crosstalk), polarised projection (quality of polarisers and screens), anaglyph (spectral quality of glasses and displays). Crosstalk reduction and crosstalk cancellation are also discussed along with methods of measuring and characterising crosstalk.
Chapter
Three-dimensional television (3DTV) technology is becoming increasingly popular, as it can provide high quality and immersive experience to end users. Stereoscopic imaging is a technique capable of recoding 3D visual information or creating the illusion of depth. Most 3D compression schemes are developed for stereoscopic images including applying traditional two-dimensional (2D) compression techniques, and considering theories of binocular suppression as well. The compressed stereoscopic content is delivered to customers through communication channels. However, both compression and transmission errors may degrade the quality of stereoscopic images. Subjective quality assessment is the most accurate way to evaluate the quality of visual presentations in either 2D or 3D modality, even though it is time-consuming. This chapter will offer an introduction to related issues in perceptual quality assessment for stereoscopic images. Our results are a subjective quality experiment on stereoscopic images and focusing on four typical distortion types including Gaussian blurring, JPEG compression, JPEG2000 compression, and white noise. Furthermore, although many 2D image quality metrics have been proposed that work well on 2D images, developing quality metrics for 3D visual content is almost an unexplored issue. Therefore, this chapter will further introduce some well-known 2D image quality metrics and investigate their capabilities in stereoscopic image quality assessments. As an important attribute of stereoscopic images, disparity refers to the difference in image location of an object seen by the left and right eyes, which has a significant impact on the stereoscopic image quality assessment. Thus, a study on an integration of the disparity information in quality assessment is presented. The experimental results demonstrated that better performance can be achieved if the disparity information and original images are combined appropriately in the stereoscopic image quality assessment.
Article
Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods. Our taxonomy is designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms. We have also produced several new multi-frame stereo data sets with ground truth and are making both the code and data sets available on the Web. Finally, we include a comparative evaluation of a large set of today's best-performing stereo algorithms.
Conference Paper
Three-dimensional (3D) imaging has attracted considerable attention recently due to its increasingly wide range of applications. Consequently, perceived quality is a great important issue to assess the performance of all 3D imaging applications. Perceived distortion and depth of any stereoscopic images are strongly dependent on the local features, such as edge, flat and texture. In this paper, we propose an noreference (NR) perceptual quality assessment for JPEG coded stereoscopic images based on segmented local features of artifacts and disparity. The local features information of stereoscopic pair images such as edge, flat and texture areas and also the blockiness and zero crossing rate within the block of the images are evaluated for artifacts and disparity in this method. The result on our subjective stereoscopic images database indicates that the model performs quite well over a wide rang of image content and distortion levels.
Conference Paper
Compared to metrics proposed to assess the quality of two-dimensional (2D) images, there are very few metrics devoted to quality assessment of stereoscopic presentations. Crosstalk is one of the most annoying distortions in the visualization stage of stereoscopic imaging technology. This paper proposes a perceptual quality metric which takes characteristics of stereoscopic images into account for predicting quality levels of crosstalk perception in stereoscopic images, based on an understanding of three main factors, crosstalk level, camera baseline and scene content. The experimental results demonstrate that the proposed metric has Pearson correlation of 87.7% when compared to the ground truth results from the subjective experiments on the crosstalk perception, which is much better than the traditional 2D metrics without integrating 3D depth information.
Human factors of 3-D images: Results of recent research at Heinrich-Hertz-Institut Berlin
  • S Pastoor
Reference softwares for depth estimation and view synthesis,
  • M Tanimoto
Xing and A. Perkis High-Quality Visual Experience: Creation, Processing and Interactivity of High-resolution and High-dimensional Video Signals
  • J You
  • G Jiang