Conference PaperPDF Available

Towards standardized 3DTV QoE assessment: Cross-lab study on display technology and viewing environment parameters

Authors:

Abstract and Figures

Subjective assessment of Quality of Experience in stereoscopic 3D requires new guidelines for the environmental setup as existing standards such as ITU-R BT.500 may no longer be appropriate. A first step is to perform cross-lab experiments in different viewing conditions on the same video sequences. Three international labs performed Absolute Category Rating studies on a freely available video database containing degradations that are mainly related to video quality degradations. Different conditions have been used in the labs: Passive polarized displays, active shutter displays, differences in viewing distance, the number of parallel viewers, and the voting device. Implicit variations were introduced due to the three different languages in Sweden, South Korea, and France. Although the obtained Mean Opinion Scores are comparable, slight differences occur in function of the video degradations and the viewing distance. An analysis on the statistical differences obtained between the MOS of the video sequences revealed that obtaining an equivalent number of differences may require more observers in some viewing conditions. It was also seen that the alignment of the meaning of the attributes used in Absolute Category Rating in different languages may be beneficial. Statistical analysis was performed showing influence of the viewing distance on votes and MOS results.
Content may be subject to copyright.
Towards standardized 3DTV QoE assessment: Cross-lab
study on display technology and viewing environment
parameters
Marcus Barkowsky, Jing Li, Taehwan Han, Sungwook Youn, Jiheon Ok,
Chulhee Lee, Christer Hedberg, Indirajith Vijai Ananth, Kun Wang, Kjell
Brunnstr¨om, et al.
To cite this version:
Marcus Barkowsky, Jing Li, Taehwan Han, Sungwook Youn, Jiheon Ok, et al.. Towards stan-
dardized 3DTV QoE assessment: Cross-lab study on display technology and viewing environ-
ment parameters. SPIE Electronic Imaging, Feb 2013, San franscisco, United States. 8648,
pp.864809-864809, 2013, <10.1117/12.2004050>.<hal-00789054>
HAL Id: hal-00789054
https://hal.archives-ouvertes.fr/hal-00789054
Submitted on 15 Feb 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destin´ee au d´epˆot et `a la diffusion de documents
scientifiques de niveau recherche, publi´es ou non,
´emanant des ´etablissements d’enseignement et de
recherche fran¸cais ou ´etrangers, des laboratoires
publics ou priv´es.
Towards standardized 3DTV QoE assessment: Cross-lab study on
display technology and viewing environment parameters
Marcus Barkowskya, Jing Lia, Taehwan Hand, Sungwook Yound, Jiheon Okd, Chulhee Leed, Christer
Hedbergb, Indirajith Vijai Ananthb, Kun Wangbc, Kjell Brunnströmbc, Patrick Le Calleta
aLUNAM Université, Université de Nantes, IRCCyN UMR CNRS 6597, France; Email:
Firstname.Lastname@univ-nantes.fr; bDept. of Netlab, Acreo Swedish ICT AB, Sweden; Email:
Firstname.Lastname@acreo.se; cMid Sweden University, Sweden; dSchool of Electrical and
Electronic Engineering, Yonsei University, South Korea; Email: Lastname@yonsei.ac.kr
ABSTRACT
Subjective assessment of Quality of Experience in stereoscopic 3D requires new guidelines for the environmental setup
as existing standards such as ITU-R BT.500 may no longer be appropriate. A first step is to perform cross-lab
experiments in different viewing conditions on the same video sequences. Three international labs performed Absolute
Category Rating studies on a freely available video database containing degradations that are mainly related to video
quality degradations. Different conditions have been used in the labs: Passive polarized displays, active shutter displays,
differences in viewing distance, the number of parallel viewers, and the voting device. Implicit variations were
introduced due to the three different languages in Sweden, South Korea, and France. Although the obtained Mean
Opinion Scores are comparable, slight differences occur in function of the video degradations and the viewing distance.
An analysis on the statistical differences obtained between the MOS of the video sequences revealed that obtaining an
equivalent number of differences may require more observers in some viewing conditions. It was also seen that the
alignment of the meaning of the attributes used in Absolute Category Rating in different languages may be beneficial.
Statistical analysis was performed showing influence of the viewing distance on votes and MOS results.
Keywords: Subjective assessment, Viewing environment, Stereoscopic Displays, 3D Quality of Experience, Cross-Lab
Validation, Standardization
1. INTRODUCTION
Reliable and reproducible subjective measurement of Quality of Experience (QoE) in 3DTV is currently investigated for
optimizing 3D service parameters and as a necessary prerequisite towards the development of objective models. QoE for
3DTV is known to extend over several psychophysical dimensions such as picture quality, depth sensation, and visual
comfort which may be combined to higher level indications such as naturalness, presence and visual experience [1].
The perception of degradations measured in subjective assessment studies is influenced by the viewing conditions. In
stereoscopic 3DTV, selecting and calibrating the display may be more important than in 2D, as additional technological
factors for the display such as maximum perceived brightness or crosstalk may have significant influence, and may be
difficult to measure across subjective assessment labs [2][3].
The influence of the viewing environment, such as illumination, viewing distance, voting interface, observer screening,
training and introduction to the experiment, is expected to differ significantly from the influence that was perceived
when 2D recommendations, such as ITU-R BT.500, were established. For example, in line-interleaved passive, polarized
displays, the horizontal resolution often exceeds the vertical resolution per view by a factor of two, leading to questions
regarding the appropriate viewing distance of 3H or 5H for Full-HD content.
Recently, the Video Quality Experts Group (VQEG) started a re-evaluation of the viewing conditions and the display
specifications in preparation of new recommendations in standardization organizations such as ITU and EBU [4]. A
freely available, common set of video sequences has been published that contains only symmetric video coding and
resolution reduction as degradations. Measurement only on the video quality scale may therefore be sufficient, as
opposed to asymmetric video coding, changes in camera distance, or transmission errors which are related to visual
discomfort and depth realism [5][6].
This paper analyzes the results obtained from three subjective experiments on the aforementioned database. The
experiments have been done in three different locations, with different equipments and viewing conditions: At Acreo
Swedish ICT AB in Sweden, at the IRCCyN lab of the University of Nantes in France, and at the Yonsei University in
South Korea. Section 2 introduces the properties of the evaluated video sequences. The subjective experiment design is
explained in Section 3, and the obtained observer votes are analyzed in Section 4, followed by the conclusion in Section
5.
2. VIDEO CONTENT AND DEGRADATIONS
A detailed description of the video content and the applied distortions has been published in [7]. A short summary for the
Nantes-Madrid 3D Stereoscopic source sequences (NAMA3DS1) and the distortions introduced in the Coding and
Spatial Degradations (COSPAD1) dataset [8] is presented in the following.
The source content (SRC) has been selected in order to provide a wide variety of different content types as can be seen in
Figure 1. All sequences have been captured using a Panasonic AG-3DA1E twin-lens camera at 1920x1080p25
resolution. Most of them have durations of 16 seconds and were stored uncompressed on a ClearView Extreme System.
SRC2, SRC3, and SRC5 have been stored on the SD cards of the camera at a maximum bitrate of 24Mbit/s per view, and
SRC10 has a length of 13 seconds.
SRC1: Barrier
frame #245
SRC2: Basket
#285
SRC3: Boxers
#189
SRC4: Hall
#200
SRC5: Lab
#390
SRC6: News report
#150
SRC7: Phone call
#181
SRC8: Soccer
#193
SRC9: Tree branches
#200
SRC10: Umbrella
#230
Figure 1: Source sequence thumbnails
The degradations, called Hypothetical Reference Circuits (HRC), are summarized in Table 1. They have been chosen to
exhibit mostly perceptual impairments on the image quality scale, i.e. avoiding influences on depth realism or visual
comfort. In some cases, this may not hold true, in particular, the strong coding degradations in HRC3 and HRC4 may
lead to a sensation of binocular rivalry and therefore lead to visual discomfort. Depending on the content characteristics,
in high frequency regions spatial details may be lost, therefore also loosing exploitable disparity information, leading to
less perceived depth effect. For example, in SRC9 the depth differences between the leaves may no longer be perceived.
Table 1: Degradations
HRC
Type
Parameters
0
None Reference sequence
1
Video coding (H.264)
QP 32
2
Video coding (H.264)
QP 38
3
Video coding (H.264)
QP 44
4
Still image coding (JPEG2k)
2 Mb/s
5
Still image coding (JPEG2k)
8 Mb/s
6
Still image coding (JPEG2k)
16 Mb/s
7
Still image coding (JPEG2k)
32 Mb/s
8
Reduction of resolution
↓4 downsampling
9
Image sharpening
Edge enhancement
10
Downsampling & sharpening
HRC 8 + HRC 9
3. SUBJECTIVE EXPERIMENT SETUP
3.1 Viewing environment and displays
In total, four different conditions were used by the three laboratories in Sweden, South Korea and France as shown in
Table 2. In all experiments, the maximum display brightness perceived through the polarized or active shutter glasses
was measured and the background illumination was adjusted to 15% as specified by ITU-R BT.500. The main
differences were in terms of language, display technology, and number of observers as well as the voting device. At
Acreo half of the sequences were presented at a viewing distance of 3H, the other half was shown at 5H. Additional
observer information was obtained in the labs.
Table 2: Viewing environment
Experiment
EXP1
EXP2
EXP3a
EXP3b
Laboratory
IRCCyN, Nantes
Yonsei, South Korea
Acreo, Sweden
Display
Philips 46PFL9705H
Hyundai S465D
Hyundai S465D
Technology
Active Shutter glasses
Polarized FPR, glasses
Polarized Frame-Pattern-Retarder (FPR)
Viewing distance
3H (1.72m)
3H (1.72m)
3H (2.5m)
5H (4.2m)
Voting device
Screen
Paper
Screen
Language
French
Korean
Swedish/English
Number of observers
29
28
24
24
Obs. per viewing
session
1
2
1
1
Observer screening
method
acuity, stereo-acuity,
color
stereo-acuity, color
acuity, stereo-acuity, color
Additional observer
information
age, gender, 3D viewing
experience, directing
eye
eye distance
age, gender, 3D viewing experience, simulator
sickness questionnaire
3.2 Subjective Test Setup
Absolute Category Rating with Hidden Reference (ACR-HR) as specified in ITU-T P.910 was conducted in all labs. The
training instructions were translated from a shared English version to the three native languages. At Acreo, half of the
observers were native Swedish speakers, the other half performed the experiment in English.
All observers watched all 110 processed video sequences (PVS) in EXP1 and EXP2. In EXP3, the PVS were split in two
groups. Each group contains all HRC of SRC2. The other SRC were equally distributed and the HRCs were selected in
order to obtain a uniform distribution of Mean Opinion Scores (MOS) based on the results of the two other labs leading
to the repartitioning as shown in Table 3. Half of the observers started viewing SetA at 3H and continued after a break
with SetB at 5H, half of the observers started viewing SetB at 3H before watching SetA at 5H. The voting in EXP3 was
performed on three scales simultaneously; “visual discomfort” and “sense of presence” will not be analyzed in this paper.
Table 3: Subset selection at Acreo
HRC0
HRC1
HRC2
HRC3
HRC4
HRC5
HRC6
HRC7
HRC8
HRC9
HRC10
SRC1
A
B
B
B
A
A
A
B
A
A
B
SRC2
A,B
A,B
A,B
A,B
A,B
A,B
A,B
A,B
A,B
A,B
A,B
SRC3
B
B
A
A
B
B
A
A
A
A
B
SRC4
B
A
B
A
B
A
A
B
B
A
B
SRC5
A
A
A
B
A
A
B
A
B
B
B
SRC6
B
A
B
B
A
B
A
A
A
B
B
SRC7
A
B
B
A
A
A
B
A
B
B
A
SRC8
B
A
A
A
B
B
B
A
A
B
B
SRC9
A
A
A
A
A
B
A
B
B
B
B
SRC10
A
A
B
A
B
B
A
B
B
B
A
At the Yonsei University, EXP1, two observers were seated in front of the screen and the voting was written on paper, in
the other labs only a single observer per session watched the PVS and a voting interface appeared on either a separate
screen (IRCCyN, EXP2) or on the same screen (Acreo, EXP3). EXP1 used an active shutter glasses system while the
two other labs used the same polarized passive display.
4. RESULTS
4.1 Inter-lab Mean Opinion Score analysis
The Mean Opinion Scores (MOS) will be analyzed first across the different labs. For this analysis, the different viewing
distances in EXP3 are not distinguished.
The Pearson Linear Correlation Coefficient (PC) and the Spearman Rank Order Coefficient (SROCC) as well as the
scatterplots depicted in Figure 2 show that, in general, high correlation has been obtained between the different labs.
Slight differences can be seen in the usage of the voting scale, a linear regression curve has been fitted to the data,
showing that the observers in EXP1 gave lower votes than those in EXP2, followed by those in EXP3, visible also in the
grand mean values of 3.10, 3.13, and 3.37 MOS respectively. None of the differences is statistically significant on a 95%
confidence using a student-t test or Wilcoxon rank sum test.
EXP1 vs. EXP2
EXP1 vs. EXP3
EXP2 vs. EXP3
PC=0.9740, SROCC=0.9636
PC=0.9804, SROCC=0.9811
PC=0.9797, SROCC=0.9634
a)
b)
c)
d)
e)
f)
Figure 2: Inter-Lab scatterplots: a-c use different markers to distinguish HRCs, d-f to distinguish SRC
The largest divergence from the regression line is observed for comparing EXP1 and EXP2, the RMSE after fitting is
0.27, compared to 0.23 and 0.22 for (b) and (c) in Figure 2 which is in line with the rank order of the Pearson Correlation
coefficients. The relation to the viewing environment conditions seems complex, no simple explanation can be provided
at this moment.
Analyzing the influence of SRC sequences on the overall votes after linear fitting to EXP1, it can be found that the
highest mean shift occurs for SRC1 (Barrier gate), which was voted with a MOS of 3.33, 3.02, and 3.16 in the three
experiments. This may be due to different display technologies rendering the fine structures of the trees particularly
interesting at the full resolution shutter glasses display. A display difference was expected for SRC2: The camera pan in
this sequence has been criticized by several experts as leading to temporal sampling problems in shutter glasses displays.
The observer may get confused by alternating the left and right view while perceiving fast motion at the same time.
Depth estimation and movement estimation coincide. However, this sequence does not show any particularity with MOS
values of 3.12, 3.19, and 3.15.
The influence of HRCs on the overall voting shows the largest influence for HRC8, the downsampling of the resolution
of a factor of 4. After alignment to EXP1, the three MOS values are 2.54, 2.75, and 2.77. While not being statistically
significant, it may be seen that the observers perceive more degradation on the Full-HD active display in EXP1 than on
the polarized screen which has half the vertical resolution per view in EXP2 and EXP3.
4.2 Number of significantly different PVS
In the analysis of subjective experiment studies, the conclusions often depend on obtaining statistical difference between
PVS. In most cases, an increase in the number of subjects results in an increased number of statistically different PVSs.
This has been analyzed for the 3 labs. The diagram in Figure 3 shows for a given number of observers the average
number of statistical differences using a Monte Carlo simulation with 1000 trials. It can be observed that in EXP2, the
same number of statistical differences can be obtained with approximately 3 more observers compared to EXP1. In
EXP3 the number of additional observers is increased by approximately 7. Analyzing the source of this difference
requires further subjective experiments.
Figure 3: Statistically different PVS in per cent in function of number of observers
4.3 Influence of language on absolute category votes
Four different languages have been used in this subjective experiment: French, Korean, English, and Swedish. It is
known that the Absolute Category Rating scale may be influenced by the meaning of the attributes in each language.
Based on the research reviewed in [9], the individual votes were aligned to a common scale. Within the four languages,
French was selected for the common scale because a complete experiment with 24 observers was available and mapping
values to an absolute scale were published in the literature. First, the French votes were mapped to the values provided in
[9]. Second, for each of the other experiments, the numerical vote values were mapped to the common scale using the
assumption that the MOS values should coincide. The Root Mean Square Error (RMSE) was used as criterion. The five
votes were therefore fitted on a minimum of 12 observers with 110 votes each for English and Swedish language in
EXP3. Figure 4a shows the results of the alignment. It should be noted that the subjects taking the test in English were
not native English speakers, which means that the difference on this language scale towards the others should be
interpreted with caution. Most of the attribute rank orders correspond to the expected results from [9]. The remaining
diagrams in Figure 4b show the scatterplots of the EXP1 as compared to the EXP2, and EXP3 with the corresponding
languages used. The Pearson Correlation and Spearman Rank Order coefficients are integrated in the diagrams and show
no significant improvement as compared to Figure 2.
14 16 18 20 22 24
66
68
70
72
74
76
78
No of observers
Statistical significantly different PVS (%)
EXP1
EXP2
EXP3
a)
b)
c)
d)
Figure 4: Alignment of adjectives in different languages
4.4 Comparing individual votes for viewing distances of 3H and 5H at Acreo
In EXP3, SRC2 was presented to all subjects both at 3H and at 5H viewing distance. Therefore, 24x11=264 pairs are
available that compare the two viewing distances. A Two-way ANOVA analysis was performed with HRC and viewing
distance as within-factors, showing statistical significance on both main factors, only slight significance for viewing
distance (F(1,23)=6.5, p=0.02)
1
, and strong significance for HRC as expected (F(10,230)=73.73, p<0.01). Analyzing the
11 conditions separately, it may be found that statistical difference is only present for HRC9, when the resolution is
reduced by a factor of 4 and upscaled again (M3H=2.667, SD3H=0.917; M5H=3.173, SD5H=0.761, student-t p=0.02384,
Wilcoxon p=0.02854).
Figure 5: Scatterplot of observer votes for 3H and 5H for SRC2 of EXP3,
(Uniform random noise added to the absolute category rating votes from 1-5 for display purpose only)
1
In the EXP3 the videos had been divided into two video sets containing one SRC (SRC 2) with all its HRC on both
video sets as a common set. The common set is therefore not a pure within variable. Analyzing the viewing distance as a
between effect on the common set gave, however, no significant difference. We believe though that it is more close to a
within effect, since the videos of the common set were randomly mixed with the whole dataset.
English Korean Swedish French
0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
French
Korean
PC= 0.974
SROCC= 0.962
0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
French
English
PC= 0.966
SROCC= 0.955
0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
French
Swedish
PC= 0.970
SROCC= 0.961
0 1 2 3 4 5 6
0
1
2
3
4
5
6
3H
5H
Scatterplot of votes for the same observer for all conditions
HRC1-8,10,11
HRC9
Figure 5 shows a scatterplot of all votes, uniform random noise with amplitude of 0.5 was added to the quantized 5-point
absolute category rating votes for displaying purpose only. HRC9 uses a different marker style, and the linear regression
curve (solid line) shows a deviation from the main diagonal (dotted line) in favor of higher votes for 5H. The resolution
reduction may be less perceivable at larger viewing distances, as 14 observers ranked them at least one attribute higher,
while only 4 chose to prefer the quality in 3H over 5H; 6 observers were undecided. It should be noted that the analysis
may be biased due to carry-over effects as all subjects started their viewing session in 3H.
The results indicate that observers may have seen a difference between 3H and 5H viewing distance individually but the
influence on the Mean Opinion Score is limited. A larger experiment involving more observers and more PVS is
required.
5. CONCLUSIONS
Quality of Experience assessment in stereoscopic 3D remains a challenging topic. The subjective experiment dataset
used by the three subjective experiment labs presented in this paper was deliberately limited to the image quality
degradation scale. The obtained Mean Opinion Scores were comparable although the lab setup differed within reasonable
limits. Amongst the most important differences may be noted that active shutter glasses or passive polarized display
technology was used, one or two observers judged the video sequence at the same time, the voting was performed either
on paper, on the presentation display, or on a separate display. The analysis showed that differences may also be related
to the meaning of the adjectives on the ACR voting scale in the different languages. Last but not least, the viewing
distance on passive polarized screens was evaluated showing that observers tend to perceive less degradations when
seated at 5H, corresponding to the vertical resolution per view, than at 3H, corresponding to the horizontal resolution per
view.
Further experimental studies are required to verify the obtained results, including more observers and further viewing
conditions. This research has been enabled by the availability of a freely available dataset and will be boosted by
collecting the observer’s votes in different viewing conditions, allowing further statistical analysis of the influence of th e
lab setup conditions.
REFERENCES
[1] Lambooij, M., IJsselsteijn, W., Bouwhuis, D. G., & Heynderickx, I., “Evaluation of Stereoscopic Images:
Beyond 2D Quality,” IEEE Transactions on Broadcasting, 57(2), 432444, (2011).
[2] Tourancheau, S., Wang, K., Bulat, J., Cousseau, R., Janowski, L., Brunnström, K., et al., Reproducibility of
crosstalk measurements on active glasses 3D LCD displays based on temporal characterization,” SPIE
Electronic Imaging Stereoscopic Displays and Applications XXIII, 8288 (2012)
[3] Xing, L., You, J., Ebrahimi, T., & Perkis, A., “Assessment of Stereoscopic Crosstalk Perception,” IEEE
Transactions on Multimedia, 14(2), 326337 (2012)
[4] Video Quality Experts Group (VQEG), "Test Plan on Evaluation and Specification of Viewing Conditions and
Environmental Setup for 3D Video Quality Assessment,"
http://www.its.bldrdoc.gov/vqeg/projects/3dtv/3dtv.aspx, (2012)
[5] Gutierrez, J., Perez, P., Jaureguizar, F., Cabrera, J., & Garcia, N., “Subjective assessment of the impact of
transmission errors in 3DTV compared to HDTV,” 3DTV Conference: The True Vision Capture,
Transmission and Display of 3D Video (3DTV-CON), (2011)
[6] Aflaki, P., Hannuksela, M. M., Häkkinen, J., Lindroos, P., & Gabbouj, M., Subjective study on compressed
asymmetric stereoscopic video,” 17th IEEE International Conference on Image Processing (ICIP), pp. 4021
4024 (2010)
[7] Urvoy, M., Barkowsky, M., Gutiérrez, J., Cousseau, R., Koudota, Y., Ricordel, V., et al., “NAMA3DS1-
COSPAD1: Subjective video quality assessment database on coding conditions introducing freely available
high quality 3D stereoscopic sequences,” IEEE Fourth International Workshop on Quality of Multimedia
Experience QoMEX, (2012)
[8] IRCCyN-IVC, “Nantes-Madrid 3D Stereoscopic database Coding and Spatial Degradations NAMA3DS1-
COSPAD1,” http://www.irccyn.ec-nantes.fr/spip.php?article1052, Nantes, (2012).
[9] Zielinski, S., Rumsey, F., & Bech, S., On Some Biases Encountered in Modern Audio Quality Listening Tests-
A Review,” Journal of the AES, 56(6), 427451 (2008)
... Comparison between different subjective 3D video quality evaluations has been presented in [14] (between crowdbased and lab-based test; authors compared MOS quality scores, using video sequences coded with MVC+D and 3D-AVC, with different bitrate, and converted to different synthesized views for subjective test), [20] (three different laboratories; similar setup for tests as in [14]) and [21] (three different laboratories; authors tested ten degradation types from NAMAS1-COSPAD data set [22]). In [23], authors carried an experiment to determine the impact of certain video characteristics such as fast in-scene motion, large changes in disparity, and depth discontinuities caused by subtitles, in terms of visual comfort via different measurement methods. ...
... Spearman's correlation between laboratories in [20] was 0.9340-0.9399. In [21] (3 different laboratories; authors tested ten degradation types from NAMAS1-COSPAD data set), Spearman's correlation between laboratories was 0.9634-0.9811. When comparing number of grades per sequence with [8], it should be noted that 2D image quality assessment can be done more easily than 3D video quality assessment, especially for crowdsourced tests. ...
Article
Full-text available
This article proposes a new method for subjective 3D video quality assessment based on crowdsourced workers—Crowd3D. The limitations of traditional laboratory-based grade collection procedures are outlined, and their solution through the use of a crowd-based approach is described. Several conceptual and technical requirements of crowd-based 3D video quality assessment methods are identified and the solutions adopted described in detail. The system built takes the form of a web-based platform that supports 3D video monitors, and orchestrates the entire process of observer validation, content presentation and quality, depth, and comfort grade recording in a remote database. The crowdsourced subjective 3D quality assessment system uses as source contents a set of 3D video and grades database assembled earlier in a laboratory setting. To evaluate the validity of the crowd-based approach, the grades gathered using the crowdsourced system were analysed and compared to a set of grades obtained in laboratory settings using the same data set. Results show that it is possible to obtain Pearson’s and Spearman’s correlation up to 0.95 for quality Difference Mean Opinion Score and 0.96 for quality Mean Opinion Score, when comparing with laboratory grades. Apart from the present study, the 3D video quality assessment platform proposed can be used with advantage for further related research activities, reducing the time and cost compared to the traditional laboratory-based quality assessments.
... The most important method applied to discomfort measurement is probably the ITU recommendation BT.500-10, which measures a wide range of image impairments on a scale from "imperceptible" to "very annoying" [60]. For example, it has been applied for measuring the effect of crosstalk [11], but it was originally designed to measure image quality, not comfort. Even when it is modified by researchers, this recommendation is still important because it defines many test conditions which can help to improve consistency between tests. ...
Article
This paper reviews the causes of discomfort in viewing stereoscopic content. These include objective factors, such as misaligned images, as well as subjective factors, such as excessive disparity. Different approaches to the measurement of visual discomfort are also reviewed, in relation to the underlying physiological and psychophysical processes. The importance of understanding these issues, in the context of new display technologies, is emphasized.
... Modern quality assessment methods have been adapted to stereoscopic content and, while they do not model discomfort caused by 3D effects explicitly, discomfort measures are typically integrated into a complete QoE score and are therefore relevant to this discussion. Due to the number of recent assessment methods, there is increasing need to standardise QoE across labs [10]. For an overview of quality assessment methods for stereo 3D, we refer to [195]. ...
Article
Full-text available
Visual discomfort is a significant obstacle to the wider use of stereoscopic 3D displays. Many studies have identified the most common causes of discomfort, and a rich body of literature has emerged in recent years with proposed technological and algorithmic solutions. In this paper, we present the first comprehensive review of available image processing methods for reducing discomfort in stereoscopic images and videos. This review covers improved acquisition, disparity re-mapping, adaptive blur, crosstalk cancellation and motion adaptation, as well as improvements in display technology.
... Systems " [ICETE 2013] 05 " A feasible solution to provide cloud computing over optical networks " 0 [Taheri and Ansari 2013] 06 " Paired comparison-based subjective quality assessment of 2 stereoscopic images " 07 " A dynamic system model of time-varying subjective quality of video 0 streams over HTTP " 08 " Robustness of speech quality metrics to background noise and 0 network degradations: Comparing ViSQOL, PESQ and POLQA " [Hines et al. 2013] 09 " Use-and QoE-related aspects of personal cloud applications: An 0 exploratory survey " [Vandenbroucke et al. 2013] 10 " How much longer to go? The influence of waiting time and progress 0 indicators on quality of experience for mobile visual search applied to print media " [Cao et al. 2013] 11 " High definition H.264/AVC subjective video database for evaluating 0 the influence of slice losses on quality perception " [Staelens et al. 2013] 12 " Perceptual experience of time-varying video quality " 0 [Rehman and Wang 2013] 13 " Providing quality of experience for users: The next DBMS challenge " 0 [de Carvalho Costa and Furtado 2013] 14 " Subjective evaluation of stereoscopic image quality " [Moorthy et al. 2013] 3 15 " A network-aware virtual machine placement algorithm in mobile 0 cloud computing environment " [Chang et al. 2013] 16 " Optimal design of virtual networks for resilient cloud services " 0 [Barla et al. 2013] 17 " Spatiotemporal no-reference video quality assessment model 0 on distortions based on encoding " [Zerman et al. 2013b] 18 " Towards standardized 3DTV QoE assessment: Cross-lab study on display 0 technology and viewing environment parameters " [Barkowsky et al. 2013] 19 " A survey on 3D quality of experience and 3D quality assessment " 0 [Moorthy and Bovik 2013] 20 " Approach for service search and peer selection in P2P service overlays " 0 [Fiorese et al. 2013] 21 " On quality of experience of scalable video adaptation " 1 22 " QoE analysis for scalable video adaptation " [Li et al. 2012a] 0 23 " QoE-aware resource allocation for scalable video transmission over 0 multiuser MIMO-OFDM systems " [Li et al. 2012b] (Continued) 24 " PNN-based QoE measuring model for video applications over LTE 0 system " [He et al. 2012] 25 " QoE assessment of multimedia video consumption on tablet devices " 0 [Floris et al. 2012] 26 " A reputation based vertical handover decision making framework 0 (R-VHDF) " [Loukil et al. 2012] 27 " Comparison of stereoscopic technologies in various configurations " 0 [Fliegel et al. 2012] 28 " Comparison of objective quality metrics on the scalable extension of 0 H.264/AVC " [Lee 2012] //// 29 " ...
Article
Full-text available
In the context of distributed databases (DDBs), the absence of mathematically well defined equations to evaluate quality of service (QoS), especially with statistical models, seems to have taken database community attention from the possible performance guarantees that could be handled by concepts related to quality of experience (QoE). In this article, we targeted the definition of QoE based on completeness of QoS to deal with decisions concerning with performance correction in a system level. This study also presents a statistical bibliometric analysis before the proposed model. The idea was to show the origin of first studies with correlated focus, which also have initial conceptualizations, and then propose a new model. This model concerns concise QoS definitions, grouped to provide a basis for QoE analysis. Afterward, it is foreseen that a DDB system will be able to autoevaluate and be aware of recovering situations before they happen.
... Recently, Barkowsky et al. [2] have studied cross-lab 3DTV quality assessment method with a main focus on defining the effect of different lab conditions like passive polarized displays, active shutter displays, viewing distance, number of parallel viewers, and voting device. ...
Conference Paper
Full-text available
Consistent and imitable subjective measurement 3D video quality assessment is investigated for evaluating 3D service parameters and as an essential criterion towards the develop-ment of objective models. This paper analyzes the results ob-tained from three test laboratories to evaluate the performance of the MVC+D and 3D-AVC 3D video coding standards using identical video content and following similar methodologies and instructions. The correlation between laboratories was investigated using similar analysis to benchmarking of objec-tive metrics. Analyses show that test laboratories employing different displays and different subjects could still produce highly correlated results, as they follow similar guidelines to carry out assessments.
... Table 2 reports the performance indexes. Results show that there is a very strong correlation between crowd-based and lab-based evaluations, as the correlation indexes are above 0.97, which is similar to the correlation between different laboratories conducting the same experiment on stereoscopic monitors [16]. The PLCC, RMSE, and OR indexes are slightly better for 'sweep' than 'mono' when no fitting or linear fitting are considered. ...
Conference Paper
Full-text available
Crowdsourcing is becoming a popular cost effective alternative to lab-based evaluations for subjective quality assessment. However, crowed-based evaluations are constrained by the limited availability of display devices used by typical online workers, which makes the evaluation of 3D content a challenging task. In this paper, we investigate two possible approaches to crowed-based quality assessment of multiview video plus depth (MVD) content on 2D displays: by using a virtual view and by using a free-viewpoint video corresponding to a smooth camera motion during a time freeze. We conducted the corresponding crowdsourcing experiments using seven MVD sequences encoded at different bit rates with the upcoming 3D-AVC video coding standard. The crowdsourcing results demonstrate high correlation with subjective evaluations performed on a stereoscopic monitor in a laboratory environment. The analysis shows no statistically significant difference between the two approaches. Full text is unavailable at the moment, since the paper is not published yet.
Article
This paper provides complementary data to the review of biases in audio quality listening tests by Zielínski et al. (2008) [1]. The paper presents selected illustrations of range equalizing bias, centering bias, stimulus spacing bias, contraction bias, and bias due to nonlinear properties of assessment scale. The illustrations are given in graphical form and respective discussions of biases using empirical data obtained by various researchers over the period of the past 15 years. The presented collection of illustrations along with the discussion may help the experimenters to identify potential biases affecting their data and avoid typical pitfalls in reporting the outcomes of the listening tests.
Chapter
Crowdsourcing enables new possibilities for QoE evaluation by moving the evaluation task from the traditional laboratory environment into the Internet, allowing researchers to easily access a global pool of subjects for the evaluation task. This makes it not only possible to include a more diverse population and real-life environments into the evaluation, but also reduces the turn-around time and increases the number of subjects participating in an evaluation campaign significantly by circumventing bottle-necks in traditional laboratory setup. In order to utilise these advantages, the differences between laboratory-based and crowd-based QoE evaluation must be considered and we therefore discuss both these differences and their impact on the QoE evaluation in this chapter.
Article
Full-text available
Multimedia technology is aiming to improve people's viewing experience, seeking for better immersiveness and naturalness. The development of HDTV, 3DTV, and Ultra HDTV are recent illustrative examples of this trend. The Quality of Experience (QoE) in multimedia encompass multiple perceptual dimensions. For instance, in 3DTV, three primary dimensions have been identified in literature: image quality, depth quality and visual comfort. In this thesis, focusing on the 3DTV, two basic questions about QoE are studied. One is "how to subjectively assess QoE taking care of its multidimensional aspect?". The other is dedicated to one particular dimension, i.e., "what would induce visual discomfort and how to predict it?". In the first part, the challenges of the subjective assessment on QoE are introduced, and a possible solution called "Paired Comparison" is analyzed. To overcome drawbacks of Paired Comparison method, a new formalism based on a set of optimized paired comparison designs is proposed and evaluated by different subjective experiments. The test results verified efficiency and robustness of this new formalism. An application is the described focusing on the evaluation of the influence factor on 3D QoE. In the second part, the influence of 3D motion on visual discomfort is studied. An objective visual discomfort model is proposed. The model showed high correlation with the subjective data obtained through various experimental conditions. Finally, a physiological study on the relationship between visual discomfort and eye blinking rate is presented.
Book
Quality of experience in 3D media requires new and innovative concepts for subjective assessment methodologies. Capturing the observer's opinion may be achieved by providing multiple voting scales, such as 2D image quality, depth quantity, and visual comfort. Pooling these different scales to achieve a single quality percept may be performed differently by each human observer. The chapter dives into the complexity of this subject by explaining the QoE concept using 3DTV as an example. It explains the meaning of the different scales, the current approaches to assess each of them, and the individual influence factors related to the voting which affects reproducibility of the obtained results. Methodologies for assessing the overall preference of experience using pair comparisons with a reasonable number of stimuli are provided. The viewers may also create their own attributes for evaluation in the Open Profiling methodology which has been recently adapted for 3DTV. The drawback of all these assessment methods is that they are intrusive in the sense that the assessor needs to concentrate on the task at hand. Medical and psychophysical measurement methods, such as EEG, EOG, EMG, and fMRI, may eliminate this drawback and are introduced with respect to the different QoE influence factors. Their value at this early stage of development is mostly in supporting and partly predicting subjectively perceived and annotated QoE. The chapter closes with a brief review of the most important technical constraints that impact on the capture, transmission, and display of 3DTV signals. © 2014 Springer Science+Business Media New York. All rights are reserved.
Conference Paper
Full-text available
Research in stereoscopic 3D coding, transmission and subjective assessment methodology depends largely on the availability of source content that can be used in cross-lab evaluations. While several studies have already been presented using proprietary content, comparisons between the studies are difficult since discrepant contents are used. Therefore in this paper, a freely available dataset of high quality Full-HD stereoscopic sequences shot with a semiprofessional 3D camera is introduced in detail. The content was designed to be suited for usage in a wide variety of applications, including high quality studies. A set of depth maps was calculated from the stereoscopic pair. As an application example, a subjective assessment has been performed using coding and spatial degradations. The Absolute Category Rating with Hidden Reference method was used. The observers were instructed to vote on video quality only. Results of this experiment are also freely available and will be presented in this paper as a first step towards objective video quality measurement for 3DTV.
Conference Paper
Full-text available
Crosstalk is one of the main display-related perceptual factors degrading image quality and causing visual dis-comfort on 3D-displays. It causes visual artifacts such as ghosting eects, blurring, and lack of color delity which are considerably annoying and can lead to diculties to fuse stereoscopic images. On stereoscopic LCD with shutter-glasses, crosstalk is mainly due to dynamic temporal aspects: imprecise target luminance (highly dependent on the combination of left-view and right-view pixel color values in disparity regions) and synchro-nization issues between shutter-glasses and LCD. These dierent factors inuence largely the reproducibility of crosstalk measurements across laboratories and need to be evaluated in several dierent locations involving similar and diering conditions. In this paper we propose a fast and reproducible measurement procedure for crosstalk based on high-frequency temporal measurements of both display and shutter responses. It permits to fully characterize crosstalk for any right/left color combination and at any spatial position on the screen. Such a reliable objective crosstalk measurement method at several spatial positions is considered a mandatory prerequisite for evaluating the perceptual inuence of crosstalk in further subjective studies.
Article
Full-text available
Perceived image quality is a standard evaluation concept for 2D imaging systems. When applied to stereoscopic 3D imaging systems, however, it does not incorporate the added value of stereoscopic depth. Higher level evaluation concepts (naturalness and viewing experience) are proposed that are sensitive to both image quality and stereoscopic depth. A 3D Quality Model is constructed in which such higher level evaluation concepts are expressed as a weighted sum of image quality and perceived depth. This model is validated by means of three experiments, in which stereoscopic depth (camera base distances and screen disparity) and image quality (white Gaussian noise and Gaussian blur) are varied. The resulting stimuli are evaluated in terms of naturalness, viewing experience, image quality and depth percept. Analysis revealed that viewing experience and naturalness incorporated variations in image quality to a similar extent, yet the added value of stereoscopic depth is incorporated significantly more by naturalness. This result classifies naturalness as the most appropriate concept to evaluate 3D quality of stereoscopic stills. The 3D Quality Model based on naturalness as evaluation concept is validly applicable to stereoscopic stills and the naturalness score is determined for approximately 75% by image quality and for approximately 25% by the added value of stereoscopic depth.
Conference Paper
Full-text available
Asymmetric stereoscopic video coding takes advantage of the binocular suppression of the human vision by representing one of the views with a lower quality. This paper describes a subjective quality test with asymmetric stereoscopic video. Different options for achieving compressed mixed-quality and mixed-resolution asymmetric stereo video were studied and compared to symmetric stereo video. The bitstreams for different coding arrangements were simulcast-coded according to the Advanced Video Coding (H.264/AVC) standard. The results showed that in most cases, resolution-asymmetric stereo video with the downsampling ratio of 1/2 along both coordinate axes provided similar quality as symmetric and quality-asymmetric full-resolution stereo video. These results were achieved under same bitrate constrain while the processing complexity decreased considerably. Moreover, in all test cases, the symmetric and mixed-quality full-resolution stereoscopic video bitstreams resulted in a similar quality at the same bitrates.
Article
A careful evaluation of listening tests designed to measure audio quality shows that they are vulnerable to systematic errors, which include biases due to affective judgments, response mapping bias, and interface bias. As a result of factors such as personal preferences, the appearance of the equipment, and the listeners' expectations or mood, errors can range up to 40% with respect to the total range of the scale. As a general conclusion, test results should be considered relative, rather than absolute. Scales in previous studies, which have been assumed to be linear, may exhibit departure from linearity. The visual appearance of the user interface may lead to severe quantization of the distribution of scores. Recommendations are offered to improve audio quality tests.
Article
A systematic review of typical biases encountered in modern audio quality listening tests is presented. The following three types of bias are discussed in more detail: bias due to affective judgments, response mapping bias, and interface bias. In addition, a potential bias due to perceptually nonlinear graphic scales is discussed. A number of recommendations aiming to reduce the aforementioned biases are provided, including an in-depth discussion of direct and indirect anchoring techniques.
Article
Stereoscopic three-dimensional (3-D) services do not always prevail when compared with their two-dimensional (2-D) counterparts, though the former can provide more immersive experience with the help of binocular depth. Various specific 3-D artefacts might cause discomfort and severely degrade the Quality of Experience (QoE). In this paper, we analyze one of the most annoying artefacts in the visualization stage of stereoscopic imaging, namely, crosstalk, by conducting extensive subjective quality tests. A statistical analysis of the subjective scores reveals that both scene content and camera baseline have significant impacts on crosstalk perception, in addition to the crosstalk level itself. Based on the observed visual variations during changes in significant factors, three perceptual attributes of crosstalk are summarized as the sensorial results of the human visual system (HVS). These are shadow degree, separation distance, and spatial position of crosstalk. They are classified into two categories: 2-D and 3-D perceptual attributes, which can be described by a Structural SIMilarity (SSIM) map and a filtered depth map, respectively. An objective quality metric for predicting crosstalk perception is then proposed by combining the two maps. The experimental results demonstrate that the proposed metric has a high correlation (over 88%) when compared with subjective quality scores in a wide variety of situations.
Article
Recently, broadcasted D video content has reached households with the first generation of 3DTV. However, few studies have been done to analyze the Quality of Experience (QoE) perceived by the end-users in this scenario. This paper studies the impact of transmission errors in 3DTV, considering that the video is delivered in side-by-side format over a conventional packet-based network. For this purpose, a novel evaluation methodology based on standard single stimulus methods and with the aim of keeping as close as possible the home environment viewing conditions has been proposed. The effects of packet losses in monoscopic and stereoscopic videos are compared from the results of subjective assessment tests. Other aspects were also measured concerning 3D content as naturalness, sense of presence and visual fatigue. The results show that although the final perceived QoE is acceptable, some errors cause important binocular rivalry, and therefore, substantial visual discomfort.
Nantes-Madrid 3D Stereoscopic database -Coding and Spatial Degradations NAMA3DS1-COSPAD1
  • Irccyn-Ivc
IRCCyN-IVC, "Nantes-Madrid 3D Stereoscopic database -Coding and Spatial Degradations NAMA3DS1-COSPAD1," "http://www.irccyn.ec-nantes.fr/spip.php?article1052", Nantes, (2012).
  • M Lambooij
  • W Ijsselsteijn
  • D G Bouwhuis
  • I Heynderickx
Lambooij, M., IJsselsteijn, W., Bouwhuis, D. G., & Heynderickx, I., " Evaluation of Stereoscopic Images: Beyond 2D Quality, " IEEE Transactions on Broadcasting, 57(2), 432–444, (2011).