ArticlePDF Available

Abstract and Figures

Stereoscopic imaging is widely used in many fields. In many scenarios, stereo images quality could be affected by various degradations, such as asymmetric distortion. Accordingly, to guarantee the best quality of experience, robust and accurate referenceless metrics are required for quality assessment of stereoscopic content. Most existing stereo no-reference Image Quality Assessment (IQA) models are not consistent with asymmetrical distortions. This paper presents a new no-reference stereoscopic image quality assessment metric using a human visual system (HVS) modeling and an advanced machine-learning algorithm. The proposed approach consists of two stages. In the first stage, cyclopean image is constructed considering the presence of binocular rivalry in order to cover the asymmetrically distorted part. In the second stage, gradient magnitude, relative gradient magnitude, and gradient orientation are extracted. These are used as a predictive source of information for the quality. In order to obtain the best overall performance against different databases, Adaptive Boosting (AdaBoost) idea of machine learning combined with artificial neural network model has been adopted. The benchmark LIVE 3D phase-I, phase-II, and IRCCyN/IVC 3D databases have been used to evaluate the performance of the proposed approach. Experimental results have demonstrated that the proposed metric performance achieves high consistency with subjective assessment and outperforms the blind stereo IQA over various types of distortion. The implementation MATLAB code can be found at: https://github.com/o-messai/3DBIQA-AdaBoost
Content may be subject to copyright.
ADABOOST NEURAL NETWORK AND CYCLOPEAN VIEW FOR NO-REFERENCE
STEREOSCOPIC IMAGE QUALITY ASSESSMENT
Oussama MessaiaFella HachoufaZianou Ahmed Seghirb
aLaboratoire Automatique et Robotique, Universit´
e des Fr`
eres Mentouri Constantine 1, Algeria.
bComputing Department, University of Abbes laghrour Khenchela, Algeria.
mr.oussama.messai@gmail.com, hachouf.fella@gmail.com, zianou ahmed seghir@yahoo.fr
ABSTRACT
Stereoscopic imaging is widely used in many fields. In many
scenarios, stereo images quality could be affected by various degra-
dations, such as asymmetric distortion. Accordingly, to guarantee
the best quality of experience, robust and accurate reference-less
metrics are required for quality assessment of stereoscopic content.
Most existing stereo no-reference Image Quality Assessment (IQA)
models are not consistent with asymmetrical distortions. This pa-
per presents a new no-reference stereoscopic image quality assess-
ment metric using a human visual system (HVS) modeling and an
advanced machine-learning algorithm. The proposed approach con-
sists of two stages. In the first stage, cyclopean image is constructed
considering the presence of binocular rivalry in order to cover the
asymmetrically distorted part. In the second stage, gradient magni-
tude, relative gradient magnitude, and gradient orientation are ex-
tracted. These are used as a predictive source of information for the
quality. In order to obtain the best overall performance against differ-
ent databases, Adaptive Boosting (AdaBoost) idea of machine learn-
ing combined with artificial neural network model has been adopted.
The benchmark LIVE 3D phase-I, phase-II, and IRCCyN/IVC 3D
databases have been used to evaluate the performance of the pro-
posed approach. Experimental results have demonstrated that the
proposed metric performance achieves high consistency with subjec-
tive assessment and outperforms the blind stereo IQA over various
types of distortion.
Index TermsStereoscopic quality assessment, no-reference,
binocular rivalry, cyclopean view, neural network, AdaBoost.
1. INTRODUCTION
According to the latest theatrical market statistics collected by Mo-
tion Picture Association of America (MPAA), a number of world-
wide 3D screens continued to grow in 2016 at a faster pace (17%)
than 2015 (15%) [1]. 3D film production methods have been im-
proved in recent years. This 3D development will undoubtedly intro-
duce more successful stereoscopic/3D films, such as Avatar in 2009.
(The Avatar 2 is expected by late 2020).
The stereoscopic images and videos have not been limited to
the entertainment industry. Stereo visualization concerns many ap-
plications, such as remote education [2], medical body exportation
[3], robot navigation [4] and so forth. Therefore, it is reasonable
to believe that the amount of stereo content will continue growing
throughout the next few years. Since more perceptual issues such
as visual discomfort/fatigue should be considered, the quality as-
sessment of stereo images and videos is becoming more complex.
Furthermore, our understanding of the perceptual components that
determine the quality of the stimulus remains limited. In most cases,
the fatigue experienced by a viewer is caused by asymmetrical dis-
tortions, also known as binocular rivalry/suppression. This paper
mainly takes this annoying experience into account as a previous
work did [5].
There are numerous aspects that affect the stereo/2D image qual-
ity during scene capture, such as light, color, structure and noise. In
addition, images can be distorted in some applications by compres-
sion, transmission, rendering and display. All these factors influence
our decision when we assess the quality of an image.
The Image Quality Assessment (IQA) is critically important in
various image processing applications as acquisition, compression
and specifically for image enhancement. In general, the quality eval-
uation can be divided into two classes: subjective and objective
methods. The subjective evaluation is based on an opinion score of
human observers, mostly expressed in terms of Mean Opinion Score
(MOS) or Difference Mean Opinion Score (DMOS). This type of
methods is effective and reliable to evaluate perceptual quality but
it has a number of limitations such as: time-consuming, high costs,
and they cannot be applied in online applications. On the other hand,
objective evaluation aims to design a computational model that pre-
dicts human perceived image quality, accurately and automatically.
Automatic evaluation (objective evaluation) metrics offer many ad-
vantages, such as rapid assessment, low cost. They are easy to incor-
porate into image processing systems/applications. For this reason, a
significant amount of research has been devoted to the development
of objective evaluation metrics that bring us closer to the ultimate
goal of simulating the perceptual process of the human visual sys-
tem (HVS).
The objective methods can be further categorized into three
categories; Full-Reference (FR), Reduced-Reference (RR) and No-
Reference (NR) metrics. In FR based methods the algorithm has
an access to a perfect version of the pristine stimuli. Which is
then compared with the deformed version. In many practical ap-
plications, the original stimuli are not fully available. Thus the RR
and NR methods are needed. The reduced-reference metrics utilize
only partial information, on pristine stimuli, while the NR methods,
called blind reference, do not have access to the reference, or it can
not be available.
A lot of efforts have been dedicated to understand the HVS to
apply it for image processing applications. The main goal of an ob-
jective IQA is to create an approach that mimics the HVS, then, a
perceived quality of the image is automatically predicted. To assess
this objective, it is important to compare the performance of the met-
ric with the subjective evaluation. Which is the perceptual human
judgment.
In this paper, a new no-reference model is designed. It will be
useful for practical applications, such as compression and quality
improvement. The remaining sections of the paper are organized
as follows: Section 2 presents the related works. Afterward, the
overall framework of the proposed model is described in section 3.
Experimental results are discussed in section 4. Finally, section 5
concludes the paper and future work is given.
2. RELATED WORKS
Generally, stereo IQA can be classified into two classes method. The
first method class does not include disparity information. It applies
the 2D IQA model directly to the stereo IQA problem by computing
the mean quality of left and right views. For instance, Moorthy et
al in [6] improved their blind 2D IQA method (DIIVINE) for stereo
images. The NR 2D-extended metrics usually extract feature vectors
separately for the left and right images. They are weight-averaged to
obtain the final feature vector for training. A novel FR QA metric for
stereopairs called binocular energy quality metric (BEQM) has been
proposed by Bensalma et al [7]. It estimates the stereo image quality
by computing the binocular energy difference between the original
and distorted stereopairs.
The second class is based on the belief that 3D quality cannot
be accurately deduced from the average of the two views, but from
the binocular perceptual view (cyclopean view) in which disparity
information is used. However, research has shown that the 3D view-
ing quality is correlated with disparity quality. The quality-aware
features of stereo images are clearly different from those of their 2D
counterparts [8, 9]. For example, Akhter et al. [10] have designed
a no-reference stereo IQA algorithm. They extracted features from
disparity map and stereo image pairs. Benoit et al. [11] proposed
a FR stereo QA model that employs two measures. The first is the
difference between left and right reference pictures and distorted im-
ages. The second is the difference between disparity map of the orig-
inal stereo image pairs and the distorted ones. A similar FR stereo
IQA has been suggested by You et al. [12]. They used a variety of
2D IQA models for stereo images and tried to combine the predicted
quality scores from both disparity map and stereo pair. The authors
of [13, 14] proposed a PSNR-based stereo QA metrics. Hewage et
al. [14] computed the edges from the disparity map, then PSNR
has been used between the pristine and test edge maps to predict the
quality.
Meanwhile, Gorley et al. [13] did not use or measure dispar-
ity/depth information. They compute quality scores on matched fea-
ture points delivered by SIFT [15] and RANSAC [16] applied to the
left and right views.
Recently, a considerable amount of research has been carried
out on how the visual system processes the signals perceived by the
two eyes, and using these researches for stereo IQA problem in the
meantime. Chen et al. [17] proposed a FR quality assessment model
that utilized the linear expression of cyclopean view [18] influenced
by binocular suppression/rivalry between left and right views. An
extended version of this framework has been used to create a NR
model using natural scene statistics features extracted from stereo
image pairs [19]. In [20], another FR metric has been adopted by
Hachicha et al. for stereoscopic images. They used Binocular Just
Noticeable Difference (BJND) [21] approach to model the binocular
rivalry theory. Fang et al. [22]. proposed an unsupervised blind
model for stereoscopic images. From the monocular and cyclopean
view patches, they extracted various quality-aware features in spatial
and frequency domains. Then, Bhattacharyya-like distance has been
used to produce a quality score. Another referenceless dictionary
learning (DL)-based metric has been proposed in [23]. The authors
simulated the main functional structure of binocular vision. A log-
Gabor filter is then used to extract features and k-nearest-neighbors
(KNN) has been deployed to map the quality score.
In particular, the natural scene statistics (NSS) have proven to be
reliable in IQA algorithms. For example, Su et al. [24] built a NR
stereo IQA framework. They synthesized a cyclopean view and then
they extracted bivariate and generalized univariate NSS as features.
More NSS-based framework has been conducted by Appina et al
[25]. Lv et al [26] also developed a NR stereo IQA metric. Their
scheme computes binocular self-similarity and binocular integration
using NSS features.
In the literature, compared to the number of full-reference met-
rics, a small number of no-reference stereo IQA metrics have been
proposed. In addition, performance of most metrics is not consistent
with asymmetric distortions. Consequently, the blind stereo IQA is
still in its initial development phase. To solve these problems, this
paper proposes an automatic no-reference metric for stereoscopic
images based on human visual perception, taking into account the
presence of binocular rivalry/suppression. The built model also uses
advanced machine learning algorithm to achieve better performance.
3. PROPOSED APPROACH
Human binocular perception is a complex visual process which has
not yet been fully understood. In order to design a model that can
assess the quality of stereoscopic images, research on human binoc-
ular perception is required. The hypothesis of cyclopean image is
therefore used with consideration of binocular suppression, and dis-
parity map. Metrics that use this hypothesis, such as the FR stereo
IQA model in [17], have achieved good precision.
Back-Propagation (BP) neural network is widely used for re-
gression and classification [27]. Liu et al. [28] utilized the idea
of AdaBoost algorithms combined with BP neural network. They
proposed a NR 2D IQA. It has showed robustness and good perfor-
mances.
Motivated by these ideas and based on our previous works [5,
29], we develop a new NR quality predictor model for stereoscopic
images. In summary, the model involves three steps: first, a cyclo-
pean image is constructed using Gabor filter responses and disparity
map. In a second step, gradient characteristics of the cyclopean im-
age and the disparity map are extracted. Finally, to predict a quality
score based on feature learning, the AdaBoost algorithm combined
to artificial neural network has been used.
3.1. Cyclopean view
The cyclopean image synthesis differs from the usual 2D images
for the depth information it contains. The first objective of the pro-
posed stereo IQA algorithm is to estimate the actual cyclopean view
formed within the observer’s mind while a stereo image is supplied.
The HVS processes and combines visual signals from both eyes into
a single combined perception [30]. However, the HVS has not been
completely understood. Therefore, the cyclopean image that is actu-
ally processed in our minds is still unclear.
3.1.1. Disparity map
Nowadays, depth/disparity estimation from stereoscopic images is
important in many applications, such as augmented reality, 3D re-
construction and navigation. The disparity information is proven
to be a strong effective factor for stereo images and videos quality.
Therefore, it is a necessary information for assessing the quality of
(a) Left view (b) Right view
(c) Truth disparity (d) Estimated disparity
Fig. 1: Top: left and right views of the stereo image. Bottom: Esti-
mated disparity versus the ground truth disparity.
the stereo content. Intensive research has been conducted on the de-
sign of stereo matching algorithms (disparity estimation). However,
there is no agreement, on the type of stereo matching algorithm to be
used in stereo IQA, except for those with low complexity. Therefore,
a stereo matching model with balanced complexity and performance
is deployed.
The chosen algorithm is called SSIM-based stereo. It is an im-
proved version of Sum of Absolute Differences SAD stereo match-
ing algorithm [31]. The modification consists in replacing SAD by
SSIM in computing disparities. SSIM [32] scores are used to se-
lect the best matches. This is done by maximizing the SSIM scores
between the current block from left image and right image blocks
along the horizontal direction. The maximum number of pixels to be
searched for is the maximum disparity. After all, the disparity map
values are the difference between the current pixel and the best SSIM
location. A 7 by 7 block size has been used, while the maximum dis-
parity distance has been set to 25. Figure 1 shows an estimated dis-
parity versus the ground-truth disparity using the SSIM-based stereo
matching algorithm.
3.1.2. Gabor filter responses
Various theories have been proposed to explain binocular rivalry.
This visual phenomenon has recently been investigated by many re-
searchers. Binocular rivalry or suppression is a failure of the brain
in fusing the left and right views causing fatigue or discomfort to the
viewers.
The binocular rivalry as mentioned before is when the two im-
ages of a stereo pair present different kinds or degrees of distortion.
Therefore, the objective quality of the mostly viewed stereo image
cannot be predicted from the average quality of the left and right
views. Reasonable explanation for binocular rivalry has been estab-
lished by Levelt [18]. Levelt et al. conducted a series of experiments
which clearly demonstrate that binocular suppression or rivalry is
strongly governed by low-level sensory factors. Levelt concluded
that visual stimuli which have more contours or high contrast, tend
to dominate the rivalry. Motivated by this result, the energy of Gabor
filter bank responses on the left and right images is used to simu-
late suppression selection (binocular rivalry) of the cyclopean image
when it is computed.
The Gabor filter bank is a band-pass filter. It extracts luminance
(a) Left view (b) Right view
Fig. 2: Gabor filter responses from the left and right views.
and chromatic channels features. The filter is related to the func-
tion of primary visual cortex cells in primates [33]. It models the
frequency-oriented decomposition in primary visual cortex, and cap-
tures energy in both space and frequency in a high localized way
[34].
The used Gabor filter is as follows:
GF (x, y) = 1
2πσxσy
e
1
2[(x0x)2+(y0y)2]ei(x+y)
with
x0= (xmx).cos(θ) + (ymy).sin(θ)
y0=(xmx).sin(θ) + (ymy).cos(θ)
(1)
where mxand mydefine center of the Gabor receptive field (mx
and myare the xand ylocations of the center with respect to the
original coordinate system). σxand σyare the standard deviations
of an elliptical Gaussian envelope along x0and y0directions, ζxand
ζyare spatial frequencies, and θorients the filter. The design of the
Gabor filter bank is based on the work conducted by Chun et al. [35].
In visual perception study, a spatial frequency is expressed as
the number of cycles per degree of visual angle. In theory, the spa-
tial frequency is that the visual cortex operates not only on the lines
and straight edges code but also on a spatial frequency code. To sup-
port this theory, a series of experiments have been conducted by P.
Issa et al. [36]. They studied the effect of spatial frequency on pri-
mary visual cortex reaction using cats. The authors concluded that
the visual cortex neurons react even more robustly to sine-wave grat-
ings in their receptive fields at specific angles than they do to edges
or bars. Therefore, using a band-pass filter over multiple orientations
is favorable to extract features which the visual cortex responds to.
The choice of the spatial center frequency is inspired by the result of
Schor et al. [37] who found that the stereoscopic acuity of human
vision normally falls off quickly when seeing stimuli dominated by
spatial frequencies lower than 2.4 cycles/degree. Based on their find-
ings, this means that using filters having spatial center frequencies
in the range from 2.4 to 4 cycles/degree should produce responses
to which a human observer would be more sensitive. Therefore,
the local energy is estimated by summing Gabor filter magnitude
responses over eight orientations at a spatial frequency of 3.67 cy-
cles/degree (ζx=ζy= 3.67) . The standard deviations σxand σy
are set to 0.01 (σx=σy= 0.01). As an example, figure 2 shows
the filter outputs on the left and right views.
3.1.3. Cyclopean image construction
The visual signals from the two eyes are added by HVS, a process
called binocular summation which enhances vision and increases the
ability to detect weak objects [30]. However, current knowledge
Fig. 3: Synthesized cyclopean image by the proposed framework.
of HVS is very modest to guide the development of a mathemati-
cal model that perfectly simulates the process in the human brain.
Therefore, a popular choice is to replace the complex simulation by
simplified mathematical models. A linear model of a cyclopean im-
age has been firstly proposed in 1968 by Levelt [18] to explain the
phenomenon of binocular rivalry.
The model is as follows:
C=wlIl+wrIr(2)
where Iland Irare respectively the left and right images, both wl
and wrare the weighting coefficients for the left and right eyes in
which wl+wr= 1.
In order to construct the cyclopean image, Levelt linear model
has been used. The energy of Gabor filter bank responses is used to
compute the weights, while the SSIM-based stereo matching algo-
rithm is employed to create the disparity map.
The used model is:
C(x, y) = wl(x, y)×Il(x, y )+ wr(x+m, y)×Ir(x+m, y)(3)
where the weights wland wrare given by:
wl(x, y) = GIl(x, y)
GIl(x, y) + GIr(x+m, y)(4)
wr(x+m, y) = GIr(x+m, y)
GIl(x, y) + GIr(x+m, y)(5)
where GIland GIrare the summation of Gabor filter magni-
tude responses from left and right views respectively, and mis the
disparity index that corresponds to pixels from left image Ilto those
in right image Ir. The filter of the form (1) is used to compute the
magnitude responses over eight orientations for better accuracy. In
the equation (1), θrefers to the filter’s orientation degree. Table 1
shows the used orientation degrees.
Table 1: Magnitude responses orientation degrees.
Orientation number 1 2 3 4 5 6 7 8
Orientation degree 0 22.5 45 67.5 90 112.5 135 157.5
Results from previous work [5] have shown that the synthesized
cyclopean view is reliable for perceptual quality evaluation. Figure
3 shows an example of cyclopean image. In this example, a stereo
image without distortion has been used. (Figure 1, left view (a) and
right view (b)). Figure 4 summarizes the construction steps of the
cyclopean image.
Fig. 4: The flowchart of the formed cyclopean image.
3.2. Feature extraction
In addition to screen height and number of displayed pixels, the
viewing conditions, namely: visual angle and viewing distance also
influence the stereo image quality for the observer. However, the vi-
sual angle and viewing range are not taken into consideration in this
study. We precisely simulate the HVS and then focus on local pixel
distortions that can occur via 3D image processing applications.
The primary visual cortex receives visual information coming
from the eyes. After reaching the visual cortex, the human mind
processes that sensory inputs and uses it to realize the scene. Image
gradients provide important visual information. They are essential
for understanding the scene. Therefore, we believe that such infor-
mation is important for the human visual system to understand the
scene and judge its quality. This theory is supported by numerous
FR IQA schemes based on the concept of gradient similarity. In re-
lation to our problem, we use gradient magnitude and orientation as
quality-aware features to evaluate the quality of stereoscopic images.
3.2.1. Gradient magnitude and orientation
Three gradient maps are produced from the cyclopean image, and
disparity using horizontal and vertical direction derivatives, dxand
dyrespectively. Gaussian distribution function is used as a kernel
in a 5by 5mask to compute the directional gradient components
[dx(i, j), and dy(i, j )]. The mask weights are samples from 2D
Gaussian function which is defined as follows:
G(x, y, σ) = 1
2πσ2e
x2+y2
2σ2(6)
where σcontrols the amount of smoothing. If σincreases, more
samples must be obtained to represent the Gaussian function accu-
rately. The derivatives have been computed using central difference.
In our implementation, we used a limited smoothing mask as it tends
to extract more edge information which makes the gradients more
sensitive to distortions. Thus, σis fixed to 0.5 (σ= 0.5).
A Gradient magnitude (GM), relative gradient orientation (RO),
and relative gradient magnitude (RM) are computed for each cyclo-
pean image and disparity map. The gradient magnitude is defined
as:
|∇I(i, j)|GM =pdx(i, j )2+dy(i, j)2(7)
while the gradient orientation is given by:
I(i, j) = arctan dy(i, j )
dx(i, j)(8)
Fig. 5: Examples of the constructed cyclopean image, GM, RM, and RO maps for different type of distortions.
the relative gradient orientation is defined as follows:
I(i, j)RO =I(i, j )I(i, j)AV (9)
where the local average orientation is:
I(i, j)AV = arctan dy(i, j )AV
dx(i, j)AV
(10)
while the average directional derivative over xand yis defined by:
Iγ(i, j)AV =1
MN X
m,n
Iγ(im, j n)(11)
where Mand Ndescribe the size of the patches, 3×3square neigh-
borhood has been chosen (M=N=3), γrefers either to the horizontal
xor the vertical ydirection. Finally the relative gradient magnitude
is defined by:
|∇I(i, j)|RM
=p(dx(i, j)dx(i, j )AV )2+ (dy(i, j)dy(i, j)AV )2(12)
Standard deviation of each gradient histogram GM, RO, RM is
computed as a final features extraction SGM ,SRO and SRM , re-
spectively. The standard deviation is known as the square root of
variance and defined by:
S(h) = v
u
u
t1
N1
N
X
x=1
(h(x)~)2(13)
where ~is the sample mean of the histogram h(x)(normalized to
unit sum), and Nis the number of observations in the sample.
Display resolution is important factor for judging the qual-
ity. The subjective evaluation of a given stereo image varies when
this factor changes. Therefore for objective evaluation, multiscale
method is a convenient way to incorporate image details at different
resolutions. Wang et al [38] proposed a multiscale quality assess-
ment metric that outperforms the single-scale SSlM [32] model.
The authors compared different down-sampling parameters results,
and noted that down-sampling with a factor of 0.5 gives the best
performance. Consequently, the cyclopean image is down-sampled
with a factor of 0.5 (divided by 2), considering the changes in stereo
image resolution and visual conditions. For example, distance from
Fig. 6: Flowchart of the proposed measure.
Fig. 7: 3D-plot of the extracted features SGM 1, SRO1, and SRM1
from the cyclopean image using LIVE 3D phase I, phase II and IVC
3D databases.
the viewer to the screen can change the size of the formed cyclopean
view in his brain. The features SGM ,SRO and SRM are com-
puted for each scale, yielding 6 features element from the cyclopean
image. The final feature vector (F) has nine elements as follows:
F= [SGM1, SRO 1, SRM1, SGM2, SRO 2, SRM2,
SGMd , SROd, SRM d](14)
Figure 5 illustrates the computed maps GR, RM, and RO from the
cyclopean image over five well-known distortions. It can be ob-
served that the distortions have affected differently the computed
gradient maps GR, RM, and RO.
Figure 6 displays the overview process to measure the quality
of stereo images, while Figure 7 exhibits a 3D-plot of the extracted
features over three databases. The represented features have been
extracted from the cyclopean image in scale 1. The colored dots in
the figure represent the extracted indicators in 3-dimensions. This
3D view shows that the features dots follow the same pattern on all
databases which contain stereo images of different quality. Conse-
quently, the extracted gradient indicators can be deployed for assess-
ing the quality of stereoscopic images.
3.3. Learning for image quality evaluation
Machine Learning (ML) plays an important role in the development
of modern picture quality models. Although a limited number of
IQA models have used advanced ML techniques such as AdaBoost
[39]. The AdaBoost, short for Adaptive Boosting, is an algorithm
that consists in sequentially training a new simple model based on
the errors of the previous model. A weight is assigned to each model.
In the end, the whole set is combined to become an overall predic-
tor. AdaBoost is one of the most useful ensemble method [40]. It
can be used in conjunction with many other types of learning al-
gorithm usually called Weak Learners (WL). The structure of the
boosting ensemble generally outperforms a single feature learning
model [41]. The boosting procedure tends to discover the exam-
ples and data points that are hard to predict and focuses on the next
model predicting these examples better, by sequentially building a
new simple model based on the errors of the previous model.
3.3.1. AdaBoost neural networks
The use of Back-Propagation neural network is powerful for good
prediction. Furthermore, to improve the performance of this neu-
ral network regression model, the AdaBoost idea has been imple-
mented. An Artificial Neural Network (ANN) with two hidden lay-
ers has been used as WL. However, the AdaBoost neural network
can be less susceptible to the over-fitting problem than other learn-
ing algorithms. To solve this problem, 15% from training dataset has
been dedicated validation for each neural network model.
The overall flow of the AdaBoost BP neural network model that
computes the predicted output Qon a test set Fis characterized
as follows: First, set the quantity Lof the Weak Learners (the BP
artificial neural network models). Second, train the ith ANN on the
sets Xjand Yj, and estimate the predicted output of the testing set
Ypred
i,j . Afterward, a distribution Difor the ith ANN is used for
computing the evaluation error which is defined as (initial values of
Algorithm 1: Adaptive Boosting (AdaBoost) regression.
Input: L, F
// L:the number of Weak learners (WL),
F:stereo image features vector.
Output: Q
// Q:the predicted quality.
Data: dataset for training and testing.
1n 1;i 1;// Initialization.
2T r random(Data, 80%)
T e random(Data, 20%) // Divide data
randomly for training and testing.
3Msize(T e)// Get the testing vectors size.
4D1(1 : M) 1// Initialize the first
distributions.
5for n= 1 : Ldo
6T tr random(T r, 85%)
V tr random(T r, 15%) // Holdout 15% from
train set for validation.
7W Lnrandom(weig hts, biases)
8T rain(W Ln, T tr )// Train and validate the
WL.
9T err(1 : M) 0
Err 0// Reset the testing and
evaluation error for each WL.
10 T err T est(W Ln, T e)// Compute the
testing error.
11 for i= 1 : Mdo
12 if (T err(i)>0.2) then
13 Err Err +Dn(i)// update the
distribution Dn+1(i)for next WL
and compute the evaluation error of
the nth WL.
14 Dn+1(i)Dn(i)×(1 + δ)
15 else
16 Dn+1(i)Dn(i)
17 wn1
eErr
18 PnW Ln(F)) // Get the prediction of the
nth WL.
19 Q PL
n=1 wnPn// Compute the final
quality score.
D1are set to 1):
Di+1,j =Di,j ×(1 + δ.I(YjYpred
i,j )) with (i= 1, .., L
j= 1, .., M
(15)
The ith ANN evaluation error Erriwith the corresponding distri-
bution Diis defined as:
Erri=
M
X
j=1
|Di,j ×I(YjYpred
i,j )|(16)
where the function Iis a binary function in which
I(x) = (1if x > 0.2,
0otherwise. (17)
Fig. 8: Structure of the used BP neural network.
Third, assign a weight wifor the ith ANN using its error Erri.
Finally, the ith ANN predicts the quality Pifor the input F. For
each ANN model, the adjusted weights and biases are randomly ini-
tialized. Hence, it produces varied Lnumber of WL models with
different prediction scores. Error threshold for the binary function
Iis set to 0.2. Mis set to the vector size dedicated for testing. j
indexes the jth element in a vector whose range is the integers be-
tween 1 and M. For instance, D1,j stands for the jth element in the
vector D1.δis a constant multiplication factor, both of threshold
and δvalues are fixed to 0.2.
A convex function is used to convert the error of each ANN into
its weight, in order to give the ANN models with a low error a high
weight, and models with a high error a small weight. ωiis the ith
ANN weight, given as:
ωi=1
eErri(18)
The overall predicted measure is given by the weighted sum of
the collection as:
Q=
L
X
i=1
ωi×Pi(19)
For the training dataset output, human scores are normalized in
the form of DMOS to min-max normalization [0,1]. Hence, the
range of the predicted measure values is from 0 to 1. The closer
to 0 the better quality of the stereo image is. Algorithm 1 describes
the developed AdaBoost regression algorithm.
3.3.2. Network architecture
The AdaBoost neural network has been used to predict the stereo
image quality. It has been applied to the obtained features from the
disparity and cyclopean image. In the BP neural network, nine inputs
cells have been deployed as the size of the final features vector (F)
described in equation (14). Elements of the F vector are also men-
tioned in figure 6 as input elements for the ANN. Two hidden layers
have been employed with nine neurons each. The applied transfer
functions are tangent sigmoid and ReLU for the first and second hid-
den layers, respectively as shown in figure 8. A pure linear transfer
function f(x) = xhas been used for a single node output layer. In
hidden layers, a number of tests have been carried out using various
activation functions. The tangent sigmoid and ReLU functions have
been selected, for their best performance.
4. EXPERIMENTAL RESULTS AND ANALYSIS
The proposed approach has been tested on different databases. The
obtained results have been compared to several FR and NR stereo
IQA metrics, including six FR and eight NR stereo schemes. The
standard performance assessment used in the Video Quality Experts
Group (VQEG) has been considered. Objective scores are fitted to
the subjective ones using logistic function [42]. This function is
based on five parameters (θ1,θ2,θ3,θ4and θ5). The used logis-
tic mapping function for the nonlinear regression is introduced by
equation (20).
Qmap =θ11
21
exp (θ2(Qθ3)) +θ4Q+θ5(20)
Where Qand Qmap are the objective quality scores before and after
the nonlinear mapping, and θi(i=1to 5) are selected for the most
excellent fit.
Three widely-used performance indicators have been chosen
to benchmark the proposed metric against the relevant state-of-
the-art techniques: Pearson linear correlation coefficient (LCC),
Spearman’s rank-order correlation coefficient (SROCC) and root
mean squared error (RMSE), between the objective and human score
(DMOS). LCC and RMSE assess the prediction accuracy while
SROCC evaluates the prediction notability degree. Higher values for
LCC and SROCC (close to 1) and lower values for RMSE (close to
0) indicate superior linear rank-order correlation and better precision
with respect to human quality judgments, respectively. For a perfect
match between the objective and subjective scores, LCC =SROCC
= 1 and RMSE = 0. Cross-validation training and testing provides a
more accurate estimate of a model performance. However, several
cross-validation techniques have been proposed, such as: LOOCV-
Leave one out cross-validation and K-Fold cross-validation. The K-
fold technique uses all data points to contribute to an understanding
of how well the model performs the task of learning from some data
and predicting some new data.
In order to ensure that the proposed approach is robust across
content and it is governed by quality-aware indicators, the 5-fold
cross validation over the three databases has been used. For every
database, the dataset has been divided into 5 folds, where each fold
contains a 80%-train set and 20%-test set randomly selected. The
overlap between the test and the train set has been avoided to en-
sure that the reported results do not depend on features derived from
known spatial information, which can artificially improve the per-
formance. To demonstrate the generalization of the proposed met-
ric against databases, a cross-database tests have been conducted.
For further statistical performance analysis, a T-test scores have been
computed over the correlation coefficients LCC and SROCC. In the
different tests, the correlation of the feature vector with subjective
human judgment has been studied. Complexity and time consuming
of the proposed approach have been computed as well. Finally, in-
fluence of the formed cyclopean image and disparity map have been
studied.
4.1. Datasets
The different experimentations have been carried out on three
databases; including, IRCCyN/IVC 3D, LIVE 3D phase I and
LIVE 3D phase II. The IVC 3D Image Quality Database has been
established in 2008 [11]. It is the first public-domain database
on stereoscopic image quality. Test conditions include JPEG and
JPEG2000 compression as well as blur. This dataset contains 96
stereoscopic images and their associated subjective scores. The res-
olution of these images is 512 ×512 pixels. 6 different stereoscopic
images are used in this database which is composed of 6 reference
images and 16 distorted versions of each source generated from 3
different distortion types (JPEG, JP2K, BLUR) symmetrically to the
stereopair images.
The popular databases LIVE 3D-I and LIVE 3D-II have been
established in 2012 and 2013 respectively [43, 17]. The phase I con-
sists of 365 distorted stereo images with a resolution of 640 ×360
pixels. There are eighty stereo images for each JPEG, JPEG2000
(JP2K), White Noise (WN), and Fast Fading (FF). The remaining 45
stereo images represent Blur distortion. All distortions are symmet-
ric in nature. Phase II consists of 360 distorted stereoscopic images.
This database includes asymmetric and symmetric distorted stere-
opairs over five types of distortion as the phase I. The two phases
constitute the largest and most comprehensive stereoscopic image
quality database currently available. The three publicly available
datasets have been used to test the performance of the proposed
model on several different types of distortion. Table 2 describes the
three databases used for the performance evaluation.
Table 2: Summary of the three databases.
Database stereopairs (sym., asy m.)distortions
LIVE 3D-I 365(365,0) JP 2K, J P EG, W N, B lur, FF
LIVE 3D-II 360(120,240) JP 2K, J P EG, W N, B lur, FF
IVC 3D 90(90,0) JP 2K, J P EG, Bl ur,dow n/upscaling
4.2. Feature vector correlation with human score
In this section, the feature vector Fcorrelation with DMOS is evalu-
ated. It is worth recalling that the regression model input is a vector
of nine elements, and because of the restriction of human spatial
awareness, it is hard to demonstrate the discriminative capacity of
the characteristics in a graphical manner, such as a four-dimensional
scatter plot. Three plots are used to describe the correlation of the
adopted three indicators SGM , SRO, SR M with the human opinion
score.
The relationship between stereo image quality and the three fea-
tures is visually illustrated in the form of three-dimensional plots.
The extracted features are used as axes and each stereo image cor-
responds to a coordinate system scatter point. All the stereoscopic
images from the LIVE 3D-I and LIVE 3D-II database are used for
this demonstration. As shown in Figure 9, the plots refer to features
from cyclopean image scale 1, scale 2, and disparity map respec-
tively from top to bottom. To differentiate the five types of distor-
tion, we use distinct labels and map the DMOS rating of each stereo
image to the preset color-map.
The ideal scenario is that the points are well separated with dis-
tinct kinds of distortion. It can be seen from Figure 9 that the scat-
ter points of the five distortions are generally distinguished. The
used stereo images are distorted increasingly from low to high fac-
tor. This can also be observed in the plots, where the adopted fea-
tures vary smoothly in space with quality correspondence. Although
there is some correlation between the extracted features, in particular
SGM and SRM , where the coefficient correlation in terms of LCC
is equal to 0.751. The deployed features provide good performance,
this topic is discussed furthermore in section 4.6.
Fig. 9: Illustration of the discriminatory power of the extracted fea-
tures. Respectively from top to bottom: elements in the axis are from
cyclopean image scale 1, scale 2, and disparity. (zoom in to get the
markers more discriminative).
4.3. Comparison with other stereo IQA methods
The overall performance of the proposed scheme has shown good ef-
ficiency and consistency. The obtained results have been compared
with several full-reference and no-reference stereo IQA metrics, in-
cluding six FR, and eight NR metrics.
For the comparison purpose, two models have been created. The
first model called 3D-nnet, is a normal neural network regression
model. It is equivalent to L= 1 in the AdaBoost algorithm (Algo-
rithm 1). The second model named 3D-AdaBoost, is a neural net-
work combined with the AdaBoost technique as previously demon-
strated, where 20 neural network models have been employed (L=
20). We find that the performance of the proposed measure is im-
proved by using additional neural network models (Weak Learners)
with saturation at a certain number and decreasing in the other case.
Note that both models have the same network architecture. Also
during the training, 15% is taken out from training set for valida-
tion. A Box plot in term of SROCC of the two models results is dis-
played in figure 10. Comparing the proposed models indicates that
the performance can be improved by the adopted Adaptive Boost-
ing technique. Tables 3,4 and 5 show the results against DMOS
Fig. 10: Comparison Box plots of SROCC of the proposed models.
The SROCC results are split into four groups (quartiles). Each group
has 25% of the results. The red line in the rectangle refers to the me-
dian value. Upper and lower ends of the rectangle limit the first and
third quartiles, respectively. The length of the dashed line means the
range of the mild outliers, and the symbol ”+” refers to the extreme
outlier.
of all stereo IQA algorithms on LIVE 3D-I. Tables 6,7 and 8 show
the results against DMOS of all stereo IQA algorithms on LIVE 3D-
II. Moreover, the performance on symmetrically and asymmetrically
distorted stimuli are shown separately in table 9, while table 10 pro-
vides detailed results over the five distortions. The 3D-nnet model
has exhibited a good performance. Among all the comparison met-
rics, the model has obtained the best SROCC score on the two LIVE
3D databases, (SROCC=0.916 on LIVE 3D-I and SROCC=0891 on
LIVE 3D-II.) Finally, table 11 shows the results on IVC 3D database.
In these tables, the top NR methods results are highlighted in bold.
The proposed model 3D-AdaBoost has given the best perfor-
mance among all compared no-reference algorithms, while the full-
reference method of Chen [17] yields better performance compared
to other FR methods. Figure 11 exhibits the prediction responses
against human score DMOS on the three databases. Even though the
proposed model is not designed for a specific distortion type. The
comparison results on each individual distortion type indicate su-
periority of the proposed method over the three databases. More
specifically, notice that the most of existing stereo IQA methods
remain limited in capability and efficiency on asymmetric degra-
dations. These metrics are more appropriate for symmetric distor-
tion, but insufficient for asymmetric one. On the other hand, re-
sults in table 10 show that the proposed framework delivers efficient
performance over asymmetric/symmetric distortion. The method
achieved LCC scores of 0.930 and 0.903 respectively on asymmet-
ric/symmetric degradation. A scatter plots of figures 12 and 13 show
the predicted quality on these two types of distortion separately.
It should be noticed that the used neural network model presents
better performance than the most commonly used (SVR) Support
Vector Regression. The same evaluation process is followed for
SVR with 5-fold cross validation. A radial basis function (RBF)
kernel has been selected. The other parameters such as the number
of support vectors and iterations are adjusted automatically during
training for the best fit. Table 12 shows the superiority of the imple-
mented neural network architecture over SVR. The mean scores of
each learning method over three databases (LIVE 3D-I, LIVE 3D-II,
and IVC 3D) have been calculated (see table 12).
Table 3: SROCC against DMOS on the LIVE 3D phase I dataset.
Method T ype WN JP 2K J P EG B lur F F All
Benoit [11] 0.930 0.910 0.603 0.931 0.699 0.899
You [12] 0.940 0.860 0.439 0.882 0.588 0.878
Gorley [13] F R 0.741 0.015 0.569 0.750 0.366 0.142
Chen [17] 0.948 0.888 0.530 0.925 0.707 0.916
Hewage [14] 0.940 0.856 0.500 0.690 0.545 0.814
Bensalma [7] 0.905 0.817 0.328 0.915 0.915 0.874
DIIVINE [6] 0.882
Akhter [10] 0.914 0.866 0.675 0.555 0.640 0.383
Chen [19] 0.919 0.863 0.617 0.878 0.652 0.891
Lv [26] NR 0.897
Appina [25] 0.910 0.917 0.782 0.865 0.666 0.911
Zhou [23] 0.921 0.856 0.562 0.897 0.771 0.901
Fang [22] 0.883 0.880 0.523 0.523 0.650 0.877
3D-nnet 0.938 0.874 0.569 0.866 0.685 0.916
Proposed 3D-AdaBoost 0.941 0.899 0.625 0.887 0.777 0.930
Table 4: LCC against DMOS on the LIVE 3D phase I dataset.
Method T ype WN JP 2K J P EG B lur F F All
Benoit [11] 0.925 0.939 0.640 0.948 0.747 0.902
You [12] 0.941 0.877 0.487 0.919 0.730 0.881
Gorley [13] F R 0.796 0.485 0.312 0.852 0.364 0.451
Chen [17] 0.942 0.912 0.603 0.942 0.776 0.917
Hewage [14] 0.895 0.904 0.530 0.798 0.669 0.830
Bensalma [7] 0.914 0.838 0.838 0.838 0.733 0.887
DIIVINE [6] 0.893
Akhter [10] 0.904 0.905 0.729 0.617 0.503 0.626
Chen [19] 0.917 0.907 0.695 0.917 0.735 0.895
Lv [26] NR 0.901
Appina [25] 0.919 0.938 0.806 0.881 0.758 0.917
Zhou [23] 0.929
Fang [22] 0.900 0.911 0.547 0.903 0.718 0.880
3D-nnet 0.941 0.919 0.625 0.908 0.777 0.923
Proposed 3D-AdaBoost 0.941 0.926 0.668 0.935 0.845 0.939
4.4. Performance using T-test
T-test is one of several types of statistical tests [44]. It questions
whether the difference between the groups represents a true differ-
ence in the study or if it is likely a meaningless statistical difference,
where 1 indicates that the groups are statistically different and 0 in-
dicates that the groups are statistically similar. In order to investi-
gate the statistical performance of the proposed metric, it is com-
pared with the state-of-the-art methods. We conducted a left-tail T-
test with confidence at 90% applied over 100 trials for PLCC and
SROCC. The results provided in table 13 show the superiority of the
proposed method over the existing ones.
Table 5: RMSE against DMOS on the LIVE 3D phase I dataset.
Method T ype WN JP 2K J P EG B lur FF All
Benoit [11] 6.307 4.426 5.022 4.571 8.257 7.061
You [12] 5.621 6.206 5.709 5.679 8.492 7.746
Gorley [13] F R 10.197 11.323 6.211 7.562 11.569 14.635
Chen [17] 5.581 5.320 5.216 4.822 7.837 6.533
Hewage [14] 7.405 5.530 5.543 8.748 9.226 9.139
Bensalma [7] 7.558
DIIVINE [6] 7.301
Akhter [10] 7.092 5.483 4.273 11.387 9.332 14.827
Chen [19] 6.433 5.402 4.523 5.898 8.322 7.247
Lv [26] NR
Appina [25] 6.664 4.943 4.391 6.938 9.317 6.598
Zhou [23] 6.010
Fang [22] 7.191
3D-nnet 5.622 5.083 5.104 6.059 7.819 6.277
Proposed 3D-AdaBoost 5.593 4.867 4.862 5.104 6.633 5.605
Table 6: SROCC against DMOS on the LIVE 3D phase II dataset.
Method T ype WN JP 2K J P EG B lur F F All
Benoit [11] 0.923 0.751 0.867 0.455 0.773 0.728
You [12] 0.909 0.894 0.795 0.813 0.891 0.786
Gorley [13] F R 0.875 0.110 0.027 0.770 0.601 0.146
Chen [17] 0.940 0.814 0.843 0.908 0.884 0.889
Hewage [14] 0.880 0.598 0.736 0.028 0.684 0.501
Bensalma [7] 0.938 0.803 0.846 0.846 0.846 0.751
DIIVINE [6] 0.346
Akhter [10] 0.714 0.724 0.649 0.682 0.559 0.543
Chen [19] 0.950 0.867 0.867 0.900 0.933 0.880
Lv [26] NR 0.862
Appina [25] 0.932 0.864 0.839 0.846 0.860 0.888
Zhou [23] 0.936 0.647 0.737 0.911 0.798 0.819
Fang [22] 0.955 0.714 0.709 0.807 0.872 0.838
3D-nnet 0.939 0.812 0.745 0.900 0.934 0.891
Proposed 3D-AdaBoost 0.943 0.842 0.837 0.913 0.925 0.913
Table 7: LCC against DMOS on the LIVE 3D phase II dataset.
Method T ype WN JP 2K J P EG B lur F F All
Benoit [11] 0.926 0.784 0.853 0.535 0.807 0.784
You [12] 0.912 0.905 0.830 0.784 0.915 0.800
Gorley [13] F R 0.874 0.372 0.322 0.934 0.706 0.515
Chen [17] 0.957 0.834 0.862 0.963 0.901 0.907
Hewage [14] 0.891 0.664 0.734 0.450 0.746 0.558
Bensalma [7] 0.943 0.666 0.857 0.907 0.909 0.769
DIIVINE [6] 0.442
Akhter [10] 0.772 0.776 0.786 0.795 0.674 0.568
Chen [19] 0.947 0.899 0.901 0.941 0.932 0.895
Lv [26] NR 0.870
Appina [25] 0.920 0.867 0.829 0.878 0.836 0.845
Zhou [23] 0.856
Fang [22] 0.961 0.740 0.764 0.968 0.867 0.860
3D-nnet 0.948 0.821 0.758 0.960 0.921 0.900
Proposed 3D-AdaBoost 0.953 0.835 0.859 0.978 0.925 0.922
Table 8: RMSE against DMOS on the LIVE 3D phase II dataset.
Method T ype WN J P 2K JP E G Blur F F All
Benoit [11] 4.028 6.096 3.787 11.763 6.894 7.490
You [12] 4.396 4.186 4.086 8.649 4.649 6.772
Gorley [13] F R 5.202 9.113 6.940 4.988 8.155 9.675
Chen [17] 3.368 5.562 3.865 3.747 4.966 4.987
Hewage [14] 10.713 7.343 4.976 12.436 7.667 9.364
Bensalma [7] 7.203
DIIVINE [6] 10.012
Akhter [10] 7.416 6.189 4.535 8.450 8.505 9.294
Chen [19] 3.513 4.298 3.342 4.725 4.180 5.102
Lv [26] NR
Appina [25] 4.325 5.087 4.756 6.662 6.519 7.279
Zhou [23] 6.041
Fang [22] 5.767
3D-nnet 3.394 5.598 4.780 3.889 4.481 4.905
Proposed 3D-AdaBoost 3.226 5.396 3.752 2.859 4.352 4.352
Fig. 11: Scatter plots of subjective scores versus scores from the proposed scheme on the three stereopair IQA databases.
Table 9: SROCC result on Symmetric and Asymmetric distortion
from LIVE 3D phase II dataset.
Method T ype Sy mmetric Asymmetric
Benoit [11] 0.860 0.671
You [12] 0.914 0.701
Gorley [13] F R 0.383 0.056
Chen [17] 0.923 0.842
Hewage [14] 0.656 0.496
Bensalma [7] 0.841 0.721
DIIVINE [6]
Akhter [10] 0.420 0.517
Chen [19] 0.918 0.834
Lv [26] NR
Appina [25] 0.857 0.872
Zhou [23]
Fang [22]
3D-nnet 0.861 0.902
Proposed 3D-AdaBoost 0.898 0.917
Table 10: Detailed results of SROCC, LCC, and RMSE on symmet-
ric / asymmetric distortion from LIVE 3D-II.
Method Indicator WN J P 2K J P EG Bl ur FF All
Proposed 3D-AdaBoost SROCC 0.923 0.829 0.933 0.848 0.889 0.898
Symmetric LCC 0.938 0.922 0.946 0.913 0.903 0.903
RMS E 3.701 3.709 3.819 3.425 4.876 4.609
Proposed 3D-AdaBoost SROCC 0.897 0.926 0.897 0.921 0.945 0.917
Asymmetric LCC 0.930 0.947 0.917 0.932 0.953 0.930
RMS E 4.191 4.006 4.747 3.450 3.387 4.216
Table 11: SROCC, LCC, and RMSE against DMOS on the IVC 3D
database.
Method T ype SROC C LCC R MS E
Benoit [11]
You [12]
Gorley [13] F R
Chen [17] 0.676 0.683 17.100
Hewage [14]
Bensalma [7]
DIIVINE [6] 0.422 0.486 18.259
Akhter [10]
Chen [19] 0.851 0.835 12.088
Lv [26] NR
Appina [25]
Zhou [23]
Fang [22]
3D-nnet 0.780 0.779 13.830
Proposed 3D-AdaBoost 0.831 0.845 11.776
4.5. Cross-database performance
The above tests are useful for assessing robustness and generaliza-
tion of the proposed metric, since all the results are obtained by
Table 12: Mean of SROCC, LCC, and RMSE results from the three
databases using different regressors.
Method SROC C LC C RM SE
SVR 0.8223 0.8406 9.0530
3D-nnet 0.8623 0.8673 8.3373
Proposed 3D-AdaBoost 0.8913 0.9020 7.2443
Fig. 12: Scatter plot of asymmetric distortions scores from LIVE 3D
phase II IQA database using 3D-AdaBoost method.
Fig. 13: Scatter plot of symmetric distortions scores from LIVE 3D
phase II IQA database using 3D-AdaBoost method.
Table 13: T-test results with confidence of 90% of the proposed
metric against the others using PLCC, SROCC from LIVE I and II
Method Akhter Chen Appina Zhou Fang 3D-nnet
LIVE I LCC 1 1 1 1 1 0
SROC C 1 1 0 1 1 1
LIVE II LC C 1 1 1 1 1 1
SROC C 1 1 1 1 1 1
training and testing using 5-fold cross validation. We extend cross-
database experiments to demonstrate the performance capability of
the proposed metric. The LIVE 3D phase I and phase II databases
have been selected for these experiments because of the similarity in
the number of stereo images. The model is trained on one database
and tested on another one.
The Weak Learners (WL) in the 3D-AdaBoost algorithm 1 are
trained, validated, and tested on the LIVE 3D phase I database to ob-
tain a model which will be tested on the LIVE 3D phase II database.
Images in the LIVE 3D phase I database have been used for train-
ing, validating and testing, and images from the LIVE 3D phase
II database are used as a final test set. The obtained results on
LIVE 3D-1 are shown in table 15 using LIVE 3D-II for training,
whereas table 16 corresponds to the inverse process. Tables 17, and
18 present detailed results over the five distortions. The SROCC has
been used as a performance index. The best results are highlighted
in bold.
It can be noticed that 3D-AdaBoost trained on the LIVE 3D
phase I database achieved lower performance compared to the model
trained on phase II. This is due to the lack of asymmetric distortion
in the LIVE 3D phase I database. However, it is interesting to ob-
serve that the 3D-AdaBoost method produces good results on the
LIVE 3D phase I database. Compared to other methods, although
their results are not performed using cross-dataset test, the proposed
metric ensures competitive performance on any type of distortion
commonly encountered. Scatter plots of figures 14 and 15 show the
3D-AdaBoost metric responses of cross-dataset test.
The overall experimental results have shown that the proposed
method has good consistency among five distortion types with hu-
man subjective evaluation. The cross-database test showed the pro-
posed metric reliability for measuring the quality of the stereoscopic
image. Among the five distortions, JPEG distortion has the lowest
accuracy. We believe this is due to the complexity of the compres-
sion distortion. Thus, it should be addressed separately for the stereo
image quality assessment.
Table 14: SROCC, LCC, and RMSE results on the LIVE 3D-I
dataset. (Trained on LIVE 3D-II).
Method T ype SROC C LCC R MS E
DIIVINE [6] 0.882 0.893 7.301
Akhter [10] 0.383 0.626 14.827
Chen [19] 0.891 0.626 7.247
Lv [26] NR 0.897 0.901
Appina [25] 0.911 0.917 6.598
Zhou [23] 0.901 0.929 6.010
Fang [22] 0.877 0.880 7.191
3D-nnet 0.880 0.888 7.514
Proposed 3D-AdaBoost 0.887 0.897 7.224
4.6. Influence of cyclopean view and disparity map
In order to demonstrate the efficiency of the proposed approach for
measuring the stereo image quality, numerous tests have been con-
Table 15: SROCC, LCC, and RMSE results on the LIVE 3D-II
dataset. (Trained on LIVE 3D-I).
Method T ype SROC C LCC R MS E
DIIVINE [6] 0.346 0.442 10.012
Akhter [10] 0.543 0.568 9.294
Chen [19] 0.543 0.895 5.102
Lv [26] NR 0.862 0.870
Appina [25] 0.888 0.845 7.279
Zhou [23] 0.819 0.856 6.041
Fang [22] 0.838 0.860 5.767
3D-nnet 0.798 0.813 6.561
Proposed 3D-AdaBoost 0.823 0.832 6.253
Table 16: SROCC against DMOS on the LIVE 3D-I dataset.
(Trained on LIVE 3D-II).
Method T ype WN JP 2K J P EG B lur F F All
DIIVINE [6] 0.882
Akhter [10] 0.914 0.866 0.675 0.555 0.640 0.383
Chen [19] 0.919 0.863 0.617 0.878 0.652 0.891
Lv [26] NR 0.897
Appina [25] 0.910 0.917 0.782 0.865 0.666 0.911
Zhou [23] 0.921 0.856 0.562 0.897 0.771 0.901
Fang [22] 0.883 0.880 0.523 0.523 0.650 0.877
3D-nnet 0.955 0.873 0.588 0.808 0.527 0.880
Proposed 3D-AdaBoost 0.956 0.889 0.556 0.875 0.530 0.892
Table 17: SROCC against DMOS on the LIVE 3D-II dataset.
(Trained on LIVE 3D-I).
Method T ype WN JP 2K J P EG B lur F F All
DIIVINE [6] 0.346
Akhter [10] 0.714 0.724 0.649 0.682 0.559 0.543
Chen [19] 0.950 0.867 0.867 0.900 0.933 0.880
Lv [26] NR 0.862
Appina [25] 0.932 0.864 0.839 0.846 0.860 0.888
Zhou [23] 0.936 0.647 0.737 0.911 0.798 0.819
Fang [22] 0.955 0.714 0.709 0.807 0.872 0.838
3D-nnet 0.882 0.803 0.772 0.925 0.936 0.798
Proposed 3D-AdaBoost 0.932 0.826 0.737 0.881 0.924 0.824
Fig. 14: Scatter plot of cross-dataset scores on LIVE 3D phase I
database using 3D-AdaBoost method.
ducted that cover the possibilities of feature extraction part. Also a
simple feature extraction has been used for comparison. Pixel sum
and pixel average have been used. The proposed learning part re-
mains the same as described, using the 5-fold cross validation. The
3D-AdaBoost model receives different input at each combination,
and the mean performance of each combination is calculated over
the three databases. The results of the tests are shown in table 18.
Fig. 15: Scatter plot of cross-dataset scores on LIVE 3D phase II
database using 3D-AdaBoost method.
The pixel sum PSis defined as follows:
PS=
m
X
i=1
n
X
j=1
I(i, j)(21)
where Iis the left or right image. The pixel average PAis defined
by:
PA=1
m.n
m
X
i=1
n
X
j=1
I(i, j)(22)
From the results it can be observed that the pixel sum and av-
erage indicators give a bad performance, because these features do
not correlate with the image quality. Meanwhile, the used features
give good performance due to their relationship with distortion types
and quality degradation as shown previously in Figure 9. It is also
noticeable that when using disparity map features, the performance
improves which supports the study conducted in [9]. The authors
used different measures to illustrated the relationship between the
perceptual quality of stereo views and the quality of the disparity
map. They concluded that the quality of the depth map is highly
correlated with the overall 3D quality.
As discussed earlier, the 2D IQA metrics may not be applied
to the stereo IQA problem, since either by averaging the score or
the features obtained from left and right image will not consider
asymmetrical distortions. The improved 2D IQA metric DIIVINE
[6] for stereo images provides good performance on the LIVE 3D-I
database and low performance on the LIVE 3D-II database. This is
because the LIVE 3D-II database mainly contains asymmetric dis-
torted stereo images (see table 2). However, due to the fact that
stereo images typically contain redundant information, a feature ex-
traction from the left and right image may result in a redundant fea-
tures. Therefore, the extracted features (SGM ,SRO, and SR M ) from
left and right images are averaged. Afterward, the 3D-AdaBoost
model has been used to map these features to predict the quality. It
is also noticed that the use of 2-scale cyclopean image increases the
accuracy of quality prediction. We assume that the space distance
between cyclopean scale 1 features and cyclopean scale 2 features is
learned while training, helping the model for better prediction. Ad-
ditionally, the use of pixel sum PSand pixel average PAas features
decreases the performance as shown in table 18.
Even though the features are somewhat correlated, the model has
given good results. Some tests have been carried out to support the
use of all gradient extracted features (SGM ,SRO, and SRM ). The
performance deteriorates if one or two features among the three are
neglected; as shown in table 19. Therefore, it is important to utilize
the three features for better quality assessment accuracy.
Furthermore, an additional experiment has been conducted with-
out using Gabor filter. wland wrhave been set to 1 in equation (2).
Results given by table 20 indicate that the cyclopean model using
Gabor weights is better than the simple cyclopean model, in particu-
lar on the LIVE II. Compared to the stereo image model, the simple
cyclopean model is also competitive, but the model may not be ac-
curate on the asymmetric distortion situation as discussed earlier.
The results of table 20 also support the idea of using cyclopean view
rather than using the stereo image for quality assessment problem.
The superiority of the cyclopean image on symmetric and asymmet-
ric degradations is also shown in table 21. Notice that the perfor-
mance of the stereopair image method in table 22 drops significantly
on LIVE 3D-II over all distortions. Consequently, for asymmetric
distortions, extracting features directly from stereo images is not re-
liable. Also, the performance of cyclopean image method maintains
consistency. In the tables, Nrefers to the number of input features to
the regression model 3D-AdaBoost. Overall, we can conclude that
the adopted cyclopean model and quality indicators (SGM ,SRO,
and SRM ) and the used combination are effective for assessing the
quality of stereopair images.
Table 18: Mean of SROCC, LCC, and RMSE results from the three
databases using various features and combinations.
The used mater ial F eatures N SROC C LC C RMS E
Stereopair Image PS2 0.107 0.196 16.182
Stereopair Image PA2 0.255 0.208 16.097
Stereopair Image PS, PA4 0.244 0.245 15.995
Stereopair Image SGM ,SRO ,SRM 3 0.740 0.769 10.386
Stereopair Image, disparity SGM,SRO,SRM 6 0.858 0.869 8.203
Stereopair Image, disparity SGM ,SRO ,SRM ,PS, PA12 0.716 0.741 9.803
Cyclopean view (scale 1) SGM,SRO ,SRM 3 0.776 0.801 9.802
Cyclopean view (scale 1) SGM,SRO ,SRM ,PS, PA5 0.676 0.705 10.552
Cyclopean view (scale 1, and 2) SGM ,SRO ,SRM 6 0.798 0.818 9.444
Cyclopean view (scale 1, and 2) SGM ,SRO ,SRM ,PS, PA10 0.688 0.715 10.298
Cyclopean view (scale 1, and 2), disparity SGM ,SRO,SRM 90.891 0.902 7.244
Cyclopean view (scale 1, and 2), disparity SGM ,SRO ,SRM ,PS, PA15 0.628 0.635 12.445
Table 19: Mean of SROCC, LCC, and RMSE results from the three
databases using different gradient features and combinations.
T he used material Features N SROC C LC C RM SE
Cyclopean view (scale 1, and 2), disparity SRO 3 0.725 0.748 10.698
Cyclopean view (scale 1, and 2), disparity SGM 3 0.724 0.751 10.722
Cyclopean view (scale 1, and 2), disparity SRM 3 0.751 0.709 10.995
Cyclopean view (scale 1, and 2), disparity SRO,SRM 6 0.809 0.817 9.401
Cyclopean view (scale 1, and 2), disparity SGM,SRM 6 0.844 0.849 8.768
Cyclopean view (scale 1, and 2), disparity SRO,SGM 6 0.838 0.847 8.823
Cyclopean view (scale 1, and 2), disparity SGM ,SRO,SR M 90.891 0.902 7.244
Table 20: Cyclopean view versus Stereopair image method results
over the three databases.
The used material Features N I ndicator Live 3D-I Live 3D-II IVC 3D
Stereopair Image SGM,SRO,SRM 3SRO CC 0.905 0.725 0.590
LCC 0.913 0.791 0.602
RMS E 6.657 6.896 17.606
Cyclopean Image (scale 1) SGM,SRO,SRM 3S ROCC 0.908 0.797 0.622
LCC 0.920 0.850 0.634
RMS E 6.417 5.944 17.046
Cyclopean Image Simple (scale 1) SGM ,SRO ,SRM 3SROC C 0.904 0.780 0.607
LCC 0.914 0.828 0.622
RMS E 6.644 6.323 17.263
Table 21: Cyclopean view versus Stereopair image method results
on Symmetric and Asymmetric distortion from LIVE 3D-II dataset.
The used material Featur es N Indicator S ymmetric Asymmetric
Stereopair Image SGM,SRO,SRM 3S ROCC 0.672 0.745
LCC 0.779 0.802
RMS E 6.746 6.851
Cyclopean Image (scale 1) SGM,SRO,SRM 3S ROC C 0.733 0.822
LCC 0.840 0.855
RMS E 5.832 5.954
Cyclopean Image Simple (scale 1) SGM ,SRO ,SRM 3SROC C 0.734 0.796
LCC 0.814 0.834
RMS E 6.249 6.337
Table 22: SROCC results of Cyclopean view versus Stereopair im-
age method over LIVE 3D-I and LIVE 3D-II databases.
T he used material D atabase W N J P 2K JP E G B lur F F All
Stereopair Image LIVE 3D-I 0.943 0.867 0.597 0.816 0.615 0.905
LIVE 3D-II 0.497 0.684 0.606 0.870 0.732 0.725
Cyclopean Image (scale 1) LIVE 3D-I 0.943 0.869 0.589 0.867 0.680 0.908
LIVE 3D-II 0.924 0.676 0.678 0.858 0.735 0.797
Cyclopean Image Simple (scale 1) LIVE 3D-I 0.942 0.872 0.595 0.821 0.678 0.904
LIVE 3D-II 0.911 0.706 0.674 0.844 0.727 0.780
4.7. Computational Complexity
Computational complexity of the proposed algorithm is discussed in
this section. The most computationally expensive stage is the cy-
clopean image construction, since it involves weights computation
of the left and right views by performing a multiscale Gabor filter.
The complexity of the proposed measure depends on the size of the
testing vectors (M) and the number of the Weak Learners (L). There-
fore, the overall complexity of the proposed algorithm is O(M . L).
Furthermore, the computation time of the proposed model has been
computed using a laptop computer with intel i5-2410M CPU, 2.30
GHz and 8 GB RAM, hence the run time in second is 72.5238 (in-
cluding training time). There are no details on the complexity of the
other NR methods. So, state-of-the-art metrics complexities have
not been compared.
The stereoscopic image’s pixel resolution may increase or de-
crease the run time, as well as the hardware computing power. The
test has been conducted on the stereo image shown in figure 1, of
640 x 360 pixels resolution. The more neural network models used,
the higher the run time is. Clearly, the run time increases with the
number of neurons. Note that the run time can be reduced via par-
allel computing (GPU cards) since the proposed method is based on
neural networks.
5. CONCLUSION
In this paper, a new blind stereoscopic IQA metric has been pro-
posed. The model is based on human binocular perception and ad-
vanced machine-learning algorithm. Efficient perceptual features
have been extracted from the gradient magnitude (GM) map, rela-
tive gradient orientation (RO) map and the relative gradient magni-
tude (RM) map. Experimental results showed that the extracted fea-
tures are sensitive to the five common distortions. Considering the
variations of stereo image resolution and viewing conditions, a mul-
tiscale gradient maps of the cyclopean image have been employed.
AdaBoost neural network is used to map the stereo image features
to quality score. The overall obtained results have indicated that
the metric correlates well with subjective scores DMOS over sym-
metric and asymmetric distortions. The proposed metric performs
better in terms of both accuracy and efficiency on the three publicly
available stereoscopic IQA databases, LIVE 3D-I, LIVE 3D-II, and
IRCCyN/IVC 3D than the state-of-the-art methods.
In feature works, we believe that the use of the extracted features
can also be useful for the development of no-reference stereoscopic
video quality models. It is also possible to develop the idea of Ad-
aBoost by incorporating other feature-learning algorithms.
6. REFERENCES
[1] Motion Picture Association of America, “2016 theatrical mar-
ket statistics report,” 2016.
[2] Albert M William and Darrell L Bailey, “Stereoscopic visual-
ization of scientific and medical content,” in ACM SIGGRAPH
2006 Educators Program, New York, NY, USA, 2006, SIG-
GRAPH ’06, ACM.
[3] C. F. Westin, “Extracting brain connectivity from diffusion mri
[life sciences],” IEEE Signal Processing Magazine, vol. 24, no.
6, pp. 124–152, 2007.
[4] Jacky Baltes, Sancho McCann, and John Anderson, “Hu-
manoid robots: Abarenbou and daodan,” RoboCup-Humanoid
League Team Description, 2006.
[5] O. Messai, F. Hachouf, and Z. Ahmed Seghir, “Blind stereo-
scopic image quality assessment using cyclopean view and
neural network,” in The fifth IEEE Global Conference on Sig-
nal and Information Processing (GlobalSIP). IEEE, 2017, pp.
196–200.
[6] Anush Krishna Moorthy and Alan Conrad Bovik, “Blind im-
age quality assessment: From natural scene statistics to per-
ceptual quality, IEEE transactions on Image Processing, vol.
20, no. 12, pp. 3350–3364, 2011.
[7] R. Bensalma and Mohamed-Chaker Larabi, A perceptual
metric for stereoscopic image quality assessment based on the
binocular energy, Multidimensional Systems and Signal Pro-
cessing, vol. 24, no. 2, pp. 281–316, 2013.
[8] P. Seuntiens, “Visual experience of 3d tv,” doctor doctoral
thesis, Eindhoven University of Technology, 2006.
[9] Amin Banitalebi-Dehkordi, Mahsa T Pourazad, and Panos Na-
siopoulos, A study on the relationship between depth map
quality and the overall 3d video quality of experience, in 2013
3DTV Vision Beyond Depth (3DTV-CON). IEEE, 2013, pp. 1–
4.
[10] R. Akhter, ZM Parvez Sazzad, Yuukou Horita, and Jacky
Baltes, “No-reference stereoscopic image quality assessment,”
in IS&T/SPIE Electronic Imaging. International Society for
Optics and Photonics, 2010, pp. 75240T–75240T.
[11] A. Benoit, Patrick Le Callet, Patrizio Campisi, and Romain
Cousseau, “Quality assessment of stereoscopic images,”
EURASIP journal on image and video processing, vol. 2008,
no. 1, pp. 1–13, 2009.
[12] J. You, Liyuan Xing, Andrew Perkis, and Xu Wang, “Percep-
tual quality assessment for stereoscopic images based on 2d
image quality metrics and disparity analysis,” in Proc. of In-
ternational Workshop on Video Processing and Quality Metrics
for Consumer Electronics, Scottsdale, AZ, USA, 2010.
[13] P. Gorley and Nick Holliman, “Stereoscopic image quality
metrics and compression,” in Electronic Imaging 2008. Inter-
national Society for Optics and Photonics, 2008, pp. 680305–
680305.
[14] CTER Hewage, Stewart T Worrall, Safak Dogan, and AM Kon-
doz, “Prediction of stereoscopic video quality using objective
quality models of 2-d video,” Electronics letters, vol. 44, no.
16, pp. 963–965, 2008.
[15] David G Lowe, “Object recognition from local scale-invariant
features,” in Computer vision, 1999. The proceedings of the
seventh IEEE international conference on. Ieee, 1999, vol. 2,
pp. 1150–1157.
[16] Martin A Fischler and Robert C Bolles, “Random sample con-
sensus: a paradigm for model fitting with applications to image
analysis and automated cartography, in Readings in computer
vision, pp. 726–740. Elsevier, 1987.
[17] M. Chen, Che-Chun Su, Do-Kyoung Kwon, Lawrence K Cor-
mack, and Alan C Bovik, “Full-reference quality assessment of
stereopairs accounting for rivalry,” Signal Processing: Image
Communication, vol. 28, no. 9, pp. 1143–1155, 2013.
[18] WJM Levelt, “On binocular rivalry (p. 107), The Hague-
Paris: Mouton, 1968.
[19] Ming-Jun Chen, Lawrence K Cormack, and Alan C Bovik,
“No-reference quality assessment of natural stereopairs,” IEEE
Transactions on Image Processing, vol. 22, no. 9, pp. 3379–
3391, 2013.
[20] W. Hachicha, Azeddine Beghdadi, and Faouzi Alaya Cheikh,
“Stereo image quality assessment using a binocular just notice-
able difference model, in Image Processing (ICIP), 2013 20th
IEEE International Conference on. IEEE, 2013, pp. 113–117.
[21] Y. Zhao, Zhenzhong Chen, Ce Zhu, Yap-Peng Tan, and Lu Yu,
“Binocular just-noticeable-difference model for stereoscopic
images,” IEEE Signal Processing Letters, vol. 18, no. 1, pp.
19–22, 2011.
[22] Meixin Fang and Wujie Zhou, “Toward an unsupervised blind
stereoscopic 3d image quality assessment using joint spatial
and frequency representations, AEU-International Journal of
Electronics and Communications, vol. 94, pp. 303–310, 2018.
[23] Wujie Zhou, Weiwei Qiu, and Ming-Wei Wu, “Utilizing dic-
tionary learning and machine learning for blind quality assess-
ment of 3-d images,” IEEE Transactions on Broadcasting, vol.
63, no. 2, pp. 404–415, 2017.
[24] Che-Chun Su, Lawrence K Cormack, and Alan C Bovik, “Ori-
ented correlation models of distorted natural images with ap-
plication to natural stereopair quality evaluation, IEEE Trans-
actions on image processing, vol. 24, no. 5, pp. 1685–1699,
2015.
[25] Balasubramanyam Appina, Sameeulla Khan, and Sumohana S.
Channappayya, “No-reference stereoscopic image quality as-
sessment using natural scene statistics,” Signal Processing:
Image Communication, vol. 43, pp. 1 14, 2016.
[26] Yaqi Lv, Mei Yu, Gangyi Jiang, Feng Shao, Zongju Peng, and
Fen Chen, “No-reference stereoscopic image quality assess-
ment using binocular self-similarity and deep neural network,”
Signal Processing: Image Communication, vol. 47, pp. 346–
357, 2016.
[27] M. Korytkowski, Leszek Rutkowski, and Rafal Scherer, “On
combining backpropagation with boosting,” in Neural Net-
works, 2006. IJCNN’06. International Joint Conference on.
IEEE, 2006, pp. 1274–1277.
[28] L. Liu, Yi Hua, Qingjie Zhao, Hua Huang, and Alan Conrad
Bovik, “Blind image quality assessment by relative gradient
statistics and adaboosting neural network,” Signal Processing:
Image Communication, vol. 40, pp. 1–15, 2016.
[29] Oussama Messai, Fella Hachouf, and Zianou Ahmed Seghir,
“Deep learning and cyclopean view for no-reference stereo-
scopic image quality assessment,” in 2018 International Con-
ference on Signal, Image, Vision and their Applications (SIVA).
IEEE, 2018, pp. 1–6.
[30] R. Blake, David H Westendorf, and Randall Overton, “What is
suppressed during binocular rivalry?, Perception, vol. 9, no.
2, pp. 223–231, 1980.
[31] Karsten M ¨
uhlmann, Dennis Maier, J¨
urgen Hesser, and Rein-
hard M¨
anner, “Calculating dense disparity maps from color
stereo images, an efficient implementation,” International
Journal of Computer Vision, vol. 47, no. 1-3, pp. 79–88, 2002.
[32] Z. Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simon-
celli, “Image quality assessment: from error visibility to struc-
tural similarity, IEEE transactions on image processing, vol.
13, no. 4, pp. 600–612, 2004.
[33] John G Daugman, “Two-dimensional spectral analysis of cor-
tical receptive field profiles, Vision research, vol. 20, no. 10,
pp. 847–856, 1980.
[34] D. J Field, “Relations between the statistics of natural images
and the response properties of cortical cells,” JOSA A, vol. 4,
no. 12, pp. 2379–2394, 1987.
[35] C. Su, Alan C Bovik, and Lawrence K Cormack, “Natu-
ral scene statistics of color and range,” in Image Processing
(ICIP), 2011 18th IEEE International Conference on. IEEE,
2011, pp. 257–260.
[36] Naoum P Issa, Christopher Trepel, and Michael P Stryker,
“Spatial frequency maps in cat visual cortex, Journal of Neu-
roscience, vol. 20, no. 22, pp. 8504–8514, 2000.
[37] C. Schor, Ivan Wood, and Jane Ogawa, “Binocular sensory
fusion is limited by spatial resolution,” Vision research, vol.
24, no. 7, pp. 661–665, 1984.
[38] Zhou Wang, Eero P Simoncelli, and Alan C Bovik, “Multi-
scale structural similarity for image quality assessment,” in
The Thrity-Seventh Asilomar Conference on Signals, Systems
& Computers, 2003. Ieee, 2003, vol. 2, pp. 1398–1402.
[39] Yoav Freund and Robert E Schapire, “A decision-theoretic
generalization of on-line learning and an application to boost-
ing,” Journal of computer and system sciences, vol. 55, no. 1,
pp. 119–139, 1997.
[40] Peter L Bartlett and Mikhail Traskin, “Adaboost is consistent,
Journal of Machine Learning Research, vol. 8, no. Oct, pp.
2347–2368, 2007.
[41] B. P Roe, Hai-Jun Yang, Ji Zhu, Yong Liu, Ion Stancu, and
Gordon McGregor, “Boosted decision trees as an alternative
to artificial neural networks for particle identification,” Nu-
clear Instruments and Methods in Physics Research Section A:
Accelerators, Spectrometers, Detectors and Associated Equip-
ment, vol. 543, no. 2, pp. 577–584, 2005.
[42] Hamid R Sheikh, Muhammad F Sabir, and Alan C Bovik, “A
statistical evaluation of recent full reference image quality as-
sessment algorithms,” IEEE Transactions on image process-
ing, vol. 15, no. 11, pp. 3440–3451, 2006.
[43] A.K. Moorthy, Che-Chun Su, Anish Mittal, and Alan Conrad
Bovik, “Subjective evaluation of stereoscopic image quality,”
Signal Processing: Image Communication, vol. 28, no. 8, pp.
870–883, 2013.
[44] ITUT Rec, “P. 1401, methods, metrics and procedures for sta-
tistical evaluation, qualification and comparison of objective
quality prediction models,” International Telecommunication
Union, Geneva, Switzerland, 2012.
[45] Feng Shao, Weisi Lin, Shanshan Wang, Gangyi Jiang, and Mei
Yu, “Blind image quality assessment for stereoscopic images
using binocular guided quality lookup and visual codebook,”
IEEE Transactions on Broadcasting, vol. 61, no. 2, pp. 154–
165, 2015.
[46] Z. Wang, Alan C Bovik, and Ligang Lu, “Why is image qual-
ity assessment so difficult?, in Acoustics, Speech, and Signal
Processing (ICASSP), 2002 IEEE International Conference on.
IEEE, 2002, vol. 4, pp. IV–3313.
[47] M. Chen, Alan C Bovik, and Lawrence K Cormack, “Study on
distortion conspicuity in stereoscopically viewed 3d images,
in IVMSP Workshop, 2011 IEEE 10th. IEEE, 2011, pp. 24–29.
[48] N Clayton Silver and William P Dunlap, “Averaging corre-
lation coefficients: should fisher’s z transformation be used?,”
Journal of applied psychology, vol. 72, no. 1, pp. 146, 1987.
... In effect, earlier NR-SIQA methods have explored in implementing HVS through traditional algorithms since the significance of HVS characteristics. However, these works [5]- [9] highly depend on hand-crafted features and are not robust and generalizable enough to represent inherent properties in stereo images. In the past few years, we have witnessed both the prosperity of convolutional neural networks (CNN) and self-attention algorithms in the computer vision community. ...
... These mentioned approaches are highly dependent on natural statistical characteristics of distorted stereo images, which do not take human visual properties into account, thus resulting in limited performance. Different from those NSS-based IQA methods, some literature [5]- [9] extract features to represent HVS characteristics directly, then the machine learning tools are employed to train the quality prediction model. For example, Zhou and Yu developed a binocular response-based NR-SIQA method in [5]. ...
... They modeled both monocular and binocular information and utilized an SVR to combine these features and assess the quality. Messai et al. [9] proposed an NR-SIQA network using a neural network Adaptive Boosting (AdaBoost). Cyclopean views of distorted left and right images can be acquired to extract features including gradient magnitude, relative gradient magnitude, and gradient orientation. ...
Preprint
Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.
... 3D Point clouds (3D-PCs) provide the shape information of 3D objects and can be quickly captured by 3D scanners; which are becoming accessible even in our mobile devices (e.g., tablets, smartphones, etc). Recently, 3D-PC has been an active research field, closely tied to applications such as augmented reality, drones, self-driving vehicles, and 3D video games [4,5]. However, because of the large number of points cloud required to describe the object, this type of 3D data requires a large amount of memory storage, and demands high computation for transmitting and display. ...
Conference Paper
Full-text available
Deep learning-based quality assessments have significantly enhanced perceptual multimedia quality assessment, however it is still in the early stages for 3D visual data such as 3D point clouds (PCs). Due to the high volume of 3D-PCs, such quantities are frequently compressed for transmission and viewing, which may affect perceived quality. Therefore, we propose no-reference quality metric of a given 3D-PC. Comparing to existing methods that mostly focus on geometry or color aspects, we propose integrating frequency magnitudes as indicator of spatial degradation patterns caused by the compression. To map the input attributes to quality score, we use a light-weight hybrid deep model; combined of Deformable Convolutional Network (DCN) and Vision Transformers (ViT). Experiments are carried out on ICIP20 [1], PointXR [2] dataset, and a new big dataset called BASICS [3]. The results show that our approach outperforms state-of-the-art NR-PCQA measures and even some FR-PCQA on PointXR. The implementation code can be found at: https://github.com/o-messai/3D-PCQA
... Due to the fact that viewers are not faced with urgent tasks while watching interactive videos, immersion is chosen as the measurement variable. Shin [46] notes that immersion, a symbiosis between the system and user, can enhance a system's efficacy. Hsieh et al. [47] state that achieving immersion often entails eliciting arousal and delight in the user's interactive journey. ...
Article
Full-text available
With the rapid spread of mobile devices and the Internet, mobile interactive video advertising has become an increasingly popular means of accessing advertising information for a large number of users. Interactive narratives are advertisements that require collaboration between consumers and designers to complete the story. Interactive narratives influence marketing impact and the advertising experience. Building on previous research, this study delves deeper into the design methods of interactive narratives for mobile video advertisements. We developed various interactive narrative samples by controlling video quality parameters, content, and product involvement, and then measured consumer perceptions of these samples in a laboratory environment. The results indicate that six design methods for interactive narratives foster positive perceptions, immersion, and satisfaction in advertisements with low product involvement. For ads with a high degree of product involvement, two design methods can achieve positive consumer perceptions of interactive narratives. This study offers insights for businesses and interaction designers aiming to advance the commercial use of mobile interactive video advertising. At the same time, we propose a design method for mobile interactive video advertising that can also serve as an entry point for theoretical research on interactive narratives.
... Jiang et al. [42] Proposed a unified quality evaluation model for singly and multiply distorted stereoscopic images by learning visual primitives based on a supervised dictionary framework to encode quality related features. Messai et al. [43], [44], [45] created cyclopean images in the first stage, followed by predicting scores based on machine learning or convolutional neural network (CNN). Oh et al. [46] built a deep CNN for blind SIQA trained through two-step regression, where the first step is responsible for automatically extracting local features, and the second part aggregates the local features into global features. ...
Article
Full-text available
Due to the deficient knowledge of binocular vision properties, how to effectively evaluate stereoscopic images still remains a challenging task. Inspired by multichannel processing of human visual system (HVS), we propose a blind method for stereoscopic image quality assessment (SIQA) by extracting quality related features in sub-bands of the image. First of all, we introduce the shearlet transform to decompose the left- and right-view images into multiple sub-bands content with diverse combinations of scales and orientations, and obtain the combined view based on energy-weighted summation of the corresponding sub-bands of two eye views. Then, natural scene statistics (NSS) of the original left and right images are obtained as quality-sensitive features, followed by extracting NSS features of the sub-bands of left, right and combined views. Moreover, we calculate the gradient similarity between each sub-band pair to denote the asymmetric distortion and disparity information. Finally, all the extracted features are mapped into a quality score by support vector regression (SVR). experimental results on multiple benchmark databases verify the superiority of our method.
... A blind stereoscopic IQA metric has been suggested in [26]. The model is based on a sophisticated machine-learning algorithm and human binocular perception. ...
Article
The application of 3D technology is rapidly expanding, and stereoscopic imagery is typically used to display 3D data. However, compression, transmission, and other necessary processes may reduce the quality of these images. Stereo image quality assessment (SIQA) has gained more attention to guarantee that customers have a positive watching experience. In order to provide the highest level of experience, it is necessary to develop a quality evaluation mechanism for stereoscopic content that is both dependable and precise. A full-reference method for SIQA is presented in this paper. Compared to previous measures, this method gives users more freedom to use distorted pixel metrics and edge similarity. The binocular summation map is calculated by adding the left and right images for a stereo pair. Improved gradient similarity based distorted pixel measure (SGSDM) is used to calculate the quality of binocular summation. The scored 3D LIVE IQA database is used to evaluate the correlation of the proposed metric with the DMOS subjective score given by the database. The proposed method’s efficacy is demonstrated by experimental comparisons.
Article
Aiming to automatically monitor and improve stereoscopic image and video processing systems, stereoscopic image quality assessment approaches are becoming more and more important as 3D technology gains popularity. We propose a full-reference stereoscopic image quality assessment method that incorporate monocular and binocular features based on binocular competition and binocular integration. To start, we create a three-channel RGB fused view by fusing Gabor filter bank responses and disparity maps. Then, using the monocular view and the RGB fusion view, respectively, we extract monocular and binocular features. To alter the local features in the binocular features, we simultaneously estimate the saliency of the RGB fusion image. Finally, the monocular and binocular quality scores are calculated based on the monocular and binocular features, and the quality scores of the stereo image prediction are obtained by fusion. Performance testing in the LIVE 3D IQA database Phase I and Phase II. The results of the proposed method are compared with newer methods. The experimental results show good consistency and robustness.
Conference Paper
Full-text available
This paper mainly introduces a new referenceless stereo-pairquality assessment using cyclopean view and deep learning. Theproposed method is based on Human Visual System (HVS) mod-eling. Firstly, the cyclopean image is constructed considering thepresence of binocular rivalry/suppression in order to cover the asym-metric distortion case. Secondly, the cyclopean image is dividedinto four patches, and we train four Convolutional Neural Network(CNN) prediction models. Finally, the trained models predict qualityscores from the cyclopean image patches, and we average the scoresto get the final quality assessment. The benchmark 3D LIVE phaseI and 3D LIVE phase II databases have been used to evaluate theperformance of our approach. Compared to the state-of-the-art full-reference and no-reference stereoscopic image quality assessment metrics. The approach has shown competitive results and achieves consistent evaluation performance
Conference Paper
Full-text available
Numerous stereo Image Quality Assessment (IQA) metrics have been designed only for symmetrically distorted stereo image pairs. However, in many scenarios, the stereo images could be afflicted by asymmetric distortion. This paper presents a new no-reference stereoscopic 3D image quality assessment metric using cyclopean image and machine-learning. The cyclopean image is constructed considering the presence of binocular rivalry in order to cover the asymmetrically distorted part. The proposed 3D image quality assessment method relies on relative gradient statistics and Back-Propagation (BP) neural network with Adaptive Boosting (AdaBoost) algorithm. The gradient orientation is also used as a predictive source of information for stereoscopic image quality assessment to reach the optimum accuracy. The new model has shown competitive results mostly on asymmetric distortions compared to existing models.
Article
Existing blind stereoscopic 3D (S3D) image quality assessment (IQA) metrics usually require supervised learning methods to predict S3D image quality, which limits their applicability in practice. In this paper, we propose an unsupervised blind S3D IQA metric that utilizes the joint spatial and frequency representations of visual perception. The metric proposed in this work was inspired by the binocular visual mechanism; furthermore, it is unsupervised and does not require subject-rated samples for training. To be more specific, first, the various binocular quality-aware features in spatial and frequency domains are extracted from the monocular and cyclopean views of natural S3D image patches. Subsequently, these features are utilized to establish a pristine multivariate Gaussian (MVG) model to characterize natural S3D image regularities. Finally, with the learned MVG model, the final quality score for a distorted S3D image can be yielded using a Bhattacharyya-like distance. Our experimental results illustrate that, compared to related existing metrics, the devised metric achieves competitive prediction performance.
Article
The emergence of multiview displays has made the need for synthesizing virtual views more pronounced, since it is not practical to capture all of the possible views when filming multiview content. View synthesis is performed using the available views and depth maps. There is a correlation between the quality of the synthesized views and the quality of depth maps. In this paper we study the effect of depth map quality on perceptual quality of synthesized view through subjective and objective analysis. Our evaluation results show that: 1) 3D video quality depends highly on the depth map quality and 2) the Visual Information Fidelity index computed between the reference and distorted depth maps has Pearson correlation ratio of 0.75 and Spearman rank order correlation coefficient of 0.67 with the subjective 3D video quality.
Article
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.