Content uploaded by Ivan V. Bajic
Author content
All content in this area was uploaded by Ivan V. Bajic on Oct 26, 2017
Content may be subject to copyright.
1
Full-Reference Objective Quality Assessment of
Tone-Mapped Images
Hadi Hadizadeh and Ivan V. Baji´
c, Senior Member,IEEE
Abstract—In this paper, we present a novel method for full-
reference (FR) image quality assessment (IQA) of tone-mapped
images displayed on standard low dynamic range (LDR) displays.
Due to the dynamic range compression caused by the tone-
mapping process, a mixture of several artifacts and distortions
may be produced in the tone-mapped images. This makes the
quality assessment of the tone-mapped images very challenging.
Due to the diversity of such artifacts and distortions, we propose
a “bag of features” (BOF) approach to tackle this problem.
Specifically, in the proposed method, a number of different
perceptually relevant quality-related features are first extracted
from a given tone-mapped image and its reference HDR image.
These features are designed such that they capture different
aspects and attributes of the tone-mapped image such as its
structural fidelity, naturalness, overall brightness, etc. A support
vector regressor is then trained based on the extracted features,
and it is used for measuring the visual quality of a tone-mapped
image. Our experimental results indicate that the proposed
method achieves high accuracy as compared to several existing
methods.
Index Terms—image quality assessment, tone mapping, image
naturalness, structural fidelity, natural scene statistics
I. INT RO DU CTI ON
HIGH dynamic range (HDR) images have gained a lot of
popularity in recent years due to their ability to accu-
rately represent a wide range of intensity levels on the order
of 10,000:1 [1]. They also provide better contrast variation,
with a higher degree of detail preservation and representation.
However, in order to display HDR images on the traditional
low dynamic range (LDR) displays, they need to be tone-
mapped by the so-called tone mapping operators (TMOs)
[1]. Different TMOs produce different LDR results, and so
a natural question is how to objectively evaluate the visual
quality of the resultant tone-mapped LDR images. In fact,
due to the dynamic range compression caused by the tone-
mapping process, several image attributes may be changed
after tone mapping, so various distortions and artifacts may
be produced in the tone-mapped images [2]. As disscussed in
[3], various image attributes contribute to the visual quality of
a tone-mapped image. These include [3]: brightness, contrast,
color, detail, and artifacts.
A. Related Works
With the rapid growth of multimedia applications such as
the optimization for video compression [4] and transmission
H. Hadizadeh is with the Quchan University of Advanced Technol-
ogy, Quchan, 94771-67335, Iran. (Email: h.hadizadeh@qiet.ac.ir) I. V.
Baji´
c is with Simon Fraser University, Burnaby, BC, V5A 1S6, Canada.
(Email:ibajic@ensc.sfu.ca). The corresponding author is Hadi Hadizadeh.
[5], transcoding [6], image/video enhancement and retargeting
[7], 3D viewing [8], telecommuting [9], [10], quality assess-
ment of dehazed images [11], quality assessment of multiply-
distorted images [12], the need for objective image quality
assessment (IQA) is increasing significantly. Over the past
decades, several IQA methods have been developed for LDR
images [13]–[17] but there are very few methods that are
specially designed for HDR and tone-mapped images.
Recently, some research efforts have been made to develop
efficient methods for full-reference (FR) image quality assess-
ment of tone-mapped images. Perhaps the earliest method in
this regard is the one proposed in [18]. In this method an
objective quality index called the tone mapping quality index
(TMQI) was proposed for quality assessment of tone-mapped
images. TMQI is based on the fusion of an SSIM-motivated
[19] structural fidelity measure with a statistical naturalness
measure, where both use only the luminance information,
and the color information is ignored. In [20], based on the
local phase information of images, an objective index called
the feature similarity index for tone-mapped images (FSITM)
was proposed. FSITM compares the locally-weighted mean
phase angle map of an original HDR image to that of its
associated tone-mapped image. The results reported in [20]
demonstrated that the combination of FSITM with TMQI
provides high accuracy in quality assessment of tone-mapped
images. However, a shortcoming of FSITM is that it does
not present any specific method for fusing the quality index
of individual color channels. Hence, the authors of FSITM
provided their experimental results based on computing the
FSITM index in different color channels separately.
In [21], the authors combined TMQI with the visual saliency
[22] model for HDR images proposed in [23] for pooling
local quality scores. The resultant method called SHDR-TMQI
achieved better accuracy than TMQI. They also proposed
two other methods called TMQI-NSS-σand TMQI-NSS-
Entropy, where in the former method TMQI is combined with
a natural scene satistics (NSS) model [24] based on mean
subtracted contrast normalized (MSCN) pixels [25] while in
the latter method the local entropy is used for pooling the
structural fidelity score computed by TMQI. Recently, the
TMQI method was revisited in [26] by improving the structural
fidelity and statistical naturalness components in TMQI. The
improved method was named TMQI-II. Also, in [27], a no-
reference (blind) IQA method called BTMQI was developed
for assessing the quality of tone-mapped images.
None of the above-mentioned methods fully utilize color
information, and most of them do not utilize color at all. How-
ever, since the luminance level is reduced by the tone-mapping
Accepted for publication in IEEE Transactions on Multimedia
2
process, and the perception of colors depends on the luminance
level [28], the color appearance may be changed by the tone-
mapping process [29], [30]. Hence, the proper utilization of
the color information may improve the accuracy of the quality
assessment of tone-mapped images. Moreover, some TMOs
use the color information to produce LDR images with better
quality and appearance [30], [31]. Therefore, it is essential to
use the color information for a better quality assessment of
tone-mapped images generated by various TMOs.
B. Contributions
In this work, we present a method for FR IQA of tone-
mapped images, which we refer to as the Tone-mapped Image
Quality (TIQ) index. As discussed earlier, after tone maping, a
mixture of complex distortions and artifacts may be produced
in the resultant tone-mapped images. Such distortions may
change the structure, color appearance and also naturalness
of the produced LDR images. Hence, in general, the quality
assessment of tone-mapped images is a difficult problem. To
tackle this problem, we sought to produce a bag of quality-
sensitive and discriminative image features, and use them to
predict the visual quality of tone-mapped images.
The employed features measure several attributes of the
tone-mapped images either in a no-reference or full-reference
manner. Some of them use the color information while the
others use only the luminance information. More specifically,
we propose the following eight features (named F1to F8):
•F1: We propose this feature to measure the structural
fidelity of a tone-mapped image with respect to its
reference HDR image based on a photometric-invariant
color descriptor [32], which uses the image 1st-order
derivative in the perceptual opponent color space [32],
[33]. To the best of our knowledge, this feature has not
been previously used for any IQA task.
•F2: This feature is borrowed from [18], and it measures
the statistical naturalness of a tone-mapped image based
on the mean and standard deviation of the luminance
channel of the tone-mapped image.
•F3: In [25], inspired by the nonlinear response of certain
cortical neurons, a statistical model of locally-normalized
luminance coefficients was developed for blind IQA
of natural LDR images, where the coefficients in this
model are computed by local mean subtractions and
the biologically-inspired divisive normalization operation
[34]. The resultant coefficients are called the mean-
subtracted contrast normalized (MSCN) coefficients [25].
Here, we postulate that two images with similar percep-
tual qualities should produce similar MSCN coefficients.
Hence, we propose this feature to measure the similarity
between the MSCN coefficients of a tone-mapped image
and its reference HDR image in each color channel.
•F4: It is known that the MSCN coefficients of a pristine
natural image follow a Gaussian distribution. However,
distortions alter the Gaussianity of this distribution [25].
In [25], the statistical distribution of these coefficients
and their pairwise products in the luminance channel was
modeled by two separate generalized Gaussian distribu-
tions (GGD), and their parameters were used for blind
IQA of LDR images. We here exploit these features
for capturing the statistical naturalness of tone-mapped
images in each color channel.
•F5: To better describe the shape of the distribution
of the MSCN coefficients and their pair-wise products,
we propose to use the skewness and kurtosis of these
coefficients as another feature.
•F6: It is known that the human visual system is very
sensitive to structural degradation of a scene [19]. To
measure the structural degradation caused by the tone-
mapping process, we propose to use the normalized
histogram of the codes produced by applying the local
binary pattern (LBP) operator on the gradient magnitude
of both the tone-mapped image and its reference HDR
image [35]. Note that LBP provides a computationally-
efficient method for describing the structure (texture) of
an image, and the gradient magnitude provides contrast
information, which is an important factor affecting human
visual system’s perception of image quality [36]. There-
fore, this feature is able to describe the spatial structure of
the contrast information of both the tone-mapped image
and its reference HDR image.
•F7: We also propose to use the key [2] of both the tone-
mapped image and its reference HDR image as another
feature to measure the overall brightness of the tone-
mapped image with respect to its HDR image.
•F8: Due to the dynamic range compression, the details in
very bright or very dark regions are lost, and the inten-
sities of the pixels in these regions are saturated/clipped
at the either lowest or highest intensity levels [2]. As
discussed in [3], the reproduction of details in very bright
or very dark regions is an important factor affecting the
perceptual quality of the tone-mapped images. Hence, we
use the area ratio of such regions as another feature,
where ratio is computed by dividing the area of such
regions by the total area of the image.
To the best of our knowledge, among the above-mentioned
features, F1,F3,F5,F7, and F8are new in the context of
IQA, especially for tone-mapped images.
In order to get a single quality score for a given tone-
mapped image, we train a support vector regressor (SVR) [37]
based on the features extracted from a set of training data,
and use the trained SVR for predicting the visual quality of a
given test image. To the best of our knowledge, we are the first
to propose a bag of features (BOF) approach for the quality
assesment of tone-mapped images. The proposed approach
is generic in the sense that any another useful feature can
easily be added to it without the need for changing the overall
framework. Unlike the existing works for quality assessment
of tone-mapped images, our proposed method does not ignore
the color information. Also, unlike almost all of the existing
methods that have been evaluated on a common database with
very few sample images without checking the results for their
statistical significance, we also evaluate our proposed method
on a larger database based on paired comparisons, and check
the statistical significance of our results. Our experimental
results indicate that the proposed method achieves high ac-
3
curacy in predicting the visual quality of tone-mapped images
as compared to several existing methods in this field.
This paper is organized as follows. In Section II, we present
the proposed method. The experimental results are given in
Section III followed by conclusions in Section IV.
II. TH E PRO PO SED ME TH OD
Given a tone-mapped LDR image and its HDR image, our
goal is to estimate the visual (perceptual) quality of the tone-
mapped image objectively. Due to the reduction in dynamic
range, the structure and the global appearance of the tone-
mapped LDR image may degrade significantly. In fact, a
mixture of several distortions and artifacts may be produced
after tone mapping [2]. This makes the quality assessment of
tone-mapped images a very challenging task, because different
distorions change the image attributes and statistics differently.
To tackle this difficult problem, we sought to produce a bag of
different quality-aware structural and statistical image features,
and train a regressor to map the extracted features to perceptual
quality scores. In the sequel, we first introduce the employed
features, and then describe the training procedure.
A. The proposed features
1) Feature F1:The first feature measures the structural
fidelity of the input tone-mapped image with respect to its
HDR image based on a descriptor that uses the 1st-order Gaus-
sian derivatives of an image. Note that in [38] it was shown
that the local structure of an image in a neighborhood can be
represented by a local Taylor series expansion of the neigh-
borhood, where the coefficients of the series can be computed
using local Gaussian derivatives. In [39] it was also discussed
that the local image structure can be described by the local
Gaussian derivatives. In [40], it is argued that due to the extra
information available in color images, the 1st-order Gaussian
derivatives are sufficient for the local structure description.
Based on these facts, a color descriptor was developed in
[32] based on the 1st-order Gaussian derivatives, where the
descriptor is invariant to photometric and illumination-related
scene-accidental effects such as shadowing, etc. We use this
descriptor to compute F1as follows.
First, the input HDR and its LDR image are transformed to
the opponent-color (OC) space [32]. The transformation for
an RGB image I= (R,G,B)is performed as:
O1
O2
O3
=
R−G
√2
R+G−2B
√6
R+G+B
√3
,(1)
where O1and O2are the chromatic components, and O3
is the luminance component. Unlike the RGB color space,
the OC space is uncorrelated with respect to photometric
events (e.g. shading, shadows, and specularities) [32], so
more discriminative and salient features can be extracted from
an image in this space. Note that based on the opponent-
process theory of color vision [41], it is believed that the
color perception is due to the combined differential response
of different color components. Hence, the OC space can be
considered as a sort of perceptual color space [22], [32].
To measure the structural fidelity, we use a color invariance
descriptor based on the 1st-order image derivatives [32]. Let
Rt,Gt, and Btbe the 1st-order derivatives of R,G, and B
along the t∈ {x, y}direction, respectively. Based on these
image derivatives, the derivatives of the two opponent colors
are computed as follows:
O1,t =Rt−Gt
√2,(2)
O2,t =Rt+Gt−2Bt
√6.(3)
An angular color descriptor called the opponent angle is then
calculated as follows to obtain a feature map Φt[32]:
Φt= tan−1O1,t
O2,t ,(4)
where the division is performed in a pixel-wise manner.
As discussed in [32], the opponent angle descriptor is
invariant to several scene-accidential and photometric events
such as the highlights, illuminant intensity, etc. Hence, it can
be used to extract the color structure of an image robustly.
Let Φh
tbe the opponent angle image of the given HDR
image of size W×H, and Φl
tbe the opponent angle image
of the corresponding tone-mapped LDR image. To measure
the structural fidelity of the tone-mapped image, we use the
following popular formula [19], [36] to compute a similarity
map Stbased on Φh
tand Φl
t:
St=2Φl
tΦh
t+c
Φl
tΦl
t+Φh
tΦh
t+c,(5)
where denotes the pixel-wise multiplication, and the di-
vision is performed in a pixel-wise manner. Also, cis a
positive constant (e.g. 1) to avoid instability. Based on (5),
we derive the first quality-aware feature F1of length 2 as
F1= [mean(Sx), mean(Sy)], where mean(.)computes the
mean value of all elements in its input matrix. Note that the
values in Sxand Syare bounded between 0 and 1, and the
higher their value, the higher the similarity between the two
opponent angle images and vice versa.
This way of measuring the structural fidelity enables us to
compare the structure of two images with different dynamic
range as the employed color opponent angle descriptor is
independent of the dynamic range. An illustration of the
opponent angle images for two sample tone-mapped images of
the same HDR scene is shown in Fig. 1. In this example, the
right tone-mapped image in the first row has a worse quality
than the left tone-mapped image. We observe that its opponent
angle images shown in the last row are less similar to the
opponent angle images of the original HDR image shown in
the second row than the opponent angle images of the first
tone-mapped image shown in the third row.
2) Feature F2:As discussed in [18], [26], the structural
fidelity and image naturalness are two important factors that
govern the visual quality of a distorted image. In fact, one
can find two images with the same structure but with very
different appearance. In the literature, various methods exist
for measuring or modeling the image naturalness based on
natural image statistics. Perhaps the most relevant existing
4
Fig. 1. An illustration of various opponent angle images using a jet colormap,
in which different colors represents different angles. Top row: two different
tone-mapped images from the same HDR scene. Second row: the color
visualizations of Φh
x(left) and Φh
y(right), Third row: the color visualizations
of Φl
xand Φl
yrelated to the left tone-mapped image in the first row, Last
row: the color visualizations of Φl
xand Φl
yrelated to the right tone-mapped
image in the first row.
method in this regard is the one proposed in [18], in which
a statistical naturalness model was proposed based on the
statistics of the brightness and contrast of tone-mapped images.
Due to the effectiveness of this model, we exploit it as a
single feature in our proposed bag of features approach. In
this method, the mean (m) and the standard deviation (s) of
the histogram of about 3000 gray-scale natural 8 bits/pixel
images in [42] were respectively modeled by a Gaussian and
Beta probability density functions (PDFs) as follows:
Pm(m) = 1
√2πσm
exp −(m−µm)2
2σ2
m,(6)
and
Ps(s) = (1 −s)βs−1sαs−1
B(αs, βs),(7)
where B(., .)is the Beta function [43], and the parameters
of the two PDFs were estimated as µm= 115.94, σm=
27.99, αs= 4.4, βs= 10.1. A statistical image naturalness
measure, F2, was then proposed as:
F2(m, s) = 1
TPm(m)Ps(s),(8)
where Tis a normalization factor given by T=
max{Pm(m)Ps(s)}. In our proposed method, we use
F2(m, s)(or simply F2) as another feature for measuring the
naturalness/quality of the tone-mapped images. Note that this
feature is computed based on only the luminance information
of the tone-mapped image.
3) Feature F3:It is widely known that natural images
possess certain regular statistical properties that are affected
by the presence of distortions [24], [25]. In fact, the regularity
of natural scene statistics (NSS) has been well established
and studied in the vision science literature in both the spatial
domain [24], and in the wavelet domain [44]. For instance,
Wainwright et al. [34] empirically found that bandpass natural
images exhibit striking nonlinear statistical dependencies, and
applying a specific nonlinear operation called the divisive
normalization operation [24], similar to the nonlinear response
behavior of certain cortical neurons [45], wherein the rectified
linear neuronal responses are divided by a weighted sum of
rectified neighboring responses, greatly reduces such observed
statistical dependencies and also Gaussianizes the processed
image data. Such an operation is also known as the contrast-
gain control, which models the local contrast masking process
in the early human vision [46], [34]. In the past decade,
several NSS-based IQA methods have been built on this
principle. For instance, in [25], a statistical model of locally-
normalized luminance coefficients in the spatial domain was
proposed for no-reference IQA, where the parameters of this
model were shown to be useful for measuring image natu-
ralness and image perceptual quality. The locally-normalized
luminance coefficients in this model are computed by local
mean subtractions and divisive normalization. The resultant
coefficients are called mean-subtracted contrast normalized
(MSCN) coefficients [25].
In this paper, inspired by the method from [25], we pro-
pose to use the similarity between the MSCN coefficients
of a given HDR image and its tone-mapped LDR image as
another feature for measuring the visual quality of the tone-
mapped image. In fact, we reason that two images with similar
perceptual quality should produce similar MSCN images. For
this purpose, let Cube a color channel of a given RGB image
Iof size W×H, where u∈ {R, G, B}. Also, let Cu(i, j )
denote the value of Cuat location (i, j). The MSCN pixel
(coefficient) at location (i, j)in Cuis obtained as:
Mu(i, j) = Cu(i, j )−µu(i, j)
σu(i, j)+1 ,(9)
where µu(i, j)and σu(i, j )are the local mean and local
standard deviation around location (i, j)defined as:
µu(i, j) =
K
X
k=−K
Z
X
z=−Z
wk,z Cu(i+k, j +z),(10)
σu(i, j) = v
u
u
t
K
X
k=−K
Z
X
z=−Z
wk,z Cu(i+k, j +z)−µu(i, j)2,
(11)
where wk,z is the value at location (k, z)in a 2D Gaussian
weighting function of size (2K+ 1) ×(2Z+ 1). In our
experiments we used K=Z= 3. We also found that the
accuracy of the proposed method is not very sensitive to the
values of Kand Z.
By calculating (9) for all pixels, an MSCN field (image) Mu
of size W×His obtained for each color channel Cu. Let Mh
u
5
Fig. 2. An illustration of the MSCN fields. Top row: the original tone-mapped
image (left) and its lightness component (right). Bottom row: the MSCN field
of the the tone-mapped image (left), and the MSCN field of the original HDR
image (right).
and Ml
ube the MSCN fields of a given HDR image and its
tone-mapped LDR image in channel u, respectively. Based on
Mh
uand Ml
uwe compute the following feature map for each
color channel u:
Ψu=2Mh
uMl
u+c
Mh
uMh
u+Ml
uMl
u+c,(12)
where similar to (5), the division is performed in a pixel-wise
manner and c= 1. Finally, we compute the following feature
vector based on (12):
F3= [mean(ΨR), mean(ΨG), mean(ΨB)].(13)
An illustration of the MSCN fields of an HDR image and its
tone-mapped image are shown in Fig. 2.
4) Feature F4:It is known that the MSCN coefficients
of natural pristine images follow a Gaussian distribution
[25]. However, the distribution deviates from Gaussian by
the presence of various distortions. Hence, by quantifying
the amount of deviation from Gaussianity, it is possible to
predict the perceptual quality of a distorted image [25]. It was
also shown that a zero-mean generalized Gaussian distribution
(GGD) can be used to model the statistical distribution of
the MSCN coefficients [25], where the zero-mean GGD for
random variable xis given by:
f(x;α, β) = α
2ηΓ(1/α)exp −|x|
ηα,(14)
where
η=sβΓ(1/α)
Γ(3/α),(15)
and Γ(.)is the Gamma function [43]. In this distribution, α
controls the shape of the distribution while βcontrols the
variance. The parameters of the GGD (α, β), can be estimated
using the moment-matching based approach proposed in [47].
The GGD parameters of the MSCN field of distorted images
have been used in several no-reference IQA methods [25]. Due
to the success of such methods, we also exploit them here in
our proposed approach. Let (αu, βu)be the GGD parameters
of Ml
u, i.e., the MSCN field of the color channel uof the
given RGB tone-mapped image.
In [25], it was shown that modeling the statistical relation-
ships between neighboring MSCN pixels is also beneficial
for better estimation of the image perceptual quality. As
discussed in [25], MSCN coefficients in pristine images are
very homogenous and the signs of adjacent coefficients exhibit
a regular structure, which gets disturbed in the presence of dis-
tortion. Inspired by this finding, we also model the distribution
of pairwise products of neighboring MSCN coefficients along
the horizontal and vertical directions in Ml
u. Let Fx,u(i, j )
and Fy,u(i, j )be the pairwise product of neighboring MSCN
pixels in Ml
ualong the horizontal and vertical directions,
respectively. Specifically, we define:
Dl
x,u(i, j ) = Ml
u(i, j)Ml
u(i, j + 1),(16)
Dl
y,u(i, j ) = Ml
u(i, j)Ml
u(i+ 1, j).(17)
We model each of these products by a separate zero-mean
GGD. Let (α(x)
u, β(y)
u)be the corresponding GGD parameters
for Dl
x,u and Dl
y,u, respectively. Based on the above parame-
ters, we define the following feature vector:
F4=αu, α(t)
u, βu, β(t)
uu∈ {R, G, B}, t ∈ {x, y }.(18)
5) Feature F5:To describe the shape of the distribution of
the MSCN fields even better, we define the following feature
vector based on the skewness and kurtosis of the MSCN field
of the 3 RGB channels of the given tone-mapped image: F5=
[skewness(Ml
u), kurtosis(Ml
u)|u∈ {R, G, B}].
6) Feature F6:During the tone-mapping process, several
distortions may be produced in the resultant image, so the
structure of the original HDR image may be degraded sig-
nificantly, thereby reducing the visual quality of the tone-
mapped image. To capture the structural degradation caused
by the tone-mapping process, we propose another dynamic-
range-independent feature based on the popular local binary
pattern (LBP) operator [35].
The LBP operator is able to describe the local structure of an
image neighborhood with a specific code. In other words, LBP
is able to encode the image primitive micro-structures, such
as edges, lines, spots and other local features. As discussed
in [35], a histogram of the LBP codes (of length p+ 2)
computed by a rotation-invariant and uniform variant of LBP
denoted by LBP riu2
(p,r)provides an effective feature vector for
describing the image structure (texture), where pis the number
of neighbors and ris the radius of the local neighborhoods.
The LBP operator is illumination invariant in the sense that
the LBP codes of a texture under different illumination levels
do not change.
To capture the structural information of the given tone-
mapped image and its reference HDR image, we compute the
the normalized LBP histogram of the gradient magnitude of
the luminance channel of both images, and use the resultant
normalized histograms as the following feature vector:
F6=hhistLBP riu2
(p,r)Yl, histLBP r iu2
(p,r)Yhi,(19)
where Yland Yhare the luminance channel of the tone-
mapped image and its HDR image, respectively. In our ex-
periments, we used p= 8 and r= 1, so the length of F6
6
is 20. Note that as mentioned earlier, the gradient magnitude
provides contrast information, which is an important factor
affecting HVS’s perception of quality [36], and this is the
reason for using the gradient magnitude. Hence, F6describes
the spatial structure of the contrast information of the given
tone-mapped image and its reference HDR image. Similar to
[36], we use the Scharr gradient operator [36] for computing
the gradient magnitude.
7) Feature F7:As another feature, we propose to use the
key [2] of both the tone-mapped image Land its HDR image
Has F7= [key(L), k ey(H)], where the key for an image I
is defined as follows:
key(I) = log(Iav g)−log(Imin )
log(Imax )−log(Imin),(20)
where Iavg ,Imax, and Imin denote the average, maximum,
and minimum intensity/luminance of I, respectively.
8) Feature F8:Due to the dynamic range compression,
some regions in a tone-mapped image may be saturated
(clipped) at either the maximum or minimum intensity level,
causing the loss of details in very bright or very dark regions.
To measure this effect, we use the percentage of the area of
such saturated regions as another feature as follows:
F8= [φ(Yl>0.9×Ymax), φ(Yl<0.1×Ymax )],(21)
where Ylis the luminance channel of the tone-mapped image,
Ymax is the maximum intensity/luminance in Yl, and φ(B)
returns the number of pixels in Ylthat satisfy rule Bdivided
by the total number of pixels in Yl.
B. Regression
In order to map the extracted features to a quality score,
we first concatenate all the computed features to obtain a
single feature vector for each given tone-mapped image. We
then train a support vector regressor (SVR) [37] with the
feature vector of a set of training images along with their
corresponding mean opinion scores (MOS) [48]. SVR is the
most common tool for learning a nonlinear mapping between
image features and a single quality score among various
existing IQA methods [25], [49], [50]. The SVR is widely
used in many applications due to its high accuracy, ability to
deal with high-dimensional data, and flexibility in modeling
diverse sources of data [48]. In our approach, we used an SVR
with a radial basis kernel function. After training the SVR,
given any test image’s feature vector as input to the trained
SVR, a quality score can be predicted. The training procedure
will be described in Section III.
III. RES ULTS
To evaluate the proposed method, we utilized the dataset
provided in [18], which we here refer to as Yeganeh’s
database, and also the PairComp TMO database provided in
[51]. Yeganeh’s database contains 15 HDR images of various
resolutions (VGA or less) for which there are 8 different LDR
images, which are produced by different TMOs. The quality
of the LDR images is ranked from 1 (best quality) to 8 (worst
quality). The ranks were obtained based on the subjective
TABLE I
ACRONYMS FOR VARIOUS MET HO DS .
Method Acronym
TIQ M1
TMQI M2
TMQI-NSS-Ent M3
TMQI-NSS-σM4
FSITMRM5
FSITMGM6
FSITMBM7
FSITMR-TMQI M8
FSITMG-TMQI M9
FSITMB-TMQI M10
TMQI-II M11
BTMQI M12
assessment of 20 subjects using a direct scaling method [52].
Hence, for each LDR image in this database, there is a mean
opinion score (MOS), which represents its visual quality.
The PairComp TMO database contains 10 HDR images in
FullHD (1920 ×1080 pixels) for which there are 9 different
tone-mapped images. Fig. 3 shows 9 different tone-mapped
images of an HDR image from this database. The subjective
data provided in this database was obtained by paired com-
parison (PC), which is a popular indirect scaling method [52].
It is known that PC is more reliable than other methodologies
like ranking and rating [51], [53]. Also, PC is less demanding
for the observers as compared to direct scaling methods since
it is easier for the observers to select which image or video in
a pair has a better quality rather than to relate the quality of an
image or a video to a particular quality level on a given scale
[52]. Moreover, PC has a higher discriminatory power when
the small qualitative differences appear in the dataset [51],
[52], [54]. To reduce the number of comparisons, the Adaptive
Square Design (ASD) PC procedure [55] was employed in
this database. The total number of paired comparisons in
this database was 18 ×10 = 180 for each observer, and 40
observers participated in the study. In half of the experiments
the HDR contents were displayed as a reference, and the
observers were asked to choose the image with higher quality
with respect to the HDR image. We used this half of the
database in our experiments.
For the objective evaluation of various methods including
the proposed method on Yeganeh’s database, we utilized the
well-known and widely-used Spearman rank-order correlation
coefficient (SRCC) and the Kendall rank-order correlation
coefficient (KRCC) metrics [48], [49], [56]. The SRCC metric
serves as a measure of prediction monotonicity while the
KRCC metric is a measure of rank correlation.
We used the Yeganeh’s database to train the SVR described
in the previous section. Specifically, for SVR learning, we
randomly partitioned the Yeganeh’s database into a training
and test sets such that 80% of the images in the database
constitues the training set, and the remaining 20% makes
the test set. For performance comparisons, the median/mean
values of the SRCC and KRCC metrics were calculated across
1000 random training-test splits to mitigate any bias due to the
division of data. Note that such a training procedure is widely
7
Fig. 3. Different tone-mapped images for an HDR image (’14.exr’) from PairComp TMO. Each image was produced by a different TMO.
used in related works [25], [27], [49], [56].
In Table II, the accuracy of the proposed method (TIQ) is
compared with several existing methods based on the SRCC
and KRCC metrics on Yeganeh’s database. For simplicity,
we name each of the compared methods with an acronym
as defined in Table I. To generate the results in Table II,
the median, mean, and standard deviation (STD) of SRCC
and KRCC of various methods across all the 1000 random
test sets were computed, and reported in this table. As seen
from these results, the median and the mean SRCC of the
proposed method is higher than the other compared methods.
Also, the median and the mean KRCC of the proposed method
is higher than the other methods. From these results, we see
that the proposed method outperforms ther other methods on
Yeganeh’s database. However, to assess whether this conclu-
sion can be generalized beyond this particular database, we
need to perform statistical significance testing. To this end,
we performed a two-tailed t-test [57] for the difference in
the mean SRCC and KRCC scores. The resulting p-values
are shown in Table III for SRCC and Table IV for KRCC.
From Table III, we see that the proposed method (M1) has a
higher mean SRCC score than M2, M3 and M4 at the 95%
significance level (p < 0.05). However, it is statistically tied
with other methods at the 95% significance level (p > 0.05).
From Table IV, M1 has a higher KRCC score than M3 and
M4 (p < 0.05) and is statistically tied with other methods
(p > 0.05) at the 95% significance level. Due to the small
number of images in Yeganeh’s database and the resulting
small number of degrees of freedom in the t-test, it becomes
hard to obtain high levels of statistical significance for the
difference in the mean SRCC and KRCC scores. Hence,
we supplement these results by testing on a larger database,
PairComp TMO.
In fact, in order to use an objective IQA metric, one must
know whether the metric’s score difference between two given
TABLE II
COMPARING THE PROPOSED METHOD (TIQ) WITH VARIOUS METHODS ON
YEG ANE H’S D ATABAS E.
Method SRCC KRCC
Mean Median STD Mean Median STD
M1 0.885 0.892 0.062 0.771 0.792 0.123
M2 0.804 0.788 0.109 0.680 0.639 0.125
M3 0.655 0.710 0.296 0.548 0.576 0.270
M4 0.665 0.716 0.288 0.557 0.574 0.272
M5 0.814 0.810 0.120 0.713 0.715 0.140
M6 0.815 0.832 0.133 0.694 0.715 0.170
M7 0.717 0.854 0.256 0.595 0.713 0.268
M8 0.853 0.855 0.087 0.751 0.715 0.140
M9 0.843 0.856 0.086 0.733 0.786 0.105
M10 0.812 0.858 0.122 0.684 0.715 0.142
M11 0.768 0.790 0.132 0.626 0.618 0.169
M12 0.803 0.823 0.123 0.645 0.653 0.120
images is statistically significant or not. Hence, as discussed
in [58], a quantification is needed for the accuracy or the
so-called resolving power of the objective IQA metric. The
resolving power of an IQA metric can be defined as the
difference between two scores of the IQA metric for which
the corresponding subjective-score distributions have means
that are statistically different from each other at a certain
significance level like 95% [58].
To overcome the problems mentioned for metrics like SRCC
and KRCC, in recommendation ITU-T J.149 [58], it was sug-
gested to use classification errors on paired comparison data to
evaluate the accuracy or resolving power of a given objective
IQA metric. A classification error is made when the prediction
made by the objective IQA metric and the judgement provided
by the subjective evaluation lead to different conclusions on a
pair of stimuli, A and B, for example. Three types of error can
happen [58]: False Tie, False Ranking, and False Differentia-
tions. The relevant situations in which these errors happen are
listed in Table V. In such a methodology, the percentage of
8
TABLE III
THE R ESU LTS O F TH E STATIS TI CA L SI GN IFI CA NC E TES T ON T HE O BTA INE D ME AN SRCC VAL UE S. THE VAL UE IN E AC H CE LL SH OW S TH E
CORRESPONDING p-VALUE W HE N CO MPA RIN G TH E ME TH OD S IN T HE C OR RE SPO ND IN G ROW A ND C OL UM N.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
M1 1.0000 0.0403 0.0052 0.0046 0.1705 0.2371 0.0793 0.6410 0.2701 0.1154 0.0032 0.0001
M2 0.0403 1.0000 0.0673 0.0684 0.9858 0.7063 0.1804 0.0241 0.0043 0.7983 0.0043 0.7654
M3 0.0052 0.0673 1.0000 0.5819 0.0937 0.0614 0.7438 0.0209 0.0235 0.0810 0.0027 0.0001
M4 0.0046 0.0684 0.5819 1.0000 0.0993 0.0632 0.7691 0.0208 0.0222 0.0826 0.0032 0.0003
M5 0.1705 0.9858 0.0937 0.0993 1.0000 0.4321 0.1154 0.1972 0.3653 0.9132 0.0001 0.0034
M6 0.2371 0.7063 0.0614 0.0632 0.4321 1.0000 0.0867 0.3313 0.5382 0.7715 0.0005 0.0519
M7 0.0793 0.1804 0.7438 0.7691 0.1154 0.0867 1.0000 0.0623 0.0830 0.1111 0.0036 0.0000
M8 0.6410 0.0241 0.0209 0.0208 0.1972 0.3313 0.0623 1.0000 0.4203 0.0530 0.0001 0.0001
M9 0.2701 0.0043 0.0235 0.0222 0.3653 0.5382 0.0830 0.4203 1.0000 0.0652 0.0034 0.0003
M10 0.1154 0.7983 0.0810 0.0826 0.9132 0.7715 0.1111 0.0530 0.0652 1.0000 0.0030 0.0527
M11 0.0032 0.0043 0.0027 0.0032 0.0001 0.0005 0.0036 0.0001 0.0034 0.0030 1.0000 0.0024
M12 0.0001 0.7654 0.0001 0.0003 0.0034 0.0519 0.0000 0.0001 0.0003 0.0527 0.0024 1.0000
TABLE IV
THE R ESU LTS O F TH E STATIS TI CA L SI GN IFI CA NC E TES T ON T HE O BTA INE D ME AN KRCC VAL UE S. THE VAL UE IN E AC H CE LL SH OW S TH E
CORRESPONDING p-VALUE W HE N CO MPA RIN G TH E ME TH OD S IN T HE C OR RE SPO ND IN G ROW A ND C OL UM N.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
M1 1.0000 0.0776 0.0023 0.0034 0.3616 0.2710 0.0723 0.9974 0.6352 0.1424 0.0001 0.0000
M2 0.0776 1.0000 0.0652 0.0713 0.7120 0.7763 0.1811 0.0177 0.0071 0.9920 0.0038 0.0567
M3 0.0023 0.0652 1.0000 0.5814 0.0545 0.0571 0.8454 0.0173 0.0208 0.1027 0.0001 0.0000
M4 0.0034 0.0713 0.5814 1.0000 0.0663 0.0671 0.8763 0.0193 0.0220 0.1075 0.0001 0.0000
M5 0.3616 0.7120 0.0545 0.0663 1.0000 0.8273 0.0620 0.3444 0.5491 0.6980 0.0001 0.0001
M6 0.2710 0.7763 0.0571 0.0671 0.8273 1.0000 0.0708 0.2953 0.4488 0.7570 0.0001 0.0001
M7 0.0723 0.1811 0.8454 0.8763 0.0620 0.0708 1.0000 0.0474 0.0651 0.1165 0.0001 0.0001
M8 0.9974 0.0177 0.0173 0.0193 0.3444 0.2953 0.0474 1.0000 0.4224 0.0173 0.0001 0.0001
M9 0.6352 0.0071 0.0208 0.0220 0.5491 0.4488 0.0651 0.4224 1.0000 0.0196 0.0000 0.0000
M10 0.1432 0.9926 0.1000 0.1078 0.6982 0.7571 0.1160 0.0179 0.0196 1.0000 0.0001 0.0001
M11 0.0001 0.0038 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0000 0.0001 1.0000 0.0045
M12 0.0000 0.0567 0.0000 0.0000 0.0001 0.0001 0.0001 0.0001 0.0000 0.0001 0.0037 1.0000
TABLE V
VARIOUS TYPES O F TH E CLASSIFICATION ERRO RS .
Subjective
Objective A > B A =B A < B
A > B Correct Decision False Differentiation False Ranking
A=BFalse Tie Correct Decision False Tie
A < B False Ranking False Differentiation Correct Decision
Correct Decision, False Tie, False Differentiation, and False
Ranking are recorded from all possible distinct pairs as a
function of the difference in the metric values. Therefore,
using this approach, a set of graphs are obtained for the given
objective IQA metric. To compare the accuracy of various IQA
metrics, one needs to compare their corresponding graphs.
However, this comparison may not be very easy and practical,
especially if the number of metrics being compared is high
[59]. Also, in [58], no method was proposed to determine the
statistical significance of the obtained results [59].
Very recently, a new analysis tool was proposed in [52]
to evaluate the performance and accuracy of objective IQA
metrics based on ground truth preference scores using classi-
fication errors characterized by the popular receiver operating
characteristic (ROC) analysis [60]. In the ROC analysis, sim-
ilar to the classification errors analysis in [58], a threshold on
the objective scores is varied while recording the true positive
rates (TPR) and false positive rates (FPR). By plotting TPR
as a function of FPR, an ROC curve is obtained.
The ROC analysis was designed for binary classification
scenarios. However, when comparing a pair of stimuli, A and
B, there are 3 possible outcomes: A < B, A =B, or A>B.
Hence, to have a better understanding of the objective metric
performance in the different situations, the following three
separate ROC analyses are performed [52]:
1) Different/Similar ROC Analysis: this analysis is used
to measure the ability of the IQA metric to discriminate
between significant and not significant visual quality
differences in a pair of stimuli.
2) Better/Worse ROC Analysis: this analysis measures the
ability of the metric to determine which stimulus in a
pair has a better visual quality.
3) Better/Equal-Worse ROC Analysis: this analysis il-
lustrates the ability of the metric to determine whether
stimulus A has similar or worse visual quality than
stimulus B or if it has significantly better visual quality
than stimulus B.
Hence, for any given objective IQA metric, 3 ROC curves
corresponding to the abovementioned 3 ROC analyses are
obtained. To compare the ROC curves of two different IQA
metrics, the authors in [52] suggested to compare the area
under the curve (AUC) of the corresponding ROC curves of
the two methods. They also used the statistical test proposed
by DeLong et. al [61] to check for the significance of the
difference between the AUC values of the two metrics.
In this paper, we used the ROC analysis tool proposed in
[52] on the PairComp TMO database to compare the perfor-
mance of the proposed method with other existing methods. To
9
Fig. 4. The ROC curves for various methods in three different analyses (Better/Equal-Worse, Better/Worse, and Different/Similar).
get the quality scores on this database, we used an SVR trained
on all images in Yeganeh’s database. This allows us to evaluate
how well the proposed method generalizes to other unseen
databases. The related ROC curves for different methods are
shown in Fig. 4.
Note that the point at the top left corner of the ROC space
corresponds to a perfect classification whereas a completely
random performance (guess) would give a point along the
diagonal line from the bottom left corner to the top right corner
in this space. Also, as discussed in [52], the Better/Equal-
Worse ROC analysis relies on both abilities to discriminate
between significant and not significant visual quality differ-
ences in a pair of stimuli, and also to determine which stimulus
in a pair has the better visual quality for pairs with significant
visual quality differences. Therefore, the Better/Equal-Worse
ROC curve typically lies between the curves for the Differ-
ent/Similar and Better/Worse analyses. Moreover, if the three
ROC curves for a metric are well seperated, then the metric is
better at distinguishing between significant and not significant
visual quality differences than at determining which stimulus
in a pair has the better visual quality or vice versa, depending
on which ROC curve lies above the other ones [52].
From Fig. 4 we note that methods M2, M8, M9, and M10
perform about the same in any of the three ROC analyses
since their three ROC curves are close to each other, and also
they are close to the diagonal line, which means that their
performance is close to random. The other methods, however,
have better performance as their three ROC curves come closer
to the top left corner of the ROC space.
The AUC values for different cases of all the compared
methods are listed in Table VI. The significance of the dif-
ference between the AUC values of any pair of the compared
methods at the 95% significance level for each of the 3 ROC
analyses was also calculated using the statistical test proposed
by DeLong et. al [61]. The corresponding p-values for the Bet-
ter/Worse, Better/Equal-Worse, and Different/Similar analyses
were shown in Tables VII, VIII, and IX, respectively.
As seen from the results in Table VI and VII, the top
three performing methods in the Better/Worse analysis are the
proposed method (M1), M3, and M4. In particular, the AUC
of the proposed method in this analysis is significantly higher
than the other methods with p < 0.05 for all pairs, which
means that its performance is statistically better at the 95%
significance level. This states that the proposed method has a
better ability to determine which stimulus in a pair has better
visual quality as compared to other methods.
As seen from the results in Table VI and VIII, the top
three performing methods in the Better/Equal-Worse analysis
are again the proposed method (M1), M3, and M4. Also we
note that the AUC of the proposed method in this analysis is
significantly higher than the other methods. This shows that
the proposed method has better ability to determine whether
stimulus A has similar or worse visual quality than stimulus
B, or if it has significantly better visual quality than stimulus
B.
Finally, from the results reported in Tables VI and IX we
10
TABLE VI
COMPARING THE PROPOSED METHOD (M1) WITH VARIOUS METHODS IN 3
DIFFERENT ROC ANA LYSE S BA SE D ON AUC .
Method Different/Similar Better/Equal-Worse Better/Worse
M1 0.5538 0.6871 0.7602
M2 0.4826 0.5387 0.5565
M3 0.5480 0.6483 0.7040
M4 0.5474 0.6465 0.7015
M5 0.5169 0.6086 0.6520
M6 0.5194 0.5987 0.6382
M7 0.5284 0.5993 0.6375
M8 0.4943 0.5494 0.5709
M9 0.4925 0.5479 0.5691
M10 0.4912 0.5485 0.5704
M11 0.4733 0.5272 0.5406
M12 0.4906 0.5365 0.5504
TABLE X
CON TRI BU TI ON O F DI FFE RE NT I NDI VI DU AL FE ATUR ES B AS ED O N THE IR
MEDIAN SRCC ON YE GAN EH ’S DATABA SE .
F1F2F3F4F5F6F7F8
0.52 0.68 0.50 0.68 0.44 0.55 0.50 0.55
observe that in the Different/Similar analysis, the performance
and accuracy of the proposed method is statistically better than
all methods except for M3, M4, and M5. This means that the
proposed method has a better ability than all methods except
for M3, M4, and M5 in distinguishing between significant and
not significant visual quality differences in a pair of stimuli.
Based on the above discussion, we can conclude that the
proposed method outperforms existing methods in most cases
as several lines of evidence confirm that the proposed method
has a better performance than the other methods.
We next evaluate the contribution of each of the utilized
features to understand the relation between each of the features
and perceptual quality better. For this purpose, we trained sep-
arate SVRs based on each of the individual features extracted
from the same random training data that was described earlier
for Yeganeh’s database. We report the median SRCC scores
of various features across the 1000 test sets in Table X. As
seen from these results, F2and F4achieve the highest SRCC
scores (0.68) while F5has the lowest score (0.44). Also, the
SRCC score of all individual features is below 0.69, which
means that each feature alone is not sufficient to achieve a high
quality prediction accuracy. This justifies the reason for using
a bag of features in the proposed method. We also measured
the SRCC score for different combinations of the features. We
observed that best accuracy is obtained when using all features
together with a median SRCC of 0.89. Note that features F1,
F3,F4, and F5use color information. Among these, F3,F4,
and F5can also be computed based on only the luminance
information. If we disable F1, and use all other features, the
overall median SRCC is reduced to 0.85 from 0.89. Also, if we
use only the luminance information for computing F3,F4, and
F5with all other features enabled, the overall median SRCC
is reduced from 0.89 to 0.83. This indicates that the color
information boosts the accuracy of the proposed method.
IV. CONCLUSIONS
In this paper we presented a novel bag of features (BOF)
method for full-reference objective quality assessment of tone-
mapped images. In the proposed method a set of features
is first extracted from a given tone-mapped image and its
reference HDR image to measure different aspects of the
tone-mapped image including its structural fidelity, natural-
ness, color, overall brightness, etc. A support vector regressor
(SVR) is then trained based on the features extracted from
a set of training images, and the trained SVR is used to
predict the quality of a given tone-mapped image. Unlike the
existing similar methods that overlook the color information,
the proposed method utilizes the color information. The per-
formance and accuracy of the proposed method was compared
with several existing methods on two databases. Experimental
results demonstrated that the proposed method achieves a high
accuracy as compared to other existing methods. Also, the
statistical significance testing of the obtained results confirmed
the superiority of the proposed method over the existing
similar methods in many cases at the 95% significance level.
The proposed method is generic, and any other suitable feature
can be added to it to improve its accuracy without changing
the overall framework.
REFERENCES
[1] E. Reinhard, G. Ward, S. Pattanaik, P. Debevec, W. Heidrich, and
K. Myszkowski, High Dynamic Range Imaging: Acquisition, Display,
and Image-Based Lighting. San Mateo, CA: Morgan Kaufmann, 2010.
[2] E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward, and
K. Myszkowski, High Dynamic Range Imaging: Acquisition, Display,
and Image-Based Lighting. Morgan Kaufmann, 2005.
[3] M. Cadik, M. Wimmer, L. Neumann, and A. Artusi, “Evaluation of HDR
tone mapping methods using essential perceptual attributes,” Computers
& Graphics, vol. 32, pp. 330–349, 2008.
[4] Y. Chen, X. Zhao, L. Zhang, and J.-W. Kang, “Multiview and 3D video
compression using neighboring block based disparity vectors,” IEEE
Trans. Multimedia, vol. 18, no. 4, pp. 576–589, 2016.
[5] A. Hameed, R. Dai, and B. Balas, “A decision-tree-based perceptual
video quality prediction model and its application in FEC for wireless
multimedia communications,” IEEE Trans. Multimedia, vol. 18, no. 4,
pp. 764–774, 2016.
[6] L. P. Van, J. D. Praeter, G. V. Wallendael, S. V. Leuven, J. D. Cock, and
R. V. de Walle, “Efficient bit rate transcoding for high efficiency video
coding,” IEEE Trans. Multimedia, vol. 18, no. 3, pp. 364–378, 2016.
[7] L. Ma, L. Xu, Y. Zhang, Y. Yan, and K. N. Ngan, “No-reference
retargeted image quality assessment based on pairwise rank learning,”
IEEE Trans. Multimedia, vol. 18, no. 11, pp. 2228–2237, 2016.
[8] F. Shao, K. Li, W. Lin, G. Jiang, and Q. Dai, “Learning blind quality
evaluator for stereoscopic images using joint sparse representation,”
IEEE Trans. Multimedia, vol. 18, no. 10, pp. 2104–2114, 2016.
[9] K. Gu, S. Wang, H. Yang, G. Zhai, X. Yang, and W. Zhang, “Saliency-
guided quality assessment of screen content images,” IEEE Trans.
Multimedia, vol. 18, no. 6, pp. 1098–1110, 2016.
[10] K. Gu, S. Wang, H. Yang, W. Lin, G. Zhai, X. Yang, and W. Zhang,
“Saliency-guided quality assessment of screen content images,” IEEE
Trans. Multimedia, vol. 18, no. 6, pp. 1098–1110, Aug. 2016.
[11] K. Gu, D. Tao, J.-F. Qiao, and W. Lin, “Learning a no-reference quality
assessment model of enhanced images with big data,” IEEE Trans.
Neural Networks and Learning Systems, 2017.
[12] K. Gu, G. Zhai, X. Yang, and W. Zhang, “Hybrid no-reference quality
metric for singly and multiply distorted images,” IEEE Trans. Broad-
casting, vol. 60, no. 3, pp. 555–567, Aug. 2014.
[13] Q. Li, W. Lin, J. Xu, and Y. Fang, “Blind image quality assessment using
statistical structural and luminance features,” IEEE Trans. Multimedia,
vol. 18, no. 12, pp. 2457–2469, 2016.
[14] L. Li, D. Wu, J. Wu, H. Li, W. Lin, and A. C. Kot, “Image sharpness
assessment by sparse representation,” IEEE Trans. Multimedia, vol. 18,
no. 6, pp. 1085–1097, 2016.
11
TABLE VII
THE R ESU LTS O F TH E STATIS TI CA L SI GN IFI CA NC E TES T ON T HE BE TT ER /WOR SE ROC ANA LYSI S. T HE VAL UE I N EAC H CE LL S HOW S TH E
CORRESPONDING p-VALUE W HE N CO MPA RIN G TH E ME TH OD S IN T HE C OR RE SPO ND IN G ROW A ND C OL UM N.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
M1 1.0000 0.0000 0.0024 0.0020 0.0004 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000
M2 0.0000 1.0000 0.0008 0.0009 0.0000 0.0010 0.0023 0.0000 0.0018 0.0037 0.5555 0.3210
M3 0.0024 0.0008 1.0000 0.0235 0.1847 0.0980 0.0931 0.0022 0.0019 0.0020 0.0001 0.0007
M4 0.0020 0.0009 0.0235 1.0000 0.2079 0.1124 0.1069 0.0026 0.0023 0.0025 0.0001 0.0007
M5 0.0004 0.0000 0.1847 0.2079 1.0000 0.0067 0.1721 0.0001 0.0000 0.0001 0.0009 0.0000
M6 0.0001 0.0010 0.0980 0.1124 0.0067 1.0000 0.9315 0.0027 0.0017 0.0017 0.0043 0.0008
M7 0.0001 0.0023 0.0931 0.1069 0.1721 0.9315 1.0000 0.0066 0.0044 0.0040 0.0040 0.0018
M8 0.0000 0.0000 0.0022 0.0026 0.0001 0.0027 0.0066 1.0000 0.1135 0.8325 0.2660 0.0000
M9 0.0000 0.0018 0.0019 0.0023 0.0000 0.0017 0.0044 0.1135 1.0000 0.4785 0.2969 0.0015
M10 0.0000 0.0037 0.0020 0.0025 0.0001 0.0017 0.0040 0.8325 0.4785 1.0000 0.2745 0.0033
M11 0.0000 0.5555 0.0001 0.0001 0.0009 0.0043 0.0040 0.2660 0.2969 0.2745 1.0000 0.4832
M12 0.0000 0.3210 0.0007 0.007 0.0000 0.0008 0.0018 0.0000 0.0015 0.0033 0.4832 1.0000
TABLE VIII
THE R ESU LTS O F TH E STATIS TI CA L SI GN IFI CA NC E TES T ON T HE BE TT ER /EQUA L-WO RSE ROC A NALYS IS . TH E VALU E IN E ACH C EL L SH OWS T HE
CORRESPONDING p-VALUE W HE N CO MPA RIN G TH E ME TH OD S IN T HE C OR RE SPO ND IN G ROW A ND C OL UM N.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
M1 1.0000 0.0000 0.0100 0.0062 0.0013 0.0006 0.0010 0.0000 0.0000 0.0000 0.0000 0.0000
M2 0.0000 1.0000 0.0034 0.0039 0.0005 0.0048 0.0076 0.0001 0.0054 0.0130 0.5989 0.1341
M3 0.0100 0.0034 1.0000 0.0627 0.2420 0.1481 0.1521 0.0077 0.0068 0.0070 0.0005 0.0036
M4 0.0062 0.0039 0.0627 1.0000 0.2647 0.1639 0.1682 0.0088 0.0078 0.0080 0.0006 0.0040
M5 0.0013 0.0005 0.2420 0.2647 1.0000 0.0245 0.3111 0.0010 0.0006 0.0006 0.0038 0.0060
M6 0.0006 0.0048 0.1481 0.1639 0.0245 1.0000 0.9231 0.0112 0.0075 0.0074 0.0120 0.0053
M7 0.0010 0.0076 0.1521 0.1682 0.3111 0.9231 1.0000 0.0182 0.0130 0.0114 0.0104 0.0082
M8 0.0000 0.0001 0.0077 0.0088 0.0010 0.0112 0.0182 1.0000 0.1104 0.6512 0.3204 0.0002
M9 0.0000 0.0054 0.0068 0.0078 0.0006 0.0075 0.0130 0.1104 1.0000 0.6831 0.3568 0.0055
M10 0.0000 0.0130 0.0070 0.0080 0.0006 0.0074 0.0114 0.6512 0.6831 1.0000 0.3417 0.0177
M11 0.0000 0.5989 0.0005 0.0006 0.0038 0.0120 0.0104 0.3204 0.3568 0.3417 1.0000 0.6103
M12 0.0000 0.1341 0.0036 0.0040 0.0060 0.0053 0.0082 0.0002 0.0055 0.0177 0.6103 1.0000
TABLE IX
THE R ESU LTS O F TH E STATIS TI CA L SI GN IFI CA NC E TES T ON T HE DI FFE RE NT /SIMILAR ROC A NALYS IS . TH E VALU E IN E ACH C EL L SH OWS T HE
CORRESPONDING p-VALUE W HE N CO MPA RIN G TH E ME TH OD S IN T HE C OR RE SPO ND IN G ROW A ND C OL UM N.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
M1 1.0000 0.0403 0.3052 0.4102 0.5813 0.0317 0.0303 0.0210 0.0205 0.0206 0.0160 0.0165
M2 0.0403 1.0000 0.1348 0.1405 0.2699 0.2447 0.1466 0.0805 0.1989 0.3469 0.8054 0.5432
M3 0.3052 0.1348 1.0000 0.8166 0.4696 0.4990 0.6391 0.2189 0.2007 0.1873 0.0918 0.1391
M4 0.4102 0.1405 0.8166 1.0000 0.4791 0.5094 0.6508 0.2268 0.2083 0.1948 0.0945 0.1511
M5 0.5813 0.2699 0.4696 0.4791 1.0000 0.8119 0.5679 0.4562 0.4168 0.3919 0.2694 0.2712
M6 0.0317 0.2447 0.4990 0.5094 0.8119 1.0000 0.5671 0.4209 0.3839 0.3590 0.2431 0.2403
M7 0.0303 0.1466 0.6391 0.6508 0.5679 0.5671 1.0000 0.2810 0.2525 0.2299 0.1628 0.1550
M8 0.0210 0.0805 0.2189 0.2268 0.4562 0.4209 0.2810 1.0000 0.4596 0.5616 0.5779 0.0912
M9 0.0205 0.1989 0.2007 0.2083 0.4168 0.3839 0.2525 0.4596 1.0000 0.7338 0.6106 0.2212
M10 0.0206 0.3469 0.1873 0.1948 0.3919 0.3590 0.2299 0.5616 0.7338 1.0000 0.6354 0.8123
M11 0.0160 0.8054 0.0918 0.0945 0.2694 0.2431 0.1628 0.5779 0.6106 0.6354 1.0000 0.6623
M12 0.0165 0.5432 0.1391 0.1511 0.2712 0.2403 0.1550 0.0912 0.2212 0.8123 0.6623 1.0000
[15] A. K. Moorthy and A. C. Bovik, “A two-step framework for constructing
blind image quality indices,” IEEE Signal Process. Lett., vol. 17, no. 5,
pp. 513–516, 2010.
[16] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality
assessment: A natural scene statistics approach in the DCT domain,”
IEEE Trans. Image Process., vol. 21, no. 8, pp. 3339–3352, 2012.
[17] K. Gu, M. Liu, G. Zhai, X. Yang, and W. Zhang, “Quality assessment
considering viewing distance and image resolution,” IEEE Trans. Broad-
casting, vol. 61, no. 3, pp. 520–531, Aug. 2015.
[18] H. Yeganeh and Z. Wang, “Objective quality assessment of tone-mapped
images,” IEEE Trans. Image Process., vol. 22, no. 2, pp. 657–667, Feb.
2013.
[19] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality
assessment: from error visibility to structural similarity,” IEEE Trans.
Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[20] H. Z. Nafchi, A. Shahkolaei, R. Farrahi, and M. Cheriet, “FSITM: A
feature similarity index for tone-mapped images,” IEEE Signal Process.
Lett., vol. 22, no. 8, pp. 1026–1029, Aug. 2015.
[21] D. Kundu and B. L. Evans, “Visual attention guided quality assessment
of tone-mapped images using scene statistics,” IEEE Int. Conf. on Image
Process., pp. 25–28, Sep. 2016.
[22] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual
attention for rapid scene analysis,” in IEEE Trans. Pattern Anal. Mach.
Intell., vol. 20, no.11, November 1998, pp. 1254–1259.
[23] J. Petit, R. Bremond, and J.-P. Tarel, “Saliency maps of high dynamic
range images,” Proc. ACM Symp. Appl. Perception in Graphics and
Visualization, pp. 134–134, 2009.
[24] D. L. Ruderman, “The statistics of natural images,” Network computa-
tion in neural systems, vol. 5, no. 4, pp. 517–548, 1994.
[25] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality
assessment in the spatial domain,” IEEE Trans. Image Process., vol. 21,
no. 12, pp. 4695–4708, 2012.
[26] K. Ma, H. Yeganeh, K. Zeng, and Z. Wang, “High dynamic range image
compression by optimizing tone mapped image quality index,” IEEE
12
Trans. Image Process., vol. 24, no. 10, pp. 3086–3097, 2015.
[27] K. Gu, S. Wang, G. Zhai, S. Ma, X. Yang, W. Lin, W. Zhang, and
W. Gao, “Blind quality assessment of tone-mapped images via analysis
of information, naturalness and structure,” IEEE Trans. Multimedia,
vol. 18, no. 3, pp. 432–443, 2016.
[28] M. Fairchild, Color appearance models. Wiley-IS&T, 2005.
[29] P. Shirley, A. Robison, and R. K. Morley, “A simple algorithm for
managing color in global tone reproduction,” Journal of Graphics, GPU,
and Game Tools, vol. 15, no. 3, pp. 199–205, 2011.
[30] R. Mantiuk, R. Mantiuk, A. Tomaszewska, and W. Heidrich, “Color
correction for tone mapping,” Computer Graphics Forum, vol. 28, no. 2,
2009.
[31] C. Schlick, “Quantization techniques for the visualization of high
dynamic range pictures,” Eurographics, pp. 7–20, 1994.
[32] J. van de Weijer and C. Schmid, “Boosting color saliency in image
feature detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28,
no. 1, pp. 150–156, 2006.
[33] A. B. Watson, Digital Images and Human Vision. The MIT press,
1993.
[34] M. J. Wainwright, O. Schwartz, and E. P. Simoncelli, “Natural image
statistics and divisive normalization: modeling nonlinearities and adapta-
tion in cortical neurons,” Statistical Theories of the Brain, pp. 203–222,
2002.
[35] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale
and rotation invariant texture classification with local binary patterns,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987,
2002.
[36] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity
index for image quality assessment,” IEEE Trans. Image Process.,
vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
[37] “The libSVM package,” https://www.csie.ntu.edu.tw/ cjlin/libsvm/.
[38] J. Koenderink and A. van Doom, “Representation of local geometry in
the visual system,” Biol. Cybern., vol. 55, no. 6, pp. 367–375, 1987.
[39] L. Griffin, “The second order local-image-structure solid,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 29, no. 8, pp. 1355–1366, 2007.
[40] P. Montesinos, V. Gouet, and R. Deriche, “Differential invariants for
color images,” Proc. 14th Int. Conf. Pattern Recog., pp. 838–840, 1998.
[41] D. H. Hubel, Eye, Brain, and Vision. W. H. Freeman, 1995.
[42] UCID, “Uncompressed colour image database. (2004),” http://
www-staff.lboro.ac.uk/cogs/datasets/UCID/ucid.html, [Online].
[43] A. Papoulis, Probability and Statistics. Pearson, 1989.
[44] A. Srivastava, A. B. Lee, E. P. Simoncelli, and S. C. Zhu, “On advances
in statistical modeling of natural images,” Journal of Mathematical
Imaging and Vision, vol. 18, no. 1, pp. 17–33, 2003.
[45] D. J. Heeger, “Normalization of cell responses in cat striate cortex,”
Visual Neuroscience, vol. 9, no. 2, pp. 181–197, 1992.
[46] M. Carandini, D. J. Heeger, and J. A. Movshon, “Linearity and normal-
ization in simple cells of the macaque primary visual cortex,” Journal
of Neuroscience, vol. 17, no. 21, pp. 8621–8621, 1997.
[47] K. Sharifi and A. Leon-Garcia, “Estimation of shape parameter for
generalized gaussian distributions in subband decompositions of video,”
IEEE Trans. on Circuits Syst. Video Technol., vol. 5, no. 1, pp. 52–56,
1995.
[48] A. K. Moorthy and A. C. Bovik, “Blind image quality assessment:
From natural scene statistics to perceptual quality,” IEEE Trans. Image
Process., vol. 20, pp. 3350–3364, 2011.
[49] H. Hadizadeh and I. V. Baji´
c, “No-reference image quality assessment
using statistical wavelet-packet features,” Pattern Recognition Letters,
vol. 80, pp. 144–149, Sep. 2016.
[50] ——, “Color Gaussian Jet features for no-reference quality assessment
of multiply-distorted images,” IEEE Signal Process. Letters, vol. 23,
no. 12, pp. 1717–1721, Dec. 2016.
[51] L. Krasula, M. Narwaria, K. Fliegel, and P. Le Callet, “Influence of
HDR reference on observers preference in tone-mapped,” in Quality of
Multimedia Experience (QoMEX), May 2014.
[52] P. Hanhart, L. Krasula, P. L. Callet, and T. Ebrahimi, “How to benchmark
objective quality metrics from paired comparison data?” 8th Interna-
tional Conference on Quality of Multimedia Experience (QoMEX), Jun.
2016.
[53] R. Mantiuk, A. Tomaszewska, and R. Mantiuk, “Comparison of four
subjective methods for image quality assessment,” Computer Graphics
Forum, vol. 31, no. 8, pp. 2478–2491, 2012.
[54] H. Hadizadeh, I. V. Baji´
c, P. Saeedi, and S. Daly, “Good-looking green
images,” 18th IEEE International Conference on Image Process. (ICIP),
Sep. 2011.
[55] J. Li, M. Barkowsky, and P. L. Callet, “Boosting paired comparison
methodology in measuring visual discomfort of 3D-TV: performances
of three different designs,” SPIE-IS&T Electronic Imaging: Stereoscopic
Display and Applications XXIV, 2013.
[56] Q. Li, W. Lin, and Y. Fang, “No-reference quality assessment for
multiply-distorted images in gradient domain,” IEEE Signal Process.
Lett., vol. 23, no. 4, pp. 541–545, Apr. 2016.
[57] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical
Procedures. Chapman & Hall/CRC, 2007.
[58] ITU-T Reccomendation J.149, Method for specifying accuracy and
cross-calibration of Video Quality Metrics (VQM), J.149 Std., 2004.
[59] L. Krasula, K. Fliegel, P. L. Callet, and M. Klima, “On the accuracy of
objective image and video quality models: New methodology for per-
formance evaluation,” 2016 Eighth International Conference on Quality
of Multimedia Experience (QoMEX), Jun. 2016.
[60] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition
Letters, vol. 27, pp. 861–874, 2006.
[61] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing
the areas under two or more correlated receiver operating characteristic
curves: A nonparametric approach,” Biometrics, vol. 44, no. 3, pp. 837–
845, 1988.
Hadi Hadizadeh received the B.Sc.Eng. degree
in electronic engineering from the Shahrood Uni-
versity of Technology, Shahrood, Iran, in 2005,
the M.S. degree in electrical engineering from the
Iran University of Science and Technology, Tehran,
Iran, in 2008, and the Ph.D. degree in engineering
science from Simon Fraser University, Burnaby,
BC, Canada, in 2013. He is currently an Assistant
Professor at the Quchan University of Advanced
Technology, Quchan, Khorasan Razavi, Iran. His
current research interests include perceptual im-
age/video coding, visual attention modeling, error resilient video transmission,
image/video processing, computer vision, multimedia communication, and
machine learning. He was a recipient of the Best Paper Runner-up Award at
ICME 2012 in Melbourne, Australia and the Microsoft Research and Canon
Information Systems Research Australia Student Travel Grant for ICME 2012.
In 2013, he was serving as the Vice Chair of the Vancouver Chapter of the
IEEE Signal Processing Society.
Ivan V. Baji ´
c(S’99-M’04-SM’11) is Associate Pro-
fessor of Engineering Science and co-director of the
Multimedia Lab at Simon Fraser University, Burn-
aby, BC, Canada. His research interests include sig-
nal, image, and video processing and compression,
multimedia ergonomics, and communications. He
has authored about a dozen and co-authored another
ten dozen publications in these fields. He has served
on the organizing and/or program committees of
various conferences in the field, including GLOBE-
COM, ICC, ICME, and ICIP. He was the Chair of
the Media Streaming Interest Group of the IEEE Multimedia Communications
Technical Committee from 2010 to 2012, and is currently a member of the
IEEE Multimedia Systems and Applications Technical Committee and the
IEEE Multimedia Signal Processing Technical Committee. He is also serving
as Associate Editor of IE EE TRANSACTIONS ON MULT IM ED IA and IEEE
SIG NAL PRO CE SS IN G MAGA ZI NE, and the Chair of the Vancouver Chapter
of the IEEE Signal Processing Society.