Content uploaded by Ivan V. Bajic
Author content
All content in this area was uploaded by Ivan V. Bajic on Oct 26, 2017
Content may be subject to copyright.
1070-9908 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2017.2717946, IEEE Signal
Processing Letters
1
Saliency-guided Just Noticeable Distortion
Estimation Using the Normalized Laplacian
Pyramid
Hadi Hadizadeh, Atiyeh Rajati, and Ivan V. Baji´
c, Senior Member,IEEE
Abstract—The human visual system (HVS), like any other
physical system, has limitations. For instance, it is known that
the HVS can only sense the content changes that are larger than
the so-called just noticeable distortion (JND) threshold. Also, to
reduce the computational load on the brain, the visual attention
mechanism is deployed such that regions with higher visual
saliency are processed with higher priority than other less-salient
regions. It is also known that visual saliency has a modulatory
effect on JND thresholds. In this letter, we present a novel pixel-
wise JND estimation method that considers the interplay between
visual saliency and JND thresholds. In the proposed method,
the largest JND thresholds of a given image are found such
that the perceptual distance between the image and its JND
noise-contaminated version is minimized in a perceptual space
defined by the coefficients of the image in a normalized Laplacian
pyramid. Experimental results indicate that the proposed method
outperforms four of the latest JND models for static images.
Index Terms—just noticeable distortion, visual saliency
I. INT ROD UC TI ON
IT is known that the human visual system (HVS) cannot
sense small visual variations whose amplitude is below
the so-called just noticeable distortion (JND) threshold due
to several physical limitations in the eyes and the brain [1],
[2]. JND modeling is widely used for perceptual redundancy
estimation in images/videos for a variety of different applica-
tions such as image/video coding and transmission [3], quality
assessment [4], watermarking [5], etc. Perceptual redundancies
in visual contents may also be produced by the visual attention
(VA) mechanism of the human brain [6]. VA provides an
automatic mechanism for selection of particular aspects of a
visual scene that are most relevant to our ongoing behavior
while eliminating interference from less relevant data so as to
reduce the computational load on the brain [6].
In the literature, several models have been developed for
JND estimation in images and videos in both the pixel and
subband domain [7]–[13]. For instance, in [9], a JND model
was proposed based on measuring edge and texture masking
[1]. Wu et al. modeled the fact that the HVS is insensitive
to the irregular visual content, and introduced a structure-
uncertainty-based JND model in [10]. An enhanced pixel-wise
Copyright (c) 2017 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
H. Hadizadeh and A. Rajati are with the Quchan University of Advanced
Technology, Quchan, Iran, and I. V. Baji´
c is with Simon Fraser University,
Burnaby, BC, V5A 1S6, Canada. The corresponding author is H. Hadizadeh
(h.hadizadeh@qiet.ac.ir).
JND estimation method was recently proposed in [7] based on
measuring the pattern complexity and luminance contrast.
According to the current knowledge, it is believed that VA
can be driven by “visual saliency”, which is a measure of
propensity for drawing VA to a specific location in a scene
[6]. A region is said to be visually salient if it possess certain
characteristics, that make it stand out from its surrounding
regions and draw attention [14]. The existing computational
models of for static images [14]–[16] are able to produce a
saliency map by which salient regions in a given image can
be predicted automatically.
It is known that visual saliency has a modulatory effect on
JND thresholds [5], [8], [17]. Specifically, it is known that JND
thresholds in attended (or very salient) regions are smaller than
JND thresholds in un-attended (or less salient) regions [17].
Hence, to better estimate visual redundancies, it is reasonable
to consider the interplay between JND thresholds and saliency.
None of the above-mentioned JND models consider the effect
of visual saliency on JND thresholds.
In the literature, there are very few existing JND models
that consider visual saliency. Notable methods include the
ones proposed in [5], [17]. In these two methods, the JND
thresholds are scaled by a set of fixed linear saliency mod-
ulation functions. Recently, a saliency-modulated JND model
was proposed in [8] in the DCT (discrete cosine transform)
domain, in which the JND thresholds estimated by a DCT-
based JND model are scaled by two non-linear modulation
functions based on the visual saliency of the pixels in the
given image. The results reported in [8] indicated that this
method outperforms [17] and [5]. Hence, we consider [8] as
the representative of the earlier saliency-based JND models.
In this letter, we present a novel JND estimation method,
which takes the visual saliency information into account.
In the proposed method, a differentiable saliency-weighted
perceptual image quality metric is first defined to measure
the perceptual difference between two images decomposed by
a normalized Laplacian pyramid (NLP) [18], in which some
biological mechanisms in the early visual system (e.g., the
center-surround filtering and local gain control [1]) are sim-
ulated. The employed metric is designed such that it assigns
larger weights to NLP coefficients corresponding to pixels with
higher visual saliency and vice versa. The JND thresholds
are then considered as an invisible noise in the sense that
if the pixel values of a given image are increased/decreased
by their corresponding JND thresholds, then the resultant
noisy image should not be distinguishable from the original
1070-9908 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2017.2717946, IEEE Signal
Processing Letters
2
pristine image. Based on this assumption, the largest JND
thresholds of a given image are then adaptively estimated such
that the perceptual distance between the original image and
the JND noise-contaminated image is minimized while the
invisible noise energy (or equivalently the amplitude of the
JND thresholds) is maximized. Note that the method proposed
in [8] is a non-adaptive method because the parameters of
the saliency modulation functions in [8] are fixed for all
images, where the utilized parameters may not always be the
best for all kinds of images. However, our proposed method
estimates the JND thresholds in an adaptive manner based
on the image content and its visual saliency. Experimental
results indicate that the proposed method outperforms four
of the latest JND models for static images including [8]. To
the best of our knowledge, we are the first to use a saliency-
weighted perceptual image quality metric for automatic JND
estimation, and this is the main contribution of this letter.
This letter is organized as follows. In Section II, the
proposed method is presented. The experimental results are
given in Section III followed by conclusions in Section IV.
II. TH E PRO PO SE D ME TH OD
Let Jbe a gray-scale image for which we wish to estimate
the JND map M. Suppose that Mis available, and we generate
an image Isuch that I=J+M. If the JND thresholds in M
are accurate, then we expect that the perceptual quality of I
is equal or very close to the perceptual quality of J. In this
case, Ican be considered as a noisy version of J, where the
injected noise is not visible. In practice, we are interested to
find the largest possible JND thresholds to detect the largest
possible perceptual redundancies. In other words, the best M
is the one that maximises the MSE (mean-squared error or the
noise energy) between Iand Jwhile keeping the perceptual
quality of Iequal (or very close) to the perceptual quality of
J. To estimate M, we define the following cost function:
Q(I|J) = (1 −λ)D(I,J)−λMSE(I,J),(1)
where D(I,J)is an image quality metric that measures the
perceptual distance between Iand J, and 0< λ < 1is a
constant by which the weight and scale of the two terms are
controlled. The best Mcan then be estimated by M=ˆ
I−J,
where ˆ
Iis obtained by the following minimization problem:
ˆ
I= argmin
I
Q(I|J),s.t. ∀j:Gmin ≤Ij≤Gmax,(2)
where Ijdenotes the value of Iat location j,Gmin and Gmax
are the minimum and maximum possible gray levels (i.e., 0
and 255 in our case). Assuming that there are Mpixels in I
and J,MSE(I,J) = 1
MPM
m=1 Im−Jm2. Note that (1) is
minimized when D(I,J)is minimized while MSE (I,J)is
maximized. Generally, when MSE(I,J)increases, D(I,J)
may either increase or remain unchanged because the added
noise (distortion) with larger energy may still remain invisible.
At the optimum point the value of Q(I|J)may be negative.
For computing D(I,J), we seek an image quality metric
that has the following properties: 1) it must measure the
perceptual dissimilarity between two images, 2) it should be
simple to compute and differentiable so that it can be easily
Fig. 1. The flowchart of the normalized Laplacian pyramid [19].
used in an optimization loop, 3) the saliency information
can be integrated into it. For this purpose, we utilized the
Normalized Laplacian Pyramid (NLP) [18], which is a multi-
scale nonlinear representation that mimics the operations of
the retina and lateral geniculate nucleus in the HVS. In [19],
it was shown that distances measured between two images
represented in the perceptual space defined by the NLP are
highly correlated with human judgments. In fact, as will be
discussed in the sequel, the NLP distance has all the above-
mentioned properties for D(I,J). For computing the NLP
distance, the pixels in Jare first transformed using a power law
transformation to get xas x=Jγ. This simulates the transfor-
mation of light to voltage in retinal photoreceptors. As shown
in Fig. 1, the NLP is then recursively built from xas [18]:
x(l+1) =DLx(l), and z(l)=x(l)−LUx(l+1),
where superscript (l)denotes the l-th level of the pyramid,
D(.)and U(.)indicate down/up-sampling by a factor 2,
respectively, and Ldenotes the filtering operation by a spa-
tially separable 5-tap filter, (0.05,0.25,0.4,0.25,0.05) as in
[18]. Within each frequency channel (l), each coefficient of
z(l)is divided by a weighted local sum of the element-wise
amplitude of the coefficients plus a positive constant σas [18]:
y(l)=z(l)÷σ+H∗|z(l)|, where ÷and ∗denote pixel-wise
division, and linear convolution, respectively, and His a local
weighting filter. In fact, this equation implements the divisive
normalization process, which is widely used for describing
the responses of neurons in different parts of the visual
system [20], [21]. Assuming that there are Lpyramid levels,
the set of NLP coefficients {y(l);l= 1,· · · , L}provides
a perceptual representation of x[19]. Inspired by [22], to
measure the perceptual distance between two images Iand
J, we first compute the absolute differences between the NLP
coefficients of the two images within each frequency channel
as d(l)
i=
y(l)
i−ˆy(l)
i
, where y(l)
iand ˆy(l)
idenote the i-th
NLP coefficient of Iand Jin the l-th channel. We then use
the summation model proposed in [22] to compute a single
distance value as follows. First, the `αnorm of the calculated
differences within each channel is computed. The `βnorm is
then used to combine the obtained values across all channels:
D(I,J) = 1
L
L
X
l=1 1
Nl
Nl
X
i=1 d(l)
iαβ
α1
β
,(3)
where Nlis the number of NLP coefficient in the l-th channel.
This metric treats all spatial locations equally. However, to
weight different spatial locations based on their visual im-
portance (saliency), we propose to use a saliency-weighted
version of d(l)
ias follows: d(l)
i=w(l)
i
y(l)
i−ˆy(l)
i
, where w(l)
i
1070-9908 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2017.2717946, IEEE Signal
Processing Letters
3
denotes the normalized saliency value of the i-th pixel in J
down-scaled to the size of the l-th channel. To compute the
saliency information, any saliency model can be used. Here,
we use the saliency model proposed in [15]. Our motivation
for using this model was the promising results reported in [23].
To solve (2), we used the Adaptive Moment Estimation
(Adam) algorithm [24] for which the derivative of Q(I|J)with
respect to Ijcan be analytically calculated as follows:
∂Q(I|J)
∂Ij
= (1 −λ)∂D(I,J)
∂Ij
−λ∂M SE(I,J)
∂Ij
,(4)
where ∂M SE (I,J)
∂Ij=2
MIj−Jj, and
∂D(I,J)
∂Ij
=1
βD(I,J)1−β∂
∂Ij1
L
L
X
l=1 1
Nl
Nl
X
i=1 d(l)
iαβ
α,
(5)
where the derivative on the right hand side is calculated as:
∂
∂Ij1
L
L
X
l=1 1
Nl
Nl
X
i=1 d(l)
iαβ
α=
β
αL
L
X
l=1 1
Nl
Nl
X
i=1 d(l)
iαβ
α
−1∂
∂Ij1
Nl
Nl
X
i=1 d(l)
iα.(6)
The derivative on the right hand side of the above equation is:
∂
∂Ij1
Nl
Nl
X
i=1 d(l)
iα=α
Nl
Nl
X
i=1
(d(l)
i)α−1∂d(l)
i
∂Ij
,(7)
where ∂d(l)
i
∂Ij=w(l)
isgn(y(l)
i−ˆy(l)
i)∂y(l)
i
∂Ij, and ∂y(l)
i
∂Ij=
∂y(l)
i
∂z(l)∂z(l)
∂x(l)
j∂x(l)
j
∂Ij. We then calculate ∂y(l)
i
∂z(l)
k
as follows:
∂y(l)
i
∂z(l)
k
=
σ+q(l)
i
−Hi,isgn(z(l)
i)z(l)
i
(σ+q(l)
i)2, k =i
−Hi,ksgn(z(l)
k)z(l)
i
(σ+q(l)
i)2, k 6=i
(8)
where q(l)
iis the value of H∗ |z(l)|at location i, and Hi,k
is the value of Hat location kassuming that the center
of His at location i. We also obtain ∂z(l)
∂x(l)
j
=t(l)
j, where
t(l)
jis the j-th column of T(l), which is the matrix of the
linear transformation performed by the Laplacian pyramid, i.e.,
z(l)=T(l)x(l). In fact, T(l)can be computed from Fig. 1. For
details please refer to [18]. Finally, we get ∂ x(l)
j
∂Ij=1
γI(1
γ
−1)
j.
To estimate σand H, and αand βin (3), and γ, similar
to [22], we optimized these parameters such that the Pearson
linear correlation between the distance values predicted by (3)
and the mean opinion scores (MOS) provided in the popular
LIVE image quality assessment database [25] is maximized.
The optimization procedure was the same as in [22]. We
obtained σ= 0.19,α= 2,β= 0.5,γ= 0.38. Also, we
obtained Has the following 3×3filter: [0.04 0.05 0.04; 0.05
0.06 0.05; 0.04 0.05 0.04]. We also experimentally found that
λ= 0.01 and L= 6 achieves the best results for our purpose.
III. EXP ER IM EN TAL RES ULT S
For evaluation of the proposed method, similar to [7], [8],
[10], a subjective experiment was performed to compare the
efficacy of the proposed method with the following four latest
JND models: [7] (EJND), [8] (SJND), [10] (Wu2013), [9]
(Liu2010). For this purpose, we used the 12 images from
[7] (named I1 to I12) for comparisons among different JND
models. These images are often used either for comparing
different JND models or for image quality assessment [7].
During the experiment, two JND noise-contaminated images
on a same scene (one produced by the proposed method, and
the other one produced by the other method being compared)
were randomly juxtaposed on the right or left part of a
screen with mid-gray background. To produce a JND noise-
contaminated image ˆ
Jout of a pristine image Jwith JND
map M, similar to many existing works [7], [8], we used the
following formula: ˆ
J=J+ηNM, where Nis a random
noise which takes -1 or +1 independently and equal likely,
denotes pixel-wise multiplication, and ηis the noise level
adjuster. The reason for using Nis to avoid creating a fixed
artificial spatial pattern. In our experiments, the noise level for
all JND models was adjusted such that the PSNR of all noise-
contaminated images becomes equal to 26 ±0.01 dB just to
be able to see distortions easier.
A 17-inch LG monitor T1710B with maximum brightness
300 cd/m2, and resolution 1024 ×768 pixels was used. The
brightness and contrast of the display was set to 50%. The
experiment was run in a quiet classroom with 24 naive subjects
(15 males, 9 females) with normal or corrected-to-normal
eyesight of age between 18 and 23. The viewing environment
and the viewing condition are set with the guidance of ITU-R
BT.500-11 standard [26]. The illumination in the room was
in the range 100-150 Lux. The distance between the display
and the subjects was fixed at 70 cm. Each participant was
familiarized with the task before the start of the expeirment
via a short printed instruction sheet. The total length of the
experiment for each participant was approximately 16 minutes.
Each image pair was shown for 10 seconds. After this pre-
sentation, a mid-gray blank screen was shown for 5 seconds.
During this period, similar to [7], the subjects were asked
to decide the image with better quality (Left or Right), and
how much better it is according to the following scoring
rule: 0(same quality), 1(slightly better), 2(better), 3(much
better). Participants did not know which image was obtained
by which method. Randomly chosen half of the trials had the
image produced by the proposed method presented on the
left side of the screen and the other half on the right side,
in order to counteract side bias in the responses. This gave
a total of 12 ×2×4trials (duplicated to balance left and
right presentation) for each subject. The obtained results are
shown in Table I. In this table, ‘Mean’ refers to the mean
of the quality scores given by the subjects to each image,
where positive Mean values state that the proposed JND model
outperforms its relevant competitor JND model. A two-sided
Pearson’s chi-square (χ2) test [27] was used to examine the
statistical significance of the results based on the number of
collected votes for each model. The null hypothesis is that
1070-9908 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2017.2717946, IEEE Signal
Processing Letters
4
there is no preference for either the proposed method or the
other JND model. The p-value [27] is indicated in the table. In
experimental sciences, as a rule of thumb, the null hypothesis
is rejected when p < 0.05. When this happens in Table I, it
means that the two images cannot be considered to have the
same subjective quality at the 95% significance level, since
one of them has obtained a significantly higher quality score,
and therefore seems to have better quality. In the table, cases
with p > 0.05 are indicated in bold typeface. We didn’t use
any outlier removal process in our experiments because with
a relatively small number of subjects, removing some of the
responses would reduce the statistical power of the test.
As seen from the results in Table I, the proposed method
outperforms EJND on all images except for I1, I8, I9, and
I10. Specifically, its performance is statistically the same as
EJND on I1 as its p-value is greater than 0.05 while on I8,
I9, and I10, the proposed method performs slightly worse
than EJND. However, looking across all trials, we observe
that the proposed method outperforms EJND with an average
quality difference score of 0.38 with overall p= 0.0045,
which is a statistically significant result at the 95% significance
level, because the odds of it occurring by chance are 45 in
10000. Note that I8, I9, and I10 have a complex content
with multiple attention-grabbing objects, but the saliency maps
produced by [15] show only one salient object. Hence, we
believe that the lower performance of the proposed method
on these images may be related to the inaccurate saliency
maps. We also observe that the proposed method performs
statistically better than SJND on all images (with a mean
quality difference score of 0.53) except for I8, I9, and I10
where its performance is statistically the same as SJND.
Comparing to Wu2013 and Liu2010, we observe that the
proposed method outperforms both of these methods on all
images with a mean quality difference score of 0.62 and
1.07, respectively, and the obtained results are all statistically
significant. Fig. 2 shows a visual example comparing various
JND models on I6 based on their JND noise-contaminated
image at the same level of noise energy (PSNR=19.06 dB).
For this example, we intentionally used low PSNR in order to
be able to see distortions on this scale easier. As seen from this
figure, the proposed method achieves better perceptual quality
compared to the other methods.
We also compared the computational complexity of the
proposed method with other JND models implemented using
their original code in Matlab on an Intel i7-3790K CPU at
4.00 GHz with 8 GB RAM on a sample 512×512 image. The
average execution time (second) for Liu2010, Wu2013, EJND,
SJND, and the proposed method was respectively as follows:
0.51, 3.82, 0.57, 1.92, and 4.2 (1.1s for saliency computation
and 3.1s for the cost function minimization). The proposed
method is the slowest among these but it enables more accurate
JND estimation. The speed of the proposed method can be
increased by using a faster method for saliency computation
and a faster algorithm for the cost function minimization.
IV. CON CL US IO NS
In this letter, we presented a novel JND estimation method,
which utilizes the visual saliency information of an image for
TABLE I
THE R ESU LTS OF C OM PARI NG TH E PRO POS ED M ETH OD W ITH E ACH O F
TH E FOU R LATE ST JND MO DE LS ON T HE 12 TES TE D IMAG ES .
VS. EJND VS. SJND VS. Wu2013 VS. Liu2010
Mean p-value Mean p-value Mean p-value Mean p-value
I1 0.03 0.1489 0.12 0.0445 0.15 0.0312 0.45 0.0121
I2 0.69 0.0013 0.74 0.0009 0.57 0.0072 1.25 0.0003
I3 1.12 0.0001 1.45 0.0001 1.57 0.0001 2.4 0.0001
I4 1.04 0.0001 1.23 0.0001 1.32 0.0002 2.1 0.0001
I5 0.24 0.0094 0.11 0.0463 0.19 0.0401 0.45 0.0121
I6 0.63 0.0039 0.45 0.0121 0.78 0.0007 1.16 0.0004
I7 0.43 0.0180 0.76 0.0008 0.61 0.0042 1.09 0.0006
I8 -0.12 0.0193 0.08 0.0543 0.12 0.0445 0.44 0.0176
I9 -0.26 0.0014 0.03 0.1489 0.10 0.0469 0.29 0.0080
I10 -0.33 0.0001 0.01 0.7728 0.13 0.0431 0.17 0.0411
I11 0.27 0.0082 0.41 0.0192 0.67 0.0018 1.14 0.0004
I12 0.84 0.0008 1.04 0.0004 1.23 0.0003 1.89 0.0001
Avg 0.38 0.0045 0.53 0.0012 0.62 0.0007 1.07 0.0001
Fig. 2. Comparing various JND models based on their JND noise-
contaminated image at the same level of noise energy. From top to bottom
and left to right: original image, Liu2010, Wu2013, EJND, SJND, and the
proposed method. Please zoom in to see the distortions better.
a better prediction of JND thresholds. The main idea behind
the proposed method is that a JND noise-contaminated image
(i.e., a nosiy version of an image whose noise amplitude is
equal to the image JND thresholds) should be indistinguishable
from the original pristine image. Hence, to estimate the JND
thresholds of a given image one can find the largest JND
thresholds such that the perceptual quality between the image
and its JND noise-contaminated version is minimized. For
this purpose, a saliency-weighted perceptual distance is first
defined in the normalized Laplacian domain. It is then used
in an optimization process to estimate the JND thresholds
based on the above-mentioned idea. The proposed method
was compared with the latest JND models in a subjective
experiment. The experimental results demonstrated that, on
average, the proposed method outperforms the compared
methods. Although the proposed method was presented for
grayscale images, it can easily be extended for color images.
For example, for color images, D(I,J)can be defined as
the mean of the NLP distance of individual color channels
and MSE(I,J)can be computed over all the color channels.
The new cost function can then be minimized using the same
optimization procedure proposed for the grayscale images.
1070-9908 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2017.2717946, IEEE Signal
Processing Letters
5
REF ER EN CE S
[1] A. B. Watson, Digital Images and Human Vision. The MIT press,
1993.
[2] F. A. A. Kingdom, Psychophysics: A Practical Introduction. Academic
press, 2009.
[3] X. Yang, W. Lin, Z. Lu, E. Ong, and S. Yao, “Motion-compensated
residue pre-processing in video coding based on just-noticeable-
distortion profile,” IEEE Trans. Circuits Syst. Video Technol., vol. 15,
pp. 745–752, 2005.
[4] W. Lin and C. J. Kuo, “Perceptual visual quality metrics: A survey,”
J. Visual Communication and Image Representation, vol. 22, no. 4, pp.
297–312, 2011.
[5] Y. Niu, M. Kyan, L. Ma, A. Beghdadi, and S. Krishnan, “Visual saliencys
modulatory effect on just noticeable distortion profile and its application
in image watermarking,” Signal Process.: Image Comm., vol. 28, pp.
917–928, 2013.
[6] L. Itti, G. Rees, and J. K. Tsotsos, Neurobiology of Attention. Academic
Press, 2005.
[7] J. Wu, L. Li, W. Dong, G. Shi, W. Lin, and C.-C. J. Kuo, “Enhanced
just noticeable difference model for images with pattern complexity,”
IEEE Trans. Image Process. (to appear), 2017.
[8] H. Hadizadeh, “A saliency-modulated just-noticeable-distortion model
with non-linear saliency modulation functions,” Pattern Recognit. Lett.,
vol. 84, no. C, pp. 49–55, 2016.
[9] A. Liu, W. Lin, M. Paul, C. Deng, and F. Zhang, “Just noticeable
difference for images with decomposition model for separating edge
and textured regions,” IEEE Trans. Circuits Syst. Video Technol., vol. 20,
no. 11, pp. 1648–1652, 2010.
[10] J. Wu, W. Lin, G. Shi, X. Wang, and F. Li, “Pattern masking estimation
in image with structural uncertainty,” IEEE Trans. Image Process.,
vol. 22, no. 12, pp. 4892–4904, 2013.
[11] X. Zhang, W. Lin, and P. Xue, “Improved estimation for just-noticeable
visual distortion,” Signal Processing, vol. 28, pp. 795–808, 2005.
[12] A. Ahumada and H. Peterson, “Luminance-model-based DCT quan-
tization for color image compression,” Vision Visual Process. Digital
Display III, pp. 365–374, 1992.
[13] C.-H. Chou and Y.-C. Li, “A perceptually tuned subband image coder
based on the measure of just-noticeable-distortion profile,” IEEE Trans.
Image Processing, vol. 5, no. 6, pp. 467–476, 1995.
[14] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual
attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Machine
Intell., vol. 20, pp. 1254–1259, Nov. 1998.
[15] L. Zhang, Z. Gu, and H. Li, “SDSP: A novel saliency detection method
by combining simple priors,” Proc. IEEE Int. Conf. Image Process., pp.
171–175, Sep. 2013.
[16] V. A. Mateescu, H. Hadizadeh, and I. V. Baji´
c, “Evaluation of several
visual saliency models in terms of gaze prediction accuracy on video,” in
IEEE QoEMC’12, in conjunction with IEEE Globecom’12, Dec. 2012.
[17] Z. Lu, W. Lin, X. Yang, E. Ong, and S. Yao, “Modeling visual attentions
modulatory aftereffects on visual sensitivity and quality evaluation,”
IEEE Trans. Image Process., vol. 14, pp. 1928–1942, 2005.
[18] P. J. Burt and E. H. Adelson, “The laplacian pyramid as a compact
image code,” IEEE Trans. Commun., vol. 31, pp. 532–540, 1983.
[19] V. Laparra, J. Balle, A. Berardino, and E. Simoncelli, “Perceptual image
quality assessment using a normalized laplacian pyramid,” Proc. IS&T
Intl Symposium on Electronic Imaging, Conf. on Human Vision and
Electronic Imaging, Feb. 2016.
[20] O. Schwartz and E. P. Simoncelli, “Natural signal statistics and sensory
gain control,” Nat. Neurosci., vol. 4, no. 8, pp. 819–825, 2001.
[21] D. Heeger, “Normalization of cell responses in cat striate cortex,”
Journal of Modern Optics Vis. Neurosci., vol. 9, pp. 181–198, 1992.
[22] V. Laparra, J. M. Mari, and J. Malo, “Divisive normalization image
quality metric revisited,” JOSA A, vol. 27, no. 4, pp. 852–864, 2010.
[23] L. Zhang, Y. Shen, and H. Li, “VSI: A visual saliency-induced index
for perceptual image quality assessment,” IEEE Trans. Image Process.,
vol. 23, no. 10, pp. 4270–4281, Oct. 2014.
[24] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimiza-
tion,” 3rd Intl. Conf. Learning Represent., 2015.
[25] H. R. Sheikh, K. Seshadrinathan, A. K. Moorthy, Z. Wang, A. C. Bovik,
and L. K. Cormack, “Image and video quality assessment research at
LIVE,” http://live.ece.utexas.edu/research/quality, 2014, [Online].
[26] ITU-R BT.500-11, “Method for the subjective assessment of the quality
of television pictures,” ITU, Tech. Rep., 2002.
[27] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical
Procedures. Chapman & Hall/CRC, 2007.