ArticlePDF Available

Seven Challenges in Image Quality Assessment: Past, Present, and Future Research

Authors:

Abstract and Figures

Image quality assessment (IQA) has been a topic of intense research over the last several decades. With each year comes an increasing number of new IQA algorithms, extensions of existing IQA algorithms, and applications of IQA to other disciplines. In this article, I first provide an up-to-date review of research in IQA, and then I highlight several open challenges in this field. The first half of this article provides discuss key properties of visual perception, image quality databases, existing full-reference, no-reference, and reduced-reference IQA algorithms. Yet, despite the remarkable progress that has been made in IQA, many fundamental challenges remain largely unsolved. The second half of this article highlights some of these challenges. I specifically discuss challenges related to lack of complete perceptual models for: natural images, compound and suprathreshold distortions, and multiple distortions, and the interactive effects of these distortions on the images. I also discuss challenges related to IQA of images containing nontraditional, and I discuss challenges related to the computational efficiency. The goal of this article is not only to help practitioners and researchers keep abreast of the recent advances in IQA, but to also raise awareness of the key limitations of current IQA knowledge.
This content is subject to copyright.
Hindawi Publishing Corporation
ISRN Signal Processing
Volume , Article ID ,  pages
http://dx.doi.org/.//
Review Article
Seven Challenges in Image Quality Assessment:
Past, Present, and Future Research
Damon M. Chandler
School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK 74078, USA
Correspondence should be addressed to Damon M. Chandler; damon.chandler@okstate.edu
Received  October ; Accepted  November 
Academic Editors: S. Li, C. S. Lin, and K. Wang
Copyright ©  Damon M. Chandler. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Image quality assessment (IQA) has been a topic of intense researchover the last several decades. With each year comes an increasing
number of new IQA algorithms, extensions of existing IQA algorithms, and applications of IQA to other disciplines. In this article,
I rst provide an up-to-date review of research in IQA, and then I highlight several open challenges in this eld. e rst half of
this article provides discuss key properties of visual perception, image quality databases, existing full-reference, no-reference, and
reduced-reference IQA algorithms. Yet, despite the remarkable progress that has been made in IQA, many fundamental challenges
remain largely unsolved. e second half of this article highlights some of these challenges. I specically discuss challenges related
to lack of complete perceptual models for: natural images, compound and suprathreshold distortions, and multiple distortions, and
the interactive eects of these distortions on the images. I also discuss challenges related to IQA of images containing nontraditional,
and I discuss challenges related to the computational eciency. e goal of this article is not only to help practitioners and
researchers keep abreast of the recent advances in IQA, but to also raise awareness of the key limitations of current IQA knowledge.
1. Introduction
Digital imaging and image-processing technologies have rev-
olutionized the way in which we capture, store, receive, view,
utilize, and share images. Today, we have come to expect the
ability to instantly share photos online, to send and receive
multimedia MMS messages at a moments notice, and to
streamlivevideoacrosstheglobeinstantaneously.Today,
these conveniences are possible because the digital cameras
and photo-editing systems used by photographers and artists,
the compression and transmission systems used by distribu-
tors and network engineers, and the various multimedia and
display technologies enjoyed by consumers all have the ability
toprocessimagesinwaysthatwereunthinkablejustyears
ago.
But despite the innovation and rapid advances in tech-
nology and despite the prevalence of higher-denition and
more immersive content, one thing has remained constant
throughout the digital imaging revolution: the biological
hardware used by consumers—the human visual system.
Although personal preferences can and do change over time
and can and do vary from person to person, the underlying
neural circuitry and biological processing strategies have
changedverylittleovermeasurablehumanhistory.Asa
result, digital processing can alter an image’s appearance in
ways that humans can reliably and consistently judge to be
either detrimental or benecial to the images visual quality.
Because of the prevalence of these alterations, a crucial
requirement for any system that processes images is a means
of assessing the impacts of such alterations on the resulting
visual quality. To meet this need, numerous algorithms for
imagequalityassessment(IQA)havebeenresearchedand
developed over the last several decades. Today, IQA research
has emerged as an active subdiscipline of image processing,
and many of the resulting techniques and algorithms have
begun to benet a wide variety of applications. Variations of
IQA algorithms have proved useful for applications such as
image and video coding (e.g., []), digital watermarking
(e.g., []), unequal error protection (e.g., []), denoising
(e.g., []), image synthesis (e.g., [,]), and various other
areas (e.g., for predicting intelligibility in sign language video
[]).
Many of the techniques employed by modern IQA algo-
rithms are founded in the early research on quality evaluation
ISRN Signal Processing
ofopticalsystemsandanalogtelevisionbroadcastanddisplay
systems (e.g., []). For example, in their  paper titled
“Quality in Television Pictures,” Goldmark and Dyer []
stated that
“e factors which chiey determine the quality of
a television picture are (1) denition, (2) contrast
range, (3) gradation, (4) brilliance, (5) icker, (6)
geometric distortion, (7) size, (8) color, and (9)
noise.” [16].
Although no objective quality assessment formulae were
presented in [], many of today’s IQA algorithms do indeed
employ measures of one or more of these factors. Later work
by Winch [], on the topic of color TV quality, further
pushed toward objective quality assessment by providing
guidelines for how photometric and colorimetric properties
couldbeusedtoderive“characteristicdataforcorrelation
with the subjective preferences”; such properties are now
commonly employed in modern IQA algorithms. On the
optics front, in their  paper titled “On the Assessment
of Optical Images,” Fellgett and Linfoot proposed two key
strategies and associated numerical measures of image qual-
ity: “assessment by similarity”and“assessment by information
content”[]. Indeed, variations of these ideas have been used
by many of today’s IQA algorithms.
It is interesting to note that nearly all of these early
research eorts up through the s mentioned the need to
take into account the characteristics of human vision during
the quality assessment process. Five of the earliest eorts to
explicitly model properties of the human visual system (HVS)
for IQA were published in the early s by Sakrison and
Algazi [], by Budrikis [], by Stockham [], by Mannos
and Sakrison [], and by Schade []. Although no extensive
IQA algorithms were presented in these early papers, many
ofthepropertieswhichareusedinmodernHVS-basedIQA
algorithms—such as luminance and contrast sensitivity and
visual masking—were also suggested in these papers. At that
time, Budrikis forecasted that
“Full evaluations are as yet impossible but seem
very likely for the foreseeable future, although
probably entailing considerable computational
tasks.” [23].
Today,  years later, we still have yet to achieve full evalua-
tions of quality, though remarkable progress has been made—
as I will point out in this paper.
At a glance, the IQA problem for digital images may not
seem as a dicult task as reported in the literature. Aer all,
digitalprocessingaltersanimagespixelvalues,andthetaskof
estimating quality requires merely mapping these numerical
changes to corresponding visual preferences. Of course,
anything that involves the human visual system is rarely
straightforward. Humans do not see images as collections
of pixels, and consequently, the appropriate mapping varies
depending on the image, on the type of processing, on the
numerical and psychological interaction between these two,
and on numerous additional factors. As an example, Figure
shows an original image and  altered versions of that image,
each with the same peak signal-to-noise ratio in comparison
to the reference. Clearly, a mapping based only on the energy
of the dierences in pixel values cannot capture the wide
range of visual qualities exhibited by these images.
e task of judging quality in Figure is facilitated by the
presence of an original, undistorted reference image. In his
seminal  collection titled “Image Quality: A Comparison
of Photographic and Television Systems,” Schade []stated
that
“Image quality is a subjective judgment made by
amentalcomparisonofanexternalimagewith
image impressions stored and remembered more
or less distinctly by the observer....Moreover,the
rating of a given image may be greatly inuenced
by the availability of a much better image for
comparison purposes.” [25].
Most IQA algorithms operate in this relative-to-a-reference
fashion; these are so-called full-reference algorithms, which
take as input a reference image and a processed (usually
distorted) image and yield as output either a scalar value
denoting the overall visual quality or a spatial map denoting
the local quality of each image region (see Section ). More
recently, researchers have begun to develop no-reference and
reduced-reference algorithms, which attempt to yield the same
quality estimates either by using only the processed/distorted
image (no-reference IQA; see Section .)orbyusingthepro-
cessed/distorted image and only partial information about
the reference image (reduced-reference IQA; see Section .).
All three types of IQA algorithms can perform quite well
at predicting quality. Some of today’s best-performing full-
reference algorithms have been shown to generate estimates
of quality that correlate highly with human ratings of quality,
typically yielding Spearmans and Pearsons correlation coe-
cients in excess of .. Research in no-reference and reduced-
reference IQA is much less mature; however, recent methods
have been shown to yield quality estimates which also corre-
late highly with human ratings of quality, sometimes yielding
correlation coecients which rival the most competitive full-
reference methods.
e eld of image quality assessment is rapidly advancing.
With each year comes an increasing number of papers on
new IQA algorithms, extensions of existing IQA algorithms,
and applications of these IQA techniques to other disciplines.
Here,theobjectiveofthispaperisnotonlytoprovidean
overview of the strategies used in IQA algorithms, but also—
and more so—to highlight the current challenges in this eld.
is paper is meant to complement previous reviews and
chapters on IQA [](seealso[,]forrelatedreviews
on video quality assessment). Here, I rst provide a more
recent survey of research in IQA to help practitioners and
researchers keep abreast of the recent advances in IQA. Next,
I discuss several open research challenges which are needed
to further push IQA algorithms toward achieving the “full
evaluations” envisioned by Budrikis.
In the rst three sections of this paper, I provide an up-
to-date survey of research in IQA. As in previous reviews,
Section summarizes several important properties of human
visual perception which are used, at least to some extent,
ISRN Signal Processing
Original White noise Rotation by 0.26Gaussian blur
Luminance reduction JPEG JPEG2000 Wavelet distortion
Contrast reduction Gamma adjustment Quantization Quantization + dithering
F : Original and distorted versions of the image child swimming from the CSIQ database []. Note the large variation in perceived
quality, despite the fact that all distorted images have the same peak signal-to-noise ratio of . dB.
directly or indirectly by the vast majority of IQA algorithms.
However, I also discuss with each of these properties some of
the early experiments in vision science that were performed
to uncover the properties; the goal of this discussion is to
provide a context which can help dene bounds on the
applicability of each of the properties. is section also pro-
vides an up-to-date survey of the publicly available ground-
truth datasets (image quality databases) that can be used to
quantify the performances of IQA algorithms in predicting
quality. (Note that research on color perception and the
specic eects of color on image quality are not covered
in this paper. Color-perception research has its own long
history, most of which predates research in IQA. e reader is
referred to [] for discussions on the inuences of color
on image quality.).
Sections and provide concise surveys of previous and
recent IQA algorithms. Again, the primary objective of these
surveys is to help the reader keep abreast of the latest IQA
techniques. Section surveys full-reference IQA algorithms.
Section surveys no-reference and reduced-reference IQA
algorithms. For a more specic and thorough discussion of
the use of natural-scene statistics for image and video quality
assessptment, I refer you to the recent review by Bovik [].
One point should become evident aer reading previous
reviews and the reviews provided in Sections ,,andof
this paper: remarkable progress has been made since the pio-
neering IQA work of Budrikis, Goldmark, Sakrison, Schade,
Stockham,Winch,andothers.TodaysIQAalgorithmscan
perform extremely well at predicting quality for a variety of
images and distortion types.
Yet, beneath the surface of this seemingly orderly picture,
behind the scenes of this wealth of IQA knowledge that we
have gained lies a more cloudy portrait fueled by a growing
number of counterexamples—images, distortions, and other
alterations—which modern IQA algorithms are ill-equipped
to handle. Under the covers of the numerous successes in IQA
research lies a long list of unanswered questions and unsolved
challenges.
In Section , I discuss seven of these challenges. Some of
the challenges are fundamental; some are more application-
specic; most of the challenges have been or are actively being
researched. But all remain largely unsolved.
ISRN Signal Processing
() Section . discusses the challenges IQA researchers
face when designing a model of human visual pro-
cessing which can cope with natural images. is
section highlights the need for improved models of
primary visual cortex, the need for more ground-
truth data on natural images, and the need for models
which incorporate processing by higher-level visual
areas.
() Section . discusses the challenges researchers
face when designing an algorithm that can cope
with the variety of distortions that IQA algorithms
can encounter. is section discusses the need for
improved visual summation models which can han-
dle the broadband nature of distortions, and the need
for more research on the perception of suprathreshold
distortions.
() Section . discusses the challenges researchers face
when designing an IQA algorithm that can model the
inuence of the distortion on the images appearance.
is section discusses the dierences between distor-
tions which are perceived as additive and distortions
which aect the images objects. is section also
highlights the need to consider the adaptive visual
strategies and other higher-level eects that humans
use when judging quality.
() Section . discusses the challenges researchers face
when designing an IQA algorithm that can cope
with images which are simultaneously distorted by
multiple types of distortion. is section reviews
previous work on the eects of multiple distortions
on image quality, and it discusses the potential per-
ceptual interactions between the distortions and their
joint eects on images.
() Section . discusses the challenges researchers face
when designing an IQA algorithm that can deal with
geometric changes to images. is section reviews
existing IQA algorithms which have been designed
to handle basic geometric changes, and it discusses
research eorts on IQA of textures, which can contain
more radical geometric and photometric changes.
() Section . discusses the challenges researchers face
when designing an IQA algorithm that can perform
IQA of enhanced images. is section describes
eorts to model the perceptual eects of enhance-
mentonquality,anditdiscussestheneedformore
thorough image quality databases which contain
enhanced images.
() Section . discusses the challenges surrounding run-
time performance and memory requirements of IQA
algorithms. is section reviews previous eorts to
accelerate existing IQA algorithms, and it discusses
the need for further related performance analyses and
accelerations.
It is important to note that these seven challenges are by
no means an exhaustive list of research topics in IQA that
require further investigation. Rather, I have selected these
particular challenges to highlight some key limitations of cur-
rentIQAknowledgeandtopointoutareaswhichcanbegin
to answer broader questions on IQA. Additional important
open challenges can be garnered from the Proceedings of SPIE
Image Quality and System Performance”and“Human Vision
and Electronic Imaging,” a m o n g o t h e r s .
2. Image Quality Assessment by Humans
A common approach toward designing an IQA algorithm
is to rst consider the physical attributes of images that
humans nd pleasing or displeasing. By understanding how
these physical changes give rise to perceptual changes, one
can begin to develop an estimate of image quality based
on measures of the physical changes. Numerous studies in
the elds of visual psychophysics and visual neuroscience
have quantied relationships between the physical attributes
of visual stimuli and the corresponding psychological and
neurophysiological responses. e results of these studies
have provided important insights into the goals and functions
of the HVS, and many of these ndings have been used in
IQA algorithms. In Section ., I provide a brief review of the
basic properties of the HVS that are commonly taken into
account—either explicitly or implicitly—in the vast majority
of IQA algorithms.
Another approach toward gaining insight into how
humans judge quality is to directly collect quality ratings from
arepresentativepoolofhumansubjectsonadatabaseof
altered images. Several such quality-rating studies have been
conducted, and the results of these studies are commonly
released in the form of the so-called image quality databases.
ese databases generally contain the set of reference and
altered images used in the study, along with corresponding
averagequalityratingsforeachalteredimage.InSection.,
I provide a survey of the various publicly available image
quality databases, including a brief discussion of how the data
are used for quantifying the predictive performances of IQA
algorithms.
2.1. Psychophysical Underpinnings of Image Quality. Research
in visual psychophysics aims to provide a better understand-
ing of the human visual system (HVS) by linking changes in
the physical attributes of a visual stimulus the to correspond-
ing changes in psychological responses (visual perception and
cognition). ese studies generally entail carefully designed
experiments on human subjects using highly controlled
visual stimuli and viewing conditions. Many of the most
fundamental properties of visual perception which are used
for IQA have been obtained from the results of such studies;
themostcommonlyusedofthesepropertiesaresummarized
in this section.
It must be stressed that the primary goal of the vast major-
ity of research in visual psychophysics is to gain knowledge
of how the HVS operates; any relations to image quality are
usually secondary and are usually not extensively discussed
in such studies. Consequently, it is oen up to the designer of
an IQA algorithm to decide how the psychophysical ndings
ISRN Signal Processing
relate to image quality. Nonetheless, due in part to the increas-
ing popularity of IQA algorithms, an increasing number of
psychophysical studies have been devoted specically toward
image quality (e.g., []).
2.1.1. Contrast Sensitivity Function. Psychophysical studies
have shown that the minimum contrast needed to detect
a visual target (e.g., distortions) depends on the spatial
frequency of the target [,]. is minimum contrast is
called the contrast detection threshold,andtheinverseof
this threshold is called contrast sensitivity.Whencontrast
sensitivity is plotted as a function of the spatial frequency
of the target, the resulting prole is the contrast sensitivity
function (CSF).
Contrast thresholds for sine waves were rst measured
by Schade [] in an experiment that presented human
observers with achromatic sine-wave gratings of various
spatial frequencies. e key result of Schades experiment
was the discovery that contrast sensitivity varies with the
spatial frequency of the grating; the resulting CSF is bandpass,
indicating that we are least sensitive to very-low-frequency
and very-high-frequency targets, with a peak in sensitivity
near – cycles per degree of visual angle (c/deg).
e reduction in sensitivity at high frequencies has been
attributed to the optics of the eye, to receptor spacing, and to
quantum noise. Reduced sensitivity at low spatial frequencies
isbelievedtooccur,inpart,bylimitedreceptiveeldsizesand
by masking eects imposed by the target’s DC component.
However, when contrast sensitivity is measured using Gabor
functions, the CSF tends to be much more low pass [], and
such low-pass-type CSFs are most commonly utilized in IQA
algorithms. e CSF has also been measured as a function of
the orientation of the sine-wave grating, commonly resulting
in reduced sensitivity to diagonal orientations as compared
to horizontal and vertical orientations (the oblique eect
[,]). Alternative theories of the neural underpinnings
of the CSF have also been proposed based on the statistical
properties of natural scenes [,].
In IQA algorithms, the CSF is commonly taken into
account by preltering the images with a D spatial lter
designed based on the psychophysical results. One popular
CSF lter, which is shown in Figure ,wasproposedby
Mannos and Sakrison []andfurtheradjustedbyDaly[];
its frequency response, (,),isgivenby
,=2.60.019+𝜃−(𝜆𝑓𝜃)1.1,if ≥
peak c/deg,
0.981, otherwise,()
where denotes the radial spatial frequency in c/deg and ∈
[−,]denotes the orientation and where
𝜃=
[0.15cos (4)+0.85]()
accounts for the oblique eect (see []). In Figure ,the
parameter was set to =0.228, resulting in the CSF taking
on its maximum value of . at peak =4c/deg (and forced
to be this value for frequencies below peak)when=0or
/2. A very thorough treatment of the use of the CSF in IQA
has been published by Barten [].
2.1.2. Visual Masking. Another nding from the visual per-
ception research which is commonly taken into account in
IQA algorithms is the fact that certain regions of an image can
hide distortions better than other regions, a nding that can
be attributed to visual masking []. Visual masking is a gen-
eral term that refers to the perceptual phenomenon in which
the presence of masking signal (the mask)reducesasubjects
ability to detect a given target signal. e task of detection
becomes a masked detection, and contrast thresholds denote
masked detection thresholds.InIQA,itiscommonlyassumed
that the image serves as the mask and the distortions serve as
the target of detection.
Luminance masking and pattern masking are the two
most common forms of masking employed in IQA algo-
rithms. Detection thresholds tend to increase due to an
increase in the luminance of the background (mask) upon
which the target is placed (luminance masking, [,]), a
process which is believed to be mediated by retinal adaptation
[]. For masks consisting of spatial patterns, detection
thresholds also tend to increase when the contrast of the mask
is increased [,,], a postretinal process believed to be
attributable to cortical processing []. Current explanations
of pattern masking can generally be divided into three
paradigms:
() noise masking, which attributes the reduction in
sensitivity to the corruptive eects of the mask on
internal decision variables [];
() contrast masking, which attributes reduction in sensi-
tivity to contrast gain control [] (discussed later);
() entropy masking, which attributes reduction in sen-
sitivity to an observer’s unfamiliarity with the mask
[].
Because a mask’s contrast is readily computable, contrast
masking has been exploited in a variety of IQA and image
processing applications (e.g., []; see Section .). e
extent to which a mask constitutes visual noise and the
extent to which an observer is unfamiliar with a mask are
phenomena which are more dicult to quantify; accordingly,
noise and entropy masking are less commonly used in IQA
(though, see []).
Contrast masking results are commonly reported in the
form of threshold-versus-contrast (TvC) curves, in which
masked detection thresholds are plotted as a function of
thecontrastofthemask.Figuredepicts TvC curves for
the detection of a sine-wave grating presented against noise
and sine-wave-grating masks (aer []). Masked detection
thresholds generally increase as the contrast of the mask is
increased and oen demonstrate a region of facilitation (i.e.,
a decrease in threshold; “dipper eect”) at lower mask con-
trasts, depending on the dimensional relationships between
the target and the mask (e.g., dierences in spatial frequency,
orientation, and phase). Note that learning eects have been
showntolowertheslopesoftheTvCcurves[,].
ISRN Signal Processing
Vertical spatial frequency (c/deg)
Horizontal spatial frequency (c/deg)
Gain
3D view of CSF filter
40
20
0
0
20
40
40
20
0
20
40
0.2
0.4
0.6
0.8
1
(a)
Vertical spatial frequency (c/deg)
Horizontal spatial frequency (c/deg)
Top-down view of CSF filter
40
20
0
20
40
40 20 0 20 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(b)
0 5 10 15 20 25 30 35 40
50
40
30
20
10
0
Gain (dB)
Radial spatial frequency (c/deg)
±45orientation
0or 90
orientation
1D slice of CSF filter
(c)
F : Frequency response of a D lter used to model the CSF. is particular model is from Mannos and Sakrison []withfurther
adjustingbyDaly[]; see ().
InIQA,avarietyofmethodshavebeenusedtoaccount
for masking, particularly in full-reference IQA (discussed
later in Section ). A common approach to explicitly account
for masking is to measure the local luminance and contrast
in the reference image and then attenuate the estimate of
the visibility of the distortions in the distorted image based
on these measures (e.g., using a power-function relationship
between attenuation and contrast). Other IQA algorithms
implicitly incorporate masking either by using local statistical
measures which take into account the local contrast or by
adjusting the simulated neural responses in the context of a
computational neural model of the HVS (see Section ).
2.1.3. Multichannel Model of the HVS. Schade used sine-
wave gratings in his CSF study based on the notion that any
stimulus can be described as a superposition of sine-waves.
Campbell and Robson [] extended this idea by measuring
detection thresholds for both sine-wave and square-wave
gratings. Because a square wave is composed of numerous
sine waves, the peak-to-peak contrast of a square wave
will always be lower than the peak-to-peak contrast of its
fundamentalsinewave(byafactorofapproximately.in
[]). e results from Campbell and Robsons experiment
revealed that the thresholds for the square-wave gratings were
indeed approximately . times lower than those found for
the sine-wave gratings. ey concluded from this nding that
the HVS performs a local spatial-frequency decomposition of
a stimulus in which the frequency components are detected
independently via multiple spatial-frequency channels.is
paradigm is known as the multichannel model of human
vision [].
ISRN Signal Processing
2.521.510.5
log10 contrast of target
log10 contrast of mask
Noise mask
Sine-wave mas
k
Target
Slope
1
Slope 0.7
F : Demonstrative threshold-versus-contrast (TvC) curves
for detection of a target consisting of a sine-wave grating in the
presence of noise or sine-wave-grating masks. e horizontal axis
denotes the contrast of the mask; the vertical axis denotes the
contrast of the target (relative to the contrast threshold for detecting
the target in the unmasked condition). Note that learning eects
have been shown to lower the slopes of the TvC curves [].
Further evidence in support of the multichannel model
has been provided by visual adaptation and summation
experiments [,]. e CSF measured for a subject adapted
to a sine-wave grating of a particular spatial frequency or
orientation shows attenuation only within a narrow band
of frequencies/orientations around the frequency/orientation
of the grating (approximately - octaves, – degrees)
[,]. Visual summation experiments have revealed that
a compound target (e.g., a plaid composed of two sine waves)
is detectable only when one of its components reaches its
own detection threshold, a nding which is consistent with a
multichannel model with independent channels [,]
(the components of the compound target must be separated
in spatial frequency by at least one octave or in orientation
by at least –;see[]). Similar experiments have
shownchannelstunedtootherdimensionssuchascolorand
direction of motion [,].
e multichannel model has also been used to explain
the shape of the CSF. Brady and Field [] and Graham et al.
[] predicted the shape of the CSF via a model with equally
sensitive spatial-frequency channels; reduction in detection
performance for high spatial frequencies was attributed to
extrinsic noise that dominates the response of channels tuned
to high frequencies, thus resulting in decreased signal-to-
noise ratios for these higher-frequency channels.
2.1.4. Computational Neural Models of V1. e multichannel
model has inspired several related computational neural
models of primary visual cortex (V). ese computational
models have been used both to predict masking results and
for IQA [,,]. Models of this type rst compute
modeled neural responses to the reference image (mask),
then compute modeled neural responses to the distorted
image (mask + target), and then deem the distortions (target)
detectable if the two sets of neural responses suciently
dier. Quality can be estimated based on the predicted
masked thresholds and/or the dierence in simulated neural
responses.
Figures and show block diagrams of the stages used
in a typical computational neural model of V used to predict
masked detection thresholds (Figure )orusedtoestimate
quality (Figure ). e pixel values of the reference and
distorted images are rst converted to either luminance or
lightnessvalues,andthenbothimagesarelteredwithaD
spatial lter designed to mimic the CSF. Alternatively, the
CSF can be accounted for by scaling the coecients of the
frequency-based decomposition used to mimic the neural
array. Next, two sets of simulated neural-array responses
(one set for the reference image, one set for the distorted
image) are computed via a lterbank. Further adjustments are
made to account for neural nonlinearities and interactions
(gain control) [,]. e adjusted neural responses
are then compared and collapsed across space, frequency,
and orientation. e resulting threshold prediction or quality
estimate is determined based on the comparison, that is,
based on the extent to which the simulated neural responses
to the reference image (mask) dier from the simulated
neural responses to the distorted image (mask + target).
Frequency-Based Decomposition. To simulate an array of
visual neurons in primary visual cortex (V), and to account
for the multichannel analysis performed by the HVS, the
computational models employ some form of local frequency-
based decomposition. Standard approaches to this decom-
position include a steerable pyramid (e.g., []), a Gaussian
pyramid (e.g., []), an overcomplete wavelet decomposition
(e.g., []), radial lters (e.g., []), and cortex lters (e.g.,
[,,]).
As shown in Figure , at a particular scale/orientation
of this local frequency-based decomposition, the resulting
matrix of transform coecients represents the initially linear
responses of a simulated array of neurons located at each
spatial position in the image. Although real neurons cannot
yield negative responses, negative coecients are permitted
andassumedtomodeltheresponsesofco-locatedneurons
that are tuned 180out-of-phase (i.e., with an inhibitory
central region and excitatory anking regions).
e parameters of the decomposition are oen tuned
based on psychophysical and neurophysiological data (e.g.,
ve or more radial frequency bands with - octave band-
widths, – orientations with –bandwidths). In an
IQA setting, the spatial-frequency decomposition is applied
to both the reference image and the distorted image, yielding
two sets of coecients. e resulting coecients are meant to
simulate the initial linear responses of the neurons; they must
be further adjusted to account for the neurons’ nonlinear
response properties.
Gain Control. Numerous studies have shown that the
responses of neurons in V are nonlinearly related to the
contrastofthestimulustowhichtheneuronsareexposed(see
[,]). In the low-contrast regime, the neurons exhibit
a threshold-type behavior in which a minimum contrast is
required in order to yield any response. In the high-contrast
regime, the neurons exhibit a saturation-type behavior in
ISRN Signal Processing
Preprocessed,
CSF-filtered
Nonlinearities/
interactions
Neural
array
Nonlinearities/
interactions
Neural
array
Preprocessed,
CSF-filtered
mask
Summation
Filterbank
Filterbank
Gain control
Gain control
Compare and collapse
Yes/no
(visible or not)
mask + target
Threshold T
?
F : Block diagram of a typical computational neural model of V used to predict masked detection thresholds (here with a noise mask
and a circular sine-wave target). ree key stages are employed in most neural models: () a frequency-based decomposition which models
the initially linear responses of an array of visual neurons, () application of a pointwise nonlinearity to the decomposition coecients
and inhibition based on the values of other coecients [,], and () pointwise dierences between the adjusted coecients and
summation of these adjusted coecient dierences across space, spatial frequency, and orientation so as to arrive at a single scalar dierential
response value or a map of dierential response values.
Preprocessed,
CSF-filtered
dist. image
Quality
(map or scalar)
Preprocessed,
CSF-filtered
ref. image
Nonlinearity
Decision
Nonlinearities/
interactions
Neural
array
Nonlinearities/
interactions
Neural
array
Summation
Filterbank
Filterbank
Gain control
Gain control
Compare and collapse
estimate
F : Block diagram of a typical computational neural model of V used to estimate quality. ree key stages are employed in most
neural models: () a frequency-based decomposition which models the initially linear responses of an array of visual neurons, () application
of a pointwise nonlinearity to the decomposition coecients and inhibition based on the values of other coecients [,], and ()
pointwise dierences between the adjusted coecients and summation of these adjusted coecient dierences across space, spatial frequency,
and orientation so as to arrive at a single scalar dierential response value or a map of dierential response values.
which further increases in contrast yield no corresponding
increases in response. Studies have also shown that responses
of V neurons can be inhibited by neighboring neurons
in space, frequency, and orientation (the inhibitory pool).
is inhibition from the neighboring neurons is commonly
attributed to a gain control mechanism which is designed
to keep the neuron operating in its linear regime and thus
prevent saturation.
To account for these response properties, neural models
apply a divisive normalization to the coecients of the local
frequency-based decomposition. Let (0,0,0)correspond
to the coecient at location 0, center frequency 0,and
orientation 0.e(nonlinear)responseofaneurontuned
to these parameters, (0,0,0),ismostoensimulatedvia
0,0,0=⋅ 0,00,0,0𝑝
𝑞+(𝑢,𝑓,𝜃)∈𝑆 ,,,𝑞,()
where is a gain factor, (,) represents an optional
weight designed to take into account the CSF, represents
a saturation constant, provides the pointwise nonlinearity
to the current neuron, provides the pointwise nonlinearity
to the neurons in the inhibitory pool, and the set indicates
which other neurons are included in the inhibitory pool. e
ISRN Signal Processing
3
Neurons analyzing three
example spatial locations
Simulated neural responses
at the three locations
Simulated neural responses
at all locations
Neurons analyzing all spatial
locations (entire image)
Simulated V1
receptive field
Note: negative coefficients
are assumed to model
responses of colocated
neurons with the following ++++++++
++++++++
++++++++
++++++++
++++++++
5
0
+
++
+
++
neuron’s classical
180out-of-phase tuning:
F : Demonstration of how spatial ltering (local frequency-based decomposition at a particular scale and orientation) mimics the
initially linear responses of an array of similarly tuned neurons tiled across space. Note that actual V neurons cannot yield a negative response;
however, negative values are still computed during the ltering and are assumed to model the responses of another array of neurons with a
180of-out-phase tuning (i.e., with an inhibitory central region and excitatory anking regions).
parameters ,,,andare commonly adjusted to t the
experimental masking data. For example, model parameters
have been optimized for detection thresholds measured using
simple sinusoidal gratings [], for ltered white noise [],
and for TvC curves of target Gabor patterns with sinusoidal
masks [,]. Typically, and are in the range 2≤≤
≤4, and the inhibitory pool consists of neural responses in
thesamespatial-frequencyband(0), at orientations within
±45of 0andwithinalocalspatialneighborhood(e.g.,-
connected neighbors).
Equation () is applied to each coecient of the decom-
position of the reference image and to each coecient of the
decomposition of the distorted image. is operation results
in two sets of simulated neural responses: () a set of neural
responses to the reference image {ref(,,)}and()asetof
neural responses to the distorted image {dst(,,)}.
Summation of Responses. enalstageusedinmostV
models entails comparing the two sets of simulated neural
responses {ref(,,)} and {dst(,,)}.Whenusedasa
masking model (Figure ), to generate a map indicating the
local visibility of the target, the responses at each location
are compared and pooled across frequency and orientation
as follows.
Distortions are visible at location
=
Ye s ,
𝑓, 𝜃 ref ,,dst ,,𝛽1/𝛽 ≥,
No,otherwise,
()
where is a predened threshold which is typically held
constant across images and where the summation exponent
is either chosen to match published results from summation
studies or adjusted to t published masking data. In an IQA
setting (Figure ), the comparison with is oen replaced
with a sigmoid or logistic nonlinearity that maps the -norm
to an estimate of quality.
Numerous variations of ()havebeenproposedin
the literature; oen the models are tuned to t specic
psychophysical data. e summation can also be applied
across a local spatial neighborhood around to determine
a regional rather than a pointwise visibility. However, it is
important to emphasize that the neural model is designed to
mimic an array of visual neurons. is neurophysiological
underpinning limits the choice of model parameters and
operations to those which are biologically plausible.
In Section ., I review several IQA algorithms which have
employed variants of this V-based model. It is also important
to note that the vast majority of masking data have been
obtained using simplistic, highly controlled targets (e.g., sine
waves or Gabor patches) presented against unnatural masks
(e.g., sine waves, Gabor patches, and noise). Consequently,
most computational V models employ parameters which
have been selected for such targets and masks. As I discuss
later in Section ., images can impose unique perceptual
eects which cannot be fully captured by current V models.
2.2. Image Quality Databases. Another approach toward
gaining insight into how humans judge quality is to directly
collect quality ratings from a representative pool of human
subjects on a database of altered images. Such ratings can also
 ISRN Signal Processing
be used to evaluate and rene IQA algorithms. Image quality
databases provide this crucial ground-truth information.
ese databases typically contain a set of reference and altered
images and average ratings of quality for each distorted
image. e averages are generally taken across subjects,
typically aer z-score normalization and other adjustments
(e.g., outlier tests) to attempt to account for individual
biases; see []. e resulting averages are almost always
reported in the form of mean opinion scores (MOS values)
or dierential mean opinion scores (DMOS values). For
databases containing distorted images, a larger MOS (smaller
DMOS) denotes greater quality, whereas a smaller MOS
(larger DMOS) denotes lesser quality. Some databases further
provide the standard deviations of the ratings across subjects.
Here, I rst briey summarize the existing publicly
available image quality databases, and then I discuss tech-
niques which are used to evaluate the performances of IQA
algorithms on the databases.
2.2.1. List of Image Quality Databases. ere are over  pub-
licly available image quality databases, the details of which
are described below and summarized in Table (ordered by
year of release). Many of these databases are listed as part
of the extensive list of multimedia databases provided by
the QUALINET consortium (European Network on Quality
of Experience in Multimedia Systems and Services) [].
Both Sheikh et al. [] and Lin and Kuo []haveprovided
analyses of the performances of various IQA algorithms on
some of these databases. In addition, in [], Winkler has
provided quantitative comparisons of various aspects (source
content, test conditions, and subjective ratings) of some of
these databases. Note that D image quality databases are not
listed here; see [].
(i) IRCCyN/IVC Image Quality Database (IVC).e
IRCCyN/IVC database [,<