ArticlePDF Available

Abstract and Figures

Recent advances in high dynamic range (HDR) capture and display technologies have attracted a lot of interest from scientific, professional, and artistic communities. As in any technology, the evaluation of HDR systems in terms of quality of experience is essential. Subjective evaluations are time consuming and expensive, and thus objective quality assessment tools are needed as well. In this paper, we report and analyze the results of an extensive benchmarking of objective quality metrics for HDR image quality assessment. In total, 35 objective metrics were benchmarked on a database of 20 HDR contents encoded with 3 compression algorithms at 4 bit rates, leading to a total of 240 compressed HDR images, using subjective quality scores as ground truth. Performance indexes were computed to assess the accuracy, monotonicity, and consistency of the metric estimation of subjective scores. Statistical analysis was performed on the performance indexes to discriminate small differences between metrics. Results demonstrated that metrics designed for HDR content, i.e., HDR-VDP-2 and HDR-VQM, are the most reliable predictors of perceived quality. Finally, our findings suggested that the performance of most full-reference metrics can be improved by considering non-linearities of the human visual system, while further efforts are necessary to improve performance of no-reference quality metrics for HDR content.
Content may be subject to copyright.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39
DOI 10.1186/s13640-015-0091-4
RESEARCH Open Access
Benchmarking of objective quality metrics
for HDR image quality assessment
Philippe Hanhart1*, Marco V. Bernardo2,3, Manuela Pereira3, António M. G. Pinheiro2and Touradj Ebrahimi1
Abstract
Recent advances in high dynamic range (HDR) capture and display technologies have attracted a lot of interest from
scientific, professional, and artistic communities. As in any technology, the evaluation of HDR systems in terms of
quality of experience is essential. Subjective evaluations are time consuming and expensive, and thus objective
quality assessment tools are needed as well. In this paper, we report and analyze the results of an extensive
benchmarking of objective quality metrics for HDR image quality assessment. In total, 35 objective metrics were
benchmarked on a database of 20 HDR contents encoded with 3 compression algorithms at 4 bit rates, leading to a
total of 240 compressed HDR images, using subjective quality scores as ground truth. Performance indexes were
computed to assess the accuracy, monotonicity, and consistency of the metric estimation of subjective scores.
Statistical analysis was performed on the performance indexes to discriminate small differences between metrics.
Results demonstrated that metrics designed for HDR content, i.e., HDR-VDP-2 and HDR-VQM, are the most reliable
predictors of perceived quality. Finally, our findings suggested that the performance of most full-reference metrics can
be improved by considering non-linearities of the human visual system, while further efforts are necessary to improve
performance of no-reference quality metrics for HDR content.
Keywords: Image quality assessment, Objective metrics, High dynamic range, JPEG XT
1 Introduction
Recently, the world of multimedia has been observing
a growth in new imaging modalities aiming at improv-
ing user immersion capability, providing more realistic
perception of content, and consequently, reaching new
levels of quality of experience. This trend has began
with the introduction of 3D capable devices in the con-
sumer market, providing depth perception, followed by
ultra high definition (UHD), focused on higher pixel res-
olutions beyond high definition, high frame rate (HFR),
to provide more fluid motion, and, more recently, high
dynamic range (HDR), intended to capture a wider range
of luminance values. Moreover, academia, industry and
service providers have proposed new models to further
enrich content, such as plenoptic and holographic sys-
tems, although these latest modalities are still in very early
stages. Many aspects need further improvement on such
new trends. For instance, consumers still experience some
*Correspondence: philippe.hanhart@epfl.ch
1Multimedia Signal Processing Group, EPFL, Lausanne, Switzerland
Full list of author information is available at the end of the article
lack of reliable 3D content and still suffer from discom-
fort caused by long exposure. UHD systems face a small
number of appropriate content although they have been
evolving into consumer markets.
HDR imaging systems are stepping into the multimedia
technologies for consumer market. HDR pursues a more
complete representation of information that the human
eye can see, capturing all the brightness information of
the visible range of a scene, even in extreme lighting con-
ditions. Hence, it pursues the representation of the entire
dynamic range and color gamut perceived by human
visual system (HVS). HDR imaging can be exploited to
improve quality of experience in multimedia applications
[1] and to enhance intelligibility in security applications
where lighting conditions cannot be controlled [2].
HDR systems are becoming available for the general
public. Acquisition systems are present in a large vari-
ety of photographic equipment and even in some mobile
devices. Typically, computer rendering and merging of
multiple low dynamic range (LDR) images taken at dif-
ferent exposure settings are the two methods used to
generate HDR images [3]. Nowadays, HDR images can
© 2015 Hanhart et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 2 of 18
also be acquired using specific image sensors. HDR dis-
plays are also becoming increasingly available and enable
representation of better contrasts, higher luminance, and
wider color gamut [4]. Optionally, tone mapping operators
(TMO) that map HDR content into the luminance range
and color gamut of conventional displays can be used [5].
In addition to the acquisition and display technolo-
gies, JPEG has been standardizing new codecs for HDR
content. JPEG XT is a recent standard for JPEG backward-
compatible compression of HDR images [6]. Using this
compression standard, HDR images are coded in two lay-
ers. A tone-mapped version of the HDR image is encoded
using the legacy JPEG format in a base layer, and the
extra HDR information is encoded in a residual layer. The
advantage of this layered scheme is that any conventional
JPEG decoder can extract the tone-mapped image, keep-
ing backward compatibility and allowing for display on a
conventional LDR monitor. Furthermore, a JPEG XT com-
pliant decoder can use the residual layer to reconstruct
a lossy or even lossless version of the HDR image. Cur-
rently, JPEG XT defines four profiles (A, B, C, and D) for
HDR image compression, of which profile D is a very sim-
ple entry-level decoder that roughly uses the 12-bit mode
of JPEG. Profiles A, B, and C all take into account the
non-linearity of the human visual system. They essentially
differ on the strategy used for creating the residual infor-
mation and on the pre- and post-processing techniques. In
profile A, the residual is represented as a ratio of the lumi-
nance of the HDR image and the tone-mapped image after
inverse gamma correction. The residual is log-encoded
and compressed as an 8-bit greyscale image [7]. In pro-
file B, the image is split into “overexposed” areas and LDR
areas. The extension image is represented as a ratio of
the HDR image and the tone-mapped image, after inverse
gamma correction. Note that instead of a ratio, profile B
uses a difference of logarithms. Finally, rofile C computes
the residual image as a ratio of the HDR image and the
inverse tone-mapped image. Unlike the other profiles, the
inverse TMO is not a simple inverse gamma, but rather a
global approximation of the inverse of the (possibly local)
TMO that was used to generate the base-layer image. Sim-
ilarly to profile B, the ratio is implemented as a difference
of logarithms. However, instead of using the exact math-
ematical log operation, profile C uses a piecewise linear
approximation, defined by re-interpreting the bit-pattern
of the half-logarithmic IEEE representation of floating-
point numbers as integers, which is exactly invertible [8].
MPEG is also starting a new standardization effort on
HDR video [9], revealing the growing importance of HDR
technologies.
As for any technology, evaluation of HDR systems, in
terms of quality of experience, is essential. Subjective eval-
uations are time consuming and expensive, thus objective
quality assessment tools are needed as well. To the best
of our knowledge, only three objective metrics have been
developed so far for HDR content. The most relevant
work on this domain is the HDR visual detection predic-
tor (HDR-VDP) metric proposed by Mantiuk et al. [10],
which is an extension of Daly’s VDP [11] for the HDR
domain. The second version of this metric, HDR-VDP-
2 [12, 13], is considered as the state-of-the-art in HDR
image quality assessment. The dynamic range indepen-
dent metric (DRIM) proposed in [14] can also be used for
HDR quality assessment. Nevertheless, this metric results
in three distortion maps, which is difficult to interpret,
as there is no pooling of the different values. Recently,
the high dynamic range video quality metric (HDR-VQM)
was proposed by Narwaria et al. [15]. The metric was
designed for quality assessment of HDR video content, but
canalsobeusedforHDRstillimages.
To overcome the lack of HDR objective metrics, LDR
metrics, e.g., PSNR, were also used to evaluate HDR qual-
ity, especially in early HDR studies. However, LDR metrics
are designed for gamma encoded images, typically hav-
ing luminance values in the range 0.1–100 cd/m2, while
HDRimageshavelinearvaluesandaremeanttocap-
ture a much wider range of luminance. Originally, gamma
encoding was developed to compensate for the charac-
teristics of cathode ray tube (CRT) displays, but it also
takes advantage of the non-linearity in HVS to optimize
quantization when encoding an image [16]. Under com-
mon illumination conditions, the HVS is more sensitive
to relative differences between darker and brighter tones.
According to Weber’s law, the HVS sensitivity approxi-
mately follows a logarithm function for light luminance
values [17]. Therefore, in several studies, LDR metrics
have been computed in the log domain to predict HDR
quality. However, at the darkest levels, the HVS sensitivity
is closer to a square-root behavior, according to Rose-
DeVries law [18, 19]. To extend the range of LDR metrics
and to consider the sensitivity of the HVS, Aydin et al.
[20] have proposed the perceptually uniform (PU) encod-
ing. Another approach to apply LDR metrics on HDR
images was proposed in [21]. This technique consists
in tone-mapping the HDR image to several LDR images
with different exposure ranges and to take the average
objective score computed on each exposure. However,
this approach is more time consuming and requires more
computational power, proportionally to the number of
exposures.
For LDR content, extensive studies have shown that not
all metrics can be considered as reliable predictors of per-
ceived quality [22, 23], while only a few recent studies have
benchmarked objective metrics for HDR quality assess-
ment. The study of Valenzise et al. [24] compared the
performance of PSNR and SSIM, computed in the loga-
rithmic and PU [20] spaces, and HDR-VDP. The authors
have concluded that non-uniformity must be corrected for
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 3 of 18
a proper metric application, as most have been designed
for perceptual uniform scales. Another subjective study
was reported by Mantel et al. in [25]. A comparison with
objective metrics in physical domain and using a gamma
correction to approximate perceptually uniform lumi-
nance is also presented, concluding that the mean relative
squared error (MRSE) metric provides the best perfor-
mance in predicting quality. The correlation between 13
well-known full-reference metrics and perceived quality
of compressed HDR content is investigated in [26]. The
metrics were applied on the linear domain, and results
show that only HDR-VDP-2 and FSIM predicted visual
quality reasonably well. Finally, Narwaria et al. [15] have
reported that their HDR-VQM metric performs simi-
lar or slightly better than HDR-VDP-2 for HDR image
quality assessment. Regarding HDR video quality assess-
ment, four studies were also reported by Azimi et al. [27],
Rerabek et al. [28], Hanhart et al. [9], and Narwaria et
al. [15]. The authors of [15] found that HDR-VQM is the
best metric, far beyond HDR-VDP-2, in contradiction to
the findings of [9], which showed lower performance for
HDR-VQM when compared to HDR-VDP-2. Also, the
other two studies found that HDR-VDP-2 has the high-
est correlation. The divergence between these findings
might be due to the contents and types of artifacts con-
sidered in the different studies. Indeed, the first three
studies consider HDR video sequences captured using
HDR video cameras, manually graded, and encoded with
compression schemes based on AVC or HEVC, whereas
Narwaria et al. have mostly used computer-generated
contents, an automatic algorithm to adjust the luminance
of the HDR video sequences, and their own backward
compatible HDR compression algorithm.
The main limitation of these studies lies in the small
number of images or video sequences used in their exper-
iments, which was limited to five or six contents. Also,
a proper adaptation of the contents to the HDR display
and correction of the metrics for non-uniformity were
not always considered. Therefore, in this paper, we report
and analyze the results of an extensive benchmarking of
objective quality metrics for HDR image quality assess-
ment. In total, 35 objective metrics were benchmarked
using subjective scores as ground truth. The database used
in our experiments [29, 30] is composed of 20 different
contents and a total of 240 compressed HDR images with
corresponding subjective quality scores. The HDR images
are adapted (resized, cropped, and tone-mapped using
display-adaptive tone-mapping operator) to SIM2 HDR
monitor. The objective metrics were computed in the lin-
ear, logarithmic, PU [20], and Dolby perceptual quantizer
(PQ) [31] domains. Additionally, the metrics were com-
puted both on the luminance channel alone and as the
average quality score of the Y,Cb,andCrchannels. For
each metric, objective scores were fitted to subjective
scores using logistic fitting. Performance indexes were
computed to assess the accuracy, monotonicity, and con-
sistency of the metrics estimation of subjective scores.
Finally, statistical analysis was performed on the perfor-
mance indexes computed from 240 data points to discrim-
inate small differences between two metrics. Hence, with
thisstudy,weexpecttoproduceavalidcontributionfor
future objective quality studies on HDR imaging.
The remainder of the paper is organized as follows.
The dataset and corresponding subjective scores used
as ground truth are described in Section 2.1. The dif-
ferent metrics benchmarked in this study are defined
in Section 2.2. In Section 2.3, the methodology used to
evaluate the performance of the metrics is described.
Section 3 provides a detailed analysis of the objective
results and discusses the reliability of objective metrics.
Finally, Section 4 concludes the paper.
2 Methodology
The results of subjective tests can be used as ground truth
to evaluate how well objective metrics estimate perceived
quality. In this paper, we use the publicly available dataset
provided by Korshunov et al. [29, 30] to benchmark 35
objective metrics. This section describes in details the
dataset, objective metrics, and performance analysis used
in our benchmark.
2.1 Dataset and subjective scores
The dataset is composed of 20 HDR images with a reso-
lution of 944 ×1080 pixels. The dataset contains scenes
with architecture, landscapes, and portraits and is com-
posed of HDR images fused from multiple exposure pic-
tures, frames extracted from HDR video, and computer-
generated images. Since publicly available HDR images are
usually not graded, the images are adjusted for a SIM2
HDR monitor using a display-adaptive TMO [32] to map
the relative radiance representation of the images to an
absolute radiance and color space of the HDR monitor.
These display-adapted images are then considered as orig-
inal images and compressed with JPEG XT using profiles
A, B, and C. The base and extension layers chroma-
subsampling are set to 4:2:0 and 4:4:4, respectively, while
optimized Huffman coding is enabled for all implemen-
tations. For each content and profile, four different bit
rates were selected, leading to a total of 240 compressed
HDR images. Figure 1 shows tone-mapped versions of the
images in the dataset, and Table 1 reports the dynamic
range and key [33] characteristics of these images. The key
is in the range [ 0, 1] and gives a measure of the overall
brightness
key =log Lavg log Lmin
log Lmax log Lmin
(1)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 4 of 18
a
h
opqr st
ijklmn
bc def g
Fig. 1 Display-adapted images of the dataset. The reinhard02 TMO was used for images from agand the mantiuk06 TMO was used for the
remaining images. Copyrights: 2006-2007 Mark D. Fairchild, Blender Foundation | www.sintel.org, under Creative Commons BY, #Mark Evans,
under Creative Commons BY
where Lmin,Lmax ,andLavg are the minimum, maximum,
and average luminance values, respectively, computed
after excluding 1 % of the darkest and lightest pixels.
The evaluation was performed using a full HD 47” SIM2
HDR monitor with individually controlled LED backlight
modulation, capable of displaying content with luminance
values ranging from 0.001 to 4000 cd/m2.Ineverysession,
three subjects assessed the displayed test images simul-
taneously. They were seated in an arc configuration, at a
constant distance of 3.2 times the picture height as recom-
mended in [34], which corresponds to 1.87 m and a visual
resolution of 60 pixels per degree. The laboratory was
equipped with a controlled lighting system with a 6500 K
color temperature, while a mid gray color was used for
all background walls and curtains. The background lumi-
nance behind the monitor was set to 20 cd/m2and did not
directly reflect off of the monitor.
The double-stimulus impairment scale (DSIS) Variant
I methodology [35] was used for the evaluation. For
scoring, a five-grade impairment scale (1: very annoying,
2: annoying,3:slightly annoying,4:perceptible, but not
annoying,5:imperceptible) was used. Two images were
presented in side-by-side fashion with 32 pixels of black
border separating the two images: one of the two images
was always the reference (unimpaired) image, while the
other was the test image. To reduce the effect of order of
images on the screen, the participants were divided into
two groups: the left image was always the reference image
for the first group, whereas the right image was always
the reference image for the second group. After the pre-
sentation of each pair of images, a six-second voting time
followed. Subjects were asked to rate the impairments of
the test images in relation to the reference image.
Before the experiment, oral instructions were provided
to the subjects to explain their tasks. Additionally, a train-
ing session was organized, allowing subjects to familiarize
themselves with the test procedure. For this purpose two
images outside of the dataset were used. Five samples were
manually selected by expert viewers for each image so
that the quality of samples was representative of the rating
scale.
Sincethetotalnumberoftestsampleswastoolargefor
a single test session, the overall experiment was split into
three sessions of approximately 16 min each. Between the
sessions, subjects took a 15-min break. The test material
was randomly distributed over the test sessions. To reduce
contextual effects, the order of displayed stimuli was ran-
domized applying different permutations for each group
of subjects, whereas the same content was never shown
consecutively.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 5 of 18
Table 1 Characteristics of HDR images from the dataset
Dynamic range Key
507 4.097 0.743
AirBellowsGap 4.311 0.768
BloomingGorse2 2.336 0.748
CanadianFalls 2.175 0.729
DevilsBathtub 2.886 0.621
dragon 4.386 0.766
HancockKitchenInside 4.263 0.697
LabTypewriter 4.316 0.733
LasVegasStore 4.131 0.636
McKeesPub 3.943 0.713
MtRushmore2 4.082 0.713
PaulBunyan 2.458 0.702
set18 4.376 0.724
set22 3.162 0.766
set23 3.359 0.764
set24 3.862 0.778
set31 4.118 0.678
set33 4.344 0.698
set70 3.441 0.735
showgirl 4.369 0.723
sintel 3.195 0.781
WillyDesk 4.284 0.777
Min 2.175 0.621
Max 4.386 0.781
Mean 3.722 0.727
Median 4.089 0.731
A total of 24 naïve subjects (12 females and 12 males)
took part in the experiments. Subjects were aged between
18 and 30 years old with an average of 22.1. All subjects
were screened for correct visual acuity and color vision
using Snellen and Ishihara charts, respectively.
The subjective scores were processed by first detect-
ing and removing subjects whose scores deviated strongly
from others. The outlier detection was applied to the set
of results obtained from the 24 subjects and performed
according to the guidelines described in Section 2.3.1 of
Annex 2 of [35]. Two outliers were detected. Then, the
mean opinion score (MOS) was computed for each test
stimulus as the mean score across the 22 valid subjects,
as well as the associated 95 % confidence interval (CI),
assuming a Student’s t-distribution of the scores. As it
can be observed in Fig. 2, MOS values reflect the sub-
jects perception fairly with enough MOS samples for each
meaningful value range. More details about the dataset
and subjective evaluations can be found in [29].
2.2 Objective quality metrics
Depending on the amount of information required about
the reference image, objective metrics can be classified
into three categories:
i) Full-reference (FR) metrics, which compare the test
image with a reference image
ii) Reduced-reference (RR) metrics, which have access
to a number of features from the reference image,
extract the same features from the test image and
compare them
iii) No-reference (NR) metrics, which do not use any
information about the reference image
In this study, only FR and NR metrics were considered.
2.2.1 Full-reference metrics
To the best of our knowledge, there are only two metrics
for HDR quality assessment that have a publicly available
implementation: (1) HDR-VDP: high dynamic range vis-
ible difference predictor [10, 12, 13] and (2) HDR-VQM:
an objective quality measure for high dynamic range
video [15].
The original HDR-VDP metric [10] was the first met-
ric designed for HDR content. It is an extension of the
VDP model [11] that considers a light-adaptive contrast
sensitivity function (CSF), which is necessary for HDR
content as the ranges of light adaptation can vary substan-
tially. The metric was further extended [12] with different
features, including a specific model of the point spread
function (PSF) of the optics of the eye, as human opti-
cal lens flare can be very strong in high contrast HDR
content. The front-end amplitude non-linearity is based
on integration of the Weber-Fechner law. HDR-VDP is a
calibrated metric and takes into account the angular res-
olution. The metric uses a multi-scale decomposition. A
neural noise block is defined to calculate per-pixel proba-
bilities maps of visibility and the predicted quality metric.
In this study, we used the latest version of HDR-VDP, i.e.,
version 2.2.1 [13], referred to as HDR-VDP-2 in this paper.
HDR-VQM was designed for quality assessment of
HDR video content. The metric is computed in the PU
space and relies on a multi-scale and multi-orientations
analysis, similarly to HDR-VDP, based on a subband
decomposition using log-Gabor filters to estimate the
subband errors. The subband errors are pooled over non-
overlapping spatiotemporal tubes to account for short-
term memory effects. Further spatial and long-term tem-
poral poolings are performed to compute the overall qual-
ity score. In the case of still images, only spatial pooling is
performed.
The remaining FR metrics considered in this study are
all designed for LDR content and can be divided into
different categories: difference measures and statistical-
oriented metrics, structural similarity measures, visual
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 6 of 18
Fig. 2 MOS values distribution
information measures, information weighted metrics,
HVS inspired metrics, and objective color difference mea-
sures (studied in the vision science). The complete list of
considered FR metrics is provided in the following:
Difference measures and statistical-oriented metrics
These metrics are based on pixel color differences
and provide a measure of the difference between the
reference image and the distorted image. The
following metrics of this category were considered:
(3) MSE: mean squared error, (4) PSNR: peak
signal-to-noise ratio, and (5) SNR: signal-to-noise
ratio.
Structural similarity measures
These metrics model the quality based on pixel
statistics to model the luminance (using the mean),
the contrast (variance), and the structure (cross-
correlation) [36]. The metrics considered in this
category are the following: (6) UQI: universal quality
index [37], (7) SSIM: structural similarity index [38],
(8) MS-SSIM: multiscale SSIM index [39], (9)
M-SVD: measure - singular value decomposition [40],
and (10) QILV: quality index on local variance [41].
The MS-SSIM index is a multiscale extension of
SSIM, which has a higher correlation with perceived
quality when compared to SSIM. It is a perceptual
metric based on the content features extraction and
abstraction. This quality metric considers that the
HVS uses the structural information from a scene
[38]. The structure of objects in the scene can be
represented by their attributes, which are
independent of both contrast and average luminance.
Hence, the changes in the structural information
from the reference and distorted images can be
perceived as a measure of distortion. The MS-SSIM
algorithm calculates multiple SSIM values at multiple
image scales. By running the algorithm at different
scales, the quality of the image is evaluated for
different viewing distances. MS-SSIM also puts less
emphasis on the luminance component when
compared to contrast and structure components [39].
Visual information measures
These metrics aim at measuring the image
information by modeling the psycho-visual features
of the HVS or by measuring the information fidelity.
Then, the models are applied to the reference and
distorted images, resulting in a measure of the
difference between them. The following metrics on
this category were considered: (11) IFC: image fidelity
criterion [42], (12) VIF: visual information fidelity
[43], (13) VIFp: VIF pixel-based version [43], and (14)
FSIM: feature similarity index [44].
The VIF criterion analyses the natural scene
statistics, using an image degradation model and the
HVS model. This FR metric is based on the
quantification of the Shannon information present in
both the reference and the distorted images. VIFP is
derived from the VIF criterion.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 7 of 18
FSIM is a perceptual metric that results from SSIM.
FSIM adds the comparison of low-level feature sets
between the reference and the distorted images [44].
Hence, FSIM analyzes the high phase congruency
extracting highly informative features and the
gradient magnitude, to encode the contrast
information. This analysis is complementary and
reflects different aspects of the HVS in assessing the
local quality of an image.
Information weighted metrics
The metrics in this category are based on the
modeling of relative local importance of the image
information. As not all regions of the image have the
same importance in the perception of distortion, the
image differences computed by any metrics have
allocated local weights resulting in a more perceptual
measure of quality. The following metrics were
computed: (15) IW-MSE: information content
weighting MSE [45], (16) IW-PSNR: information
content weighting PSNR [45], and (17) IW-SSIM:
information content weighting SSIM [45].
HVS-inspired metrics
These metrics try to model empirically the human
perception of images from natural scenes. The
following metrics were considered: (18) JND_st: just
noticeable distortion [46], (19) WSNR: weighted SNR
[47, 48], and (20) DN: divisive normalization [36].
Objective color difference measures
The color difference metrics were developed because
the CIE1976 color difference [49] magnitude in
different regions of the color space did not appear
correlated with perceived colors. These metrics were
designed to compensate the non-linearities of the
HVS present on the CIE1976 model. The following
CIE metrics were computed: (21) CIE1976 [49], (22)
CIE94 [50], (23) CMC [51], and (24) CIEDE2000 [52].
The CIEDE2000 metric is a color difference measure
that includes not only weighting factors for lightness,
chroma, and hue but also factors to handle the
relationship between chroma and hue. The
CIEDE2000 computation is not reliable in all color
spaces. However, in this case, it can be used because
the tested images are represented in the CIELAB
color space that allows a precise computation.
2.2.2 No-reference metrics
These metrics are based on the analysis of a set of well-
known sharpness measures. The following NR metrics
were considered: (25) JND: just noticeable distortion [46],
(26) VAR: variance [53], (27) LAP: laplacian [54], (28)
GRAD: gradient [54], (29) FTM: frequency threshold met-
ric [55], (30) HPM: HP metric [56], (31) Marziliano:
Marziliano blurring metric [55], (32) KurtZhang: kurtosis-
based metric [57], (33) KurtWav: kurtosis of wavelet
coefficients [58], (34) AutoCorr: auto correlation [54], and
(35) RTBM: Riemannian tensor-based metric [59].
2.2.3 Metrics computation and transform domains
LDR metrics are designed for gamma encoded images,
typically having luminance values in the range 0.1–100
cd/m2, while HDR images have linear values and are
meant to capture a much wider range of luminance.
Therefore, in this study, metrics were computed not only
in the linear space but also in transformed spaces that pro-
vide a more perceptual uniformity. This space transfor-
mation was not applied to HDR-VDP-2 and HDR-VQM,
which are calibrated metrics and require absolute lumi-
nance values as input. The color difference metrics, i.e.,
CIE1976, CIE94, CMC, and CIEDE2000, were also not
computed in transformed spaces. These color difference
measures require a conversion from the RGB represen-
tation to the CIELAB color space, considering a D65
100 cd/m2reflective white point as reference white point.
Before any metric was computed, images were clipped
to the range [0.001,4000] cd/m2(theoretical range of lumi-
nance values that the HDR monitor used in the subjective
tests can render) to mimic the physical clipping performed
by the HDR display. To compute the metrics in the linear
domain, these luminance values were normalized to the
interval [ 0, 1]. This normalization was not applied to HDR
metrics and to color difference metrics.
The remaining metrics were computed in three trans-
form domains: the log domain, the PU domain [20],
and the PQ domain [31]. The PU transform is derived
using the threshold-integration method [60]. The trans-
form is constrained such that luminance values in the
range 0.1–80 cd/m2,asproducedbyatypicalCRTdis-
play, are mapped to the range 0–255 to mimic the
sRGB non-linearity. The PQ transform is derived from
the Barten contrast sensitivity function [61]. The PQ
curve has a square-root and log behavior at the darkest
and highest light levels, respectively, while it exhibits a
slope similar to the gamma non-linearities between those
extreme luminance regions. Figure 3 depicts the normal-
ized response of the log, PU, and PQ responses in the
range [0,4000] cd/m2.
These transformations were applied before any nor-
malization and only after their application the result-
ing color components were normalized to the interval
[ 0, 1]. After the normalizations, the values considered
to be in the RGB color space were transformed to the
YCbCrcolor space [62]. The exception is the DN met-
ric, which uses directly these RGB components. The
metrics were computed on each of these components
separately and two final metrics were considered: the
quality score computed on the luminance channel alone
and the average quality score of the Y,Cb,andCr
channels.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 8 of 18
ab
Fig. 3 Comparison of responses. Comparison of responses for the transformation functions in two different luminance ranges (a: [ 0, 120] cd/m2,
b: [ 120, 4000] cd/m2)
2.3 Benchmarking of quality metrics
To evaluate how well an objective metric is able to esti-
mate perceived quality, the MOS obtained from subjective
experiments are taken as ground truth and compared to
predicted MOS values obtained from objective metrics.
To compute the predicted MOS ˜
M, a regression analysis
on each objective metric results Owas performed using a
logistic function as a regression model:
˜
M=a+b
1+exp (c·(Od))(2)
where a,b,c,anddare the parameters that define the
shape of the logistic fitting function and were determined
using a least squares method.
2.3.1 Performance indexes
Performance indexes to assess the accuracy of objective
metrics were computed following the same procedure
as in [63]. In particular, the Pearson linear correlation
coefficient (PLCC) and the unbiased estimator of the root-
mean-square error (RMSE) were used. The Spearman
rank order correlation (SROCC) coefficient and the out-
lier ratio (OR) were also used to estimate respectively the
monotonicity and the consistency of the objective met-
ric as compared with the ground truth subjective data.
The OR is the ratio of points for which the error between
the predicted and actual MOS values exceeds the 95%
confidence interval of MOS values.
2.3.2 Statistical analysis
To determine whether the difference between two perfor-
mance index values corresponding to two different met-
rics is statistically significant, two-sample statistical tests
were performed on all four performance indexes. In par-
ticular, for the PLCC and SROCC, a Z-test was performed
using Fisher z-transformation. For the RMSE, an F-test
was performed, whereas a Z-test for the equality of two
proportions was performed for the OR. No processing was
applied to correct for the multiple comparisons. The sta-
tistical tests were performed according to the guidelines
of recommendation ITU-T P.1401 [64].
3 Results
Figures 4, 5, 6, and 7 report the accuracy, monotonic-
ity, and consistency indexes, as defined in Section 2.3,
for the metrics computed in the different domains. The
metrics are sorted from best (top) to least (bottom)
performing, based on the different performance indexes
(higher PLCC/SROCC and lower RMSE/OR values indi-
cate better performance). As HDR-VDP-2 and HDR-
VQM require absolute luminance values as input, these
metrics were computed neither on the chrominance chan-
nels nor in the transform domains. Similarly, the different
color difference metrics were computed only in the linear
domain, after converting the absolute RGB values to the
CIELAB color space. The DN metric was computed on the
RGB components, considering all three channels together.
Finally, the remaining 28 metrics were computed both on
the luminance channel alone (_Y suffix) and as the aver-
age quality score of the luminance, blue-difference, and
red-difference channels (_M suffix). The statistical anal-
ysis results are reported in the same tables. This analysis
was performed on the performance indexes computed
from 240 data points to discriminate small differences
between two metrics. Metrics whose performance indexes
are connected by a line are considered statistically not
significantly different. For example, in the linear domain,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 9 of 18
Fig. 4 Accuracy, consistency, and monotonicity indexes for each objective metric computed in the linear space
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 10 of 18
Fig. 5 Accuracy, consistency, and monotonicity indexes for each objective metric computed in the logarithm space
according to PLCC, there is no statistical evidence to show
performance differences between IFC and FSIM com-
puted on the luminance channel, but they are statistically
different from HDR-VDP-2 (see Fig. 4).
3.1 Best performing metrics
As expected, HDR-VDP-2 and HDR-VQM, which are
the only true HDR quality metrics considered in this
study, computed on absolute luminance values, are the
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 11 of 18
Fig. 6 Accuracy, consistency, and monotonicity indexes for each objective metric computed in the PU space
best performing metrics when compared to all other
metrics and domains. Both metrics have a correlation
above 0.95 and a particularly low RMSE (around 0.35)
and low OR, whereas all other metrics have an OR
above 0.48. HDR-VDP-2 (OR =0.35) has a slightly
lower OR than HDR-VQM (OR =0.4083), but there
is no statistical evidence to show a significant differ-
ence. However, HDR-VQM is over three times faster than
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 12 of 18
Fig. 7 Accuracy, consistency, and monotonicity indexes for each objective metric computed in the PQ space
HDR-VDP-2 [15], which makes it a suitable alternative to
HDR-VDP-2.
The results for HDR-VDP-2 are in line with the find-
ing of [26], slightly better than that of Valenzise et al. [24],
but in contradiction with Mantel et al. [25], who reported
a much lower correlation. However, Mantel et al. used
unusual combinations of parameters for the base and
extension layers, especially for content BloomingGorse.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 13 of 18
Narwaria et al. [15] found that HDR-VQM was per-
forming significantly better than HDR-VDP-2 for both
video and still image content. However, our results show
that both metrics have similar performance, while it was
reported in [9] that HDR-VQM performs lower than
HDR-VDP-2 for HDR video compression. The divergence
between these findings might be due to the contents and
types of artifacts considered in the different studies.
In contrast to the HDR metrics, the NR metrics show
the worst performance with PLCC and SROCC values
below 0.5 and RMSE and OR values above 1 and 0.8,
respectively, independently of the domain in which the
metric was computed. These results show that NR metrics
are not sufficient to reach satisfactory prediction accu-
racy considering a perceptual domain and that specific
NR metrics should be designed for HDR image quality
assessment.
3.2 Difference measures and statistical-oriented metrics
Results show that MSE-based metrics, i.e., MSE, SNR,
and PSNR, are not very reliable predictors of perceived
quality when computed in the linear domain, with corre-
lation between 0.65 and 0.75. Higher PLCC values were
reported in [26] for MSE and SNR (PLCC = 0.88), but
the study was performed considering only five contents.
These metrics are know to be very content dependent
[65], which might explain the drop in performance when
considering 20 images. The correlation of MSE-based
metrics computed on the luminance channel alone can
be improved by about 0.1 by considering a more percep-
tual domain than the linear domain, which does not take
into account the contrast sensitivity response of the HVS.
In the log and PU domains, the correlation is about 0.83
and 0.84, respectively, which is in line with the results
from [24]. Nevertheless, the performance of the MSE-
based metrics computed as the average quality score of
the Y,Cb,andCrchannels did not improve when con-
sidering perceptual domains. These observations indicate
that the log, PU, and PQ domains can better represent the
luminance sensitivity of the HVS than the linear domain,
but they might not be optimal for the chrominance
sensitivity.
3.3 Objective color difference measures
In the linear domain, the color difference metrics, with the
exception of the original CIE1976 color difference met-
ric, are the best performing pixel-based metrics. They
outperform the MSE-based metrics, but there is no sta-
tistical evidence to show a significant improvement over
SNR computed on the luminance alone. Nevertheless,
their correlation with perceived visual quality is only
about 80 %, with an OR above 69 %, which cannot be
considered as reliable prediction. Since the release of
the CIE1976 color difference metric, two extensions have
been developed in 1994 and 2000 to better address per-
ceptual non-uniformities of the HVS. But, according to
the benchmarking results, further improvements might
be necessary for HDR images to handle non-uniformities
in low and high luminance ranges, outside of the typical
range of LDR displays. The color difference metrics are
computed in the CIELAB color space, which considers rel-
ative luminance values with respect to a reference white
point, typically a reflective D65 white point about 100–
120 cd/m2. This reference white point is similar to the
targeted peak luminance that is typically considered when
calibrating LDR reference monitors. Therefore, for HDR
images, one would be tempted to set the luminance of
the reference white point considered in the color conver-
sion equal to the peak luminance of the HDR monitor.
However, this leads to lower performance of the color dif-
ference metrics and the reflective white point should also
be used for HDR content instead.
3.4 Structural similarity and visual information measures
The performance of SSIM and its multiscale exten-
sion, MS-SSIM, is improved by considering logarithm
instead of linear values and is even further improved
by considering the PU or PQ transform. In particu-
lar, on the luminance channel, the correlation of SSIM
is increased by about 0.15 from linear to logarithm,
while MS-SSIM improved by only about 0.03. From log
to PU/PQ, improvements are relatively low for SSIM,
whereas MS-SSIM exhibits a gain of about 0.04. Results
show that MS-SSIM (luminance only) performs the best
in PU and PQ spaces according to the PLCC, SROCC,
Fig. 8 Statistical analysis comparing the HDR metrics and best performing metric of each domain
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 14 of 18
ab
cd
ef
Fig. 9 Subjective versus objective results. Subjective versus objective evaluation results for the HDR metrics (a-b) and best performing metric of
each domain (c-f). Each symbol, i.e., combination of marker and color, corresponds to a specific content
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 15 of 18
and RMSE indexes. The correlation obtained for SSIM
in the log and PU domains is similar to the results of
Valenzise et al. [24]. On the other hand, UQI, which cor-
responds to the special case of SSIM when the constants
C1and C2are set to 0, does not perform better in the
log, PU, or PQ space than in the linear domain. Similar
correlation results for SSIM and MS-SSIM are reported
in [26] as in this paper (for the linear domain). However,
it is reported that the relative change between the worst
and best qualities for SSIM and MS-SSIM was less than
0.003 and 0.0003 %, respectively. In this study, the aver-
age relative change computed over all domains is 16.5 and
11.5 % for SSIM and MS-SSIM, respe ctively. One major
difference between the two works is the use of absolute
luminance values in [26], whereas luminance values were
linearly mapped from the theoretical display range to the
range [ 0, 1] in this paper. For LDR content, SSIM uses
different values for C1and C2depending on whether the
images are in the range [ 0, 1] or [ 0, 255]. For HDR con-
tent, our findings suggest that the value of these constants
should be adjusted according the luminance range and
depending on whether scaling of the values is performed
or not.
Metrics that quantify the loss of image information, i.e.,
VIF, its pixel-based version, VIFP, and its predecessor, IFC,
also show good performance. In particular, IFC (lumi-
nance only) is the second best performing metric in the
linear domain. While the performance of IFC is not influ-
enced by the domain in which the metric is computed,
the performance of VIF(P) is significantly improved when
considering a more perceptual domain than the linear
space. In the log domain, results show that VIF computed
on the luminance alone is the best performing metric.
Note that the correlation reported for VIF(P) in this paper
is significantly better than the one reported in [26]. Sim-
ilarly to (MS-)SSIM, the difference might be due to the
scaling procedure. Among the other HVS-based metrics,
FSIM also shows good performance, especially in the PU
and PQ space (RMSE below 0.5). In the linear domain,
results are similar to our previous work.
3.5 Statistical analyses
To determine how the best metrics of each domain com-
pare to each other, a direct benchmarking of the two HDR
metrics, which are the best performing metrics in the lin-
ear space, and the best performing metric of the log, PU,
and PQ spaces was performed. The PSNR metric com-
puted on the luminance channel in the log space was
added to this comparison, as this metric is widely used in
HDR compression studies. Figure 8 reports the results of
the statistical analysis of the six metrics. To identify met-
rics computed in the log, PU, and PQ spaces, the LOG_,
PU2,andPQ2 prefixes are used, respectively. According to
PLCC and SROCC, there is no statistical evidence to show
performance differences between HDR-VDP-2, HDR-
VQM, and MS-SSIM computed on the luminance channel
in the PU space. However, HDR-VDP-2 and HDR-VQM
have a significantly lower RMSE than all other metrics.
Figure9depictsthescatterplotsofsubjectiveversus
objective results for these metrics. As it can be observed,
the data points are well concentrated near the fitting curve
for HDR-VDP-2, as well as for HDR-VQM, while they
are more scattered for the other metrics, especially in
the case of LOG_PSNR_Y, which shows higher content
dependency. These findings indicate that HDR-VDP-2
and HDR-VQM have a very high consistency when com-
pared to the other metrics. Nevertheless, HDR-VDP-2
is complex and requires heavy computational resources,
which limits its use in many applications. HDR-VQM and
MS-SSIM computed in the PU space are lower complexity
alternatives to HDR-VDP-2.
The statistical analysis was also used to understand
whether there is a statistically significant difference
between the performance of each metric when computed
on the luminance component alone and when computed
on all components. Only results from the analysis per-
formed on the 28 metrics that were computed both on
the Ychannel alone and as the average quality score
of the Y,Cb,andCrchannels were considered. Table 2
reports the number of metrics for which one approach
was significantly better than the other one, as well as when
no significant difference between the two approaches
was observed. The analysis was performed individually
for each performance index and domain. In the linear
domain, there is no statistical evidence to show perfor-
mance differences between the two approaches for about
80 % of the metrics. However, in the log, PU, and PQ space,
roughly half of the metrics perform significantly better
Table 2 Comparison of the 28 metrics computed on the Yand YCbCrchannels. Comparison of the metrics computed as the average
quality score of the Ychannel alone and as the average quality score of the YCbCrchannels
lin log PU PQ
PLCC SROCC RMSE OR PLCC SROCC RMSE OR PLCC SROCC RMSE OR PLCC SROCC RMSE OR
Yis better 6 7 8 3 16 14 16 8 16 14 15 14 14 14 15 14
Similar 2221 20 251114 12201214 13141414 13 14
YCbCrisbetter0000100000000000
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 16 of 18
Table 3 Comparison of the 57 metrics computed on all domains. Results represent the number of times a metric computed in the
domain iperforms significantly better than when computed in the domain j, where iand jare the row and column of the table
PLCC SROCC RMSE OR
lin log PU PQ lin log PU PQ lin log PU PQ lin log PU PQ
lin095609 6808 5502 11
log100 22100 4480 0080 33
PU 14 17 0 11 13 15 0 11 11 16 0 10 9 7 0 7
PQ 13 17 8 0 11 15 8 0 11 16 6 0 9 7 3 0
when computed on the luminance channel alone. Accord-
ing to PLCC, the JND metric, FR version, computed in
the log domain, is the only case for which better perfor-
mance is achieved when considering all channels. As HDR
is often considered in combination with wide color gamut
(WCG), it is expected that the fidelity of color reproduc-
tion will play a more important role in the context of HDR
when compared to LDR. We believe that improvements
can be achieved by considering different domains for com-
puting the metrics on the chrominance channels and by
using better pooling strategies.
Similarly, the statistical analysis was also used to under-
stand whether there is a statistically significant difference
between the performance of a particular metric computed
in one domain and another domain. Only results from the
analysis performed on the 57 metrics that were computed
in all domains were considered. Table 3 reports the num-
ber of times a metric computed in the domain iperforms
significantly better than when computed in the domain j,
where iand jare the row and column of the table. Results
show that most metrics perform the best in the PU and
PQ spaces when compared to the lin and log spaces, which
is in line with our previous observations. Note that results
based on PLCC, SROCC, and RMSE are in agreement,
while the OR metric shows fewer cases where statistically
significant differences are observed. Additionally, there
are also metrics for which computations performed in the
linear and logarithm domains perform better than in the
PU and PQ space. Overall, there is no optimal domain that
performs the best for all metrics. Instead, different metrics
should use different domains to maximize the correlation
with perceived quality.
4Conclusions
In this paper, 35 objective metrics were benchmarked on a
database of 240 compressed HDR images using subjective
quality scores as ground truth. Additionally to the linear
space, metrics were computed in the logarithm, PU, and
PQ domains to mimic non-linearities of the HVS. Results
showed that the performance of most full-reference met-
rics can be improved by considering perceptual trans-
forms when compared to linear values. On the other hand,
our findings suggested that a lot of work remains to be
done for no-reference quality assessment of HDR con-
tent. Our benchmark demonstrated that HDR-VDP-2 and
HDR-VQM are ultimately the most reliable predictors
of perceived quality. Nevertheless, HDR-VDP-2 is com-
plex and requires heavy computational resources, which
limits its use in many applications. HDR-VQM is over
three times faster, which makes it a suitable alternative
to HDR-VDP-2. Alternatively, MS-SSIM computed in the
PU space is another lower complexity substitute, as there
is no statistical evidence to show performance differences
between these metrics in terms of PLCC and SROCC.
Even though the numbers of contents and compressed
images considered in the experiments are quite large, dif-
ferent performance might be observed for other contents
and types of artifacts.
Competing interests
The authors declare that they have no competing interests.
Acknowledgements
This work has been conducted in the framework of the Swiss National
Foundation for Scientific Research (FN 200021-143696-1), Swiss SERI project
Compression and Evaluation of High Dynamic Range Image and Video
(C12.0081), Portuguese Instituto de Telecomunicações - “FCT – Fundação para
a Ciência e a Tecnologia" (project UID/EEA/50008/2013), and COST IC1005 The
digital capture, storage, transmission, and display of real-world lighting HDRi.
Author details
1Multimedia Signal Processing Group, EPFL, Lausanne, Switzerland. 2Optics
Center, UBI, Covilhã, Portugal. 3Instituto de Telecomunicações, UBI, Covilhã,
Portugal.
Received: 4 May 2015 Accepted: 3 November 2015
References
1. P Hanhart, P Korshunov, T Ebrahimi, in SPIE Applications of Digital Image
Processing XXXVII. Subjective evaluation of higher dynamic range video,
(2014)
2. P Korshunov, H Nemoto, A Skodras, T Ebrahimi, in Proc. SPIE 9138, Optics,
Photonics, and Digital Technologies for Multimedia Applications III.
Crowdsourcing-based Evaluation of Privacy in HDR Images, (2014)
3. E Reinhard, G Ward, S Pattanaik, P Debevec, High Dynamic Range Imaging:
Acquisition, Display, and Image-Based Lighting (The Morgan Kaufmann
Series in Computer Graphics). (Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 2005)
4. H Seetzen, W Heidrich, W Stuerzlinger, G Ward, L Whitehead, M
Trentacoste, A Ghosh, A Vorozcovs, High Dynamic Range Display
Systems. ACM Trans. Graph. 23(3), 760–768 (2004)
5. E Reinhard, M Stark, P Shirley, J Ferwerda, Photographic tone
reproduction for digital images. ACM Trans. Graph. 21(3), 267–276 (2002)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 17 of 18
6. T Richter, in Picture Coding Symposium. On the standardization of the JPEG
XT image compression, (2013)
7. G Ward, M Simmons, in ACM SIGGRAPH 2006 Courses. JPEG-HDR: a
backwards-compatible, high dynamic range extension to JPEG, (2006)
8. T Richter, in SPIE Applications Of Digital Image Processing XXXVII.Onthe
integer coding profile of JPEG XT, vol. 9217, (2014)
9. P Hanhart, M Rerabek, T Ebrahimi, in Proc. SPIE, Applications of Digital
Image Processing XXXVIII. Towards high dynamic range extensions of
HEVC: subjective evaluation of potential coding technologies, (2015)
10. R Mantiuk, S Daly, K Myszkowski, H-P Seidel, in SPIE Human Vision and
Electronic Imaging X, vol. 5666. Predicting visible differences in high
dynamic range images: model and its calibration, (2005)
11. SJ Daly, in SPIE Human Vision, Visual Processing, and Digital Display III, vol.
1666. Visible differences predictor: an algorithm for the assessment of
image fidelity, (1992)
12. R Mantiuk, KJ Kim, AG Rempel, W Heidrich, HDR-VDP-2: A calibrated visual
metric for visibility and quality predictions in all luminance conditions.
ACM Trans. Graph. 30(4), 40:1–40:14 (2011)
13. M Narwaria, RK Mantiuk, M Perreira Da Silva, P Le Callet, HDR-VDP-2.2: a
calibrated method for objective quality prediction of high-dynamic range
and standard images. J. Electron. Imaging. 24(1), 010501 (2015)
14. TO Aydin, R Mantiuk, K Myszkowski, H-P Seidel, Dynamic Range
Independent Image Quality Assessment. ACM Trans. Graph. 27(3),
69:1–69:10 (2008)
15. M Narwaria, M Perreira Da Silva, P Le Callet, HDR-VQM: An objective
quality measure for high dynamic range video. Signal Process. Image
Commun. 35, 46–60 (2015)
16. C Poynton, Digital Video and HD: Algorithms and Interfaces.
(Elsevier/Morgan Kaufmann, Burlington, Vermont, USA, 2012)
17. SK Shevell, The Science of Color. (Elsevier, Boston, Massachusetts, USA,
2003)
18. A Rose, The sensitivity performance of the human eye on an absolute
scale. J. Opt. Soc. Am. A. 38(2), 196–208 (1948)
19. H De Vries, The quantum character of light and its bearing upon
threshold of vision, the differential sensitivity and visual acuity of the eye.
Physica. 10(7), 553–564 (1943)
20. TO Aydın, R Mantiuk, K Myszkowski, H-P Seidel, in SPIE Human Vision and
Electronic Imaging XIII, vol. 6806. Extending quality metrics to full
luminance range images, (2008)
21. J Munkberg, P Clarberg, J Hasselgren, T Akenine-Möller, High Dynamic
Range Texture Compression for Graphics Hardware. ACM Trans. Graph.
25(3), 698–706 (2006)
22. HR Sheikh, MF Sabir, AC Bovik, A Statistical Evaluation of Recent Full
Reference Image Quality Assessment Algorithms. IEEE Trans. Image
Process. 15(11), 3440–3451 (2006)
23. K Seshadrinathan, R Soundararajan, AC Bovik, LK Cormack, Study of
Subjective and Objective Quality Assessment of Video. IEEE Trans. Image
Process. 19(6), 1427–1441 (2010)
24. G Valenzise, F De Simone, P Lauga, F Dufaux, in SPIE Applications of Digital
Image Processing XXXVII, vol. 9217. Performance evaluation of objective
quality metrics for HDR image compression, (2014)
25. C Mantel, SC Ferchiu, S Forchhammer, in 16th International Workshop on
Multimedia Signal Processing. Comparing subjective and objective quality
assessment of HDR images compressed with JPEG-XT, (2014)
26. P Hanhart, MV Bernardo, P Korshunov, M Pereira, AMG Pinheiro, T
Ebrahimi, in 6th International Workshop on Quality of Multimedia
Experience. HDR image compression: A new challenge for objective
quality metrics, (2014)
27. M Azimi, A Banitalebi-Dehkordi, Y Dong, MT Pourazad, P Nasiopoulos, in
International Conference on Multimedia Signal Processing. Evaluating the
Performance of Existing Full-Reference Quality Metrics on High Dynamic
Range (HDR) Video Content, (2014)
28. M Rerabek, P Hanhart, P Korshunov, T Ebrahimi, in 9th International
Workshop on Video Processing and Quality Metrics for Consumer Electronics.
Subjective and objective evaluation of HDR video compression, (2015)
29. P Korshunov, P Hanhart, T Ricther, A Artusi, R Mantiuk, T Ebrahimi, in 7th
International Workshop on Quality of Multimedia Experience.Subjective
quality assessment database of HDR images compressed with JPEG XT,
(2015)
30. Subjective Quality Assessment Database of HDR Images Compressed
with JPEG XT. http://mmspg.epfl.ch/jpegxt-hdr. Accessed 23 July 2015
31. S Miller, M Nezamabadi, S Daly, Perceptual signal coding for more efficient
usage of bit codes. SMPTE Motion Imaging J. 122(4), 52–59 (2013)
32. R Mantiuk, S Daly, L Kerofsky, Display adaptive tone mapping. ACM Trans.
Graph. 27(3), 68 (2008)
33. AO Akyüz, E Reinhard, Color appearance in high-dynamic-range imaging.
SPIE J. Electron. Imaging. 15(3) (2006)
34. ITU-R BT.2022, General viewing conditions for subjective assessment of
quality of SDTV and HDTV television pictures on flat panel displays.
International Telecommunication Union (2012)
35. ITU-R BT.500-13, Methodology for the subjective assessment of the quality
of television pictures. International Telecommunication Union (2012)
36. V Laparra, JM noz-Marí, J Malo, Divisive normalization image quality
metric revisited. J. Opt. Soc. Am. A. 27(4), 852–864 (2010)
37. Z Wang, AC Bovik, A universal image quality index. IEEE Signal Process.
Lett. 9(3), 81–84 (2002)
38. Z Wang, AC Bovik, HR Sheikh, EP Simoncelli, Image quality assessment:
from error visibility to structural similarity. IEEE Trans. Image Process.
13(4), 600–612 (2004)
39. Z Wang, EP Simoncelli, AC Bovik, in 37th Asilomar Conference on Signals,
Systems and Computers. Multiscale structural similarity for image quality
assessment, (2003)
40. A Shnayderman, A Gusev, AM Eskicioglu, An SVD - based grayscale image
quality measure for local and global assessment. IEEE Trans. Image
Process. 15(2), 422–429 (2006)
41. S Aja-Fernandéz, RSJ Estepar, C Alberola-Lopez, C-F Westin, in 28th Annual
International Conference of the IEEE Engineering in Medicine and Biology
Society. Image Quality Assessment based on Local Variance, (2006)
42. HR Sheikh, AC Bovik, G de Veciana, An information fidelity criterion for
image quality assessment using natural scene statistics. IEEE Trans. Image
Process. 14(12), 2117–2128 (2005)
43. HR Sheikh, AC Bovik, Image information and visual quality. IEEE Trans.
Image Process. 15(2), 430–444 (2006)
44. L Zhang, D Zhang, X Mou, D Zhang, FSIM: A feature similarity index for
image quality assessment. IEEE Trans. Image Process. 20(8), 2378–2386
(2011)
45. Z Wang, Q Li, Information Content Weighting for Perceptual Image
Quality Assessment. IEEE Trans. Image Process. 20(5), 1185–1198 (2011)
46. XK Yang, WS Ling, ZK Lu, EP Ong, SS Yao, Just noticeable distortion model
and its applications in video coding. Signal Process. Image Commun.
20(7), 662–680 (2005)
47. J Mannos, DJ Sakrison, The effects of a visual fidelity criterion of the
encoding of images. IEEE Trans. Inf. Theory. 20(4), 525–536 (1974)
48. TMitsa,KLVarkur,inIEEEInternational Conference on Acoustics, Speech, and
Signal Processing. Evaluation of contrast sensitivity functions for the
formulation of quality measures incorporated in halftoning algorithms,
(1993)
49. CIE, Colorimetry Official Recommendation of the International
Commission on Illumination. CIE publication 15.2, CIE Central Bureau
(1986)
50. CIE, Industrial Colour-Difference Evaluation. CIE publication 116, CIE
Central Bureau (1995)
51. FJJ Clarke, R McDonald, B Rigg, Modification to the JPC79
Colour-difference Formula. J. Soc. Dye. Colour. 100(4), 128–132 (1984)
52. M Luo, G Cui, B Rigg, The development of the CIE 2000 colour-difference
formula: CIEDE2000. Color Res. Appl. 26(5), 340–350 (2001)
53. SJ Erasmus, KCA Smith, An automatic focusing and astigmatism correction
system for the SEM and CTEM. J. Microsc. 127(2), 185–199 (1982)
54. CF Batten, Autofocusing and astigmatism correction in the scanning electron
microscope. Master’s thesis. (University of Cambridge, UK, 2000)
55. AV Murthy, LJ Karam, in 2nd International Workshop on Quality of
Multimedia Experience. A MATLAB-based framework for image and video
quality evaluation, (2010)
56. D Shaked, I Tastl, in IEEE International Conference on Image Processing.
Sharpness measure: towards automatic image enhancement, (2005)
57. N Zhang, A Vladar, M Postek, B Larrabee, in Proceedings Section of Physical
and Engineering Sciences of American Statistical Society. A kurtosis-based
statistitcal measure for two-dimensional processes and its application to
image sharpness, (2003)
58. R Ferzli, LJ Karam, J Caviedes, in 1st International Workshop on Video
Processing and Quality Metrics for Consumer Electronics. A robust image
sharpness metric based on kurtosis measurement of wavelet coefficients,
(2005)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Hanhart et al. EURASIP Journal on Image and Video Processing (2015) 2015:39 Page 18 of 18
59. R Ferzli, LJ Karam, in 3rd International Workshop on Video Processing and
Quality Metrics for Consumer Electronics. A no-reference objective
sharpness metric using riemannian tensor, (2007)
60. H Wilson, A transducer function for threshold and suprathreshold human
vision. Biol. Cybern. 38(3), 171–178 (1980)
61. PG Barten, Contrast Sensitivity of the Human Eye and Its Effects on Image
Quality. (SPIE Optical Engineering Press, Bellingham, Washington, USA,
1999)
62. ITU-R BT.709, Parameter values for the HDTV standards for production
and international programme exchange. International
Telecommunication Union (2002)
63. P Hanhart, P Korshunov, T Ebrahimi, in 18th International Conference on
Digital Signal Processing. Benchmarking of quality metrics on ultra-high
definition video sequences, (2013)
64. ITU-T P.1401, Methods, metrics and procedures for statistical evaluation,
qualification and comparison of objective quality prediction models.
International Telecommunication Union (2012)
65. Q Huynh-Thu, M Ghanbari, Scope of validity of PSNR in image/video
quality assessment. Electron. Lett. 44(13), 800–801 (2008)
Submit your manuscript to a
journal and benefi t from:
7 Convenient online submission
7 Rigorous peer review
7 Immediate publication on acceptance
7 Open access: articles freely available online
7 High visibility within the fi eld
7 Retaining the copyright to your article
Submit your next manuscript at 7 springeropen.com
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... In this experiment, we focused on luminance reproducibility: e.g., a signal value corresponding to 800 cd/m 2 should be displayed at 800 cd/m 2 . The monitor displayed several PQ images (11)(12)(13)(14)(15)(16)(17)(18)(19)(20) after clipping at the peak luminance, 1,000 cd/m 2 (in our experiments this clipping procedure does not appear to have any effect on the perceived deterioration). ...
... Results showed that directly applying SDR metrics, visual information fidelity (VIF) [43], and multiscale SSIM (MS-SSIM) [44], to the luminance component of PQ/HLG images in 10-bit precision (calculated from (1) and (2)) showed better results than those of HDR dedicated metrics, HDR-VDP-2 [8] and HDR-VQM [9]. Note that HDR-VQM was selected because it showed an excellent result for HDR still images in a past study [13] though it was developed for videos, i.e., it considers deterioration in both spatial and temporal dimensions. Also, we considered two additional SDR metrics: feature similarity index (FSIM) [45], recommended in [27], and video multimethod assessment fusion (VMAF) [46], that is the state-of-the-art support vector regression (SVR)-based metric. ...
Article
Full-text available
In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use.
... vi) Mean opinion score of quality in HDR-VDP2 metric [31]: Instead of subjective evaluation of an image or video's visual aspects, the mean opinion score (mos) of HDR-VDP2 is used, which provides an alternative for reliable prediction of visibility and quality difference between the test and reference images. P. Hanhart et al. [32] analyzed 35 objective benchmarks on a dataset of 20 HDR images. They found HDR-VDP2 to be amongst the best and observed a strong correlation between HDR-VDP2 and subjective testing's mean opinion score. ...
Article
Full-text available
High dynamic range (HDR) images contain more details of dark and bright regions, which cannot be captured by the standard cameras. Raw HDR data is represented with floating-point precision, and the commonly used lossless/near-lossless HDR image encoding formats produce files that require large storage and are not suitable for transmission. Methods have been proposed for lossy encoding using single- and dual-layer structures, but the codecs are generally more complex and often require additional metadata for decoding. We propose a dual-layer codec in which companding, which is a closed-form transformation and hence can be reversed without any additional data, is used to generate the first layer. The residual data stored in the second layer is quantized linearly and do not need metadata either for decoding. The proposed codec is computationally light, has higher accuracy, and produces smaller size files compared to the existing codecs. This has been validated through an extensive evaluation using different metrics, and the results are reported in the paper.
... In this section, the quality of the reconstructed HDR video is measured by objective metrics to provide quality scores. Hanhart et al. demonstrated that HDR-VDP2 [44] and HDR-VQM [45] were reliable predictors of perceived quality [46]. Azimi et al. investigated the performance of existing assessment metrics and concluded that HDR-VDP2 and VIF [47] with PU encoding [48] (PU-VIF) provided results well correlated with the subjective evaluation [49]. ...
Article
Emerging HDR videos enable the recording of adequate luminance information and representing realistic scenes to the audience. Due to the high precision of the HDR data recorded in the floating-point format, a quantization process is required to convert HDR data to integer data for compatibility with current transmission and display systems. In this study, a novel attention-aware quantization method is presented that attempts to preserve the contrast details in the region of interest of the human visual system. This method was applied in the context of HDR video coding. The proposed coding solution was compared with the current anchor solution in terms of the quality of the reconstructed video. Experimental results show that the proposed solution is able to improve the visual quality of encoded video with respect to the anchor solution. Additionally, the proposed solution achieves a bit-rate gain over the anchor with reference to the objective evaluation results.
Article
Full-text available
Featured Application: The results of the research may be helpful in the setup of video quality assessment procedures in order to achieve results as close as possible to the quality experienced by the end users of the video streaming services. Abstract: The paper presents the results of subjective and objective quality assessments of H.264-, H.265-, and VP9-encoded video. Most of the literature is devoted to subjective quality assessment in well-defined laboratory circumstances. However, the end users usually watch the films in their home environments, which may be different from the conditions recommended for laboratory measurements. This may cause significant differences in the quality assessment scores. Thus, the aim of the research is to show the impact of environmental conditions on the video quality perceived by the user. The subjective assessment was made in two different environments: in the laboratory and in users' homes, where people often watch movies on their laptops. The video signal was assessed by young viewers who were not experts in the field of quality assessment. The tests were performed taking into account different image resolutions and different bit rates. The research showed strong correlations between the obtained results and the coding bit rates used, and revealed a significant difference between the quality scores obtained in the laboratory and at home. As a conclusion, it must be underlined that the laboratory tests are necessary for comparative purposes, while the assessment of the video quality experienced by end users should be performed under circumstances that are as close as possible to the user's home environment.
Article
Full-text available
Perceptual video quality considerably affects the quality of experience (QoE) of watching television (TV) broadcasts. Viewing conditions, such as the screen size and viewing distance, impact the perceived quality. We performed subjective evaluation experiments on 8K (7, $680\times 4$ ,320) ultra-high definition (UHD) compressed videos under seven viewing conditions (combinations of 31.5-, 55-, and 85-inch displays and 0.75, 1.5, and 3.0 H (times of screen height) of viewing distance). Distorted videos compressed by the versatile video coding (VVC)/H.266 were used in four types of encoding resolution, from 2K (1, $920\times 1$ ,080) to 8K, at a wide bitrate settings range of 3–80 Mbps. We derived a simple regression equation predicting the mean opinion score (MOS) using the hierarchical linear model (HLM), investigating the factors influencing subjective video quality. In this equation, MOS is expressed as a linear combination of terms including intercept and bitrate associated with sequence and encoding resolution, screen size, and viewing distance; it indicates that the smaller the screen, or the further the viewing distance, the fewer artifacts are perceived, as following empirical rules. Furthermore, we confirmed that the derived model is accurate as the Pearson linear and Spearman rank order correlation coefficients between predicted and actual MOS values were more than 0.97.
Article
Despite advances in display technology, many existing applications rely on psychophysical datasets of human perception gathered using older, sometimes outdated displays. As a result, there exists the underlying assumption that such measurements can be carried over to the new viewing conditions of more modern technology. We have conducted a series of psychophysical experiments to explore contrast sensitivity using a state-of-the-art HDR display, taking into account not only the spatial frequency and luminance of the stimuli but also their surrounding luminance levels. From our data, we have derived a novel surround-aware contrast sensitivity function (CSF), which predicts human contrast sensitivity more accurately. We additionally provide a practical version that retains the benefits of our full model, while enabling easy backward compatibility and consistently producing good results across many existing applications that make use of CSF models. We show examples of effective HDR video compression using a transfer function derived from our CSF, tone-mapping and improved accuracy in visual difference prediction.
Article
Full-text available
High dynamic range (HDR) imaging is a technique to allow a greater dynamic range of exposures, which is a very important field in image processing, computer graphics, and vision. Recent years have witnessed a striking advancement of HDR imaging using deep learning. This paper aims to provide a systematic review and analysis of the recent development of deep HDR imaging methodologies. Overall, we hierarchically and structurally group existing deep HDR imaging methods into five categories based on the number/domain of input exposures in HDR imaging, the number of learning tasks in HDR imaging, HDR imaging using the novel sensor data, HDR imaging using novel learning strategies, and the applications. Importantly, we provide constructive discussions for each category regarding its potential and challenges. Moreover, we cover some crucial issues for deep HDR imaging, such as datasets and evaluation metrics. Lastly, we highlight some open issues and point out future directions by sharing some new perspectives.
Preprint
Full-text available
High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in image processing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.
Conference Paper
Full-text available
This paper reports the details and results of the subjective evaluations conducted at EPFL to evaluate the responses to the Call for Evidence (CfE) for High Dynamic Range (HDR) and Wide Color Gamut (WCG) Video Coding issued by Moving Picture Experts Group (MPEG). The CfE on HDR/WCG Video Coding aims to explore whether the coding efficiency and/or the functionality of the current version of HEVC standard can be significantly improved for HDR and WCG content. In total, nine submissions, five for Category 1 and four for Category 3a, were compared to the HEVC Main 10 Profile based Anchor. More particularly, five HDR video contents, compressed at four bit rates by each proponent responding to the CfE, were used in the subjective evaluations. Further, the side-by-side presentation methodology was used for the subjective experiment to discriminate small differences between the Anchor and proponents. Subjective results shows that the proposals provide evidence that the coding efficiency can be improved in a statistically noticeable way over MPEG CfE Anchors in terms of perceived quality within the investigated content. The paper further benchmarks the selected objective metrics based on their correlations with the subjective ratings. It is shown that PSNR-DE1000, HDR-VDP-2, and PSNR-Lx can reliably detect visible differences between the proposed encoding solutions and current HEVC standard.
Article
Full-text available
With the emergence of high-dynamic range (HDR) imaging, the existing visual signal processing systems will need to deal with both HDR and standard dynamic range (SDR) signals. In such systems, computing the objective quality is an important aspect in various optimization processes (e.g., video encoding). To that end, we present a newly calibrated objective method that can tackle both HDR and SDR signals. As it is based on the previously proposed HDR-VDP-2 method, we refer to the newly calibrated metric as HDR-VDP-2.2. Our main contribution is toward improving the frequency-based pooling in HDR-VDP-2 to enhance its objective quality prediction accuracy. We achieve this by formulating and solving a constrained optimization problem and thereby finding the optimal pooling weights. We also carried out extensive cross-validation as well as verified the performance of the new method on independent databases. These indicate clear improvement in prediction accuracy as compared with the default pooling weights. The source codes for HDR-VDP-2.2 are publicly available online for free download and use. (C) 2015 SPIE and IS&T.
Conference Paper
Full-text available
Recent advances in high dynamic range (HDR) capturing and display technologies attracted a lot of interest to HDR imaging. Many issues that are considered as being resolved for conventional low dynamic range (LDR) images pose new challenges in HDR context. One such issue is a lack of standards for HDR image compression. Another is the limited availability of suitable image datasets that are suitable for studying and evaluation of HDR image compression. In this paper, we address this problem by creating a publicly available dataset of 20 HDR images and corresponding versions compressed at four different bit rates with three profiles of the upcoming JPEG XT standard for HDR image compression. The images cover different scenes, dynamic ranges, and acquisition methods (fusion from several exposures, frame of an HDR video, and CGI generated images). The dataset also includes Mean Opinion Scores (MOS) for each compressed version of the images obtained from extensive subjective experiments using SIM2 HDR monitor.
Article
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.
Book
Rapidly evolving computer and communications technologies have achieved data transmission rates and data storage capacities high enough for digital video. But video involves much more than just pushing bits! Achieving the best possible image quality, accurate color, and smooth motion requires understanding many aspects of image acquisition, coding, processing, and display that are outside the usual realm of computer graphics. At the same time, video system designers are facing new demands to interface with film and computer system that require techniques outside conventional video engineering. Charles Poynton's 1996 book A Technical Introduction to Digital Video became an industry favorite for its succinct, accurate, and accessible treatment of standard definition television (SDTV). In Digital Video and HDTV, Poynton augments that book with coverage of high definition television (HDTV) and compression systems.For more information on HDTV Retail markets, go to: http://www.insightmedia.info/newsletters.php#hdtv.
Conference Paper
This paper reports the details and results of a subjective and objective quality evaluation assessing responses to an MPEG call for evidence (CfE) on high dynamic range (HDR) and wide color gamut video coding. Five HDR video contents, compressed at four bit rates by each proponent responding to the CfE, were used in the subjective assessments. To be able to evaluate the performance of objective quality metrics, the double stimulus impairment scale (DSIS) method was used for subjective assessments instead of previously published paired comparison to an anchor. Subjective results show evidence that coding efficiency can be improved in a statistically noticeable way over the HEVC anchor in terms of perceived quality. However, when compared to paired comparison, less statistically significant differences are observed because of the lower discrimination power of the DSIS method. The collected subjective scores were used as a ground truth to benchmark and analyze the performance of objective metrics. Results show that HDR-VDP-2 and PQ2VIFP have the highest correlation with subjective scores and outperform other investigated metrics.
Article
JPEG XT (ISO/IEC 18477), the latest standardization initiative of the JPEG committee defines an image compression standard backwards compatible to the well-known JPEG standard (ISO/IEC 10918-1). JPEG XT extends JPEG by features like coding of images of higher bit-depth, coding of oating point image formats and lossless compression, all of which are backwards compatible to the legacy JPEG standard. In this work, the author presents profiles of JPEG XT that are especially suited for hardware implementations by requiring only integer logic. All functional blocks of a JPEG XT codec are here implemented by integer or fixed point logic. A performance analysis and comparison with other profiles of JPEG XT concludes the work.
Article
Rapidly evolving computer and communications technologies have achieved data transmission rates and data storage capacities high enough for digital video. But video involves much more than just pushing bits! Achieving the best possible image quality, accurate color, and smooth motion requires understanding many aspects of image acquisition, coding, processing, and display that are outside the usual realm of computer graphics. At the same time, video system designers are facing new demands to interface with film and computer system that require techniques outside conventional video engineering. Charles Poynton's 1996 book A Technical Introduction to Digital Video became an industry favorite for its succinct, accurate, and accessible treatment of standard definition television (SDTV). In Digital Video and HDTV, Poynton augments that book with coverage of high definition television (HDTV) and compression systems. For more information on HDTV Retail markets, go to: http://www.insightmedia.info/newsletters.php#hdtv With the help of hundreds of high quality technical illustrations, this book presents the following topics: * Basic concepts of digitization, sampling, quantization, gamma, and filtering * Principles of color science as applied to image capture and display * Scanning and coding of SDTV and HDTV * Video color coding: luma, chroma (4:2:2 component video, 4fSC composite video) * Analog NTSC and PAL * Studio systems and interfaces * Compression technology, including M-JPEG and MPEG-2 * Broadcast standards and consumer video equipment.
Article
High Dynamic Range (HDR) signals fundamentally differ from the traditional low dynamic range (LDR) ones in that pixels are related (proportional) to the physical luminance in the scene (i.e. scene-referred). For that reason, the existing LDR video quality measurement methods may not be directly used for assessing quality in HDR videos. To address that, we present an objective HDR video quality measure (HDR-VQM) based on signal pre-processing, transformation, and subsequent frequency based decomposition. Video quality is then computed based on a spatio-temporal analysis that relates to human eye fixation behavior during video viewing. Consequently, the proposed method does not involve expensive computations related to explicit motion analysis in the HDR video signal, and is therefore computationally tractable. We also verified its prediction performance on a comprehensive, in-house subjective HDR video database with 90 sequences, and it was found to be better than some of the existing methods in terms of correlation with subjective scores (for both across sequence and per sequence cases). A software implementation of the proposed scheme is also made publicly available for free download and use.