Conference PaperPDF Available

Belif: Blind Quality Evaluator Of Light Field Image With Tensor Structure Variation Index

Authors:

Abstract and Figures

With the development of immersive media, Light Field Image (LFI) quality assessment is becoming more and more important , which helps to better guide light field acquisition, processing and application. However, almost all existing LFI quality assessment schemes utilize the 2D or 3D quality assessment methods while ignoring the intrinsic high dimensional characteristics of LFI. Therefore, we adopt the tensor theory to explore the LF 4D structure characteristics and propose the first Blind quality Evaluator of LIght Field image (BELIF). We generate cyclopean images tensor from the original LFI and then the features are extracted by the tucker decomposition. Specifically, Tensor Spatial Characteristic Features (TSCF) for spatial quality and Tensor Structure Variation Index (TSVI) for angular consistency are designed to fully assess the LFI quality. Extensive experimental results on the public LFI databases demonstrate that BELIF significantly outperforms the existing image quality assessment algorithms .
Content may be subject to copyright.
BELIF: BLIND QUALITY EVALUATOR OF LIGHT FIELD IMAGE WITH TENSOR
STRUCTURE VARIATION INDEX
Likun Shi Shengyang Zhao Zhibo Chen
CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System
University of Science and Technology of China, Hefei 230027, China
ABSTRACT
With the development of immersive media, Light Field Image
(LFI) quality assessment is becoming more and more impor-
tant, which helps to better guide light field acquisition, pro-
cessing and application. However, almost all existing LFI
quality assessment schemes utilize the 2D or 3D quality as-
sessment methods while ignoring the intrinsic high dimen-
sional characteristics of LFI. Therefore, we adopt the tensor
theory to explore the LF 4D structure characteristics and pro-
pose the first Blind quality Evaluator of LIght Field image
(BELIF). We generate cyclopean images tensor from the orig-
inal LFI and then the features are extracted by the tucker de-
composition. Specifically, Tensor Spatial Characteristic Fea-
tures (TSCF) for spatial quality and Tensor Structure Vari-
ation Index (TSVI) for angular consistency are designed to
fully assess the LFI quality. Extensive experimental results
on the public LFI databases demonstrate that BELIF signifi-
cantly outperforms the existing image quality assessment al-
gorithms.
Index TermsLight field, Image quality assessment,
Objective model, Tensor, Angular consistency.
1. INTRODUCTION
As a recent emerging media modality, Light Field Image
(LFI) has attracted widespread attention. Unlike traditional
2D and 3D technology, light field can describe the distri-
bution of light rays in free space. Specifically, LFIs can
simultaneously record the direction and intensity information
of radiance, which ideally provides abundant depth cues and
6 degree-of-freedom (DOF) viewing experience. Therefore,
monitoring the quality of LFI is critical to better guiding the
procedure of light field acquisition, processing and applica-
tion techniques.
The LFI is a 4D signal composed of multiple Sub-
Aperture Images (SAIs), which can be represented as a 2D
image array, as shown in Fig.1. Here the uand vdimen-
sions are referred to as the angular dimensions and sand t
dimensions are referred to as the spatial dimensions. Due
This work was supported in part by NSFC under Grant 61571413,
61632001. *Corresponding author. (Email: chenzhibo@ustc.edu.cn)
Fig. 1. Illustration of the Light Field Image (LFI).
to the high dimensional characteristic of the LFI, its quality
is affected by three factors, spatio-angular resolution, spatial
quality and angular consistency [1]. Here, spatio-angular
resolution refers to the number of SAIs in a LFI and the
resolution of a SAI. Spatial quality indicates the quality of
SAIs and angular consistency measures the visual coherence
between SAIs. Since spatio-angular resolution is determined
by the acquisition devices, this paper focuses on the effects of
spatial quality and angular consistency. Currently, LFI qual-
ity evaluation mainly concentrates on subjective evaluation.
Since the real light field display is still under exploration [2],
some research works attempt to display LF with the available
facilities. For example, Paudyal et al. [3] and Viola et al. [4]
analyzed the effect of different compression methods on the
quality of LFIs based on 2D display and Adhikarla et al. [5]
and Shi et al. [6] considered the quality effects from LFIs
compression, rendering and synthesis.
However, subjective evaluation is resource and time con-
suming, which is not applicable for real applications. There-
fore an effective objective quality assessment model is neces-
sary. In general, image quality assessment (IQA) algorithms
can be categorized as full-reference (FR), reduced-reference
(RR) and no-reference (NR) according to the availability of
original image information. The FR-IQA methods measure
the difference between the reference image and distorted im-
age. For example, structure similarity between reference and
distorted images is measured in SSIM [7], binocular rivalry
difference is calculated in [8] and morphological pyramid de-
Fig. 2. Flow diagram of the proposed BELIF model.
composition and mean squared errors on them are utilized in
MP-PSNR[9] And the RR-IQA methods utilize partial infor-
mation of the reference image for quality assessment, such
as [10, 11, 12]. The NR-IQA method only measures the dis-
torted images, which is more applicable in most real scenar-
ios without having access to the original reference image. For
example, Natural Scene Statistic (NSS) theory is employed
in NIQE [13] and BRISQUE [14] and binocular fusion and
binocular rivalry are simulated in BSVQE [15].
However, none of aforementioned schemes consider the
intrinsic high dimensional characteristics of LFI, especially
the distortion caused by angular consistency. Therefore, con-
sidering both the high dimensionality property of the LFI and
potential applicability of the proposed quality evaluator, we
propose the first NR-LFI quality assessment scheme based on
tensor theory. Mathematically, a LFI is a 4D tensor. The
tensor theory can effectively describe the characteristics and
distributions in the high-dimensional space. It has been suc-
cessfully applied to many fields of computer vision, such as
compression and recognition [16]. In this work, we propose
the first Blind quality Evaluator of LIght Field image (BELIF)
based on the tensor theory, in which a novel Tensor Structure
Variation Index (TSVI) is designed to measure angular con-
sistency degradation. Specifically, we first mimic the prop-
erties of the binocular vision to generate cyclopean images
and decompose the cyclopean image array along the angular
dimension to obtain the tensor decomposition components.
Secondly, we extract Tensor Spatial Characteristic Features
(TSCF) to measure the degradation of spatial quality. Thirdly,
TSVI is calculated by analyzing the structural similarity dis-
tribution between the first decomposition component and LFI
cyclopean images. Finally, a regression model is applied to
train and predict the quality of distorted LFI. Extensive exper-
iment results verify that BELIF is superior to existing objec-
tive algorithms and achieves the state-of-the-art performance.
The remainder of the paper is organized as follows. Sec-
tion 2 describes the proposed model in detail. In Section 3,
we illustrate the experimental results. Finally, Section 4 con-
cludes the paper.
(a) (b)
(c) (d)
Fig. 3. Tucker decomposition components. (a) First compo-
nent; (b) Second component; (c) Third component; (d) energy
distribution of all decomposition components.
2. PROPOSED BELIF MODEL
In this section, we describe the proposed model in detail. As
shown in Fig. 2, after generating cyclopean images based
on binocular vision theory, we utilize Tucker decomposition
to decompose it along the angular dimension. Then, TSCF
and TSVI are proposed to monitor spatial quality and angular
consistency respectively. Finally, regression model is used to
predict LFI quality.
2.1. LFI Cyclopean Images
As the LFI can provide binocular cues directly, we mimic
the response of HVS to estimate the cyclopean image that
is formed in the observer’s mind when a stereo image pair
is stereoscopically presented [17]. Since the cyclopean im-
age contains both left and right view information and takes
into account the influence of binocular visual characteristics,
it can reflect the quality of the received image pair [8]. In our
model, horizontally adjacent SAIs in LFI are treated as left
and right views, respectively. Then, we synthesise the cyclo-
pean image Cu,v, according to [18],
Cu,v =Wu,v ×Iu,v +Wu,v+1 ×Iu,v+1 (1)
where we define the SAI as Iu,v, and (u, v)∈ {U, V }denotes
the angular coordinate. Wu,v and Wu,v+1 are the weights
defined in [18]. Finally, a cyclopean image array is obtained
with angular resolution of U×(V1).
2.2. Tucker Decomposition
Images in the cyclopean image array have high texture sim-
ilarity, indicating that the cyclopean image array has a large
amount of redundant information in the angular dimension.
To alleviate this problem, we adopt tensor decomposition to
remove redundant information from the angular dimension.
Specifically, the Tucker decomposition is used to achieve di-
mensionality reduction [16]. It decomposes a tensor into a
core tensor multiplied by a matrix along each dimension. So
for cyclopean image array C, we have
CG×1A(1) ×2A(2) ×3A(3) (2)
where we reshape the cyclopean image array into 3D tensor
along the angular dimension. The tensor GRR1×R2×R3
is the core tensor whose entries illustrate the level of interac-
tion between the different components. A(1) RK1×R1and
A(2) RK2×R2are the factor matrices in spatial dimension
and A(3) RK3×R3is the angular dimension factor matrix.
In our model, we set Kn=Rn, where n= 1,2,3.
The angular decomposition components can then be ob-
tained by multiplying the core tensor with the factor matri-
ces A(1) and A(2) along each mode in the spatial dimension,
which can be given by
T=G×1A(1) ×2A(2) (3)
Here we utilize the alternating least squares method pro-
vided by the tensor toolbox [19] to implement the Tucker de-
composition. Fig. 3(a)-(c) show the first three components
and the energy of each component is shown in Fig. 3(d). Ob-
viously, the texture information and energy mainly concen-
trate in the first component, which represents the basic texture
structure information of the LFI. It is also observed that the
second and third components contain the high frequency in-
formation with relatively higher energy. We find the first three
components contains more than 80% energy, so we treat them
as three most important dimensionality reduced images.
2.3. Features Extraction
2.3.1. Tensor Spatial Characteristic Features (TSCF)
In practical applications, operations such as compression in-
evitably lead to deterioration of the LFI spatial quality. To
measure changes in spatial quality, we extract the TSCF fea-
ture. First, we analyze the global statistics of the first com-
ponent and extract the NSS feature fnss. Specifically, mean
subtracted contrast normalized (MSCN) coefficients are ob-
tained and the statistical distribution is fitted using the zero-
mean asymmetric generalized gaussian distribution (AGGD)
model [14]. In addition, it is observed that the distortion will
change the local informatioin in the first three components.
Then both spatial and spectral entropy are computed on 8×8
blocks without overlapping [20]. Here spectral entropy is cal-
culated after excluding DC coefficients in the DCT domain.
Finally, the skewness and mean value of entropy flocal are
extracted. Further we compute mean value, entropy, kurto-
sis and skewness of energy distribution as supplementary fea-
tures fenergy . Finally, we combine all these features to obtain
the TSCF FT SC F ,
FT SC F ={fnss, flocal , fenerg y }(4)
2.3.2. Tensor Structure Variation Index (TSVI)
In addition to spatial quality, angular consistency also affects
the quality of the LFIs. Usually, angular reconstruction oper-
ations, such as interpolation, will break angular consistency.
To measure the degradation of angular consistency, we pro-
pose the Tensor Structure Variation Index (TSVI). Specifi-
cally, we measure the structural similarity Sbetween each
image of the cyclopean image array and the first decomposi-
tion component,
Su,v =Fss(Cu,v , T1)(5)
where T1is the first decomposition component and (u, v)
{U, V 1}is the angular coordinate of the cyclopean image
array. Fss is the function to calculate structural similarity, we
utilize the popular SSIM [7] in the paper.
The distribution of two LFIs selected from Win5-LID [6]
is shown in Fig. 4. When the angular consistency is not de-
stroyed, the distribution of structural similarity is smooth, as
shown in Fig. 4(a). However, when the angular consistency is
disrupted by interpolation distortion, the distribution of struc-
tural similarity changes significantly. Fig. 4(b)-(d) show the
structural similarity distribution of the LFIs introducing the
nearest neighbor interpolation distortion. As the angular con-
sistency deteriorates, the degree of variation in the structural
similarity distribution of the LFI gradually increases.
Finally, we treat the structural similarity distribution as a
matrix and extract the mean, standard deviation, and singular
values as feature FT SV I ,
FT SV I ={avg(S), std(S), sv d(S)}(6)
3. EXPERIMENT RESULTS
3.1. LFI Databases
In our experiment, we conduct comparison experiments on
Win5-LID [6] and MPI-LFA [5] databases. The Win5-LID
database comprises 220 quality annotated LFIs based on 10
reference LFIs that are subject to 6 different types of distor-
tions at different distortion levels. Distortion types include
JPEG2000, HEVC, linear interpolation (LN), nearest neigh-
bor interpolation (NN), and two CNN models. The associated
overall mean opinion score (MOS) value is also provided for
each LFI.
The MPI-LFA contains 336 quality annotated LFIs. 14
reference LFIs are distorted by 3D extension of HEVC, LIN-
EAR, NN, optical flow estimation (OPT), quantized depth
maps (DQ) and gaussian blur (GAUSS) with six degra-
dation levels for each distortion type. For quality assess-
ment, authors denote measured values as just-objectionable-
differences (JODs), which is more similar to difference-
mean-opinion-score (DMOS) [21]. The lower value indicates
the worse quality.
(a) (b) (c) (d)
Fig. 4. Distribution of structural similarity with different nearest neighbor distortion levels from Win5-LID database [6]. Above
is the ’Bike’ LFI and below is the ’Dish’ LFI. (a) is the original image, and the distortion levels are gradually increased from
(b) to (d).
Table 1. Performance Comparison.
Win5-LID MPI-LFA
Metrics SROCC LCC RMSE SROCC LCC RMSE
PSNR 0.6026 0.6189 0.8031 0.8078 0.7830 1.2697
SSIM [7] 0.7346 0.7596 0.6650 0.7027 0.7123 1.4327
VIF [22] 0.6665 0.7032 0.7270 0.7843 0.7861 1.2618
FSIM [23] 0.8233 0.8318 0.5675 0.7776 0.7679 1.3075
MSSIM [24] 0.8266 0.8388 0.5566 0.7675 0.7518 1.3461
IWSSIM [25] 0.8352 0.8435 0.5492 0.8124 0.7966 1.2340
IFC [26] 0.5028 0.5393 0.8611 0.7573 0.7445 1.3629
NIQE [13] 0.2086 0.2645 0.9861 0.0665 0.1950 2.0022
BRISQUE [14] 0.6687 0.7510 0.5619 0.6724 0.7597 1.1317
NFERM [27] 0.6328 0.7213 0.5767 0.6454 0.7451 1.1036
Chen [8] 0.5269 0.6070 0.8126 0.7668 0.7585 1.3303
SINQ [18] 0.8029 0.8362 0.5124 0.8524 0.8612 0.9939
BSVQE [15] 0.8179 0.8425 0.4801 0.8570 0.8751 0.9561
3DSwIM [28] 0.4320 0.5262 0.8695 0.5565 0.5489 1.7063
MW-PSNR Reduc [29] 0.5326 0.4766 0.8989 0.7217 0.6757 1.5048
MW-PSNR Full [29] 0.5147 0.4758 0.8993 0.7232 0.6770 1.5023
MP-PSNR Reduc [30] 0.5374 0.4765 0.8989 0.7210 0.6747 1.5067
MP-PSNR Full [9] 0.5335 0.4766 0.8989 0.7203 0.6730 1.5099
APT [31] 0.3058 0.4087 0.9332 0.0710 0.0031 2.0413
BELIF 0.8719 0.8910 0.4294 0.8854 0.9096 0.7877
3.2. Comparison with the Existing Metrics
Currently there is no LFI objective quality evaluation model.
In order to demonstrate the effectiveness of BELIF, we com-
pare several FR and NR IQA metrics, including seven 2D-FR
metrics [7, 22, 23, 24, 25, 26], three 2D-NR metrics [13, 14,
27], one 3D-FR metric [8], two 3D-NR metrics [15, 18], five
multi-view FR metrics [28, 29, 9, 30] and one multi-view NR
metric [31]. Correlation between MOS and predicted results
is computing by using SROCC, PCC, and RMSE [8]. The
SROCC measures the monotonicity while PCC evaluates the
linear relationship between predicted score and MOS. The
RMSE provides a measure of the prediction accuracy. The
value of SROCC and PCC closing to 1 represent high pos-
itive correlation and a lower RMSE value indicates a better
performance. Then the support vector regression (SVR) with
a radial basis function kernel [32] is chosen as the regression
model. We randomly select 80% of the database as training
set, while the remaining 20% as test set. 1000 times cross-
Table 2. Ablation Study Results.
Win5-LID MPI-LFA
SROCC LCC RMSE SROCC LCC RMSE
BELIF-FTSVI 0.7902 0.8473 0.5051 0.8678 0.8916 0.8188
BELIF 0.8719 0.8910 0.4294 0.8854 0.9096 0.7877
validations are performed. The median of correlation coeffi-
cients are used as the final results.
The results of all methods on Win5-LID and MPI-LFA
are shown in Table 1. Obviously, BELIF outperforms all the
existing algorithms on both databases. The reason is that ex-
isting 2D and 3D algorithms primarily measure the degrada-
tion of spatial quality without considering the angular consis-
tency in the 4D LFI. Although multi-view IQA methods can
measure the distortion caused by angular reconstruction, they
don’t take into account the compression distortion or similar
distortions. Therefore we can conclude that BELIF can effec-
tively capture the degradation in spatial quality and angular
consistency.
Additionally, we trained BELIF on the Win5-LID database,
and tested it on the same distortions in the MPI-LFA database.
The SROCC value can reach 0.8309, which proves that BE-
LIF has good generality.
3.3. Ablation Study
To demonstrate the validity of proposed TSVI, we perform
an ablation study and the results are shown in Table 2. Obvi-
ously, TSVI can significantly improve the performance of the
model.
4. CONCLUSION
In this paper, we have presented the first Blind quality Evalu-
ator of LIght Field image (BELIF), which is based on tensor
decomposition theory. The BELIF can effectively assess the
distortion of the spatial quality and angular consistency. And
the results show that BELIF outperforms the existing metrics.
In the future, we’ll extend this framework for quality evalua-
tion of light field video signals.
5. REFERENCES
[1] Gaochang Wu, Belen Masia, Adrian Jarabo, Yuchen Zhang, Liangyong
Wang, Qionghai Dai, Tianyou Chai, and Yebin Liu, “Light field image
processing: An overview, IEEE Journal of Selected Topics in Signal
Processing, vol. 11, no. 7, pp. 926–954, 2017.
[2] Jon Karafin, “On the support of light field and holographic video dis-
play technologies,” ISO/IEC JTC 1/SC 29/WG 11 Macau, CN, 2017.
[3] Pradip Paudyal, Federica Battisti, M˚
arten Sj¨
ostr¨
om, Roger Olsson, and
Marco Carli, “Towards the perceptual quality evaluation of compressed
light field images,” IEEE Transactions on Broadcasting, vol. 63, no. 3,
pp. 507–522, 2017.
[4] Irene Viola, Martin ˇ
Reˇ
r´
abek, and Touradj Ebrahimi, “Comparison and
evaluation of light field image coding approaches, IEEE Journal of se-
lected topics in signal processing, vol. 11, no. 7, pp. 1092–1106, 2017.
[5] Vamsi Kiran Adhikarla, Marek Vinkler, Denis Sumin, Rafal K Man-
tiuk, Karol Myszkowski, Hans-Peter Seidel, and Piotr Didyk, “Towards
a quality metric for dense light fields,” in Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, 2017, pp. 58–67.
[6] Likun Shi, Shengyang Zhao, Wei Zhou, and Zhibo Chen, “Perceptual
evaluation of light field image, in International Conference on Image
Processing (ICIP), 2018 accepted. IEEE, 2018.
[7] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli,
“Image quality assessment: from error visibility to structural similar-
ity,IEEE transactions on image processing, vol. 13, no. 4, pp. 600–
612, 2004.
[8] Ming-Jun Chen, Che-Chun Su, Do-Kyoung Kwon, Lawrence K Cor-
mack, and Alan C Bovik, “Full-reference quality assessment of stere-
opairs accounting for rivalry,” Signal Processing: Image Communica-
tion, vol. 28, no. 9, pp. 1143–1155, 2013.
[9] D Sandic-Stankovic, D Kukolj, and P Le Callet, “Dibr synthesized
image quality assessment based on morphological pyramids, 3dtv-con
immersive and interactive 3d media experience over networks,Lisbon,
July, 2015.
[10] Zhou Wang and Eero P Simoncelli, “Reduced-reference image qual-
ity assessment using a wavelet-domain natural image statistic model,
in Human Vision and Electronic Imaging X. International Society for
Optics and Photonics, 2005, vol. 5666, pp. 149–160.
[11] Zhou Wang and Alan C Bovik, “Reduced-and no-reference image qual-
ity assessment,” IEEE Signal Processing Magazine, vol. 28, no. 6, pp.
29–40, 2011.
[12] Abdul Rehman and Zhou Wang, “Reduced-reference image quality
assessment by structural similarity estimation,” IEEE Transactions on
Image Processing, vol. 21, no. 8, pp. 3378–3389, 2012.
[13] Lin Zhang, Lei Zhang, and Alan C Bovik, “A feature-enriched com-
pletely blind image quality evaluator, IEEE Transactions on Image
Processing, vol. 24, no. 8, pp. 2579–2591, 2015.
[14] Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik, “No-
reference image quality assessment in the spatial domain,” IEEE Trans-
actions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012.
[15] Zhibo Chen, Wei Zhou, and Weiping Li, “Blind stereoscopic video
quality assessment: From depth perception to overall experience,
IEEE Transactions on Image Processing, vol. 27, no. 2, pp. 721–734,
2018.
[16] Tamara G Kolda and Brett W Bader, “Tensor decompositions and ap-
plications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
[17] Bela Julesz, “Foundations of cyclopean perception., 1971.
[18] Lixiong Liu, Bao Liu, Che-Chun Su, Hua Huang, and Alan Conrad
Bovik, “Binocular spatial activity and reverse saliency driven no-
reference stereopair quality assessment,” Signal Processing: Image
Communication, vol. 58, pp. 287–299, 2017.
[19] Brett W. Bader, Tamara G. Kolda, et al., “Matlab tensor toolbox version
2.6,” Available online, February 2015.
[20] Lixiong Liu, Bao Liu, Hua Huang, and Alan Conrad Bovik, “No-
reference image quality assessment based on spatial and spectral en-
tropies,” Signal Processing: Image Communication, vol. 29, no. 8, pp.
856–863, 2014.
[21] Recommendation ITU-T P.913, “Methods for the subjective assessment
of video quality, audio quality and audiovisual quality of internet video
and distribution quality television in any environment,” 2016.
[22] Hamid R Sheikh and Alan C Bovik, “Image information and visual
quality,IEEE Transactions on image processing, vol. 15, no. 2, pp.
430–444, 2006.
[23] Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang, “Fsim: A fea-
ture similarity index for image quality assessment,” IEEE transactions
on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
[24] Zhou Wang, Eero P Simoncelli, and Alan C Bovik, “Multiscale struc-
tural similarity for image quality assessment,” in Signals, Systems and
Computers, 2004. Conference Record of the Thirty-Seventh Asilomar
Conference on. Ieee, 2003, vol. 2, pp. 1398–1402.
[25] Zhou Wang and Qiang Li, “Information content weighting for percep-
tual image quality assessment,” IEEE Transactions on Image Process-
ing, vol. 20, no. 5, pp. 1185–1198, 2011.
[26] Hamid R Sheikh, Alan C Bovik, and Gustavo De Veciana, “An in-
formation fidelity criterion for image quality assessment using natural
scene statistics,” IEEE Transactions on image processing, vol. 14, no.
12, pp. 2117–2128, 2005.
[27] Deepti Ghadiyaram and Alan C Bovik, “Perceptual quality prediction
on authentically distorted images using a bag of features approach,”
Journal of vision, vol. 17, no. 1, pp. 32–32, 2017.
[28] Federica Battisti, Emilie Bosc, Marco Carli, Patrick Le Callet, and Si-
mone Perugia, “Objective image quality assessment of 3d synthesized
views,Signal Processing: Image Communication, vol. 30, pp. 78–88,
2015.
[29] Dragana Sandi´
c-Stankovi´
c, Dragan Kukolj, and Patrick Le Callet,
“Dibr synthesized image quality assessment based on morphological
wavelets, in Quality of Multimedia Experience (QoMEX), 2015 Sev-
enth International Workshop on. IEEE, 2015, pp. 1–6.
[30] Dragana Sandi´
c-Stankovi´
c, Dragan Kukolj, and Patrick Le Callet,
“Multi–scale synthesized view assessment based on morphological
pyramids,” Journal of Electrical Engineering, vol. 67, no. 1, pp. 3–
11, 2016.
[31] Ke Gu, Vinit Jakhetiya, Jun-Fei Qiao, Xiaoli Li, Weisi Lin, and Daniel
Thalmann, “Model-based referenceless quality metric of 3d synthe-
sized images using local image description,” IEEE Transactions on
Image Processing, vol. 27, no. 1, pp. 394–405, 2018.
[32] Chih-Chung Chang and Chih-Jen Lin, “Libsvm: a library for support
vector machines,” ACM transactions on intelligent systems and tech-
nology (TIST), vol. 2, no. 3, pp. 27, 2011.
... LFIs distinct from both natural images and SCIs, are captured by light field cameras that record comprehensive data on the angle and position of light rays. Existing representative datasets for light field image quality assessment include MPI-LFA (Kiran Adhikarla et al. 2017), VALID (Viola and Ebrahimi 2018), and Win5-LID (Shi, Zhao, and Chen 2019). LFIs contain rich scene information, making it more challenging to assess their quality. ...
... Therefore, relatively few NR-IQA methods have been proposed for LFIs. One pioneering approach is BELIF (Shi, Zhao, and Chen 2019), which uses tensor theory to process LFIs by transforming a raw LFI into a circular image tensor and applying Tucker decomposition for feature extraction. The extracted features, namely tensor spatial features and tensor structure change indices, assess spatial quality and angular consistency, respectively. ...
Article
Full-text available
No‐reference image quality assessment (NR‐IQA) has garnered significant attention due to its critical role in various image processing applications. This survey provides a comprehensive and systematic review of NR‐IQA methods, datasets, and challenges, offering new perspectives and insights for the field. Specifically, we propose a novel taxonomy for NR‐IQA methods based on distortion scenarios and design principles, which distinguishes this work from previous surveys. Representative methods within each category are thoroughly examined, with a focus on their strengths, limitations, and performance characteristics. Additionally, we review 20 widely used NR‐IQA datasets that serve as benchmarks for evaluating these methods, providing detailed information on the number of images, distortion types, and distortion levels for each dataset. Furthermore, we identify and discuss key challenges currently faced by NR‐IQA methods, such as handling diverse and complex distortions, ensuring generalisation across datasets and devices, and achieving real‐time performance. We also suggest potential future research directions to address these issues. In summary, this survey offers a comprehensive and systematic examination of NR‐IQA methods, datasets, and challenges, offering valuable insights and guidance for researchers and practitioners working in the NR‐IQA domain.
... Due to the unique structural features of LFI, however, current 2D IQA techniques are unable to accurately predict the user-perceived quality. Although there are some recently proposed NR-LFIQA methods such as BELIEF [6] and NR-LFQA [7], their performance is limited as they are fundamentally designed based on 2D IQA methodologies such as naturalness statistics and structural similarity (SSIM) [8]. As a result, they gain sub-optimal performance and predict inaccurately on certain distortion types such as EPICNN [9] and USCD [10] (See Section IV for details). ...
... Ideally, a comprehensive LFI-specific algorithm should consider both spatial quality and angular consistency. For example, recent work on NR-LFIQA includes BELIEF [6], Tensor-NLFQ [26], LGF-LFC [27], and NR-LFQA [7]. Particularly, NR-LFQA proposed gradient direction distribution (GDD) to measure the deterioration of angular consistency. ...
Preprint
Full-text available
In multimedia broadcasting, no-reference image quality assessment (NR-IQA) is used to indicate the user-perceived quality of experience (QoE) and to support intelligent data transmission while optimizing user experience. This paper proposes an improved no-reference light field image quality assessment (NR-LFIQA) metric for future immersive media broadcasting services. First, we extend the concept of depthwise separable convolution (DSC) to the spatial domain of light field image (LFI) and introduce "light field depthwise separable convolution (LF-DSC)", which can extract the LFI's spatial features efficiently. Second, we further theoretically extend the LF-DSC to the angular space of LFI and introduce the novel concept of "light field anglewise separable convolution (LF-ASC)", which is capable of extracting both the spatial and angular features for comprehensive quality assessment with low complexity. Third, we define the spatial and angular feature estimations as auxiliary tasks in aiding the primary NR-LFIQA task by providing spatial and angular quality features as hints. To the best of our knowledge, this work is the first exploration of deep auxiliary learning with spatial-angular hints on NR-LFIQA. Experiments were conducted in mainstream LFI datasets such as Win5-LID and SMART with comparisons to the mainstream full reference IQA metrics as well as the state-of-the-art NR-LFIQA methods. The experimental results show that the proposed metric yields overall 42.86% and 45.95% smaller prediction errors than the second-best benchmarking metric in Win5-LID and SMART, respectively. In some challenging cases with particular distortion types, the proposed metric can reduce the errors significantly by more than 60%.
... In the field of no-reference light-field image-quality assessment, since light field images can be considered as a low-rank tensor, Shi et al. [11] utilized the tensor structure of light field image arrays to study their angular and spatial characteristics. They designed a Blind Quality Evaluator Of Light Field Image (BELIF) that generates the first component of the hyper-volume image tensor through Tucker decomposition, and evaluates the spatial quality and angular consistency of distorted LFIs based on the features and structural changes of the tensor in space. ...
... To verify the comprehensive performance of this model, comparisons and analyses were conducted with four existing full-reference two-dimensional image-quality assessment models, four no-reference light-field image-quality assessment models, and two fullreference light-field image-quality assessment models on the same databases. These models include IWSSIM [32], SSIM [33], FSIM [34], MS-SSIM [35], BELIF [11], NR-LFQA [13], Tensor-NLFQ [14], VBLFI [3], MDFM [4], and MPFS [36]. In the table below, the best result for each evaluation metric is presented in bold with underline, the second best result is shown in bold, and "-" indicates missing performance metrics in the paper. ...
Article
Full-text available
Light field images can record multiple information about the light rays in a scene and provide multiple views from a single image, offering a new data source for 3D reconstruction. However, ensuring the quality of light field images themselves is challenging, and distorted image inputs may lead to poor reconstruction results. Accurate light field image quality assessment can pre-judge the quality of light field images used as input for 3D reconstruction, providing a reference for the reconstruction results before the reconstruction work, significantly improving the efficiency of 3D reconstruction based on light field images. In this paper, we propose an Adaptive Vision Transformer-based light-field image-quality assessment model (AViT-LFIQA). The model adopts a multi-view sub-aperture image sequence input method, greatly reducing the number of input images while retaining as much information as possible from the original light field image, alleviating the training pressure on the neural network. Furthermore, we design an adaptive learnable attention layer based on ViT, which addresses the lack of inductive bias in ViT by using adaptive diagonal masking and a learnable temperature coefficient strategy, making the model more suitable for training on small datasets of light field images. Experimental results demonstrate that the proposed model is effective for various types of distortions and shows superior performance in light-field image-quality assessment.
Article
Due to the distortions occurring at various stages from acquisition to visualization, light field image quality assessment (LFIQA) is crucial for guiding the processing of light field images (LFIs). In this letter, we propose a new blind LFIQA metric via frequency domain analysis and auxiliary learning, termed as FABLFQA. First, spatial-angular patches are extracted from LFIs and further processed through discrete cosine transform to obtain light field frequency maps. Subsequently, a concise and efficient frequency-aware deep learning network is designed to extract frequency features, including the frequency descriptor, 3D ConvBlock, and frequency transformer. Finally, a distortion type discrimination auxiliary task is employed to facilitate the learning of the main quality assessment task. Experimental results on three representative LFI datasets show that the proposed metric outperforms the state-of-the-art metrics. The code of the proposed metric will be publicly available at https://github.com/oldblackfish/FABLFQA .
Article
Light field imaging captures both the intensity and directional information of light rays, providing users with more immersive visual experience. However, during the processes of imaging, processing, coding and reconstruction, light field images (LFIs) may encounter various distortions that degrade their visual quality. Compared to two-dimensional image quality assessment, light field image quality assessment (LFIQA) needs to consider not only the image quality in the spatial domain but also the quality degradation in the angular domain. To effectively model the factors related to visual perception and LFI quality, this paper proposes a multi-scale attention feature fusion based blind LFIQA metric, named MAFBLiF. The proposed metric consists of the following parts: MLI-Patch generation, spatial-angular feature separation module, spatial-angular feature extraction backbone network, pyramid feature alignment module and patch attention module. These modules are specifically designed to extract spatial and angular information of LFIs, and capture multi-level information and regions of interest. Furthermore, a pooling scheme guided by the LFI’s gradient information and saliency is proposed, which integrates the quality of all MLI-patches into the overall quality of the input LFI. Finally, to demonstrate the effectiveness of the proposed metric, extensive experiments are conducted on three representative LFI quality evaluation datasets. The experimental results show that the proposed metric outperforms other state-of-the-art image quality assessment metrics. The code will be publicly available at https://github.com/oldblackfish/MAFBLiF .
Article
Full-text available
Stereoscopic video quality assessment (SVQA) is a challenging problem. It has not been well investigated on how to measure depth perception quality independently under different distortion categories and degrees, especially exploit the depth perception to assist the overall quality assessment of 3D videos. In this paper, we propose a new Depth Perception Quality Metric (DPQM) and verify that it outperforms existing metrics on our published 3D-HEVC video database. Further, we validate its effectiveness by applying the crucial part of the DPQM to a novel Blind Stereoscopic Video Quality Evaluator (BSVQE) for overall 3D video quality assessment. In the DPQM, we introduce the feature of Auto-Regressive prediction based Disparity Entropy (ARDE) measurement and the feature of energy weighted video content measurement, which are inspired by the freeenergy principle and the binocular vision mechanism. In the BSVQE, the binocular summation and difference operations are integrated together with the Fusion Natural Scene Statistic (FNSS) measurement and the ARDE measurement to reveal the key influence from texture and disparity. Experimental results on three stereoscopic video databases demonstrate that our method outperforms state-of-the-art SVQA algorithms for both symmetrically and asymmetrically distorted stereoscopic video pairs of various distortion types.
Article
Full-text available
Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher-dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing and display. Taking these two elements together, research in light field image processing has become increasingly popular in computer vision, computer graphics and signal processing communities. In this article, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus in all aspects of light field image processing, including the basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data.
Article
True light field displays, also known as holographic displays, converge bundles of light that vary depending on horizontal and vertical viewing angle and location, no different than real objects in the world. They offer an alternative to a number of other technologies often marketed as holographic, but fail to achieve the realism and visual comfort of a true light field display. In order to help demystify the requirements for real light field displays, competing technologies and content formats are reviewed, and several tests are provided for determining if a display or dataset is truly holographic. Pixel density and data bandwidth requirements for a light field display depending on design parameters are given.
Article
The recent advances in light field imaging, supported among others by the introduction of commercially available cameras e.g. Lytro or Raytrix, are changing the ways in which visual content is captured and processed. Efficient storage and delivery systems for light field images must rely on compression algorithms. Several methods to compress light field images have been proposed recently. However, in-depth evaluations of compression algorithms have rarely been reported. This paper aims at evaluation of perceived visual quality of light field images and at comparing the performance of a few state of the art algorithms for light field image compression. First, a processing chain for light field image compression and decompression is defined for two typical use cases, professional and consumer. Then, five light field compression algorithms are compared by means of a set of objective and subjective quality assessments. An interactive methodology recently introduced by authors, as well as a passive methodology is used to perform these evaluations. The results provide a useful benchmark for future development of compression solutions for light field images. IEEE
Article
We develop a new model for no-reference 3D stereopair quality assessment that considers the impact of binocular fusion, rivalry, suppression, and a reverse saliency effect on the perception of distortion. The resulting framework, dubbed the S3D INtegrated Quality (SINQ) Predictor, first fuses the left and right views of a stereopair into a single synthesized cyclopean image using a novel modification of an existing binocular perceptual model. Specifically, the left and right views of a stereopair are fused using a measure of “cyclopean” spatial activity. A simple product estimate is also calculated as the correlation between left and right disparity-corrected corresponding binocular pixels. Univariate and bivariate statistical features are extracted from the four available image sources: the left view, the right view, the synthesized “cyclopean” spatial activity image, and the binocular product image. Based on recent evidence regarding the placement of 3D fixation by subjects viewing stereoscopic 3D (S3D) content, we also deploy a reverse saliency weighting on the normalized “cyclopean” spatial activity image. Both one- and two-stage frameworks are then used to map the feature vectors to predicted quality scores. SINQ is thoroughly evaluated on the LIVE 3D image quality database (Phase I and Phase II). The experimental results show that SINQ delivers better performance than state of the art 2D and 3D quality assessment methods on six public databases, especially on asymmetric distortions. A software release of SINQ has been made available online: http://live.ece.utexas.edu/research/Quality/SINQ_release.zip.
Article
New challenges have been brought out along with the emerging of 3D-related technologies such as virtual reality (VR), augmented reality (AR), and mixed reality (MR). Free viewpoint video (FVV), due to its applications in remote surveillance, remote education, etc, based on the flexible selection of direction and viewpoint, has been perceived as the development direction of next-generation video technologies and has drawn a wide range of researchers’ attention. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the “blind” environment (without reference images), a reliable real-time blind quality evaluation and monitoring system is urgently required. But existing assessment metrics do not render human judgments faithfully mainly because geometric distortions are generated by DIBR. To this end, this paper proposes a novel referenceless quality metric of DIBR-synthesized images using the autoregression (AR)-based local image description. It was found that, after the AR prediction, the reconstructed error between a DIBR-synthesized image and its AR-predicted image can accurately capture the geometry distortion. The visual saliency is then leveraged to modify the proposed blind quality metric to a sizable margin. Experiments validate the superiority of our no-reference quality method as compared with prevailing full-, reduced- and no-reference models.
Article
Evaluation of perceived quality of light field images, as well as testing new processing tools, or even assessing the effectiveness of objective quality metrics, relies on the availability of test dataset and corresponding quality ratings. This paper presents SMART light field image quality dataset. The dataset consists of source images (raw data without optical corrections), compressed images, and annotated subjective quality scores. Furthermore, analysis of perceptual effects of compression on SMART dataset is presented. Next, the impact of image content on the perceived quality is studied with the help of image quality attributes. Finally, the performances of 2-D image quality metrics when applied to light field images are analyzed.