ChapterPDF Available

Experimental Comparison of PSNR and SSIM Metrics for Video Quality Estimation

Authors:

Abstract

Since the development of digital video technology, due to the nature of digital video, the approach to video quality estimation has changed. Basically there are two types of metrics used to measure the objective quality of processed digital video: purely mathematically defined video quality metrics (DELTA, MSAD, MSE, SNR and PSNR) where the error is mathematically calculated as a difference between the original and processed pixel, and video quality metrics that have similar characteristics as the Human Visual System – HVS (SSIM, NQI, VQM) where the perceptual quality is considered in the overall video quality estimation. In this paper, an overview and experimental comparison of PSNR and SSIM metrics for video quality estimation is presented.
http://link.springer.com/chapter/10.1007%2F978-3-642-10781-8_37#
... PSNR values according to the video codec quality such as (SD, HD, 2 K, 4 K, and 8 K) would be ranged from 20 to 70 dB. PSNR is computed by the following mathematical equations 31 : ...
Article
Full-text available
Nowadays, smart multimedia network services have become crucial in the healthcare system. The network parameters of Quality of Service (QoS) are widely affecting the efficiency and accuracy of multimedia streaming in wireless environments. This paper proposes an adaptation framework model, which makes a relation between the QP (quantization parameter) in H.264 and H.265 codecs and the QoS of 5G wireless technology. Besides, the effect of QP and packet loss have been studied because of their impact on video streaming. Packet loss of 5G wireless network characteristic is emulated to determine the impact of QP on the received video quality using objective and subjective quality metrics such as PSNR (peak signal to noise ratio), SSIM (structure similarity), and DMOS (differential mean opinion score). In this research, a Testbed is implemented to stream the encoded video from the server to the end users. The application model framework has automatically evaluated the QoE (Quality of Experience). Accordingly, the model detects the defect of network packet loss and selects the optimum QP value to enhance the QoE by the end‐users. The application has been tested on low and high video motions with full high definition (HD) resolution (1920 × 1080) which were taken from ( https://www.xiph.org/downloads/). Test results based on the objective and subjective quality measurements indicate that an optimal QP = 35 and QP = 30 have been chosen for low and high motion respectively to satisfy user QoE requirements.
... SSIM was proposed as an alternative metric since it quantifies the relation between the pixels and their neighbourhood (i.e., the structural information). Several works have focused on the weakness of these metrics [16,25,30,34,35], where the main criticism is that images subject to different compression artifacts and distortion effects (such as additive Gaussian blurring) exhibit similar PSNR and SSIM values. Additional work [12] has shown analytical and experimental relations between both metrics, meaning that they are not independent. ...
Preprint
Full-text available
Neural Radiance Fields (NeRF) have attracted significant attention due to their ability to synthesize novel scene views with great accuracy. However, inherent to their underlying formulation, the sampling of points along a ray with zero width may result in ambiguous representations that lead to further rendering artifacts such as aliasing in the final scene. To address this issue, the recent variant mip-NeRF proposes an Integrated Positional Encoding (IPE) based on a conical view frustum. Although this is expressed with an integral formulation, mip-NeRF instead approximates this integral as the expected value of a multivariate Gaussian distribution. This approximation is reliable for short frustums but degrades with highly elongated regions, which arises when dealing with distant scene objects under a larger depth of field. In this paper, we explore the use of an exact approach for calculating the IPE by using a pyramid-based integral formulation instead of an approximated conical-based one. We denote this formulation as Exact-NeRF and contribute the first approach to offer a precise analytical solution to the IPE within the NeRF domain. Our exploratory work illustrates that such an exact formulation Exact-NeRF matches the accuracy of mip-NeRF and furthermore provides a natural extension to more challenging scenarios without further modification, such as in the case of unbounded scenes. Our contribution aims to both address the hitherto unexplored issues of frustum approximation in earlier NeRF work and additionally provide insight into the potential future consideration of analytical solutions in future NeRF extensions.
... Researchers have also presented various metrics, and their efficacy differed with respect to the challenges faced in using them for judging video and network quality [4]. In this work, we use PSNR and SSIM to compare the received audio and video quality, similar to the work by Kotevski and Mitrevski [16]. ...
Preprint
Full-text available
Video conferencing platforms have been appropriated during the COVID-19 pandemic for different purposes, including classroom teaching. However, the platforms are not designed for many of these objectives. When users, like educationists, select a platform, it is unclear which platform will perform better given the same network and hardware resources to meet the required Quality of Experience (QoE). Similarly, when developers design a new video conferencing platform, they do not have clear guidelines for making design choices given the QoE requirements. In this paper, we provide a set of networks and systems measurements, and quantitative user studies to measure the performance of video conferencing apps in terms of both, Quality of Service (QoS) and QoE. Using those metrics, we measure the performance of Google Meet, Microsoft Teams, and Zoom, which are three popular platforms in education and business. We find a substantial difference in how the three apps treat video and audio streams. We see that their choice of treatment affects their consumption of hardware resources. Our quantitative user studies confirm the findings of our quantitative measurements. While each platform has its benefits, we find that no app is ideal. A user can choose a suitable platform depending on which of the following, audio, video, or network bandwidth, CPU, or memory are more important.
... The SSIM represents image quality in terms of similarity index between two images and was proposed by [Kotevski and Mitrevski, 2010]. It represents the image quality by combining three factors: luminance distortion, contrast distortion, and loss of correlation [Ieremeiev et al., 2020], and is given as ...
Thesis
Full-text available
This study deals with developments to increase the possibilities offered by laboratory X-ray computed tomography in material science by focusing on contrast enhancement and on time resolution aspect. First, the feasibility of using a new generation photon-counting detector (PCD) in lab-CT was evaluated. The characterization of the standard imaging performances and the spectral capabilities of four PCDs were carried out and compared to a standard flat-panel detector. The potential of PCD towards spectral and single-shot K-edge imaging was investigated. Second, a model-based optimization strategy is developed to define the suitable CT scanning parameters for dynamic in situ acquisitions with an image quality allowing qualitative or quantitative analysis. The model is based on three modules: modelling noise in the feature of interest, X-ray absorption simulation tool, and the screening algorithm that outputs the different possible scanning configurations associated with the probability of detection of the interested feature size for each configuration. A real-time in-situ test with sub-minute temporal resolution was performed with the experimentally optimized CT set-up as an application aspect of the thesis. The experimental configuration is confronted with the proposed optimization model configurations, which were found to be in-line with the chosen setup. The application corresponds to the real-time monitoring of microstructural evolution of 3D printed cellulose parts during air-drying phenomena with qualitative and quantitative analysis. It illustrates the quantitative characterization capabilities of lab-CT for high-speed in-situ imaging.Keywords: X-ray tomography, In-situ tomography, Time-resolved imaging, Photon-counting detector, Parameter optimization, Image analysis
... Even the image is degraded by some major types of distortions such as Gaussian blur, motion blur, and noises, the related SSIM and PSNR scores can be larger than the enhanced image, which far contradicts to human visual prospect. More related works showing the drawback of SSIM and PSNR can be found in [8][9][10][11]. ...
Experiment Findings
Full-text available
This is the supplementary for our publication Luminosity Rectified Blind Richardson-Lucy Deconvolution for Single Retinal Image Restoration
... PSNR has a drawback that it does not take into account the bias of the human eye in observing the same amount of noise in different images structures. On the other hand, SSIM [31] has more realistic values and gives a better performance compared to the PSNR [32], as it considers three components: luminance, contrast, and structural information. SSIM assigns a specific equation for each component, where the luminance of a digital image can be estimated as a function of the mean intensity, the contrast is a function of the standard deviation, and the structural information can be extracted after the luminance subtraction and variance normalization, and then, the factors are combined into a single equation as given by equation: ...
Article
Full-text available
The importance of digital image authentication has grown in the last decade particularly with the widespread availability of digital media and image manipulation tools. As a result, different techniques were developed to detect fraudulent alterations in digital images and restore the original data. In this paper, a new algorithm is proposed to authenticate images by hiding a copy of the approximation band in the original image. The approximation band is hidden by embedding it inside the image pixels. The intensity of the hiding was decided using a perceptual map that simulates the human vision system and adds more intensity in areas where the human eye cannot recognize changes. The perceptual map consists of three parts, luminance mask, texture mask, and edge detection mask. Results show a high ability to blindly recover images after different attacks such as removing and blocking attacks. At the same time, the structure similarity index of resultant images was higher than 0.99 for all tested images.
Article
Direct light field acquisition method using a lens array requires a complex system and has a low resolution. On the other hand, the light fields can be also acquired indirectly by back-projection of the focal stack images without lens array, providing a resolution as high as the sensor resolution. However, it also requires the bulky optical system design to fix field-of-view (FOV) between the focal stacks, and an additional device for sensor shifting. Also, the reconstructed light field is texture-dependent and low-quality because it uses either a high-pass filter or a guided filter for back-projection. This paper presents a simple light field acquisition method based on chromatic aberration of only one defocused image pair. An image with chromatic aberration has a different defocus distribution for each R, G, and B channel. Thus, the focal stack can be synthesized with structural similarity (SSIM) 0.96 from only one defocused image pair. Then this image pair is also used to estimate the depth map by depth-from-defocus (DFD) using chromatic aberration (chromatic DFD). The depth map obtained by chromatic DFD is used for high-quality light field reconstruction. Compared to existing light field indirect acquisition, the proposed method requires only one pair of defocused images and can clearly reconstruct light field images with Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) scores lowered by 17%–38% and with Perception-Based Image Quality Evaluator (PIQE) scores lowered by 19%–45%. A defocused image pair is acquired by our customized compact optical system consisting of only three lenses, including a varifocal lens. Image processing and image quality evaluation are all performed using MATLAB. © 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Article
ASTM C1678 outlines an approach to estimate the fracture strength of glasses and ceramics through the use of empirical relationships relating the strength to characteristic fractographic length-scales, such as the ‘mirror radius’. However, the process of measuring radii is subjective, and the relationship suggested by ASTM standards has been shown to be relatively inaccurate for flexural stress fields. This research introduces and tests a visual analysis algorithm to carry out the fractographic analysis of silicate glasses automatically and objectively. The fracture surfaces of various silicate glasses produced by both tensile and flexural stress fields were considered. First, optical images of the fracture surfaces were gathered and unique; descriptive features such as the shape of the ‘mirror-mist boundary’ were extracted using visual analysis tools. Next, a newly developed algorithm compared the processed images with a database comprised of fracture samples of known strengths, fracture toughness, stress fields, and geometric features. Lastly, dimensional analysis principles coupled with a broad, experimental set of over 2100 fracture surfaces was used to accurately estimate the strengths of the imaged fracture surfaces.
Conference Paper
Full-text available
Many recently proposed perceptual image quality assessment algorithms are implemented in two stages. In the first stage, image quality is evaluated within local regions. This results in a quality/distortion map over the image space. In the second stage, a spatial pooling algorithm is employed that combines the quality/distortion map into a single quality score. While great effort has been devoted to developing algorithms for the first stage, little has been done to find the best strategies for the second stage (and simple spatial average is often used). In this work, we investigate three spatial pooling methods for the second stage: Minkowski pooling, local quality/distortion-weighted pooling, and information content-weighted pooling. Extensive experiments with the LIVE database show that all three methods may improve the prediction performance of perceptual image quality measures, but the third method demonstrates the best potential to be a general and robust method that leads to consistent improvement over a wide range of image distortion types
Book
55% new material in the latest edition of this "must-have? for students and practitioners of image & video processing!This Handbook is intended to serve as the basic reference point on image and video processing, in the field, in the research laboratory, and in the classroom. Each chapter has been written by carefully selected, distinguished experts specializing in that topic and carefully reviewed by the Editor, Al Bovik, ensuring that the greatest depth of understanding be communicated to the reader. Coverage includes introductory, intermediate and advanced topics and as such, this book serves equally well as classroom textbook as reference resource. - Provides practicing engineers and students with a highly accessible resource for learning and using image/video processing theory and algorithms - Includes a new chapter on image processing education, which should prove invaluable for those developing or modifying their curricula - Covers the various image and video processing standards that exist and are emerging, driving today's explosive industry - Offers an understanding of what images are, how they are modeled, and gives an introduction to how they are perceived - Introduces the necessary, practical background to allow engineering students to acquire and process their own digital image or video data - Culminates with a diverse set of applications chapters, covered in sufficient depth to serve as extensible models to the reader's own potential applications About the Editor... Al Bovik is the Cullen Trust for Higher Education Endowed Professor at The University of Texas at Austin, where he is the Director of the Laboratory for Image and Video Engineering (LIVE). He has published over 400 technical articles in the general area of image and video processing and holds two U.S. patents. Dr. Bovik was Distinguished Lecturer of the IEEE Signal Processing Society (2000), received the IEEE Signal Processing Society Meritorious Service Award (1998), the IEEE Third Millennium Medal (2000), and twice was a two-time Honorable Mention winner of the international Pattern Recognition Society Award. He is a Fellow of the IEEE, was Editor-in-Chief, of the IEEE Transactions on Image Processing (1996-2002), has served on and continues to serve on many other professional boards and panels, and was the Founding General Chairman of the IEEE International Conference on Image Processing which was held in Austin, Texas in 1994.
Article
Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000.
Article
In this article, we have reviewed the reasons why we (collectively) want to love or leave the venerable (but perhaps hoary) MSE. We have also reviewed emerging alternative signal fidelity measures and discussed their potential application to a wide variety of problems. The message we are trying to send here is not that one should abandon use of the MSE nor to blindly switch to any other particular signal fidelity measure. Rather, we hope to make the point that there are powerful, easy-to-use, and easy-to-understand alternatives that might be deployed depending on the application environment and needs. While we expect (and indeed, hope) that the MSE will continue to be widely used as a signal fidelity measure, it is our greater desire to see more advanced signal fidelity measures being used, especially in applications where perceptual criteria might be relevant. Ideally, the performance of a new signal processing algorithm might be compared to other algorithms using several fidelity criteria. Lastly, we hope that we have given further motivation to the community to consider recent advanced signal fidelity measures as design criteria for optimizing signal processing algorithms and systems. It is in this direction that we believe that the greatest benefit eventually lies.
Article
55% new material in the latest edition of this "must-have? for students and practitioners of image & video processing!This Handbook is intended to serve as the basic reference point on image and video processing, in the field, in the research laboratory, and in the classroom. Each chapter has been written by carefully selected, distinguished experts specializing in that topic and carefully reviewed by the Editor, Al Bovik, ensuring that the greatest depth of understanding be communicated to the reader. Coverage includes introductory, intermediate and advanced topics and as such, this book serves equally well as classroom textbook as reference resource. - Provides practicing engineers and students with a highly accessible resource for learning and using image/video processing theory and algorithms - Includes a new chapter on image processing education, which should prove invaluable for those developing or modifying their curricula - Covers the various image and video processing standards that exist and are emerging, driving today's explosive industry - Offers an understanding of what images are, how they are modeled, and gives an introduction to how they are perceived - Introduces the necessary, practical background to allow engineering students to acquire and process their own digital image or video data - Culminates with a diverse set of applications chapters, covered in sufficient depth to serve as extensible models to the reader's own potential applications About the Editor... Al Bovik is the Cullen Trust for Higher Education Endowed Professor at The University of Texas at Austin, where he is the Director of the Laboratory for Image and Video Engineering (LIVE). He has published over 400 technical articles in the general area of image and video processing and holds two U.S. patents. Dr. Bovik was Distinguished Lecturer of the IEEE Signal Processing Society (2000), received the IEEE Signal Processing Society Meritorious Service Award (1998), the IEEE Third Millennium Medal (2000), and twice was a two-time Honorable Mention winner of the international Pattern Recognition Society Award. He is a Fellow of the IEEE, was Editor-in-Chief, of the IEEE Transactions on Image Processing (1996-2002), has served on and continues to serve on many other professional boards and panels, and was the Founding General Chairman of the IEEE International Conference on Image Processing which was held in Austin, Texas in 1994.
Article
Motion is one of the most important types of information contained in natural video, but direct use of motion information in the design of video quality assessment algorithms has not been deeply investigated. Here we propose to incorporate a recent model of human visual speed perception [Nat. Neurosci. 9, 578 (2006)] and model visual perception in an information communication framework. This allows us to estimate both the motion information content and the perceptual uncertainty in video signals. Improved video quality assessment algorithms are obtained by incorporating the model as spatiotemporal weighting factors, where the weight increases with the information content and decreases with the perceptual uncertainty. Consistent improvement over existing video quality assessment algorithms is observed in our validation with the video quality experts group Phase I test data set.
Article
We propose a new universal objective image quality index, which is easy to calculate and applicable to various image processing applications. Instead of using traditional error summation methods, the proposed index is designed by modeling any image distortion as a combination of three factors: loss of correlation, luminance distortion, and contrast distortion. Although the new index is mathematically defined and no human visual system model is explicitly employed, our experiments on various image distortion types indicate that it performs significantly better than the widely used distortion metric mean squared error. Demonstrative images and an efficient MATLAB implementation of the algorithm are available online at http://anchovy.ece.utexas.edu//spl sim/zwang/research/quality_index/demo.html.