Article

No Reference Image Quality Assessment Based on Deep Learning with Distortion Type Prediction

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, we propose an image quality estimation method using deep learning. In the conventional image quality estimation method, image characteristics were analyzed for each distortion of the image, and a model was constructed. In recent years, image quality estimation method that learns automatically the relationship between distortion and image quality using machine learning has been proposed. Furthermore, image quality estimation method using deep learning which is frequently used for general image recognition has been proposed. In this paper, we propose image quality estimation method which using deep neural network that considers the type of distortion. We show that our method improves estimation accuracy of image quality compared with conventional CNN model.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... These IQA methods using the CNN achieved high performance. Therefore, we proposed a model that incorporates the distortion type prediction with more parameters [7]. The model used the L2 norm as an evaluation metric. ...
... The network structure for this study was developed based on the model in [7]. It consists of five layers of units, 32 × 32 − 26 × 26 − 2 × 50 − 800 − 800 − 7. We use the same normalization as in [5] for the input. ...
... By adding such elements, it is assumed that each distortion signal can be captured more precisely and the image quality can be estimated more accurately. To predict the subjective score and distortion label, our model minimizes the following loss function [7]: ...
Article
Current image quality assessment (IQA) methods require the original images for evaluation. However, recently, IQA methods that use machine learning have been proposed. These methods learn the relationship between the distorted image and the image quality automatically. In this paper, we propose an IQA method based on deep learning that does not require a reference image. We show that a convolutional neural network with distortion prediction and fixed filters improves the IQA accuracy.
Article
Full-text available
A memory-based network that provides estimates of continuous variables and converges to the underlying (linear or nonlinear) regression surface is described. The general regression neural network (GRNN) is a one-pass learning algorithm with a highly parallel structure. It is shown that, even with sparse data in a multidimensional measurement space, the algorithm provides smooth transitions from one observed value to another. The algorithmic form can be used for any regression problem in which an assumption of linearity is not justified.
Article
Full-text available
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.
Article
Full-text available
Reduced-reference image quality assessment (RRIQA) methods estimate image quality degradations with partial information about the ldquoperfect-qualityrdquo reference image. In this paper, we propose an RRIQA algorithm based on a divisive normalization image representation. Divisive normalization has been recognized as a successful approach to model the perceptual sensitivity of biological vision. It also provides a useful image representation that significantly improves statistical independence for natural images. By using a Gaussian scale mixture statistical model of image wavelet coefficients, we compute a divisive normalization transformation (DNT) for images and evaluate the quality of a distorted image by comparing a set of reduced-reference statistical features extracted from DNT-domain representations of the reference and distorted images, respectively. This leads to a generic or general-purpose RRIQA method, in which no assumption is made about the types of distortions occurring in the image being evaluated. The proposed algorithm is cross-validated using two publicly-accessible subject-rated image databases (the UT-Austin LIVE database and the Cornell-VCL A57 database) and demonstrates good performance across a wide range of image distortions.
Article
Full-text available
We develop a no-reference image quality assessment (QA) algorithm that deploys a general regression neural network (GRNN). The new algorithm is trained on and successfully assesses image quality, relative to human subjectivity, across a range of distortion types. The features deployed for QA include the mean value of phase congruency image, the entropy of phase congruency image, the entropy of the distorted image, and the gradient of the distorted image. Image quality estimation is accomplished by approximating the functional relationship between these features and subjective mean opinion scores using a GRNN. Our experimental results show that the new method accords closely with human subjective judgment.
Article
Full-text available
The development of general-purpose no-reference approaches to image quality assessment still lags recent advances in full-reference methods. Additionally, most no-reference or blind approaches are distortion-specific, meaning they assess only a specific type of distortion assumed present in the test image (such as blockiness, blur, or ringing). This limits their application domain. Other approaches rely on training a machine learning algorithm. These methods however, are only as effective as the features used to train their learning machines. Towards ameliorating this we introduce the BLIINDS index (BLind Image Integrity Notator using DCT Statistics) which is a no-reference approach to image quality assessment that does not assume a specific type of distortion of the image. It is based on predicting image quality based on observing the statistics of local discrete cosine transform coefficients, and it requires only minimal training. The method is shown to correlate highly with human perception of quality.
Article
Full-text available
The mainstream approach to image quality assessment has centered around accurately modeling the single most relevant strategy employed by the human visual system (HVS) when judging image quality (e.g., detecting visible differences; extracting image structure/information). In this paper, we suggest that a single strategy may not be sufficient; rather, we advocate that the HVS uses multiple strategies to determine image quality. For images containing near-threshold distortions, the image is most apparent, and thus the HVS attempts to look past the image and look for the distortions (a detection-based strategy). For images containing clearly visible distortions, the distortions are most apparent, and thus the HVS attempts to look past the distortion and look for the image's subject matter (an appearance-based strategy). Here, we present a quality assessment method (MAD: Most Apparent Distortion) which attempts to explicitly model these two separate strategies. Local luminance and contrast masking are used to estimate detection-based perceived distortion in high-quality images, whereas changes in the local statistics of spatial-frequency components are used to estimate appearance-based perceived distortion in low-quality images. We show that a combination of these two measures can perform well in predicting subjective ratings of image quality.
Conference Paper
Full-text available
In this paper, a novel learning based method is proposed for No-Reference image quality assessment. Instead of examining the exact prior knowledge for the given type of distortion and finding a suitable way to represent it, our method aims to directly get the quality metric by means of learning. At first, some training examples are prepared for both high-quality and low-quality classes; then a binary classifier is built on the training set; finally the quality metric of an un-labeled example is denoted by the extent to which it belongs to these two classes. Different schemes to acquire examples from a given image, to build the binary classifier and to model the quality metric are proposed and investigated. While most existing methods are tailored for some specific distortion type, the proposed method might provide a general solution for No-Reference image quality assessment. Experimental results on JPEG and JPEG2000 compressed images validate the effectiveness of the proposed method.
Article
Full-text available
This paper presents an efficient metric for quantifying the visual fidelity of natural images based on near-threshold and suprathreshold properties of human vision. The proposed metric, the visual signal-to-noise ratio (VSNR), operates via a two-stage approach. In the first stage, contrast thresholds for detection of distortions in the presence of natural images are computed via wavelet-based models of visual masking and visual summation in order to determine whether the distortions in the distorted image are visible. If the distortions are below the threshold of detection, the distorted image is deemed to be of perfect visual fidelity (VSNR = infin)and no further analysis is required. If the distortions are suprathreshold, a second stage is applied which operates based on the low-level visual property of perceived contrast, and the mid-level visual property of global precedence. These two properties are modeled as Euclidean distances in distortion-contrast space of a multiscale wavelet decomposition, and VSNR is computed based on a simple linear sum of these distances. The proposed VSNR metric is generally competitive with current metrics of visual fidelity; it is efficient both in terms of its low computational complexity and in terms of its low memory requirements; and it operates based on physical luminances and visual angle (rather than on digital pixel values and pixel-based dimensions) to accommodate different viewing conditions.
Article
Full-text available
Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/∼lcv/ssim/.
Article
Full-text available
The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multi-scale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.
Conference Paper
In this work we describe a Convolutional Neural Network (CNN) to accurately predict image quality without a reference image. Taking image patches as input, the CNN works in the spatial domain without using hand-crafted features that are employed by most previous methods. The network consists of one convolutional layer with max and min pooling, two fully connected layers and an output node. Within the network structure, feature learning and regression are integrated into one optimization process, which leads to a more effective model for estimating image quality. This approach achieves state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments. Further experiments on images with local distortions demonstrate the local quality estimation ability of our CNN, which is rarely reported in previous literature.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Conference Paper
In this paper, we present an efficient general-purpose objective no-reference (NR) image quality assessment (IQA) framework based on unsupervised feature learning. The goal is to build a computational model to automatically predict human perceived image quality without a reference image and without knowing the distortion present in the image. Previous approaches for this problem typically rely on hand-crafted features which are carefully designed based on prior knowledge. In contrast, we use raw-image-patches extracted from a set of unlabeled images to learn a dictionary in an unsupervised manner. We use soft-assignment coding with max pooling to obtain effective image representations for quality estimation. The proposed algorithm is very computationally appealing, using raw image patches as local descriptors and using soft-assignment for encoding. Furthermore, unlike previous methods, our unsupervised feature learning strategy enables our method to adapt to different domains. CORNIA (Codebook Representation for No-Reference Image Assessment) is tested on LIVE database and shown to perform statistically better than the full-reference quality measure, structural similarity index (SSIM) and is shown to be comparable to state-of-the-art general purpose NR-IQA algorithms.
Article
Recent years have witnessed dramatically increased interest and demand for accurate, easy-to-use, and practical image quality assessment (IQA) and video quality assessment (VQA) tools that can be used to evaluate, control, and improve the perceptual quality of multimedia content in a wide variety of practical multimedia signal acquisition, communication, and display systems. There is a vast and increasing proliferation of such content over both wireline and wireless networks. Think of the Internet: Youtube, Facebook, Google Video, Flickr and so on; networked high-definition television (HDTV), Internet Protocol TV (IPTV) and unicast home video-on-demand (Netflix and Hulu, for example); and an explosion of wireless video traffic that is expected to more than double every year over the next five years [1].
Conference Paper
It is often desirable to evaluate an image based on its quality. For many computer vision applications, a perceptually meaningful measure is the most relevant for evaluation; however, most commonly used measure do not map well to human judgements of image quality. A further complication of many existing image measure is that they require a reference image, which is often not available in practice. In this paper, we present a “blind” image quality measure, where potentially neither the groundtruth image nor the degradation process are known. Our method uses a set of novel low-level image features in a machine learning framework to learn a mapping from these features to subjective image quality scores. The image quality features stem from natural image measure and texture statistics. Experiments on a standard image quality benchmark dataset shows that our method outperforms the current state of art.
Article
Our approach to blind image quality assessment (IQA) is based on the hypothesis that natural scenes possess certain statistical properties which are altered in the presence of distortion, rendering them un-natural; and that by characterizing this un-naturalness using scene statistics, one can identify the distortion afflicting the image and perform no-reference (NR) IQA. Based on this theory, we propose an (NR)/blind algorithm-the Distortion Identification-based Image Verity and INtegrity Evaluation (DIIVINE) index-that assesses the quality of a distorted image without need for a reference image. DIIVINE is based on a 2-stage framework involving distortion identification followed by distortion-specific quality assessment. DIIVINE is capable of assessing the quality of a distorted image across multiple distortion categories, as against most NR IQA algorithms that are distortion-specific in nature. DIIVINE is based on natural scene statistics which govern the behavior of natural images. In this paper, we detail the principles underlying DIIVINE, the statistical features extracted and their relevance to perception and thoroughly evaluate the algorithm on the popular LIVE IQA database. Further, we compare the performance of DIIVINE against leading full-reference (FR) IQA algorithms and demonstrate that DIIVINE is statistically superior to the often used measure of peak signal-to-noise ratio (PSNR) and statistically equivalent to the popular structural similarity index (SSIM). A software release of DIIVINE has been made available online: http://live.ece.utexas.edu/research/quality/DIIVINE_release.zip for public use and evaluation.
Article
Present day no-reference/no-reference image quality assessment (NR IQA) algorithms usually assume that the distortion affecting the image is known. This is a limiting assumption for practical applications, since in a majority of cases the distortions in the image are unknown. We propose a new two-step framework for no-reference image quality assessment based on natural scene statistics (NSS). Once trained, the framework does not require any knowledge of the distorting process and the framework is modular in that it can be extended to any number of distortions. We describe the framework for blind image quality assessment and a version of this framework-the blind image quality index (BIQI) is evaluated on the LIVE image quality assessment database. A software release of BIQI has been made available online: http://live.ece.utexas.edu/research/quality/BIQI_release.zip.
Article
We present a full- and no-reference blur metric as well as a full-reference ringing metric. These metrics are based on an analysis of the edges and adjacent regions in an image and have very low computational complexity. As blur and ringing are typical artifacts of wavelet compression, the metrics are then applied to JPEG2000 coded images. Their perceptual significance is corroborated through a number of subjective experiments. The results show that the proposed metrics perform well over a wide range of image content and distortion levels. Potential applications include source coding optimization and network resource management.
Conference Paper
In this paper, we propose a new learning based No-Reference Image Quality Assessment (NR-IQA) algorithm, which uses a visual codebook consisting of robust appearance descriptors extracted from local image patches to capture complex statistics of natural image for quality estimation. We use Gabor filter based local features as appearance descriptors and the codebook method encodes the statistics of natural image classes by vector quantizing the feature space and accumulating histograms of patch appearances based on this coding. This method does not assume any specific types of distortion and experimental results on the LIVE image quality assessment database show that this method provides consistent and reliable performance in quality estimation that exceeds other state-of-the-art NR-IQA approaches and is competitive with the full reference measure PSNR.
Article
This paper proposes a no-reference quality assessment metric for images subject to quantization noise in block-based DCT (discrete cosine transform) domain, as those resulting from JPEG or MPEG encoding. The proposed method is based on natural scene statistics of the DCT coefficients, whose distribution is usually modeled by a Laplace probability density function, with parameter λ. A new method for λ estimation from quantized coefficient data is presented; it combines maximum-likelihood with linear prediction estimates, exploring the correlation between λ values at adjacent DCT frequencies. The resulting coefficient distributions are then used for estimating the local error due to lossy encoding. Local error estimates are also perceptually weighted, using a well-known perceptual model by Watson. When confronted with subjective quality evaluation data, results show that the quality scores that result from the proposed algorithm are well correlated with the human perception of quality. Since no knowledge about the original (reference) images is required, the proposed method resembles a no-reference quality metric for image evaluation.
Article
Measurement of visual quality is of fundamental importance to numerous image and video processing applications. The goal of quality assessment (QA) research is to design algorithms that can automatically assess the quality of images or videos in a perceptually consistent manner. Image QA algorithms generally interpret image quality as fidelity or similarity with a "reference" or "perfect" image in some perceptual space. Such "full-reference" QA methods attempt to achieve consistency in quality prediction by modeling salient physiological and psychovisual features of the human visual system (HVS), or by signal fidelity measures. In this paper, we approach the image QA problem as an information fidelity problem. Specifically, we propose to quantify the loss of image information to the distortion process and explore the relationship between image information and visual quality. QA systems are invariably involved with judging the visual quality of "natural" images and videos that are meant for "human consumption." Researchers have developed sophisticated models to capture the statistics of such natural signals. Using these models, we previously presented an information fidelity criterion for image QA that related image quality with the amount of information shared between a reference and a distorted image. In this paper, we propose an image information measure that quantifies the information that is present in the reference image and how much of this reference information can be extracted from the distorted image. Combining these two quantities, we propose a visual information fidelity measure for image QA. We validate the performance of our algorithm with an extensive subjective study involving 779 images and show that our method outperforms recent state-of-the-art image QA algorithms by a sizeable margin in our simulations. The code and the data from the subjective study are available at the LIVE website.
Conference Paper
Human observers can easily assess the quality of a distorted image without examining the original image as a reference. By contrast, designing objective No-Reference (NR) quality measurement algorithms is a very difficult task. Currently, NR quality assessment is feasible only when prior knowledge about the types of image distortion is available. This research aims to develop NR quality measurement algorithms for JPEG compressed images. First, we established a JPEG image database and subjective experiments were conducted on the database. We show that Peak Signal-to-Noise Ratio (PSNR), which requires the reference images, is a poor indicator of subjective quality. Therefore, tuning an NR measurement model towards PSNR is not an appropriate approach in designing NR quality metrics. Furthermore, we propose a computational and memory efficient NR quality assessment model for JPEG images. Subjective test results are used to train the model, which achieves good quality prediction performance.
LIVE image quality assessment database release 2
  • H R Sheikh
  • Z Wang
  • L Cormack
  • A C Bovik
H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik: "LIVE image quality assessment database release 2," : http://live.ece. utexas.edu/research/quality
User requirements for objective perceptual video quality measurements in digital cable television
User requirements for objective perceptual video quality measurements in digital cable television. ITU-T Recommendation J.143, 2000.