Article

Scope of validity of PSNR in image/video quality assessment

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Experimental data are presented that clearly demonstrate the scope of application of peak signal-to-noise ratio (PSNR) as a video quality metric. It is shown that as long as the video content and the codec type are not changed, PSNR is a valid quality measure. However, when the content is changed, correlation between subjective quality and PSNR is highly reduced. Hence PSNR cannot be a reliable method for assessing the video quality across different video contents.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The boundaries P (i, j) and C (i, j) allude to the pixel situated at the ith row and jth column of the plain and ciphered picture separately. The bigger MSE esteem demonstrates great encryption security [22]. ...
... I max is the highest pixel worth of the image [22]. Table 6 summarizes this analysis of the proposed encryption technique. ...
... It is a connection-based measure that determines the resemblance between the encrypted image E(x, y) and the simple image P(x, y), both of dimension M Â N. The mathematical formula for SC is [22] SC ¼ It is the square root of average of square of all errors. It is normally used to make a magnificent generalpurpose error measured for statistical appraisals [23]. ...
Article
Full-text available
This article introduces an encryption scheme for any digital data. The introduced encryption scheme is primarily based on three layers: substitution, permutation, and bit XOR. The substitution phase is acquired by multiple substitution boxes constructed over the multiplicative group of the commutative chain ring R8. For the permutation step, to permute the pixels of the image, a sequence of the parity bit is obtained by a recursive map through Blum integer. This phase considerably reduces the correlation between neighboring pixels. Since the behavior of chaos is remarkable in secure communication, for bit XOR operation a chaotic sequence is generated through a four-dimensional (4D) chaotic attractor. In the end, the proposed encoding scheme is assessed by different analyses. The overall results confirmed that the presented work has good cryptographic features which makes it suitable for low-profile mobile applications.
... However, some works in the literature have tried to approximate it using the L 2 distance and the Peak Signal-to-Noise Ratio (PSNR) [20]. PSNR is a metric used to quantify the quality of a reconstructed or compressed image by measuring the ratio of the maximum possible power of the original signal to the power of the noise introduced during compression or reconstruction. ...
Chapter
Full-text available
Although machine learning models achieve high classification accuracy against benign examples, they are vulnerable to adversarial machine learning (AML) attacks which generate adversarial examples by adding well-crafted perturbations to the benign examples. The perturbations can be increased to enhance the attack success rate, however, if the perturbations are added without considering the semantic or perceptual similarity between the benign and adversarial examples, the attack can be easily perceived/detected. As such, there exists a trade-off between the attack success rate and the perceptual similarity. In this paper, we propose a novel Semantic-Preserving Adversarial Transformation (SPAT) framework which facilitates an advantageous trade-off between the two metrics. SPAT modifies the optimisation objective of an AML attack to include the goal of increasing the attack success rate as well as the goal of maintaining the perceptual similarity between benign and adversarial examples. Our experiments on a variety of datasets including CIFAR-10, GTSRB, and MNIST demonstrate that SPAT-transformed AML attacks achieve better perceptual similarity while maintaining the attack success rates as the conventional AML attacks.
... A higher PSNR value indicates a lower level of image distortion. The PSNR [24] is calculated as follows: 2 10 10 log ( ) ...
Preprint
Full-text available
With current Near-infrared (NIR) image colorization methods, the color and details of the colorized images are not well restored. Thus, in this paper, we propose an unsupervised color feature control SE attention StyleGAN (CF-StyleGAN) method for the NIR image colorization task. The proposed method is based on histogram LAB color and brightness feature extraction, which solves the problem whereby the color and brightness of the results do not match the actual situation. The proposed Squeeze-and-Excitation-based StyleGAN (SE-SGAN) method, which introduces a channel attention mechanism based on StyleGAN and utilizes both standard deviation adaptive normalization and the Mish activation function in the synthesis network, can improve the quality of the output image. The proposed method was evaluated experimentally on the KAIST dataset. We found that the proposed CF-StyleGAN outperformed existing methods and achieved state-of-the-art NIR image colorization results. Experimental results show that the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) values of the colorized images were 27.15 and 0.83, respectively.
... In our experiments, for images with ground truth, we can evaluate each method by two commonly used quantitative metrics, the peak signal-to-noise ratio (PSNR) [41] and the structural similarity index (SSIM) [40]. For the images without ground truth (i.e., real dataset), we provide some visual results. ...
Preprint
Full-text available
Single image deraining (SID) has shown its importance in many advanced computer vision tasks. Though many CNN based image deraining methods have been proposed, how to effectively remove raindrops while maintaining background structure remains a challenge that needs to be overcome. Most of the deraining work focuses on removing rain streaks, but in heavy rain images, the dense accumulation of rainwater or the rain curtain effect significantly interferes with the effective removal of rain streaks, and often introduces some artifacts that make the scene more blurry. In this paper, we propose a new network structure R-PReNet for single image deraining with good background structure maintaining. This framework fully utilizes the cyclic recursive structure of PReNet. Moreover, we introduce residual channel prior (RCP) and feature fusion modules for better deraining performance by focusing on background feature information. Compared with the previous methods, our method has significantly improvement effect on the rainstorm image with the artifacts removing and good visual detail restoring.
... Then, the content of the selected video, especially the Chinese introduction, should be appropriate and not too difficult, and should conform to the students' Chinese level. At the same time, if the Chinese part of the video is too brief or the pronunciation is not standard, it is not conducive to students' learning, and even errors will occur (Huynh-Thu & Ghanbari, 2008). On the contrary, some popular science documentaries are of great significance for foreign students to learn, but the proportion of language introductions is often too large, with many proper nouns, Chinese elements, historical figures, and stories, which will make it difficult for students to understand and remember (Lee et al., 2012). ...
Article
Full-text available
With the continuous development of speech recognition technology, automatic subtitle generation has gradually attracted people's attention. However, the level of short videos is uneven, and cultural teaching is also one-sided, irregular, and lacking systematicness. Since the media era, it is possible to apply the short video subtitle generation algorithm to international Chinese education and teaching. However, Chinese teachers should pay attention to some possible problems in self-media videos and adopt appropriate teaching strategies. This paper mainly discusses the development of international Chinese education and teaching under the new media environment, and discusses the characteristics, advantages, and disadvantages of international Chinese education and teaching under the new media environment, as well as the existing problems. Short video subtitle generation algorithm provides a new way for international Chinese education and teaching, enhances the vitality of education, and expands educational channels.
... To evaluate the performance of the zero-value pixel predictions from the probabilistic model described in Section 2.2, we used the following commonly used metrics for comparing images [39,40]: the root mean square error (RMSE) [41], peak signal-to-noise ratio (PSNR) [42,43], and structural similarity index measure (SSIM) [44,45]. These metrics were used to compare the prediction maps with synthetic multicamera interference and the prediction maps with real multicamera interference. ...
Article
Full-text available
The behavior of multicamera interference in 3D images (e.g., depth maps), which is based on infrared (IR) light, is not well understood. In 3D images, when multicamera interference is present, there is an increase in the amount of zero-value pixels, resulting in a loss of depth information. In this work, we demonstrate a framework for synthetically generating direct and indirect multicamera interference using a combination of a probabilistic model and ray tracing. Our mathematical model predicts the locations and probabilities of zero-value pixels in depth maps that contain multicamera interference. Our model accurately predicts where depth information may be lost in a depth map when multicamera interference is present. We compare the proposed synthetic 3D interference images with controlled 3D interference images captured in our laboratory. The proposed framework achieves an average root mean square error (RMSE) of 0.0625, an average peak signal-to-noise ratio (PSNR) of 24.1277 dB, and an average structural similarity index measure (SSIM) of 0.9007 for predicting direct multicamera interference, and an average RMSE of 0.0312, an average PSNR of 26.2280 dB, and an average SSIM of 0.9064 for predicting indirect multicamera interference. The proposed framework can be used to develop and test interference mitigation techniques that will be crucial for the successful proliferation of these devices.
... Once all routes were isolated, the following step was to compare the participant-traced routes with the pre-traced target route provided by the experimenters. A wide range of measures can be used to assess image quality, including the peak signal-to-noise ratio (Huynh-Thu & Ghanbari, 2008), visual information fidelity (Sheikh & Bovik, 2006), and structural similarity index (Wang et al., 2004), all three of which can also be used to assess the similarity between a reference image and a modified version implementing different algorithms. Following a test of these three measures for comparisons using our data, we found that overall, the structural similarity index ("SSIM") was the most effective at detecting differences between pre-traced and participant-traced routes. ...
Article
Full-text available
The “map task” is an interactive, goal-driven, real-time conversational task used to elicit semi controlled natural language production data. We present recommendations for creating a bespoke map task that can be tailored to individual research projects and administered online using a chat interface. As proof of concept, we present a case study exemplifying our own implementation, designed to elicit informal written communication in either English or Spanish. Eight experimental maps were created, manipulating linguistic factors including lexical frequency, cognate status, and semantic ambiguity. Participants (N = 40) completed the task in pairs and took turns (i) providing directions based on a pre-traced route, or (ii) following directions to draw the route on an empty map. Computational measures of image similarity (e.g., structural similarity index) between pre-traced and participant-traced routes showed that participants completed the task successfully; we describe use of this measure for measuring task success quantitatively. We also provide a comparative analysis of the language elicited in English and Spanish. The most frequently used words were roughly equivalent in both languages, encompassing primarily commands and items on the maps. Similarly, abbreviations, swear words, and slang present in both datasets indicated that the task successfully elicited informal communication. Interestingly, Spanish turns were longer and displayed a wider range of morphologically complex forms. English, conversely, displayed strategies mostly absent in Spanish, such as the use of cardinal directions as a communicative strategy. We consider the online map task as a promising method for examining a variety of phenomena in applied linguistics research.
... Equations (1) and (2) [30,31] are used to calculate the testing accuracy of PI-NMD. Furthermore, Equations (3)-(5) are used to calculate the testing accuracy of PI-Clas. ...
Article
Full-text available
Despite the significant number of classification studies conducted using plant images, studies on nonlinear motion blur are limited. In general, motion blur results from movements of the hands of a person holding a camera for capturing plant images, or when the plant moves owing to wind while the camera is stationary. When these two cases occur simultaneously, nonlinear motion blur is highly probable. Therefore, a novel deep learning-based classification method applied on plant images with various nonlinear motion blurs is proposed. In addition, this study proposes a generative adversarial network-based method to reduce nonlinear motion blur; accordingly, the method is explored for improving classification performance. Herein, experiments are conducted using a self-collected visible light images dataset. Evidently, nonlinear motion deblurring results in a structural similarity index measure (SSIM) of 73.1 and a peak signal-to-noise ratio (PSNR) of 21.55, whereas plant classification results in a top-1 accuracy of 90.09% and F1-score of 84.84%. In addition, the experiment conducted using two types of open datasets resulted in PSNRs of 20.84 and 21.02 and SSIMs of 72.96 and 72.86, respectively. The proposed method of plant classification results in top-1 accuracies of 89.79% and 82.21% and F1-scores of 84% and 76.52%, respectively. Thus, the proposed network produces higher accuracies than the existing state-of-the-art methods.
... Whereas the spectral and spatial levels are given by the Spatial frequency (SF) [59] and gray level difference (GLD) [60]. Finally, PSNR [61], Qabf [62], and SSIM [63] are related to the noise, quality, and similarity as shown in Table 2. Likewise, the resultant image quality is observed and measured with various no-reference quality metrics given in [64][65][66] as BRISQUE, PIQE, and DIIVINE are established on perceptual quality, whereas the metric in [67] is based on discrete cosine transform. Likewise, metrics related to image quality assessment through visual saliency guided (IQVG) [68], patch-based metric (QAC) [69], pseudo structure similarity (PSS) [70], blind multiple pseudo reference images (BMPRI) [71], and fusion of bag of mixtures with perceptual quality named as FRIQUEE [72] is measured and given in Table 3. ...
Preprint
Full-text available
To illustrate, radar images, popularly known as Synthetic Aperture Radar (SAR) and optical images are very useful in satellite-based information retrieval. But, the complementary nature of both these data led to misinterpretation of the information in various fields like military surveillance, weather forecasting, and resource management. However, the study of image fusion on optical SAR images has overcome some of the limitations in the remote sensing field with cloud removal, change detection, and infrastructure identification. There have been traditional approaches like wavelet analysis, cosine transform, and sparse representation in image fusion which were not more effective in analysing the nature of SAR and optical images. Hence, the advanced algorithms based on neural networks which are very much effective in image fusion, especially in medical diagnostics, social networks, and surveillance are used in the anticipated study. Recent studies of deep-learning-based approaches specifically Generative Adversarial Networks (GAN) architecture suggest decent developments in image fusion. With good developments in Principal Component Analysis for image compression, image segmentation, and image representation in satellite imagery, the suggested work develops a hybrid fusion model which is an innovative combination of Principal Component Analysis and GAN to produce the target image. The effectiveness of the proposed unsupervised algorithm improves the results and preserves the characteristics of both image types. The results specify the importance of the GAN in the SAR optical fusion with the proposed PCA GAN SAR Optical Fusion Network. Subsequently, the assessment of the quality metrics reveals that the current study outperforms prior works and is very useful for future geoinformation.
... For the compared methods, we manually adjust parameters to ensure optimal performance according to author's suggestions. For numerical comparison, we use the peak signal-to-noise rate (PSNR) [60] and the structural similarity (SSIM) [61] to evaluate the performance of different methods. All numerical experiments are implemented in Windows 10 64-bit and MATLAB R2022a on a desktop computer with an Intel(R) Core(TM) i9-12900 CPU at 2.40 GHz with 64GB memory of RAM. ...
Article
Full-text available
Recently, transform-based tensor nuclear norm (TNN) methods have received increasing attention as a powerful tool for multi-dimensional visual data (color images, videos, and multispectral images, etc.) recovery. Especially, the redundant transform-based TNN achieves satisfactory recovery results, where the redundant transform along spectral mode can remarkably enhance the low-rankness of tensors. However, it suffers from expensive computational cost induced by the redundant transform. In this paper, we propose a learnable spatial-spectral transform-based TNN model for multi-dimensional visual data recovery, which not only enjoys better low-rankness capability but also allows us to design fast algorithms accompanying it. More specifically, we first project the large-scale original tensor to the small-scale intrinsic tensor via the learnable semi-orthogonal transforms along the spatial modes. Here, the semi-orthogonal transforms, serving as the key building block, can boost the spatial low-rankness and lead to a small-scale problem, which paves the way for designing fast algorithms. Secondly, to further boost the low-rankness, we apply the learnable redundant transform along the spectral mode to the small-scale intrinsic tensor. To tackle the proposed model, we apply an efficient proximal alternating minimization-based algorithm, which enjoys a theoretical convergence guarantee. Extensive experimental results on real-world data (color images, videos, and multispectral images) demonstrate that the proposed method outperforms state-of-the-art competitors in terms of evaluation metrics and running time.
... Qualitative and quantitative analyses were performed using these four benchmark datasets. In our experiments, we apply PSNR [47] and SSIM [45] as assessment metrics to evaluate the performance quantitatively. For fairness, we use the same code [48] to calculate the performance metrics. ...
Article
Full-text available
Rain removal is very important for many applications in computer vision, and it is a challenging problem due to its ill‐posed nature, especially for single‐image deraining. In order to remove rain streaks more thoroughly, as well as to retain more details, a progressive dilation dense residual fusion network is proposed. The entire network is designed in a cascade manner with multiple fusion blocks. The fusion block consists of a dilation dense residual block (DDRB) and a dense residual feature fusion block (DRFFB), where DDRB is created for feature extraction and DRFFB is mainly designed for feature fusion operation. Meanwhile, detail compensation memory mechanism (DCMM) is leveraged between each of two cascade modules to retain more background details. Compared with previous state‐of‐the‐art methods, extensive experiments show that the proposed method can achieve better results, in terms of rain streaks removal and background details preservation. Furthermore, the authors’ network also shows its superiority for image noise removal.
... Because the MSE loss function focuses on the misfit of each single grid point, we include two extra metrics to assess the overall quality of the emulation results: peak signal-to-noise ratio (PSNR, Korhonen & You, 2012) and structural similarity index measure (SSIM, Brunet et al., 2011). Both of these metrics are widely used to measure the quality of image and video compression (Huynh-Thu & Ghanbari, 2008;Wang & Bovik, 2009), with formal definitions given in Text S6 of the Supporting Information S1. A high PSNR value indicates low noise level, and vice versa, while the SSIM provides a similarity metric between 0 and 1 where a higher value indicates better emulation quality. ...
Article
Full-text available
Plain Language Summary Piecing together the history of ice sheet change during past glacial cycles is not only important for understanding past sea‐level change but also for predicting how ongoing glacial rebound contributes to future sea‐level change. Traditionally, a physics‐based “sea‐level model” is used to predict the sea‐level change associated with a particular reconstruction of past ice sheet change and compare the results with geological records of past sea level. However, a fundamental limitation of this approach is the need to compute sea‐level change for a large number of plausible ice histories, which is often prohibited by the computational resources required to repeatedly solve the complex physical equations. In this paper, we describe a machine‐learning‐based statistical model that can mimic the behavior of a physics‐based sea‐level model. This statistical model is computationally cheap and we demonstrate that it is able to accurately predict global sea‐level change for a suite of 150 “unseen” ice histories. Our statistical model predicts sea‐level change 100–1,000 times faster than a physics‐based model, making it an ideal tool for investigating and improving our understanding of global ice sheet change.
... We compare our method with existing state-of-the-art deraining methods, including DerainNet [35], SEMI [24], DIDMDN [30], URML [25], RESCAN [26], SPANet [10], PReNet [27], MSPFN [31], CRDNet [11], MPRNet [32], IDLIR [14], Uformer-B [13], IDT [34] and Semi-SwinDerain [36]. Continuing along the trajectory of previous works [34,32,27], we use PSNR [40] and SSIM [41] to evaluate the deraining performance of synthetic images, and use NIQE [42] to evaluate the real dataset. ...
Preprint
Since rain streaks show a variety of shapes and directions, learning the degradation representation is extremely challenging for single image deraining. Existing methods are mainly targeted at designing complicated modules to implicitly learn latent degradation representation from coupled rainy images. This way, it is hard to decouple the content-independent degradation representation due to the lack of explicit constraint, resulting in over- or under-enhancement problems. To tackle this issue, we propose a novel Latent Degradation Representation Constraint Network (LDRCNet) that consists of Direction-Aware Encoder (DAEncoder), UNet Deraining Network, and Multi-Scale Interaction Block (MSIBlock). Specifically, the DAEncoder is proposed to adaptively extract latent degradation representation by using the deformable convolutions to exploit the direction consistency of rain streaks. Next, a constraint loss is introduced to explicitly constraint the degradation representation learning during training. Last, we propose an MSIBlock to fuse with the learned degradation representation and decoder features of the deraining network for adaptive information interaction, which enables the deraining network to remove various complicated rainy patterns and reconstruct image details. Experimental results on synthetic and real datasets demonstrate that our method achieves new state-of-the-art performance.
... In this study, Peak Signal-to-Noise Ratio (PSNR [11]) and Structural SIMilarity (SSIM [12]) are primarily employed to evaluate image reconstruction quality. PSNR is a widely used metric for image reconstruction evaluation, which measures pixel loss between two images. ...
Article
Full-text available
Deep convolutional neural networks have significantly enhanced the performance of single image super-resolution in recent years. However, the majority of the proposed networks are single-channel, making it challenging to fully exploit the advantages of neural networks in feature extraction. This paper proposes a Multi-attention Fusion Recurrent Network (MFRN), which is a multiplexing architecture-based network. Firstly, the algorithm reuses the feature extraction part to construct the recurrent network. This technology reduces the number of network parameters, accelerates training, and captures rich features simultaneously. Secondly, a multiplexing-based structure is employed to obtain deep information features, which alleviates the issue of feature loss during transmission. Thirdly, an attention fusion mechanism is incorporated into the neural network to fuse channel attention and pixel attention information. This fusion mechanism effectively enhances the expressive power of each layer of the neural network. Compared with other algorithms, our MFRN not only exhibits superior visual performance but also achieves favorable results in objective evaluations. It generates images with sharper structure and texture details and achieves higher scores in quantitative tests such as image quality assessment.
... Figure 2b depicts several frames. We used the peak signal-to-noise ratio (PSNR) [39] and the structural similarity (SSIM) [40] as quantitative measures of performance. And The visual results of the reconstructed images are measured for qualitative evaluation. ...
Article
Full-text available
Dynamic Magnetic Resonance Imaging (DMRI) reconstruction is a challenging theme in image processing. A variety of dimensionality reduction methods using vectorization have been proposed. However, most of them gave rise to a loss of spatial and temporal information. To deal with this problem, this article develops a DMRI reconstruction method in a nonlocal framework by integrating the nonlocal sparse tensor with low-rank tensor regularization. The sparsity constraint employs the Tucker decomposition tensor sparse representation, and the t-product-based tensor nuclear norm is used to set the low-rank constraint. Both constraints are handled in a nonlocal framework, which can take advantage of data redundancy in DMRI. Furthermore, the nonlocal sparse tensor representation we proposed constructs a tensor dictionary in the spatio-temporal dimension, making sparsity more efficient. Consequently, our method can better exploit the multi-dimensional coherence of DMRI data due to its sparsity and lowrankness and the fact that it uses a different tensor decomposition-based method. The Alternating Direction Method of Multipliers (ADMM) has been used for optimization. Experimental results show that the performance of the proposed method is superior to several conventional methods.
... (1) Peak signal-to-noise ratio (PSNR) [53]: a measurable standard for assessing the quality of image generation, the unit is dB, the higher of PSNR, the less visual distortion there will be, the calculation formula is expressed as (14). ...
Article
Full-text available
Images are easily polluted by noise in the process of acquisition and transmission, which will affect people's understanding and utilization of knowledge and information in images. Therefore, image denoising, as a classic problem, has received extensive attention from researchers. At present, many image denoising methods based on deep learning have been proposed and achieved good performance. However, most existing methods are insufficient in acquiring and utilizing crucial information in the image when removing noise under complex image denoising tasks such as blind denoising and real-world denoising, resulting in the loss of fine details in the reconstructed image. To overcome this shortcoming, in this paper, we propose a novel image denoising algorithm combining attention mechanism and residual UNet network, named Att-ResUNet. Specifically, we propose a novel UNet-based image denoising framework, which employs residual enhancement blocks and skip connections to form global–local residuals, which can fuse multi-scale global context and local features to more thoroughly capture and remove hidden noise in the image. A channel attention mechanism is introduced, which can better focus on the crucial information in the image and improve the denoising performance. In addition, we use adaptive average pooling for down-sampling, which can preserve more image structure information, reduce the loss of edge details, and adopt a residual learning strategy to enhance the learning and expressive capabilities of the denoising model. Extensive experiments on several publicly available standard datasets demonstrate the superiority of our method over 15 state-of-the-art methods and achieve excellent denoising performance. Compared with mainstream methods, our method outperforms current state-of-the-art methods by up to 0.76 dB and 1.10 dB on PSNR evaluation metrics on BSD68 and Set12 datasets, respectively. Notably, our method achieves an average PSNR value of 37.88 dB on the CC dataset in real-world denoising experiments, a significant improvement of 2.14 dB over the most advanced methods.
... We also leverage the work done in multi-resolution encoding, which have shown the ability to significantly improve reconstructions by encoding data as a multiresolution subset of high-frequency embeddings, as measured by commonly-used reconstruction metrics, Learned Perceptual Image Patch Similarity (LPIPS) [58], Structural Similarity (SSIM) [5], and peak Signalto-noise ratio (PSNR) [16]. ...
Preprint
Full-text available
We present a novel neural radiance model that is trainable in a self-supervised manner for novel-view synthesis of dynamic unstructured scenes. Our end-to-end trainable algorithm learns highly complex, real-world static scenes within seconds and dynamic scenes with both rigid and non-rigid motion within minutes. By differentiating between static and motion-centric pixels, we create high-quality representations from a sparse set of images. We perform extensive qualitative and quantitative evaluation on existing benchmarks and set the state-of-the-art on performance measures on the challenging NVIDIA Dynamic Scenes Dataset. Additionally, we evaluate our model performance on challenging real-world datasets such as Cholec80 and SurgicalActions160.
... Meanwhile, for different feature descriptors, we use the MT to measure their computational efficiency. At last, the traditional index Peak Signal-to-Noise Ratio (PSNR) [44] is used as a measurement of the alignment quality. ...
Article
Full-text available
Deep‐sea image is of great significance for exploring seabed resources. However, the information of a single image is limited. Besides, deep‐sea image with low contrast and colour distortion further restricts useful feature extraction. To address the issues above, this paper presents a multi‐channel fusion and accelerated‐KAZE (AKAZE) feature detection algorithm for deep‐sea image stitching. First, the authors restore deep‐sea image in LAB colour space and RGB colour space, respectively; in LAB space, the authors use homomorphic filtering in L colour channel, and in RGB space, the authors adopt multi‐scale Retinex with chromaticity preservation algorithm to adjust the colour information. Then, the authors blend two pre‐processed images with dark channel prior weighted coefficient. After that, the authors detect feature points with the AKAZE algorithm and obtain feature descriptors with Boosted Efficient Binary Local Image Descriptor. Finally, the authors match the feature points and warp deep‐sea images to obtain the stitched image. Experimental results demonstrate that the authors’ method generates high‐quality stitched image with minimized seam. Compared with state‐of‐the‐art algorithms, the proposed method has better quantitative evaluation, visual stitching results, and robustness.
... The evaluation indicators used in this paper include SSIM [60], PSNR [61], LPIPS [62]. Among them, SSIM, PSNR and LPIPS are used to evaluate the image quality. ...
Article
Full-text available
INTRODUCTION: Video inpainting is a very important task in computer vision, and it’s a key component of various practical applications. It also plays an important role in video occlusion removal, traffic monitoring and old movie restoration technology. Video inpainting is to obtain reasonable content from the video sequence to fill the missing region, and maintain time continuity and spatial consistency.OBJECTIVES: In previous studies, due to the complexity of the scene of video inpainting, there are often cases of fast motion of objects in the video or motion of background objects, which will lead to optical flow failure. So the current video inpainting algorithm hasn’t met the requirements of practical applications. In order to avoid the problem of optical flow failure, this paper proposes a transformer-guided video inpainting model based on local Spatial-temporal joint.METHODS: First, considering the rich Spatial-temporal relationship between local flows, a Local Spatial-Temporal Joint Network (LSTN) including encoder, decoder and transformer module is designed to roughly inpaint the local corrupted frames, and the Deep Flow Network is used to calculate the local bidirectional corrupted flows. Then, the local corrupted optical flow map is input into the Local Flow Completion Network (LFCN) with pseudo 3D convolution and attention mechanism to obtain a complete set of bidirectional local optical flow maps. Finally, the roughly inpainted local frame and the complete bidirectional local optical flow map are sent to the Spatial-temporal transformer and the inpainted video frame is output.RESULTS: Experiments show that the algorithm achieves high quality results in the video target removal task, and has a certain improvement in indicators compared with advanced technologies.CONCLUSION: Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint can obtain high-quality optical flow information and inpainted result video.
... The peak signal-to-noise ratio (PSNR) [41] and SSIM, two commonly applied indicators to assess image quality, can quantitatively analyze the reconstruction results and evaluate the model's capability to retain details during image reconstruction objectively and accurately. SSIM was discussed in Section 3.1.5, ...
Article
Full-text available
Defect detection is crucial in quality control for fabric production. Deep-learning-based unsupervised reconstruction methods have been recognized universally to address the scarcity of fabric defect samples, high costs of labeling, and insufficient prior knowledge. However, these methods are subject to several weaknesses in reconstructing defect images into defect-free images with high quality, like image blurring, defect residue, and texture inconsistency, resulting in false detection and missed detection. Therefore, this article proposes an unsupervised detection method for fabric surface defects oriented to the timestep adaptive diffusion model. Firstly, the Simplex Noise–Denoising Diffusion Probabilistic Model (SN-DDPM) is constructed to recursively optimize the distribution of the posterior latent vector, thus gradually approaching the probability distribution of surface features of the defect-free samples through multiple iterative diffusions. Meanwhile, the timestep adaptive module is utilized to dynamically adjust the optimal timestep, enabling the model to flexibly adapt to different data distributions. During the detection, the SN-DDPM is employed to reconstruct the defect images into defect-free images, and image differentiation, frequency-tuned salient detection (FTSD), and threshold binarization are utilized to segment the defects. The results reveal that compared with the other seven unsupervised detection methods, the proposed method exhibits higher F1 and IoU values, which are increased by at least 5.42% and 7.61%, respectively, demonstrating that the proposed method is effective and accurate.
... An Image Quality Assessment (IQA) of the image before and after the attack can measure the degradation of the original image from a perspective more in line with human visual awareness. In IQA-related studies, metrics such as Structural Similarity (SSIM [69]) and Peak Signal to Noise Ratio (PSNR [70]) can measure image similarity based on low-dimensional features such as image structure information and pixel statistics; while Zhang et al. pointed out that human judgments of image similarity rely on higher-order image structure and context. To comprehensively measure the feature similarity of the adversarial examples, AMS consists of two sub-metrics, Average Deep Metrics Similarity (ADMS) and Average Low-level Metrics Similarity (ALMS). ...
Article
Full-text available
The vulnerability of deep-learning-based image classification models to erroneous conclusions in the presence of small perturbations crafted by attackers has prompted attention to the question of the models’ robustness level. However, the question of how to comprehensively and fairly measure the adversarial robustness of models with different structures and defenses as well as the performance of different attack methods has never been accurately answered. In this work, we present the design, implementation, and evaluation of Canary, a platform that aims to answer this question. Canary uses a common scoring framework that includes 4 dimensions with 26 (sub)metrics for evaluation. First, Canary generates and selects valid adversarial examples and collects metrics data through a series of tests. Then it uses a two-way evaluation strategy to guide the data organization and finally integrates all the data to give the scores for model robustness and attack effectiveness. In this process, we use Item Response Theory (IRT) for the first time to ensure that all the metrics can be fairly calculated into a score that can visually measure the capability. In order to fully demonstrate the effectiveness of Canary, we conducted large-scale testing of 15 representative models trained on the ImageNet dataset using 12 white-box attacks and 12 black-box attacks and came up with a series of in-depth and interesting findings. This further illustrates the capabilities and strengths of Canary as a benchmarking platform. Our paper provides an open-source framework for model robustness evaluation, allowing researchers to perform comprehensive and rapid evaluations of models or attack/defense algorithms, thus inspiring further improvements and greatly benefiting future work.
... The ideal evaluation metric would be one that measures the similarity between the generated sketches and the initial sketches. Unfortunately, none of the available evaluation metrics [20,67,73] for measuring image similarity can be applied directly to sketches. To address this limitation, we conduct a user study. ...
Preprint
Full-text available
Artificial Intelligence Generated Content (AIGC) has shown remarkable progress in generating realistic images. However, in this paper, we take a step "backward" and address AIGC for the most rudimentary visual modality of human sketches. Our objective is on the creative nature of sketches, and that creative sketching should take the form of an interactive process. We further enable text to drive the sketch ideation process, allowing creativity to be freely defined, while simultaneously tackling the challenge of "I can't sketch". We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images. Our proposed approach, referred to as SketchDreamer, integrates a differentiable rasteriser of Bezier curves that optimises an initial input to distil abstract semantic knowledge from a pretrained diffusion model. We utilise Score Distillation Sampling to learn a sketch that aligns with a given caption, which importantly enable both text and sketch to interact with the ideation process. Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard by expanding the text prompt while making minor adjustments to the sketch input. Through this work, we hope to aspire the way we create visual content, democratise the creative process, and inspire further research in enhancing human creativity in AIGC. The code is available at \url{https://github.com/WinKawaks/SketchDreamer}.
... In this article, to quantitatively measure the performance of different denoising approaches, we introduce the peak signalto-noise ratio (PSNR) [62], structural similarity (SSIM) [63], and normalized root-mean-square error (NRMSE) [64] as evaluation indicators of synthetic data example. They are defined as follows. ...
Article
Full-text available
Random noise attenuation is an essential procedure of seismic data processing, which is crucial to improve the signal-to-noise ratio (SNR) of seismic data. Recently, deep learning (DL) has emerged as a promising tool for seismic data denoising. Although the DL-based method has excellent learning and representation capabilities, it lacks the interpretability of the traditional hand-crafted denoisers. Furthermore, supervised learning involved in most of the previous work is usually not feasible to construct a great amount of noisy/noise-free training data pairs for real applications. We develop a total variation (TV) regularized self-supervised Bayesian deep learning model, dubbed as TVRBNN, which combines the advantages of Bayesian deep neural network (BNN) and TV regularization techniques for seismic random noise suppression. The significant characteristic of the proposed model is that it does not rely on the ground-truth seismic data as training labels and solely utilizes the observed noisy data to train TVRBNN. Synthetic and field experiments are implemented to verify the effectiveness of the TVRBNN model in seismic data denoising. Compared with the classical non-learning denoising approaches and the state-of-the-art self-supervised DL model, TVRBNN can effectively enhance the lateral continuity of seismic events and preserve the geological structure information while effectively removing random noise.
... Thus, it is essential to select appropriate evaluation indexes for the reconstructed image. To quantitatively evaluate the reconstructed image and the original image, some commonly used methods, including the structural similarity (SSIM) [27] and peak signal-to-noise ratio (PSNR) [28], are utilized. ...
Article
Full-text available
Image super-resolution (SR) reconstruction technology can improve the quality of low-resolution (LR) images. There are many available deep learning networks different from traditional machine learning algorithms. However, these networks are usually prone to poor performance on complex computation, vanishing gradients, and loss of useful information. In this work, we propose a sub-pixel convolutional neural network (SPCNN) for image SR reconstruction. First, to reduce the strong correlation, the RGB mode was translated into YCbCr mode, and the Y channel data was chosen as the input LR image. Meanwhile, the LR image was chosen as the network input to reduce computation instead of the interpolation reconstructed image as used in the super-resolution convolutional neural network (SRCNN). Then, two convolution layers were built to obtain more features, and four non-linear mapping layers were used to achieve different level features. Furthermore, the residual network was introduced to transfer the feature information from the lower layer to the higher layer to avoid the gradient explosion or vanishing gradient phenomenon. Finally, the sub-pixel convolution layer based on up-sampling was designed to reduce the reconstruction time. Experiments on three different data sets proved that the proposed SPCNN performs superiorly to the Bicubic, sparsity constraint super-resolution (SCSR), anchored neighborhood regression (ANR), and SRCNN methods on reconstruction precision and time consumption.
... As neural networks become deeper, even small changes to individual parameters can have a significant impact on the final output, rendering the traditional image watermarking inapplicable. Similarly, the Peak Signal-to-Noise Ratio (PSNR) [11] commonly used in the image field to indicate the similarity between images is not a incompatible indicator of model variation in neural networks. Moreover, for white-box watermarks, we often need to replace the least significant bits (LSBs), but the watermarking that have little impact on small models can lead to a significant performance decrease when placed on deep neural models. ...
Preprint
Artificial Intelligence (AI) has found wide application, but also poses risks due to unintentional or malicious tampering during deployment. Regular checks are therefore necessary to detect and prevent such risks. Fragile watermarking is a technique used to identify tampering in AI models. However, previous methods have faced challenges including risks of omission, additional information transmission, and inability to locate tampering precisely. In this paper, we propose a method for detecting tampered parameters and bits, which can be used to detect, locate, and restore parameters that have been tampered with. We also propose an adaptive embedding method that maximizes information capacity while maintaining model accuracy. Our approach was tested on multiple neural networks subjected to attacks that modified weight parameters, and our results demonstrate that our method achieved great recovery performance when the modification rate was below 20%. Furthermore, for models where watermarking significantly affected accuracy, we utilized an adaptive bit technique to recover more than 15% of the accuracy loss of the model.
... To evaluate the proposed method, peak signal-to-noise ratio (PSNR) (Huynh-Thu and Ghanbari, 2008) and structural similarity index (SSIM) (Horé and Ziou, 2010;Zhou et al., 2004) are used as quantitative measurements for the XCAT phantom data. The PSNR is an expression for the ratio between the (denoised) low-dose CT image and the corresponding full-dose CT image as follows, = 10 log 2: 8 ...
Article
Full-text available
Electrocardiogram (ECG)-gated multi-phase computed tomography angiography (MP-CTA) is fre-quently used for diagnosis of coronary artery disease. Radiation dose may become a potential concern as the scan needs to cover a wide range of cardiac phases during a heart cycle. A common method to reduce radiation is to limit the full dose acquisition to a predefined range of phases while reducing the radiation dose for the rest. Our goal in this study is to develop a spatiotemporal deep learning method to enhance the quality of low-dose CTA images at phases acquired at reduced radiation dose. Recently, we demonstrated that a deep learning method, Cycle-Consistent generative adversarial networks (CycleGAN), could effectively denoise low-dose CT images through spatial image translation without la-beled image pairs in both low-dose and full-dose image domains. As CycleGAN does not utilize the temporal information in its denoising mechanism, we propose to use RecycleGAN, which could trans-late a series of images ordered in time from the low-dose domain to the full-dose domain through an additional recurrent network. To evaluate RecycleGAN, we use the XCAT phantom program, a highly realistic simulation tool based on real patient data, to generate MP-CTA image sequences for 18 patients (14 for training, 2 for validation and 2 for test). Our simulation results show that RecycleGAN can achieve better denoising performance than CycleGAN based on both visual inspection and quantitative metrics. We further demonstrate the superior denoising performance of RecycleGAN using clinical MP-CTA images from 50 patients.
... A) Peak signal-to-noise ratio (PSNR) [58]: a measurable standard for assessing the quality of image generation, the unit is dB, the higher of PSNR, the less visual distortion there will be, the calculation formula is expressed as (14). B) Structural Similarity (SSIM) [59]: a measure of similarity between two images, based on the comparison of three angles between labels and generated results: brightness, contrast, and structure. ...
Article
Full-text available
Deep convolutional neural networks (DCNN) have attracted considerable interest in image denoising because of their excellent learning capacity. However, most of the existing methods cannot fully extract and utilize the fine features during denoising, resulting in insufficient detailed information extracted and limited model expression ability, especially in complex denoising tasks. Inspired by the above challenges, in this paper, a feature-enhanced multi-scale residual network (FEMRNet) is proposed, mainly including an enhanced feature extraction block (EFEB), a multi-scale residual backbone (MSRB), a detail information recovery block (DIRB) and a merge reconstruction block (MRB). Specifically, the EFEB can increase the receptive field through dilated convolution with different expansion factors, and multi-scale convolution can further enhance the feature. The MSRB integrates global and local feature information through residual denoising blocks and skip connections to enhance the inferencing ability of denoising models. The DIRB is used to finely extract the information in the image, and combine the timing information by convGRU to restore the image details. Finally, MRB is designed to construct a clean image by subtracting the fused noise mapping obtained from MSRB and DIRB with a given noisy image. Additionally, extensive experiments are implemented on commonly-used denoising benchmarks. Comparison experiments with state-of-the-art methods and ablation experiments show that our method achieves promising performance in denoising tasks.
Preprint
Full-text available
Low-light images often suffer from noise and color distortion. Object detection, semantic segmentation, instance segmentation, and other tasks are challenging when working with low-light images because of image noise and chromatic aberration. We also found that the conventional Retinex theory loses information in adjusting the image for low-light tasks. In response to the aforementioned problem, this paper proposes an algorithm for low illumination enhancement. The proposed method, KinD-LCE, uses a light curve estimation module to enhance the illumination map in the Retinex decomposed image, improving the overall image brightness. An illumination map and reflection map fusion module were also proposed to restore the image details and reduce detail loss. Additionally, a total variation loss function was applied to eliminate noise. The proposed method was trained using the GladNet dataset and tested with the Low-Light dataset. The ExDark dataset was used for validation in downstream tasks. The benchmark experiments demonstrated that the proposed algorithm achieved PSNR (19.7216) and SSIM (0.8213) values, which are close to state-of-the-art results.
Article
BACKGROUND: An effective method for achieving low-dose CT is to keep the number of projection angles constant while reducing radiation dose at each angle. However, this leads to high-intensity noise in the reconstructed image, adversely affecting subsequent image processing, analysis, and diagnosis. OBJECTIVE: This paper proposes a novel Channel Graph Perception based U-shaped Transformer (CGP-Uformer) network, aiming to achieve high-performance denoising of low-dose CT images. METHODS: The network consists of convolutional feed-forward Transformer (ConvF-Transformer) blocks, a channel graph perception block (CGPB), and spatial cross-attention (SC-Attention) blocks. The ConvF-Transformer blocks enhance the ability of feature representation and information transmission through the CNN-based feed-forward network. The CGPB introduces Graph Convolutional Network (GCN) for Channel-to-Channel feature extraction, promoting the propagation of information across distinct channels and enabling inter-channel information interchange. The SC-Attention blocks reduce the semantic difference in feature fusion between the encoder and decoder by computing spatial cross-attention. RESULTS: By applying CGP-Uformer to process the 2016 NIH AAPM-Mayo LDCT challenge dataset, experiments show that the peak signal-to-noise ratio value is 35.56 and the structural similarity value is 0.9221. CONCLUSIONS: Compared to the other four representative denoising networks currently, this new network demonstrates superior denoising performance and better preservation of image details.
Article
Full-text available
This research presents a method for eliminating salt and pepper noise using a deep learning system. The method consists of two steps. In the first step, deep learning systems are trained to identify the noise densities. The second step involves improving the AWMF method (Adaptive Weighted Median Filter) to remove the noise. The deep learning system is adapted to define the sub-window size, which is used in the restoring process. The experimental results demonstrate that the improved AWMF method outperforms the state-of-the-art method at high noise densities and requires less processing time.
Article
Full-text available
Random fluctuations are inescapable feature in biological systems, but appropriate intensity of randomness can effectively facilitate information transfer and memory encoding within the nervous system. In the study, a modified spiking neuron–astrocyte network model with excitatory–inhibitory balance and synaptic plasticity is established. This model considers external input noise, and allows investigating the effects of intrinsic random fluctuations on working memory tasks. It is found that the astrocyte network, acting as a low-pass filter, reduces the noise component of the total input currents and improves the recovered images. The memory performance is enhanced by selecting appropriate intensity of random fluctuations, while excessive intensity can inhibit signal transmission of network. As the intensity of random fluctuations gradually increases, there exists a maximum value of the working memory performance. The cued recall of the network markedly decreases excessive input noise relative to test images. Meanwhile, a greater contrast effect is observed as the external input noise increases. In addition, synaptic plasticity reduces the firing rates and firing peaks of neurons, thus stabilizing the working memory activity during the test. The outcomes of this study may provide some inspirations for comprehending the role of random fluctuations in working memory mechanisms and neural information processing within the cerebral cortex.
Article
Image retrieval, empowered by deep metric learning, is undoubtedly a building block in today’s media-sharing practices, but it also poses a severe risk of digging user privacy via retrieval. State-of-the-art countermeasures are built on adversarial learning, which would spoil the image-sharing mood with significant latency. To relieve the cumbersome experience of such data-centric approaches, we propose a plug-and-play privacy-preserving design (MIP) against image retrieval violations by exploring the rule-based triggering characteristics of model backdoors. The basic idea is to inject a privacy-preserving backdoor into the global retrieval model via backdoor learning, thus preventing shared images with such triggers from being searched. At its core, two types of triplet loss functions are invented, namely, imperceptible loss for normal retrieval performance and privacy-sensitive loss for disturbing retrieval with deliberate privacy backdoor injection. Extensive experiments on four widely used, realistic datasets showcase that MIP provides an outstanding privacy-preserving (backdoor) success rate, e.g., the poisoned retrieval mAP could be reduced to 0.33% (98.12%↓) in CUB-200, 0.04% (99.84%↓) in In-Shop, 0.64% (99.59%↓) in CARS196 and 0.01% (99.98%↓) in SOP, respectively, while maintaining similar normal retrieval performance (average 0.02%↓); provides a superior efficiency (7 orders of latency reduction) than the baselines. Besides, as a model-centric solution, MIP yields imperceptible visual changes and is demonstrated to resist potential black-box defenses (e.g., image filtering) and white-box defenses (e.g., fine-pruning). The code and data will be made available at https://github.com/lqsunshine/MIP.
Article
Orthopedic spine disease is one of the most common diseases in the clinic. The diagnosis of spinal orthopedic injury is an important basis for the treatment of spinal orthopedic diseases. Due to the complexity of the spine structure, doctors usually need to rely on orthopedic CT image data for accurate diagnosis. In some cases, such as poor areas or in emergency situations, it is difficult for doctors to make accurate diagnoses using only 2D x-ray images due to lack of 3D imaging equipment or time crunch. Therefore, an approach based on 2D x-ray images is needed to solve this problem. In this paper, a novel 3D spine reconstruction technique based on 2D orthogonal x-ray images (3DSRNet) is designed. 3DSRNet uses a generative adversarial network architecture and novel modules to make 3D spine reconstruction more accurate and efficient. Spine reconstruction CNN-transformer framework (SRCT) is employed to effectively integrate local bone surface information and long-range relation spinal structure information. Spine reconstruction texture framework (SRTE) is used to extract spine texture features to enhance the effect of pixel-level reconstruction. Experiments show that 3DSRNet achieves excellent 3D spine reconstruction results on multiple metrics including PSNR (45.4666 dB), SSIM (0.8850), CS (0.7662), MAE (23.6696), MSE (9016.1044), and LPIPS (0.0768).
Article
Face recognition is an essential technology in intelligent transportation and security within smart cities. Nevertheless, face images taken in nighttime urban environments often suffer from low brightness, small sizes, and low resolution, which pose significant challenges for accurate face feature recognition. To address this issue, we propose the Low-light Small-target Face Enhancement (LSFE) method, a collaborative learning-based image brightness enhancement approach specifically designed for small-target faces in low-light environments. LSFE employs a multilevel feature stratification module to acquire detailed face image features at different levels, revealing hidden facial image information within the dark. In addition, we design a network combining collaborative learning and self-attention mechanisms, which effectively captures long-distance pixel dependencies in low-brightness face images and enhances their brightness in a stepwise manner. The enhanced feature maps are then fused through a branch fusion module. Experimental results demonstrate that LSFE can more effectively enhance the luminance of small-target face images in low-light scenes while retaining more visual information, compared to other existing methods.
Article
An abstract is not available.
Article
A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented. The performance of the various standards is compared by means of PSNR and subjective testing results. The results indicate that H.264/AVC compliant encoders typically achieve essentially the same reproduction quality as encoders that are compliant with the previous standards while typically requiring 60% or less of the bit rate.