Article

Scope of validity of PSNR in image/video quality assessment

Wiley
Electronics Letters
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Experimental data are presented that clearly demonstrate the scope of application of peak signal-to-noise ratio (PSNR) as a video quality metric. It is shown that as long as the video content and the codec type are not changed, PSNR is a valid quality measure. However, when the content is changed, correlation between subjective quality and PSNR is highly reduced. Hence PSNR cannot be a reliable method for assessing the video quality across different video contents.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Finally, three picture quality assessment measures are used to evaluate segmentation outcomes. These are Feature Peak Signal to Noise Ratio (PSNR) [56], Structural Similarity ...
... The PSNR is the most widely used statistic for evaluating quality of image, with higher values suggesting less distortion. Here's how it's defined [56]: ...
Article
Full-text available
To improve the performance of the original backtracking search algorithm (BSA), this work combines BSA with a centralized population initialization and an elitism-based local escape operator and proposes a new, improved variant of BSA called CiBSA. BSA's performance is improved by centralized population initialization regarding search capability, convergence speed, and capacity to jump out of the local optimum. The suggested algorithm's elitism-based local escape operator method prevents BSA searching slowness and enhances convergence rates and local search efficiency. The performance of the proposed algorithm has been tested on a set of 54 test suites. An extensive comparison with other recent meta-heuristics algorithms (MA) demonstrates that both methodologies substantially assist BSA in boosting the quality of its solutions and speeding up the convergence rate. The proposed CiBSA is then applied to a real-world problem: image segmentation using Kapur's entropy function. The proposed CiBSA is successfully applied to the segmentation of COVID-19 CT images using Kapur's entropy for grey-scale images. Considering the performance matrices Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Feature Similarity Index (FSIM), and Intersection over Union (IoU), the suggested CiBSA outperformed conventional MA algorithms. In COVID-19 CT image segmentation trials, CiBSA has superior segmentation capability and adaptability at varied threshold levels than other approaches. As a result, CiBSA has a lot of potential for enhancing COVID-19 diagnosis accuracy.
... We conduct knowledge distillation experiments using five publicly available datasets: Rain1400 [11] and Test1200 [12] for deraining, Gopro [13] and HIDE [14] for deblurring, and SIDD [15] for denoising. To evaluate restoration performance, we use two full-reference metrics: Peak Signal-to-Noise Ratio (PSNR [16]) in dB and Structural Similarity Index (SSIM [17]). Model complexity is assessed by measuring FLOPs and inference time on each 512 × 512 image. ...
... As shown in Table III, our distilled models exhibit significantly lower computational complexity compared to other models. At the same time, their performance in terms of PSNR [16] and SSIM [17] is considerably better than that of lightweight models, approaching the performance of the more complex MPRNet [1]. ...
Preprint
Model compression through knowledge distillation has seen extensive application in classification and segmentation tasks. However, its potential in image-to-image translation, particularly in image restoration, remains underexplored. To address this gap, we propose a Simultaneous Learning Knowledge Distillation (SLKD) framework tailored for model compression in image restoration tasks. SLKD employs a dual-teacher, single-student architecture with two distinct learning strategies: Degradation Removal Learning (DRL) and Image Reconstruction Learning (IRL), simultaneously. In DRL, the student encoder learns from Teacher A to focus on removing degradation factors, guided by a novel BRISQUE extractor. In IRL, the student decoder learns from Teacher B to reconstruct clean images, with the assistance of a proposed PIQE extractor. These strategies enable the student to learn from degraded and clean images simultaneously, ensuring high-quality compression of image restoration models. Experimental results across five datasets and three tasks demonstrate that SLKD achieves substantial reductions in FLOPs and parameters, exceeding 80\%, while maintaining strong image restoration performance.
... For quantitative analysis of image quality, we employ two full-reference metrics: Peak Signal-to-Noise Ratio (PSNR) [18] in dB, and Structural Similarity Index (SSIM) [19]. To assess model complexity, we measure FLOPs and inference time on each 512×512 image. ...
Preprint
Transformer-based encoder-decoder models have achieved remarkable success in image-to-image transfer tasks, particularly in image restoration. However, their high computational complexity-manifested in elevated FLOPs and parameter counts-limits their application in real-world scenarios. Existing knowledge distillation methods in image restoration typically employ lightweight student models that directly mimic the intermediate features and reconstruction results of the teacher, overlooking the implicit attention relationships between them. To address this, we propose a Soft Knowledge Distillation (SKD) strategy that incorporates a Multi-dimensional Cross-net Attention (MCA) mechanism for compressing image restoration models. This mechanism facilitates interaction between the student and teacher across both channel and spatial dimensions, enabling the student to implicitly learn the attention matrices. Additionally, we employ a Gaussian kernel function to measure the distance between student and teacher features in kernel space, ensuring stable and efficient feature learning. To further enhance the quality of reconstructed images, we replace the commonly used L1 or KL divergence loss with a contrastive learning loss at the image level. Experiments on three tasks-image deraining, deblurring, and denoising-demonstrate that our SKD strategy significantly reduces computational complexity while maintaining strong image restoration capabilities.
... We utilize a set of quantitative evaluation metrics to assess the performance of our method. Specifically, we employ the peak signal-to-noise ratio (PSNR) [56], structural similarity index (SSIM) [57], and spectral angle mapper (SAM) [58]. PSNR is employed to quantify the pixel-by-pixel disparity between the demosaicked and super-resolved spectral images and the ground truth (GT). ...
Article
Full-text available
In spectral imaging, the constraints imposed by hardware often lead to a limited spatial resolution within spectral filter array images. On the other hand, the process of demosaicking is challenging due to intricate filter patterns and a strong spectral cross correlation. Moreover, demosaicking and super resolution are usually approached independently, overlooking the potential advantages of a joint solution. To this end, we use a two-branch framework, namely a pseudo-panchromatic image network and a pre-demosaicking sub-branch coupled with a novel deep residual demosaicking and super resolution module. This holistic approach ensures a more coherent and optimized restoration process, mitigating the risk of error accumulation and preserving image quality throughout the reconstruction pipeline. Our experimental results underscore the efficacy of the proposed network, showcasing an improvement of performance both qualitatively and quantitatively when compared to the sequential combination of state-of-the-art demosaicking and super resolution. With our proposed method, we obtained on the ARAD-1K dataset an average PSNR of 48.02 (dB) for domosaicking only, equivalent to the best method of the state-of-the-art. Moreover, for joint demosaicking and super resolution our model averages 35.26 (dB) and 26.29 (dB), respectively for ×2 and ×4 upscale, outperforming state-of-the-art sequential approach.The codes and datasets are available at https://github.com/HamidFsian/DRDmSR.
... Evaluation Metrics. To verify the effectiveness of our Qffusion on portrait video animation, we use the average Peak signal-to-noise Noise Ratio (PSNR) [24], Structural Similarity Index Measure (SSIM) [52], and Learned perceptual similarity (LPIPS) [61]. Besides, we apply Warp Error [10,16] to measure the temporal consistency of generated videos. ...
Preprint
This paper presents Qffusion, a dual-frame-guided framework for portrait video editing. Specifically, we consider a design principle of ``animation for editing'', and train Qffusion as a general animation framework from two still reference images while we can use it for portrait video editing easily by applying modified start and end frames as references during inference. Leveraging the powerful generative power of Stable Diffusion, we propose a Quadrant-grid Arrangement (QGA) scheme for latent re-arrangement, which arranges the latent codes of two reference images and that of four facial conditions into a four-grid fashion, separately. Then, we fuse features of these two modalities and use self-attention for both appearance and temporal learning, where representations at different times are jointly modeled under QGA. Our Qffusion can achieve stable video editing without additional networks or complex training stages, where only the input format of Stable Diffusion is modified. Further, we propose a Quadrant-grid Propagation (QGP) inference strategy, which enjoys a unique advantage on stable arbitrary-length video generation by processing reference and condition frames recursively. Through extensive experiments, Qffusion consistently outperforms state-of-the-art techniques on portrait video editing.
... Three image quality evaluation indicators are adopted: SSIM [61], MS-SSIM [62], and PSNR [63], to assess the visual quality of the generated face images. Identity preservation is then evaluated by the face recognition (FR) rate, with ResNet-50 [64] and MobileFace [65] serving as the verifiers. ...
Article
Full-text available
Face sketch and photo synthesis is widely applied in industry and information fields, such as entertainment business and heterogeneous face retrieval. The key challenge lies in completing a face transformation with both good visual effects and face identity preservation. However, existing methods are still difficult to obtain a good synthesis due to the large model gap between the two different face domains. Recently, diffusion models have achieved great success in image synthesis, which allows us to extend its application in such a face generation task. Thus, we propose IPDM, which constructs a mapping of latent representation for domain-adaptive face features. The other proposed IDP utilizes auxiliary features to correct the latent features through their directions and supplementary identity information, so that the generation can keep face identity unchanged. The various evaluation results show that our method is superior to state-of-the-art methods in both identity preservation and visual effects.
... All experiments were performed on MATLAB (R2021a); the CPU of the computer was AMD Ryzen 7 5800H with Radeon Graphics (16 CPUs), and the memory was 16 GB. TC−L+S priors MF-TV [29] SPC-TV [30] TNN-TV [31] t-CTV [17] TRPCA−L+S priors LRTV [32] LRTDTV [30] TLRHTV [33] t-CTV For a quantitative comparison, three picture quality indices (PQIs) were employed, including the peak signal-to-noise ratio (PSNR [34]), structural similarity (SSIM [35]), and feature similarity (FSIM [36]). The larger the PSNR, SSIM, and FSIM, the better the performance of the recovery model. ...
Article
Full-text available
Tensor restoration finds applications in various fields, including data science, image processing, and machine learning, where the global low-rank property is a crucial prior. As the convex relaxation to the tensor rank function, the traditional tensor nuclear norm is used by directly adding all the singular values of a tensor. Considering the variations among singular values, nonconvex regularizations have been proposed to approximate the tensor rank function more effectively, leading to improved recovery performance. In addition, the local characteristics of the tensor could further improve detail recovery. Currently, the gradient tensor is explored to effectively capture the smoothness property across tensor dimensions. However, previous studies considered the gradient tensor only within the context of the nuclear norm. In order to better simultaneously represent the global low-rank property and local smoothness of tensors, we propose a novel regularization, the Tensor-Correlated Total Variation (TCTV), based on the nonconvex Geman norm and total variation. Specifically, the proposed method minimizes the nonconvex Geman norm on singular values of the gradient tensor. It enhances the recovery performance of a low-rank tensor by simultaneously reducing estimation bias, improving approximation accuracy, preserving fine-grained structural details and maintaining good computational efficiency compared to traditional convex regularizations. Based on the proposed TCTV regularization, we develop TC-TCTV and TRPCA-TCTV models to solve completion and denoising problems, respectively. Subsequently, the proposed models are solved by the Alternating Direction Method of Multipliers (ADMM), and the complexity and convergence of the algorithm are analyzed. Extensive numerical results on multiple datasets validate the superior recovery performance of our method, even in extreme conditions with high missing rates.
... Evaluation Metrics: We use PSNR [30] and SSIM [31] as the evaluation metrics. ...
Article
Full-text available
Image deraining holds great potential for enhancing the vision of autonomous vehicles in rainy conditions, contributing to safer driving. Previous works have primarily focused on employing a single network architecture to generate derained images. However, they often fail to fully exploit the rich prior knowledge embedded in the scenes. Particularly, most methods overlook the depth information that can provide valuable context about scene geometry and guide more robust deraining. In this work, we introduce a novel learning framework that integrates multiple networks: an AutoEncoder for deraining, an auxiliary network to incorporate depth information, and two supervision networks to enforce feature consistency between rainy and clear scenes. This multi-network design enables our model to effectively capture the underlying scene structure, producing clearer and more accurately derained images, leading to improved object detection for autonomous vehicles. Extensive experiments on three widely used datasets demonstrated the effectiveness of our proposed method.
... Undetectability in video steganography can be assessed from two perspectives: resistance to automated steganoanalysis and imperceptibility to the human eye (Huynh-Thu & Ghanbari, 2008;Wang et al., 2004). While much of the existing research has focused on developing methods that evade automated detection systems (Hossain et al., 2024), this study emphasizes human imperceptibility, particularly in scenarios where analog distortions or transformations render automated systems less effective (Anderson & Petitcolas, 1998;Katzenbeisser & Petitcolas, 2000). ...
Article
Full-text available
This paper presents a comprehensive computational system designed to evaluate the undetectability of video steganography from human perspective. The system assesses the perceptibility of steganographic modifications to the human eye while simultaneously determining the minimum encoding level required for successful automated decoding of hidden messages. The proposed architecture comprises four subsystems: steganogram database preparation, human evaluation, automated decoding, and comparative analysis. The system was tested using example steganographic techniques applied to a dataset of video files. Experimental results revealed the thresholds of human-level undetectability and automated decoding for each technique, enabling the identification of critical differences between human and algorithmic detection capabilities. This research contributes to the field of steganography by offering a novel framework for evaluating the trade-offs between human perception and automated decoding in video-based information hiding. The system serves as a tool for advancing the development of more secure and reliable video steganographic techniques.
... Evaluation metrics We use PSNR [30] and SSIM [31] as the evaluation metrics. ...
Preprint
Image deraining holds great potential for enhancing the vision of autonomous vehicles in rainy conditions, contributing to safer driving. Previous works have primarily focused on employing a single network architecture to generate derained images. However, they often fail to fully exploit the rich prior knowledge embedded in the scenes. Particularly, most methods overlook the depth information that can provide valuable context about scene geometry and guide more robust deraining. In this work, we introduce a novel learning framework that integrates multiple networks: an AutoEncoder for deraining, an auxiliary network to incorporate depth information, and two supervision networks to enforce feature consistency between rainy and clear scenes. This multi-network design enables our model to effectively capture the underlying scene structure, producing clearer and more accurately derained images, leading to improved object detection for autonomous vehicles. Extensive experiments on three widely-used datasets demonstrated the effectiveness of our proposed method.
... Specifically, CDA is the performance on the clean test set, i.e., the ratio of trigger-free test images that are correctly predicted to their ground-truth labels, and the ASR is the performance on the poisoned test set, i.e., the ratio of poisoned images that are correctly classified as the target attack labels. For invisibility evaluation, we compare clean and poisoned images with PSNR [45]. ...
Article
Full-text available
Backdoor attacks aim to implant hidden backdoors into Deep Neural Networks (DNNs) so that the victim models perform well on clean images, whereas their predictions would be maliciously changed on poisoned images. However, most existing backdoor attacks lack the invisibility and robustness required for real-world applications, especially when it comes to resisting image compression techniques, such as JPEG and WEBP. To address these issues, in this paper, we propose a Backdoor Attack Method based on Trigger Generation (BATG). Specifically, a deep convolutional generative network is utilized as the trigger generation model to generate effective trigger images and an Invertible Neural Network (INN) is utilized as the trigger injection model to embed the generated trigger images into clean images to create poisoned images. Furthermore, a noise layer is used to simulate image compression attacks for adversarial training, enhancing the robustness against real-world image compression. Comprehensive experiments on benchmark datasets demonstrate the effectiveness, invisibility, and robustness of the proposed BATG.
... In recent years, one of the most actively investigated directions in research on steganographic techniques has been the domain of machine learning, including the use of neural networks. One of the latest examples of a novel approach advancing steganographic techniques can be found in the paper [8], which introduces an innovative invisible and • Undetectability: this metric evaluates a steganographic technique's ability to conceal information within a steganogram so that its presence remains imperceptible to both human perception and statistical detection methods [10]. High undetectability signifies that the alterations introduced into the cover medium are minimally noticeable and resistant to detection [11], • Capacity: this metric measures the amount of information that can be embedded within a digital cover medium without causing noticeable degradation in its quality. ...
Article
Full-text available
This study presents a detailed characterization of iterative steganography, a unique class of information-hiding techniques, and proposes a formal mathematical model for their description. A novel quantitative measure, the Incremental Information Function (IIF), is introduced to evaluate the process of information gain in iterative steganographic methods. The IIF offers a comprehensive framework for analyzing the step-by-step process of embedding information into a cover medium, focusing on the cumulative effects of each iteration in the encoding and decoding cycles. The practical application and efficacy of the proposed method are demonstrated using detailed case studies in video steganography. These examples highlight the utility of the IIF in delineating the properties and characteristics of iterative steganographic techniques. The findings reveal that the IIF effectively captures the incremental nature of information embedding and serves as a valuable tool for assessing the robustness and capacity of steganographic systems. This research provides significant insights into the field of information hiding, particularly in the development and evaluation of advanced steganographic methods. The IIF emerges as an innovative and practical analytical tool for researchers, offering a quantitative approach to understanding and optimizing iterative steganographic techniques.
... Following previous works [49]- [51], we employ Peak Signal-to-Noise Ratio (PSNR) [70] and Structural Similarity (SSIM) [71] as our quantitative evaluation metrics. ...
Preprint
Recent efforts on image restoration have focused on developing "all-in-one" models that can handle different degradation types and levels within single model. However, most of mainstream Transformer-based ones confronted with dilemma between model capabilities and computation burdens, since self-attention mechanism quadratically increase in computational complexity with respect to image size, and has inadequacies in capturing long-range dependencies. Most of Mamba-related ones solely scanned feature map in spatial dimension for global modeling, failing to fully utilize information in channel dimension. To address aforementioned problems, this paper has proposed to fully utilize complementary advantages from Mamba and Transformer without sacrificing computation efficiency. Specifically, the selective scanning mechanism of Mamba is employed to focus on spatial modeling, enabling capture long-range spatial dependencies under linear complexity. The self-attention mechanism of Transformer is applied to focus on channel modeling, avoiding high computation burdens that are in quadratic growth with image's spatial dimensions. Moreover, to enrich informative prompts for effective image restoration, multi-dimensional prompt learning modules are proposed to learn prompt-flows from multi-scale encoder/decoder layers, benefiting for revealing underlying characteristic of various degradations from both spatial and channel perspectives, therefore, enhancing the capabilities of "all-in-one" model to solve various restoration tasks. Extensive experiment results on several image restoration benchmark tasks such as image denoising, dehazing, and deraining, have demonstrated that the proposed method can achieve new state-of-the-art performance, compared with many popular mainstream methods. Related source codes and pre-trained parameters will be public on github https://github.com/12138-chr/MTAIR.
... In addition, we also measure the frame consistency [9] by calculating the average CLIP cosine similarity of two consecutive frames. Meanwhile, for conditional video generation, we follow traditional pixel-or local-level video assessment metrics, including Peak Signalto-Noise Ratio (PSNR) [17] and Structural Similarity Index (SSIM) [41], to assess the content and structural quality of the generated video against the ground truth. Furthermore, we include optical flow evaluation metrics, specifically the F1-epe and F1-all scores, to assess the consistency between the optical flow of the generated video and the given optical flow. ...
Preprint
Medical video generation has transformative potential for enhancing surgical understanding and pathology insights through precise and controllable visual representations. However, current models face limitations in controllability and authenticity. To bridge this gap, we propose SurgSora, a motion-controllable surgical video generation framework that uses a single input frame and user-controllable motion cues. SurgSora consists of three key modules: the Dual Semantic Injector (DSI), which extracts object-relevant RGB and depth features from the input frame and integrates them with segmentation cues to capture detailed spatial features of complex anatomical structures; the Decoupled Flow Mapper (DFM), which fuses optical flow with semantic-RGB-D features at multiple scales to enhance temporal understanding and object spatial dynamics; and the Trajectory Controller (TC), which allows users to specify motion directions and estimates sparse optical flow, guiding the video generation process. The fused features are used as conditions for a frozen Stable Diffusion model to produce realistic, temporally coherent surgical videos. Extensive evaluations demonstrate that SurgSora outperforms state-of-the-art methods in controllability and authenticity, showing its potential to advance surgical video generation for medical education, training, and research.
... In [69], authors study the faithfulness of different privacy leakage metrics to human perception. Crowdsourcing revealed that hand-crafted metrics [64,58,87,76] have a weak correlation and contradict with human awareness and similar methods [87,30]. From this point of view, we reconsider the usage of the MSE metric for the evaluation of the defense against feature reconstruction attacks, i.e., the quality of reconstruction. ...
Preprint
Vertical Federated Learning (VFL) aims to enable collaborative training of deep learning models while maintaining privacy protection. However, the VFL procedure still has components that are vulnerable to attacks by malicious parties. In our work, we consider feature reconstruction attacks, a common risk targeting input data compromise. We theoretically claim that feature reconstruction attacks cannot succeed without knowledge of the prior distribution on data. Consequently, we demonstrate that even simple model architecture transformations can significantly impact the protection of input data during VFL. Confirming these findings with experimental results, we show that MLP-based models are resistant to state-of-the-art feature reconstruction attacks.
... To quantitatively evaluate the performance of the model, we use two metrics: PSNR (Peak Signal to Noise Ratio) [15] and SSIM (Structural Similarity Index) [46]. When showing the restoration results, we save them as PNG images for visualization purposes. ...
Article
Full-text available
The restoration of images affected by adverse weather conditions is hindered by two main challenges. The first is the restoration of fine details in severely degraded regions. The second is the interference between different types of degradation data during the model training process, which consequently reduces the restoration performance of the model on individual tasks. In this work, we propose a Transformer-based All-in-one image restoration model, called PDFormer, to alleviate the aforementioned issues. Initially, we designed an effective transformer network to capture the global contextual information in the image and utilize this information to restore the locally severely degraded regions better. Additionally, to alleviate the interference between different types of degraded data, we introduced two specialized modules: the Prompt-Guided Feature Refinement Module (RGRM) and the Degradation Mask Supervised Attention Module (MSAM). The former employs a set of learnable prompt parameters to generate prompt information, which interacts with the degraded feature through cross-attention, enhancing the discriminative ability of different degraded features in the latent space. The latter, under the supervision of the degraded mask prior, assists the model in differentiating between different degradation types and locating the regions and sizes of the degradations. The designs above permit greater flexibility in handling specific degradation scenarios, enabling the adaptive removal of different degradation artifacts to restore fine details in images. Performance evaluation on both synthetic and real data has demonstrated that our method surpasses existing approaches, achieving state-of-the-art (SOTA) performance.
... Specifically, for single-view based generation, models are evaluated using a view rendered at an elevation of 20 • as an input, with the remaining 63 renderings used for evaluation. We utilize a variety of metrics to assess quantitative performance, including PSNR [15], LPIPS [54], CLIP [28], and SSIM [44]. ...
Preprint
Generating high-quality 3D content requires models capable of learning robust distributions of complex scenes and the real-world objects within them. Recent Gaussian-based 3D reconstruction techniques have achieved impressive results in recovering high-fidelity 3D assets from sparse input images by predicting 3D Gaussians in a feed-forward manner. However, these techniques often lack the extensive priors and expressiveness offered by Diffusion Models. On the other hand, 2D Diffusion Models, which have been successfully applied to denoise multiview images, show potential for generating a wide range of photorealistic 3D outputs but still fall short on explicit 3D priors and consistency. In this work, we aim to bridge these two approaches by introducing DSplats, a novel method that directly denoises multiview images using Gaussian Splat-based Reconstructors to produce a diverse array of realistic 3D assets. To harness the extensive priors of 2D Diffusion Models, we incorporate a pretrained Latent Diffusion Model into the reconstructor backbone to predict a set of 3D Gaussians. Additionally, the explicit 3D representation embedded in the denoising network provides a strong inductive bias, ensuring geometrically consistent novel view generation. Our qualitative and quantitative experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction. When evaluated on the Google Scanned Objects dataset, DSplats achieves a PSNR of 20.38, an SSIM of 0.842, and an LPIPS of 0.109.
... The regularization parameters (γ, σ) are varied and each choice results in a different indirectly registered image. Each of these are then matched against the target and the registration is quantitatively assessed using structural similarity (SSIM) [62] and peak signal-to-noise ratio (PSNR) [35] as figure of merits. ...
Preprint
Full-text available
The paper adapts the large deformation diffeomorphic metric mapping framework for image registration to the indirect setting where a template is registered against a target that is given through indirect noisy observations. The registration uses diffeomorphisms that transform the template through a (group) action. These diffeomorphisms are generated by solving a flow equation that is defined by a velocity field with certain regularity. The theoretical analysis includes a proof that indirect image registration has solutions (existence) that are stable and that converge as the data error tends so zero, so it becomes a well-defined regularization method. The paper concludes with examples of indirect image registration in 2D tomography with very sparse and/or highly noisy data.
... Three criteria were utilized to measure performance: (1) MPSNR [24]: Mean of peak signal-to-noise ratio (PSNR) over all bands between clean HSI and recovered HSI. (1)- (2), and 3 in Case (3)- (6). ...
Preprint
Hyperspectral image (HSI) denoising has been attracting much research attention in remote sensing area due to its importance in improving the HSI qualities. The existing HSI denoising methods mainly focus on specific spectral and spatial prior knowledge in HSIs, and share a common underlying assumption that the embedded noise in HSI is independent and identically distributed (i.i.d.). In real scenarios, however, the noise existed in a natural HSI is always with much more complicated non-i.i.d. statistical structures and the under-estimation to this noise complexity often tends to evidently degenerate the robustness of current methods. To alleviate this issue, this paper attempts the first effort to model the HSI noise using a non-i.i.d. mixture of Gaussians (NMoG) noise assumption, which is finely in accordance with the noise characteristics possessed by a natural HSI and thus is capable of adapting various noise shapes encountered in real applications. Then we integrate such noise modeling strategy into the low-rank matrix factorization (LRMF) model and propose a NMoG-LRMF model in the Bayesian framework. A variational Bayes algorithm is designed to infer the posterior of the proposed model. All involved parameters can be recursively updated in closed-form. Compared with the current techniques, the proposed method performs more robust beyond the state-of-the-arts, as substantiated by our experiments implemented on synthetic and real noisy HSIs.
... Reconstructions labelled "CFBP" (Consistent FBP) were computed with FBP on a sinogram pre-filtered by HLSF with an upsampling factor of 2; reconstructions labelled "IFBP" (Interpolation FBP) were computed with FBP on a sinogram upsampled by a factor of 2 by means of 1D cubic spline interpolation along the view direction; otherwise they are simply labelled "FBP". The standard peak-signal-to-noise ratio (PSNR) [19], calculated within the reconstruction circle, is used to score each reconstruction with respect to the corresponding phantom. The "sampling factor" (SF) is defined as the ratio between the number of projections of the conside- red sinogram and the number of projections of an optimally sampled sinogram. ...
Preprint
This work introduces and characterizes a fast parameterless filter based on the Helgason-Ludwig consistency conditions, used to improve the accuracy of analytical reconstructions of tomographic undersampled datasets. The filter, acting in the Radon domain, extrapolates intermediate projections between those existing. The resulting sinogram, doubled in views, is then reconstructed by a standard analytical method. Experiments with simulated data prove that the peak-signal-to-noise ratio of the results computed by filtered backprojection is improved up to 5-6 dB, if the filter is used prior to reconstruction.
... Tables 1, 2, and 3 are presenting the quantitative comparison between JPEG, JPEG2000 (based on 1-level, 2-level, and 3-level decompositions) and proposed method (based on 1-level, 2level, and 3-level decompositions) using MAE for Compression Ratios of 2, 4, and 8. Results show in compression ratios, the MAE values of proposed method are low than the half of JPEG and JPEG2000 methods. The PSNR is an approximation to human perception of reconstruction quality [32]. In Tables 4, 5, and 6 we see that the PSNR values show the performance of the proposed method transcends JPEG and JPEG2000 methods. ...
Preprint
With the development of human communications the usage of Visual Communications has also increased. The advancement of image compression methods is one of the main reasons for the enhancement. This paper first presents main modes of image compression methods such as JPEG and JPEG2000 without mathematical details. Also, the paper describes gradient Haar wavelet transforms in order to construct a preliminary image compression algorithm. Then, a new image compression method is proposed based on the preliminary image compression algorithm that can improve standards of image compression. The new method is compared with original modes of JPEG and JPEG2000 (based on Haar wavelet) by image quality measures such as MAE, PSNAR, and SSIM. The image quality and statistical results confirm that can boost image compression standards. It is suggested that the new method is used in a part or all of an image compression standard.
... The recovered images of TNNR-ADMM are much clear than NSA and SPCP, but still not as good as our method. We measure the recovery performance based on the RSE(L) defined in (4.1) and the PSNR (Peak Signal Noise Ratio) [HTG08] value. Figure 5 plots the RSE and PSNR values on 50 tested images. ...
Preprint
In this paper, we propose a non-convex formulation to recover the authentic structure from the corrupted real data. Typically, the specific structure is assumed to be low rank, which holds for a wide range of data, such as images and videos. Meanwhile, the corruption is assumed to be sparse. In the literature, such a problem is known as Robust Principal Component Analysis (RPCA), which usually recovers the low rank structure by approximating the rank function with a nuclear norm and penalizing the error by an 1\ell_1-norm. Although RPCA is a convex formulation and can be solved effectively, the introduced norms are not tight approximations, which may cause the solution to deviate from the authentic one. Therefore, we consider here a non-convex relaxation, consisting of a Schatten-p norm and an q\ell_q-norm that promote low rank and sparsity respectively. We derive a proximal iteratively reweighted algorithm (PIRA) to solve the problem. Our algorithm is based on an alternating direction method of multipliers, where in each iteration we linearize the underlying objective function that allows us to have a closed form solution. We demonstrate that solutions produced by the linearized approximation always converge and have a tighter approximation than the convex counterpart. Experimental results on benchmarks show encouraging results of our approach.
... We evaluate both Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [20] as they very common in the literature and image quality assessments can be compared with other methods. In Figure 6 it is presented the curves of image quality assessments PSNR and SSIM (in line with proposals [21]) depending on the quantization step ∆. We can see from this figure that the greater is ∆, the worse is the visual comprehension of the images. ...
Preprint
There is a great adventure of watermarking usage in the context of conventional authentication since it does not require additional storage space for supplementary metadata. However JPEG compression, being a conventional method to compress images, leads to exact authentication breaking. We discuss a semi-fragile watermarking system for digital images tolerant to JPEG/JPEG2000 compression. Recently we have published a selective authentication method based on Zernike moments. But unfortunately it has large computational complexity and not sufficiently good detection of small image modifications. In the current paper it is proposed (in contrast to Zernike moments approach) the usage of image finite differences and 3-bit quantization as the main technique. In order to embed a watermark (WM) into the image, some areas of the Haar wavelet transform coefficients are used. Simulation results show a good resistance of this method to JPEG compression with \mbox{\rm CR}\leq 30\% (Compression Ratio), high probability of small image modification recognition, image quality assessments \mbox{\rm PSNR}\geq 40 (Peak signal-to-noise ratio) dB and \mbox{\rm SSIM}\geq 0.98 (Structural Similarity Index Measure) after embedding and lower computation complexity of WM embedding and extraction. All these properties qualify this approach as effective.
... where A and A' are the uncompressed and compressed images, respectively. The PSNR is a common measure in image processing (e.g., Huynh-Thu and Ghanbari, 2008). For the two cases presented herein, we use n = 100 and n = 150, respectively, resulting in PSNR values between 28 and 39 dB. ...
Preprint
A strategy is presented to incorporate prior information from conceptual geological models in probabilistic inversion of geophysical data. The conceptual geological models are represented by multiple-point statistics training images (TIs) featuring the expected lithological units and structural patterns. Information from an ensemble of TI realizations is used in two different ways. First, dominant modes are identified by analysis of the frequency content in the realizations, which drastically reduces the model parameter space in the frequency-amplitude domain. Second, the distributions of global, summary metrics (e.g. model roughness) are used to formulate a prior probability density function. The inverse problem is formulated in a Bayesian framework and the posterior pdf is sampled using Markov chain Monte Carlo simulation. The usefulness and applicability of this method is demonstrated on two case studies in which synthetic crosshole ground-penetrating radar traveltime data are inverted to recover 2-D porosity fields. The use of prior information from TIs significantly enhances the reliability of the posterior models by removing inversion artefacts and improving individual parameter estimates. The proposed methodology reduces the ambiguity inherent in the inversion of high-dimensional parameter spaces, accommodates a wide range of summary statistics and geophysical forward problems.
... The first one is MSE, which is used to evaluate a number of deviations or image variance and it is calculated by taking an average of the square of the errors between processed and reference images [32]. In Eq. 3, M and N correspond to image size, while I 1 (s, t) and I 2 (s, t) correspond to location of image pixel intensities. ...
Preprint
Image blurring artifact is the main challenge to any spatial, denoising filters. This artifact is contributed by the heterogeneous intensities within the given neighborhood or window of fixed size. Selection of most similar intensities (G-Neighbors) helps to adapt the window shape which is of edge-aware nature and subsequently reduce this blurring artifact. The paper presents a memristive circuit design to implement this variable pixel G-Neighbor filter. The memristive circuits exhibits parallel processing capabilities (near real-time) and neuromorphic architectures. The proposed design is demonstrated as simulations of both algorithm (MATLAB) and circuit (SPICE). Circuit design is evaluated for various parameters such as processing time, fabrication area used, and power consumption. Denoising performance is demonstrated using image quality metrics such as peak signal-to-noise ratio (PSNR), mean square error (MSE), and structural similarity index measure (SSIM). Combining adaptive filtering method with mean filter resulted in average improvement of MSE to about 65\% reduction, increase of PSNR and SSIM to nearly 18\% and 12\% correspondingly.
... DEGREE-1 has 10 layers and 64 channels, and DEGREE-2 has 20 layers and 64 channels. The quality of the HR images produced by different SR methods is measured by the Peak Signal-to-Noise Ratio (PSNR) [45] and the perceptual quality metric Structural SIMilarity (SSIM) [46], which are two widely used metrics in image processing. The results of our proposed DEGREE-1 and DEGREE-2 as well as the baselines are given in Table 1. ...
Preprint
In this work, we consider the image super-resolution (SR) problem. The main challenge of image SR is to recover high-frequency details of a low-resolution (LR) image that are important for human perception. To address this essentially ill-posed problem, we introduce a Deep Edge Guided REcurrent rEsidual~(DEGREE) network to progressively recover the high-frequency details. Different from most of existing methods that aim at predicting high-resolution (HR) images directly, DEGREE investigates an alternative route to recover the difference between a pair of LR and HR images by recurrent residual learning. DEGREE further augments the SR process with edge-preserving capability, namely the LR image and its edge map can jointly infer the sharp edge details of the HR image during the recurrent recovery process. To speed up its training convergence rate, by-pass connections across multiple layers of DEGREE are constructed. In addition, we offer an understanding on DEGREE from the view-point of sub-band frequency decomposition on image signal and experimentally demonstrate how DEGREE can recover different frequency bands separately. Extensive experiments on three benchmark datasets clearly demonstrate the superiority of DEGREE over well-established baselines and DEGREE also provides new state-of-the-arts on these datasets.
... To make a more objective and systematic assessment of the results, we turned to several metrics that are commonly used for image comparison, namely: structural similarity (SSIM) [47], which measures the perceived quality of an image with respect to another reference image; peak signal-to-noise ratio (PSNR) [48], which is frequently used to measure the quality of image compression; and normalized root mean square error (NRMSE), where the normalization factor was the Euclidean norm of the original image. • SSIM has a maximum value of 1.0 (only reached when the two images are equal). ...
Preprint
Deep learning is having a profound impact in many fields, especially those that involve some form of image processing. Deep neural networks excel in turning an input image into a set of high-level features. On the other hand, tomography deals with the inverse problem of recreating an image from a number of projections. In plasma diagnostics, tomography aims at reconstructing the cross-section of the plasma from radiation measurements. This reconstruction can be computed with neural networks. However, previous attempts have focused on learning a parametric model of the plasma profile. In this work, we use a deep neural network to produce a full, pixel-by-pixel reconstruction of the plasma profile. For this purpose, we use the overview bolometer system at JET, and we introduce an up-convolutional network that has been trained and tested on a large set of sample tomograms. We show that this network is able to reproduce existing reconstructions with a high level of accuracy, as measured by several metrics.
... Commonly used metrics are MSE, PSNR and SSIM. MSE is basically a weighted function of deviations in images, or square difference between compared images [10]. In Eq. 1, M and N stands for image size, while I 1 (s, t) and I 2 (s, t) for locations. ...
Preprint
The quality assessment of edges in an image is an important topic as it helps to benchmark the performance of edge detectors, and edge-aware filters that are used in a wide range of image processing tasks. The most popular image quality metrics such as Mean squared error (MSE), Peak signal-to-noise ratio (PSNR) and Structural similarity (SSIM) metrics for assessing and justifying the quality of edges. However, they do not address the structural and functional accuracy of edges in images with a wide range of natural variabilities. In this review, we provide an overview of all the most relevant performance metrics that can be used to benchmark the quality performance of edges in images. We identify four major groups of metrics and also provide a critical insight into the evaluation protocol and governing equations.
Article
In vivo fluorescence imaging, particularly indocyanine green (ICG)‐based imaging, has gained traction for cerebrovascular imaging due to its real‐time dynamics, free radiation, and accessibility. However, the presence of the scalp and skull significantly hampers imaging quality, often necessitating invasive procedures or biotoxic probes to achieve adequate depth and resolution. This limitation restricts the broader clinical/preclinical application of fluorescence imaging techniques. To address this, a novel approach is introduced that utilizes deep learning techniques to enhance ICG‐based imaging, achieving high‐resolution cerebrovascular imaging without invasive methods or biotoxic probes. By leveraging diffusion models, a connection between trans‐scalp (TS) and trans‐cranial (TC) ICG fluorescence images are establish in the latent space. This allows the transformation of blurred TS images into high‐resolution images resembling TC images. Notably, intracerebral vascular structures and microvascular branches are unambiguously observed, achieving an anatomical resolution of 20.1 µm and a 1.7‐fold improvement in spatial resolution. Validation also in a mouse model of middle cerebral artery occlusion demonstrates effective and sensitive identification of ischemic stroke sites. This advancement offers a non‐invasive, cost‐efficient alternative to current expensive imaging methods, paving the way for more advanced fluorescence imaging techniques.
Article
Background Although computed tomography (CT) is widely employed in disease detection, X-ray radiation may pose a risk to the health of patients. Reducing the projection views is a common method, however, the reconstructed images often suffer from streak artifacts. Purpose In previous related works, it can be found that the convolutional neural network (CNN) is proficient in extracting local features, while the Transformer is adept at capturing global information. To suppress streak artifacts for sparse-view CT, this study aims to develop a method that combines the advantages of CNN and Transformer. Methods In this paper, we propose a Multi-Attention and Dual-Branch Feature Aggregation U-shaped Transformer network (MAFA-Uformer), which consists of two branches: CNN and Transformer. Firstly, with a coordinate attention mechanism, the Transformer branch can capture the overall structure and orientation information to provide a global context understanding of the image under reconstruction. Secondly, the CNN branch focuses on extracting crucial local features of images through channel spatial attention, thus enhancing detail recognition capabilities. Finally, through a feature fusion module, the global information from the Transformer and the local features from the CNN are integrated effectively. Results Experimental results demonstrate that our method achieves outstanding performance in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and root mean square error (RMSE). Compared with Restormer, our model achieves significant improvements: PSNR increases by 0.76 dB, SSIM improves by 0.44%, and RMSE decreases by 8.55%. Conclusion Our method not only effectively suppresses artifacts but also better preserves details and features, thereby providing robust support for accurate diagnosis of CT images.
Article
This study presents the development and implementation of a Three-Wheeler Ride Service Application tailored to the unique needs of elderly individuals, people with physical disabilities, and women. The application aims to provide a safe, accessible, and user-friendly platform that addresses the mobility challenges faced by these vulnerable groups. Leveraging modern technologies like Flutter for a responsive interface, Firebase for real-time data management, and Google Maps API for route optimization, the app ensures seamless ride booking, live tracking, and affordability. Key features include a user-centric interface optimized for accessibility, real-time ride safety mechanisms such as driver verification and emergency contact integration, and specialized vehicle options for enhanced comfort and security. Additional functionalities, such as chatbot assistance, dynamic pricing based on distance, and multi-modal payment systems, further improve usability. Through the integration of advanced technological solutions and a focus on inclusivity, the application bridges gaps in the existing ride-hailing ecosystem. The platform's scalability and potential for future enhancements, such as voice-based assistance and eco-friendly vehicle options, underscore its broader social impact, offering a transformative approach to inclusive transportation services.
Article
Full-text available
Mid-infrared photoacoustic microscopy can capture biochemical information without staining. However, the long mid-infrared optical wavelengths make the spatial resolution of photoacoustic microscopy significantly poorer than that of conventional confocal fluorescence microscopy. Here, we demonstrate an explainable deep learning-based unsupervised inter-domain transformation of low-resolution unlabeled mid-infrared photoacoustic microscopy images into confocal-like virtually fluorescence-stained high-resolution images. The explainable deep learning-based framework is proposed for this transformation, wherein an unsupervised generative adversarial network is primarily employed and then a saliency constraint is added for better explainability. We validate the performance of explainable deep learning-based mid-infrared photoacoustic microscopy by identifying cell nuclei and filamentous actins in cultured human cardiac fibroblasts and matching them with the corresponding CFM images. The XDL ensures similar saliency between the two domains, making the transformation process more stable and more reliable than existing networks. Our XDL-MIR-PAM enables label-free high-resolution duplexed cellular imaging, which can significantly benefit many research avenues in cell biology.
Preprint
Image colorization methods have shown prominent performance on natural images. However, since humans are more sensitive to faces, existing methods are insufficient to meet the demands when applied to facial images, typically showing unnatural and uneven colorization results. In this paper, we investigate the facial image colorization task and find that the problems with facial images can be attributed to an insufficient understanding of facial components. As a remedy, by introducing facial component priors, we present a novel facial image colorization framework dubbed FCNet. Specifically, we learn a decoupled color representation for each face component (e.g., lips, skin, eyes, and hair) under the guidance of face parsing maps. A chromatic and spatial augmentation strategy is presented to facilitate the learning procedure, which requires only grayscale and color facial image pairs. After training, the presented FCNet can be naturally applied to facial image colorization with single or multiple reference images. To expand the application paradigms to scenarios with no reference images, we further train two alternative modules, which predict the color representations from the grayscale input or a random seed, respectively. Extensive experiments show that our method can perform favorably against existing methods in various application scenarios (i.e., no-, single-, and multi-reference facial image colorization). The source code and pre-trained models will be publicly available.
Article
An abstract is not available.
Article
A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented. The performance of the various standards is compared by means of PSNR and subjective testing results. The results indicate that H.264/AVC compliant encoders typically achieve essentially the same reproduction quality as encoders that are compliant with the previous standards while typically requiring 60% or less of the bit rate.