Conference Paper

Unsupervised Single Image Deraining with Self-Supervised Constraints

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Once the gap is much larger, the deraining results would be most unsatisfactory. Further, CycleGAN-based unsupervised methods are proposed for real-world image deraining, avoiding paired training data [3,12,15,36,45]. But these methods are known as being difficult to train, due to their complex adversarial objectives. ...
... Such paired data is very difficult or even impossible to collect under real-world rainy scenes. Motivated by a popular image-toimage translation architecture, i.e., CycleGAN [46], recent works [3,15,35,45] attempt to exploit the improved Cy-cleGAN architecture and constrained transfer learning to and Diffusive Translation Branch (DTB). In detail, NTB, only used in the model training phase, fully exploits cycle-consistent architecture to generate initial clean/rainy image pairs. ...
Preprint
What will happen when unsupervised learning meets diffusion models for real-world image deraining? To answer it, we propose RainDiffusion, the first unsupervised image deraining paradigm based on diffusion models. Beyond the traditional unsupervised wisdom of image deraining, RainDiffusion introduces stable training of unpaired real-world data instead of weakly adversarial training. RainDiffusion consists of two cooperative branches: Non-diffusive Translation Branch (NTB) and Diffusive Translation Branch (DTB). NTB exploits a cycle-consistent architecture to bypass the difficulty in unpaired training of standard diffusion models by generating initial clean/rainy image pairs. DTB leverages two conditional diffusion modules to progressively refine the desired output with initial image pairs and diffusive generative prior, to obtain a better generalization ability of deraining and rain generation. Rain-Diffusion is a non adversarial training paradigm, serving as a new standard bar for real-world image deraining. Extensive experiments confirm the superiority of our RainDiffusion over un/semi-supervised methods and show its competitive advantages over fully-supervised ones.
... Once the distributions are of large distance, the semi-supervised deraining result by SSIR [78] would be less satisfactory, as shown in Fig. 2(c). The unsupervised methods have raised more attentions for real rain removal, mainly including the CycleGAN-based unpaired image translation methods [14], [36], [80], [101] and the optimizationmodel driven deep prior network [96]. In Fig. 3, we summarize the development of the single image deraining methods. ...
... Recently, the unsupervised deraining methods have emerged [14], [36], [80], [94], [96], [101]. Most of the previous unsupervised works formulated the unsupervised image deraining as the image generation task via the generative adversarial learning. ...
Preprint
Full-text available
Most of the existing learning-based deraining methods are supervisedly trained on synthetic rainy-clean pairs. The domain gap between the synthetic and real rain makes them less generalized to complex real rainy scenes. Moreover, the existing methods mainly utilize the property of the image or rain layers independently, while few of them have considered their mutually exclusive relationship. To solve above dilemma, we explore the intrinsic intra-similarity within each layer and inter-exclusiveness between two layers and propose an unsupervised non-local contrastive learning (NLCL) deraining method. The non-local self-similarity image patches as the positives are tightly pulled together, rain patches as the negatives are remarkably pushed away, and vice versa. On one hand, the intrinsic self-similarity knowledge within positive/negative samples of each layer benefits us to discover more compact representation; on the other hand, the mutually exclusive property between the two layers enriches the discriminative decomposition. Thus, the internal self-similarity within each layer (similarity) and the external exclusive relationship of the two layers (dissimilarity) serving as a generic image prior jointly facilitate us to unsupervisedly differentiate the rain from clean image. We further discover that the intrinsic dimension of the non-local image patches is generally higher than that of the rain patches. This motivates us to design an asymmetric contrastive loss to precisely model the compactness discrepancy of the two layers for better discriminative decomposition. In addition, considering that the existing real rain datasets are of low quality, either small scale or downloaded from the internet, we collect a real large-scale dataset under various rainy kinds of weather that contains high-resolution rainy images.
... Inspired by [24], we further apply a Background Guidance Module (BGM) to provide additional reliable supervision. The BGM maintains the consistency of the background between the synthetic noisy image and the clean image, constraining their low-frequency contents to be similar. ...
Preprint
Deep learning methods have shown remarkable performance in image denoising, particularly when trained on large-scale paired datasets. However, acquiring such paired datasets for real-world scenarios poses a significant challenge. Although unsupervised approaches based on generative adversarial networks offer a promising solution for denoising without paired datasets, they are difficult in surpassing the performance limitations of conventional GAN-based unsupervised frameworks without significantly modifying existing structures or increasing the computational complexity of denoisers. To address this problem, we propose a SC strategy for multiple denoisers. This strategy can achieve significant performance improvement without increasing the inference complexity of the GAN-based denoising framework. Its basic idea is to iteratively replace the previous less powerful denoiser in the filter-guided noise extraction module with the current powerful denoiser. This process generates better synthetic clean-noisy image pairs, leading to a more powerful denoiser for the next iteration. This baseline ensures the stability and effectiveness of the training network. The experimental results demonstrate the superiority of our method over state-of-the-art unsupervised methods.
... The task of enhancing images can also be formulated as an image-to-image translation task where the objective is to learn a mapping that transforms an image in the noisy domain into an image in the clean domain [42]. Models for unpaired image translation have also been utilized for unpaired image denoising tasks in natural scene images [13,19,41,51,53,55]. However, the application of these unsupervised methods for enhancing document images has not been extensively explored. ...
Preprint
Full-text available
The recognition performance of optical character recognition (OCR) models can be sub-optimal when document images suffer from various degradations. Supervised deep learning methods for image enhancement can generate high-quality enhanced images. However, these methods demand the availability of corresponding clean images or ground truth text. Sometimes this requirement is difficult to fulfill for real-world noisy documents. For instance, it can be challenging to create paired noisy/clean training datasets or obtain ground truth text for noisy point-of-sale receipts and invoices. Unsupervised methods have been explored in recent years to enhance images in the absence of ground truth images or text. However, these methods focus on enhancing natural scene images. In the case of document images, preserving the readability of text in the enhanced images is of utmost importance for improved OCR performance. In this work, we propose a modified architecture to the CycleGAN model to improve its performance in enhancing document images with better text preservation. Inspired by the success of CNN-BiLSTM combination networks in text recognition models, we propose modifying the discriminator network in the CycleGAN model to a combined CNN-BiLSTM network for better feature extraction from document images during classification by the discriminator network. Results indicate that our proposed model not only leads to better preservation of text and improved OCR performance over the CycleGAN model but also achieves better performance than the classical unsupervised image pre-processing techniques like Sauvola and Otsu.
... Since paired training data is nearly unreachable in practice, some researchers have shifted their focus to semi-supervised or unsupervised learning methods [15][16][17] recently from a single image with unpaired training data in low-level computer vision fields. For image deblurring, the semi-supervised methods always use unpaired data to complement paired image training. ...
Article
Full-text available
Most motion deblurring methods require a large amount of paired training data, which is nearly unreachable in practice. To overcome the limitation, a domain translation network with contrastive constraint for unpaired motion image deblurring is proposed. First, a domain translation network with two streams, a sharp domain translation stream and a blurred domain translation stream, to handle unpaired sharp and blurred images from the real world is presented. Second, a contrastive constraint loss in the deep intermediate level for the two streams to promote the network to produce deblurring results close to the real sharp image is proposed. Third, distinct loss functions for the two streams to preserve the edge and texture detail of the deblurring image is designed. Extensive experiments on several benchmark datasets demonstrate that the proposed network achieves better visual performance than the current state‐of‐the‐art methods for unpaired motion image deblurring.
Article
Accurate recognition of marine species in drone-captured images is essential in maintaining the stability of coastal zone ecosystems. Unmanned aerial vehicle (UAV) remote sensing images usually lack paired supervised signals and suffer from color distortion and blurring due to the interaction of ambient light with cross-medium transmission between air and water. However, current algorithms mainly focus on supervised training methods and also ignore the interaction involved in the cross-medium transmission of light in water. In this article, for UAV in coastal zones, we propose an unsupervised image water removal model, named DewaterGAN, which is based solely on low-tide and high-tide images without paired supervised signals and also preserves color and texture in the water removal process. Specifically, our approach involves two key steps: an unsupervised training CycleGAN network accomplishes domain transitions from low-tide level to high-tide level, and a physics-based attention module guides image water removal and maintains authenticity. Additionally, we utilize evaluation metrics of image restoration peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) to quantitatively analyze the performance of the model. We also employed several non-reference metrics (UIQM, UCIQE, NIQE, BRISQUE, LIQE, ILNIQE, and CLIPIQA) to evaluate the visual quality of the image de-watering process. Extensive experiments conducted on both our water removal dataset and public datasets validate the efficacy of our model. The code is at https://github.com/yfq-yy/Dewater.git .
Article
CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destriping task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise under unsupervised constraints. This poses a threat to the effectiveness of the cycle-consistency loss, leading to stripe noise residual in the denoised image. To address the above issue, we present a novel framework for single-frame infrared image destriping, named DestripeCycleGAN. In this model, the conventional auxiliary generator is replaced with a priori stripe generation model (SGM) to introduce vertical stripe noise in the clean data, and the gradient map is employed to re-establish cycle-consistency. Meanwhile, a Haar wavelet background guidance module (HBGM) has been designed to minimize the divergence of background details between the different domains. To preserve vertical edges, a multi-level wavelet U-Net (MWUNet) is proposed as the denoising generator, which utilizes the Haar wavelet transform as the sampler to decline directional information loss. Moreover, it incorporates the group fusion block (GFB) into skip connections to fuse the multi-scale features and build the context of long-distance dependencies. Extensive experiments on real and synthetic data demonstrate that our DestripeCycleGAN surpasses the state-of-the-art methods in terms of visual quality and quantitative evaluation. Our code is available at https://github.com/xdFai/DestripeCycleGAN.
Article
Full-text available
Building facade completion is an important part of digitizing the structures of buildings using computer technology. Due to the intricate textures and structures in building facade images, existing image-completion algorithms cannot accurately restore the rich texture and detailed information. In response, this paper proposes a novel network to simultaneously recover the texture and semantic structural features of building facades. By incorporating dynamic convolutions into each layer of the feature encoder, the shallow layers of the completion network can create a global receptive field, thus enhancing the model’s feature-extraction capability. Additionally, a spatial attention branch is integrated into the dynamic convolution module to boost the correlation between the completion area and its surrounding edge area, resulting in improved edge clarity and accuracy of the completed facade image. Experimental results on multiple public image datasets demonstrate that the proposed model in this paper achieves state-of-the-art results when applied to real-world datasets.
Article
Full-text available
We propose a novel model-free unsupervised learning paradigm to tackle the unfavorable prevailing problem of real-world image deraining, dubbed MUL-Derain. Beyond existing unsupervised deraining efforts, MUL-Derain leverages a model-free Multiscale Attentive Filtering (MSAF) to handle multiscale rain streaks. Therefore, formulation of any rain imaging is not necessary, and it requires neither iterative optimization nor progressive refinement operations. Meanwhile, MUL-Derain can efficiently compute spatial coherence and global interactions by modeling long-range dependencies, allowing MSAF to learn useful knowledge from a larger or even global rain region. Furthermore, we formulate a novel multiloss function to constrain MUL-Derain to preserve both color and structure information from the rainy images. Extensive experiments on both synthetic and real-world datasets demonstrate that our MUL-Derain obtains state-of-the-art performance over un/semisupervised methods and exhibits competitive advantages over the fully-supervised ones.
Preprint
To tackle the difficulties in fitting paired real-world data for single image deraining (SID), recent unsupervised methods have achieved notable success. However, these methods often struggle to generate high-quality, rain-free images due to a lack of attention to semantic representation and image content, resulting in ineffective separation of content from the rain layer. In this paper, we propose a novel cycle contrastive generative adversarial network for unsupervised SID, called CCLGAN. This framework combines cycle contrastive learning (CCL) and location contrastive learning (LCL). CCL improves image reconstruction and rain-layer removal by bringing similar features closer and pushing dissimilar features apart in both semantic and discriminative spaces. At the same time, LCL preserves content information by constraining mutual information at the same location across different exemplars. CCLGAN shows superior performance, as extensive experiments demonstrate the benefits of CCLGAN and the effectiveness of its components.
Article
Full-text available
Real hyperspectral images (HSIs) are ineluctably contaminated by diverse types of noise, which severely limits the image usability. Recently, transfer learning has been introduced in hyperspectral denoising networks to improve model generalizability. However, the current frameworks often rely on image priors and struggle to retain the fidelity of background information. In this article, an unsupervised adaptation learning (UAL)-based hyperspectral denoising network (UALHDN) is proposed to address these issues. The core idea is first learning a general image prior for most HSIs, and then adapting it to a real HSI by learning the deep priors and maintaining background consistency, without introducing hand-crafted priors. Following this notion, a spatial–spectral residual denoiser, a global modeling discriminator, and a hyperspectral discrete representation learning scheme are introduced in the UALHDN framework, and are employed across two learning stages. First, the denoiser and the discriminator are pretrained using synthetic noisy-clean ground-based HSI pairs. Subsequently, the denoiser is further fine-tuned on the real multiplatform HSI according to a spatial–spectral consistency constraint and a background consistency loss in an unsupervised manner. A hyperspectral discrete representation learning scheme is also designed in the fine-tuning stage to extract semantic features and estimate noise-free components, exploring the deep priors specific for real HSIs. The applicability and generalizability of the proposed UALHDN framework were verified through the experiments on real HSIs from various platforms and sensors, including unmanned aerial vehicle-borne, airborne, spaceborne, and Martian datasets. The UAL denoising scheme shows a superior denoising ability when compared with the state-of-the-art hyperspectral denoisers.
Article
Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM). WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. To ensure restoration performance, a unique training strategy is proposed where the low-frequency and high-frequency spectrums are learned using distinct modules. In addition, an Efficient Conditional Sampling (ECS) strategy is developed from experiments, which reduces the number of total sampling steps to around 5. Evaluations on twelve benchmark datasets including image raindrop removal, rain steaks removal, dehazing, defocus deblurring, demoiréing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100× faster than existing image restoration methods using vanilla diffusion models. The code is available at https://github.com/stayalive16/WaveDM</uri
Article
Full-text available
Single image deraining (SID) has shown its importance in many advanced computer vision tasks. Although many CNN-based image deraining methods have been proposed, how to effectively remove raindrops while maintaining background structure remains a challenge that needs to be overcome. Most of the deraining work focuses on removing rain streaks, but in heavy rain images, the dense accumulation of rainwater or the rain curtain effect significantly interferes with the effective removal of rain streaks, and often introduces some artifacts that make the scene more blurry. In this paper, a novel network architecture, R-PReNet, is introduced for single image denoising with an emphasis on preserving the background structure. The framework effectively exploits the cyclic recursive structure inherent in PReNet. Additionally, the residual channel prior (RCP) and feature fusion modules have been incorporated, enhancing denoising performance by emphasizing background feature information. Compared with the previous methods, this approach offers notable improvement in rainstorm images by reducing artifacts and restoring visual details.
Article
Most existing learning-based deraining methods are supervisedly trained on synthetic rainy-clean pairs. The domain gap between the synthetic and real rain makes them less generalized to complex real rainy scenes. Moreover, the existing methods mainly utilize the property of the image or rain layers independently, while few of them have considered their mutually exclusive relationship. To solve above dilemma, we explore the intrinsic intra-similarity within each layer and inter-exclusiveness between two layers and propose an unsupervised non-local contrastive learning (NLCL) deraining method. The non-local self-similarity image patches as the positives are tightly pulled together and rain patches as the negatives are remarkably pushed away, and vice versa. On one hand, the intrinsic self-similarity knowledge within positive/negative samples of each layer benefits us to discover more compact representation; on the other hand, the mutually exclusive property between the two layers enriches the discriminative decomposition. Thus, the internal self-similarity within each layer ( similarity ) and the external exclusive relationship of the two layers ( dissimilarity ) serving as a generic image prior jointly facilitate us to unsupervisedly differentiate the rain from clean image. We further discover that the intrinsic dimension of the non-local image patches is generally higher than that of the rain patches. This insight motivates us to design an asymmetric contrastive loss that precisely models the compactness discrepancy of the two layers, thereby improving the discriminative decomposition. In addition, recognizing the limited quality of existing real rain datasets, which are often small-scale or obtained from the internet, we collect a large-scale real dataset under various rainy weathers that contains high-resolution rainy images. Extensive experiments conducted on different real rainy datasets demonstrate that the proposed method obtains state-of-the-art performance in real deraining. Both the code and the newly collected datasets will be available at https://owuchangyuo.github.io .
Article
The rain removal task is to restore a clean image from the contaminated image by separating the background. Since the rise of deep learning in 2016, the task of image deraining has also stepped into the era of deep learning. Numerous researchers have devoted themselves to the field of computer vision and pattern recognition. However, there is still a lack of comprehensive review papers focused on using deep learning to perform rain removal tasks. In this paper, we present a comprehensive review of single image deraining based on deep learning over the past ten years. Two categories of deraining methods are discussed: the data-driven approach and the data-model-based approach. For the first type, we compare the existing network structures and loss functions. For the second type, we analyze the combination of different deraining models with deep learning, and each branch method is introduced in detail. Additionally, we quantitatively investigate the performances of the existing state-of-the-art methods on both publicly synthetic and real datasets. The trend of image deraining is also discussed.
Article
Full-text available
Introduction In the context of evolving societal preferences for deeper emotional connections in art, this paper explores the emergence of multimodal robot music performance art. It investigates the fusion of music and motion in robot performances to enhance expressiveness and emotional impact. The study employs Transformer models to combine audio and video signals, enabling robots to better understand music's rhythm, melody, and emotional content. Generative Adversarial Networks (GANs) are utilized to create lifelike visual performances synchronized with music, bridging auditory and visual perception. Multimodal reinforcement learning is employed to achieve harmonious alignment between sound and motion. Methods The study leverages Transformer models to process audio and video signals in robot performances. Generative Adversarial Networks are employed to generate visually appealing performances that align with the musical input. Multimodal reinforcement learning is used to synchronize robot actions with music. Diverse music styles and emotions are considered in the experiments. Performance evaluation metrics include accuracy, recall rate, and F1 score. Results The proposed approach yields promising results across various music styles and emotional contexts. Performance smoothness scores exceed 94 points, demonstrating the fluidity of robot actions. An accuracy rate of 95% highlights the precision of the system in aligning robot actions with music. Notably, there is a substantial 33% enhancement in performance recall rate compared to baseline modules. The collective improvement in F1 score emphasizes the advantages of the proposed approach in the realm of robot music performance art. Discussion The study's findings demonstrate the potential of multimodal robot music performance art in achieving heightened emotional impact. By combining audio and visual cues, robots can better interpret and respond to music, resulting in smoother and more precise performances. The substantial improvement in recall rate suggests that the proposed approach enhances the robots' ability to accurately mirror the emotional nuances of the music. These results signify the potential of this approach to transform the landscape of artistic expression through robotics, opening new avenues for emotionally resonant performances.
Article
Full-text available
Context. Removing the undesirable consequences of rain effects from single images is an actual problem in many computer vision tasks, because rain streaks can significantly degrade the visual quality of images and seriously interfere with the operation of various intelligent systems, which are used for their processing and further analysis. Objective. The goal of the work is to develop a method for detecting and removing undesirable effects of the rain from single images, which is based on the using of a convolutional neural network with a recurrent structure. Method. The main component of the proposed method is a convolutional neural network, which has a recurrent multi-stage structure. A feature of this network architecture is the use of repeated blocks (layers), at the output of which you can get an intermediate result of «cleaning» the original image. Moreover at the output of each next layer of the network we get an image with less influence of rain components than on the previous one. Each network layer contains two independent sub-networks (branches) for parallel image processing. The main branch is designed to detect and remove the effect of rain from the image and the attention branch is used to improve and speed up the process of detecting undesirable rain components (for rain attention map formation). Results. An approach has been developed to automatically detect and remove the rain effect from single images. The process of “cleaning” the original image is based on the use of a convolutional neural network with a recurrent structure, which was trained on the Rain100H and Rain100L datasets. The results of computer experiments, which testifies to the effectiveness and expediency of using the proposed method for solving practical tasks of pre-processing “contaminated” images are presented. Conclusions. The advantage of the developed method for removing undesirable components of rain from images is that the recurrent multi-stage network architecture, on which it is based allows it to be potentially applied to solving tasks under conditions of limited computing resources. The proposed method can be successfully used in the development of intelligent systems for area monitoring with surveillance cameras, autonomous vehicles control, processing aerial photography results, etc. In the future, it should be considered the possibility of forming a separate sub-network to eliminate blurring in the image and train the network on datasets that contain image samples with different components of rain, which will make the method more «resistant» to different forms of the rain effect and increase the quality of image “cleaning”.
Article
Image rain removal is an essential and challenging low-level task in computer vision. In recent years, although great progress has been achieved in the field of image and video deraining, the restored details may be incomplete in many cases, making it difficult to recover the clear images from diverse rain forms. Moreover, the features of layers may not be fully exploited, which may reduce the rain removal performance to some degree. To address the above issues, we propose a novel Mask-based Enhancement and Feature Aggregation Network (MEFA-Net). First, we develop a novel structure with enhancement branch, which can greatly improve the feature representation capability. The enhancement branch is implemented based on the random masking technique, which can generate complementary features and improve the robustness. Specially, we design a novel adaptive weighting block in the MEFA-Net, which can adaptively assign weights to the enhanced features for each specific image, effectively improving the generalization ability of MEFA-Net. Second, we propose a feature aggregation sub-module (FAS) for comprehensive representation by integrating the features from different layers. Experimental results demonstrate that the proposed MEFA-Net can achieve better performance than most state-of-the-art image deraining methods on four synthetic and one real-world datasets.
Article
Full-text available
Single image deraining is a fundamental task in computer vision, which can greatly improve the performance of subsequent high-level tasks under rainy conditions. Existing data-driven rain removal methods rely heavily on paired training data, which is expensive to collect. In this paper, we propose a new unsupervised method, called Cycle-Attention-Derain, which removes rain from single images in the absence of paired data. Our method is based on the CycleGAN framework with two major novelties. First, since rain removal is highly correlated with analyzing the texture features of an input image, we propose a novel attention fusion module (AFM) with complementary channel attention and spatial attention, which can effectively learn more discriminative features for rain removal. Second, to further improve the generalization ability of our model, we propose a global–local attention discriminator architecture with an attention mechanism to guide the network training, so that the rain removal results are realistic, both globally and locally. Our proposed model is able to remove rain streaks and raindrops of varying degrees without paired training images. Extensive experiments on synthetic and real datasets demonstrate that the proposed method outperforms most of the state-of-the-art unsupervised rain removal methods in terms of both PSNR and SSIM on Rain800 datasets and achieves slightly close results to other popular supervised learning methods.
Article
Low-light image enhancement aims to recover normal-light images from the images captured under dim environments. Most existing methods could just improve the light appearance globally whereas failing to handle other degradation such as dense noise, color offset and extremely low-light. Moreover, unsupervised methods proposed in recent years lack reliable physical model as the basis, thus universality is greatly limited. To address these problems, we propose a novel low-light image enhancement method via Retinex-inline cycle-consistent generative adversarial network named Cycle-Retinex, whose training is totally dependent on unpaired datasets. Specifically, we organically combine Retinex theory with CycleGAN, by which we decouple low-light image enhancement task into two sub-tasks, i.e. illumination map enhancement and reflectance map restoration. Retinex theory helps CycleGAN simplify low-light image enhancement problem and CycleGAN provides synthetic paired images to guide the training of Retinex decomposition network. We further introduce a self-augmented method to address the color distortion and noise problem, thus making the network learn to enhance low-light images adaptively. Extensive experiments show that the proposed method can achieve promising results. The source code is available at https://github.com/mummmml/Cycle-Retinex .
Article
Single-image deraining aims to restore the image that is degraded by the rain streaks, where the long-standing bottleneck lies in how to disentangle the rain streaks from the given rainy image. Despite the progress made by substantial existing works, several crucial questions — e.g., How to distinguish rain streaks and clean image, while how to disentangle rain streaks from low-frequency pixels, and further prevent the blurry edges — have not been well investigated. In this paper, we attempt to solve all of them under one roof. Our observation is that rain streaks are bright stripes with higher pixel values that are evenly distributed in each color channel of the rainy image, while the disentanglement of the high-frequency rain streaks is equivalent to decreasing the standard deviation of the pixel distribution for the rainy image. To this end, we propose a self-supervised rain streaks learning network to characterize the similar pixel distribution of the rain streaks from a macroscopic viewpoint over various low-frequency pixels of gray-scale rainy images, coupling with a supervised rain streaks learning network to explore the specific pixel distribution of the rain streaks from a microscopic viewpoint between each paired rainy and clean images. Building on this, a self-attentive adversarial restoration network comes up to prevent the further blurry edges. These networks compose an end-to-end Macroscopic-and-Microscopic Rain Streaks Disentanglement Network, named M 2 RSD-Net, to learn rain streaks, which is further removed for single image deraining. The experimental results validate its advantages on deraining benchmarks against the state-of-the-arts. The code is available at: https://github.com/xinjiangaohfut/MMRSD-Net.
Article
Owing to the rapid development of deep networks, single image deraining tasks have achieved significant progress. Various architectures have been designed to recursively or directly remove rain, and most rain streaks can be removed by existing deraining methods. However, many of them cause a loss of details during deraining, resulting in visual artifacts. To resolve the detail-losing issue, we propose a novel unrolling rain-guided detail recovery network (URDRN) for single image deraining based on the observation that the most degraded areas of the background image tend to be the most rain-corrupted regions. Furthermore, to address the problem that most existing deep-learning-based methods trivialize the observation model and simply learn an end-to-end mapping, the proposed URDRN unrolls the single image deraining task into two subproblems: rain extraction and detail recovery. Specifically, first, a context aggregation attention network is introduced to effectively extract rain streaks, and then, a rain attention map is generated as an indicator to guide the detail-recovery process. For a detail-recovery sub-network, with the guidance of the rain attention map, a simple encoder–decoder model is sufficient to recover the lost details. Experiments on several well-known benchmark datasets show that the proposed approach can achieve a competitive performance in comparison with other state-of-the-art methods.
Article
Full-text available
Single-image super-resolution technology has made great progress with the development of the convolutional neural network, but most of the current super-resolution methods do not attempt high-magnification image super-resolution reconstruction; only reconstruction with ×2, ×3, ×4 magnification is carried out for low-magnification down-sampled images without serious degradation. Based on this, this paper proposed a single-image high-magnification super-resolution method, which extends the scale factor of image super-resolution to high magnification. By introducing the idea of multi-task learning, the process of the high-magnification image super-resolution process is decomposed into different super-resolution tasks. Different tasks are trained with different data, and network models for different tasks can be obtained. Through the cascade reconstruction of different task network models, a low-resolution image accumulates reconstruction advantages layer by layer, and we obtain the final high-magnification super-resolution reconstruction results. The proposed method shows better performance in quantitative and qualitative comparison on the benchmark dataset than other super-resolution methods.
Article
Image restoration is the problem of restoring a real degraded image. Previous studies mostly focused on single distortion. However, most of the real images experience multiple distortions, and single distortion image restoration algorithms can not effectively improve the image quality. Moreover, few existing hybrid distortion image restoration algorithms can not deal with single distortion. Therefore, an end-to-end pipeline network based on stagewise training is proposed in this paper. Specifically, the network selects three typical image restoration tasks: denoising, inpainting, and super resolution. The whole training process is divided into single distortion training, hybrid distortion training of two types, and hybrid distortion training of three types. The design of loss function draws on the idea of deep supervision. Experimental results prove that the proposed method is not only superior to other methods in hybrid-distorted image restoration, but also suitable for single distortion image restoration.
Article
As common weather, rain streaks adversely degrade the image quality and tend to negatively affect the performance of outdoor computer vision systems. Hence, removing rains from an image has become an important issue in the field. To handle such an ill-posed single image deraining task, in this article, we specifically build a novel deep architecture, called rain convolutional dictionary network (RCDNet), which embeds the intrinsic priors of rain streaks and has clear interpretability. In specific, we first establish a rain convolutional dictionary (RCD) model for representing rain streaks and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model. By unfolding it, we then build the RCDNet in which every network module has clear physical meanings and corresponds to each operation involved in the algorithm. This good interpretability greatly facilitates an easy visualization and analysis of what happens inside the network and why it works well in the inference process. Moreover, taking into account the domain gap issue in real scenarios, we further design a novel dynamic RCDNet, where the rain kernels can be dynamically inferred corresponding to input rainy images and then help shrink the space for rain layer estimation with few rain maps, so as to ensure a fine generalization performance in the inconsistent scenarios of rain types between training and testing data. By end-to-end training such an interpretable network, all involved rain kernels and proximal operators can be automatically extracted, faithfully characterizing the features of both rain and clean background layers and, thus, naturally leading to better deraining performance. Comprehensive experiments implemented on a series of representative synthetic and real datasets substantiate the superiority of our method, especially on its well generality to diverse testing scenarios and good interpretability for all its modules, compared with state-of-the-art single image derainers both visually and quantitatively. Code is available at https://github.com/hongwang01/DRCDNet .
Article
Full-text available
Due to the requirement of video surveillance, machine learning-based single image deraining has become a research hotspot in recent years. In order to efficiently obtain rain removal images that contain more detailed information, this paper proposed a novel frequency-aware single image deraining network via the separation of rain and background. For the rainy images, most of the background key information belongs to the low-frequency components, while the high-frequency components are mixed by background image details and rain streaks. This paper attempted to decouple background image details from high frequency components under the guidance of the restored low frequency components. Compared with existing approaches, the proposed network has three major contributions. (1) A residual dense network based on Discrete Wavelet Transform (DWT) was proposed to study the rainy image background information. (2) The frequency channel attention module was introduced into the adaptive decoupling of high-frequency image detail signals. (3) A fusion module was introduced that contains the attention mechanism to make full use of the multi receptive fields information using a two-branch structure, using the context information in a large area. The proposed approach was evaluated using several representative datasets. Experimental results shows this proposed approach outperforms other state-of-the-art deraining algorithms.
Article
Full-text available
This survey article is concerned with the emergence of vision augmentation AI tools for enhancing the situational awareness of first responders (FRs) in rescue operations. More specifically, the article surveys three families of image restoration methods serving the purpose of vision augmentation under adverse weather conditions. These image restoration methods are: (a) deraining; (b) desnowing; (c) dehazing ones. The contribution of this article is a survey of the recent literature on these three problem families, focusing on the utilization of deep learning (DL) models and meeting the requirements of their application in rescue operations. A faceted taxonomy is introduced in past and recent literature including various DL architectures, loss functions and datasets. Although there are multiple surveys on recovering images degraded by natural phenomena, the literature lacks a comprehensive survey focused explicitly on assisting FRs. This paper aims to fill this gap by presenting existing methods in the literature, assessing their suitability for FR applications, and providing insights for future research directions.
Article
Full-text available
Rainy images typically contain heterogeneous rain distributions; however, many existing methods perform well in simple homogeneous rain and fail to handle complex heterogeneous rain effectively. In this paper, we try to solve this problem by fully exploiting the complementary contextual information in the manner of a Joint Feedback and Recurrent deraining scheme with Ensemble Learning (JFREL). First, the proposed JFREL is built on a recurrent multistage architecture, and the output of each stage is fused automatically via ensemble learning. Second, the feedback mechanism is utilized to refine information from inter- and intra-stages. Third, at each stage the residual dilated aggregation attention module is recursively adopted to adequately characterize complementary high-level contextual information in multiple receptive fields and adaptively aggregate beneficial details to achieve feature compensation. Extensive experiments demonstrate that the proposed JFREL can achieve a competitive performance over the state-of-the-art methods on both synthetic and real-world datasets.
Article
Full-text available
Deep learning models have been used in several domains, however, adjusting is still required to be applied in sensitive areas such as medical imaging. As the use of technology in the medical domain is needed because of the time limit, the level of accuracy assures trustworthiness. Because of privacy concerns, machine learning applications in the medical field are unable to use medical data. For example, the lack of brain MRI images makes it difficult to classify brain tumors using image-based classification. The solution to this challenge was achieved through the application of Generative Adversarial Network (GAN)-based augmentation techniques. Deep Convolutional GAN (DCGAN) and Vanilla GAN are two examples of GAN architectures used for image generation. In this paper, a framework, denoted as BrainGAN, for generating and classifying brain MRI images using GAN architectures and deep learning models was proposed. Consequently, this study proposed an automatic way to check that generated images are satisfactory. It uses three models: CNN, MobileNetV2, and ResNet152V2. Training the deep transfer models with images made by Vanilla GAN and DCGAN, and then evaluating their performance on a test set composed of real brain MRI images. From the results of the experiment, it was found that the ResNet152V2 model outperformed the other two models. The ResNet152V2 achieved 99.09% accuracy, 99.12% precision, 99.08% recall, 99.51% area under the curve (AUC), and 0.196 loss based on the brain MRI images generated by DCGAN architecture.
Chapter
Full-text available
Rain streaks can severely degrade the visibility, which causes many current computer vision algorithms fail to work. So it is necessary to remove the rain from images. We propose a novel deep network architecture based on deep convolutional and recurrent neural networks for single image deraining. As contextual information is very important for rain removal, we first adopt the dilated convolutional neural network to acquire large receptive field. To better fit the rain removal task, we also modify the network. In heavy rain, rain streaks have various directions and shapes, which can be regarded as the accumulation of multiple rain streak layers. We assign different alpha-values to various rain streak layers according to the intensity and transparency by incorporating the squeeze-and-excitation block. Since rain streak layers overlap with each other, it is not easy to remove the rain in one stage. So we further decompose the rain removal into multiple stages. Recurrent neural network is incorporated to preserve the useful information in previous stages and benefit the rain removal in later stages. We conduct extensive experiments on both synthetic and real-world datasets. Our proposed method outperforms the state-of-the-art approaches under all evaluation metrics. Codes and supplementary material are available at our project webpage: https://xialipku.github.io/RESCAN.
Article
Full-text available
Single image rain streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. We present a novel density-aware multi-stream densely connected convolutional neural network-based algorithm, called DID-MDN, for joint rain density estimation and de-raining. The proposed method enables the network itself to automatically determine the rain-density information and then efficiently remove the corresponding rain-streaks guided by the estimated rain-density label. To better characterize rain-streaks with different scales and shapes, a multi-stream densely connected de-raining network is proposed which efficiently leverages features from different scales. Furthermore, a new dataset containing images with rain-density labels is created and used to train the proposed density-aware network. Extensive experiments on synthetic and real datasets demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods. In addition, an ablation study is performed to demonstrate the improvements obtained by different modules in the proposed method. Code can be found at: https://github.com/hezhangsprinter
Article
Full-text available
Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by weakly supervised learning: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. Our experiments on the DPED, Kitti and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable qualitative results with state-of-the-art strongly supervised methods, while not requiring the tedious work to obtain aligned datasets.
Article
Full-text available
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G: X -> Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y -> X and introduce a cycle consistency loss to push F(G(X)) \approx X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Article
Full-text available
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Article
Full-text available
In this paper, we propose a novel explicit image filter called guided filter. Derived from a local linear model, the guided filter computes the filtering output by considering the content of a guidance image, which can be the input image itself or another different image. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter [1], but it has better behaviors near edges. The guided filter is also a more generic concept beyond smoothing: It can transfer the structures of the guidance image to the filtering output, enabling new filtering applications like dehazing and guided feathering. Moreover, the guided filter naturally has a fast and nonapproximate linear time algorithm, regardless of the kernel size and the intensity range. Currently, it is one of the fastest edge-preserving filters. Experiments show that the guided filter is both effective and efficient in a great variety of computer vision and computer graphics applications, including edge-aware smoothing, detail enhancement, HDR compression, image matting/feathering, dehazing, joint upsampling, etc.
Article
Full-text available
Rain removal from a video is a challenging problem and has been recently investigated extensively. Nevertheless, the problem of rain removal from a single image was rarely studied in the literature, where no temporal information among successive images can be exploited, making the problem very challenging. In this paper, we propose a single-image-based rain removal framework via properly formulating rain removal as an image decomposition problem based on morphological component analysis. Instead of directly applying a conventional image decomposition technique, the proposed method first decomposes an image into the low- and high-frequency (HF) parts using a bilateral filter. The HF part is then decomposed into a "rain component" and a "nonrain component" by performing dictionary learning and sparse coding. As a result, the rain component can be successfully removed from the image while preserving most original image details. Experimental results demonstrate the efficacy of the proposed algorithm.
Article
Recently emerged deep learning methods have achieved great success in single image rain streaks removal. However, existing methods ignore an essential factor in the rain streaks generation mechanism, i.e., the motion blur leading to the line pattern appearances. Thus, they generally produce overderaining or underderaining results. In this article, inspired by the generation mechanism, we propose a novel rain streaks removal framework using a kernel-guided convolutional neural network (KGCNN), achieving state-of-the-art performance with a simple network architecture. More precisely, our framework consists of three steps. First, we learn the motion blur kernel by a plain neural network, termed parameter network, from the detail layer of a rainy patch. Then, we stretch the learned motion blur kernel into a degradation map with the same spatial size as the rainy patch. Finally, we use the stretched degradation map together with the detail patches to train a deraining network with a typical ResNet architecture, which produces the rain streaks with the guidance of the learned motion blur kernel. Experiments conducted on extensive synthetic and real data demonstrate the effectiveness of the proposed KGCNN, in terms of rain streaks removal and image detail preservation.
Conference Paper
Single image rain streaks removal has recently witnessed substantial progress due to the development of deep convolutional neural networks. However, existing deep learning based methods either focus on the entrance and exit of the network by decomposing the input image into high and low frequency information and employing residual learning to reduce the mapping range, or focus on the introduction of cascaded learning scheme to decompose the task of rain streaks removal into multi-stages. These methods treat the convolutional neural network as an encapsulated end-to-end mapping module without deepening into the rationality and superiority of neural network design. In this paper, we delve into an effective end-to-end neural network structure for stronger feature expression and spatial correlation learning. Specifically, we propose a non-locally enhanced encoder-decoder network framework, which consists of a pooling indices embedded encoder-decoder network to efficiently learn increasingly abstract feature representation for more accurate rain streaks modeling while perfectly preserving the image detail. The proposed encoder-decoder framework is composed of a series of non-locally enhanced dense blocks that are designed to not only fully exploit hierarchical features from all the convolutional layers but also well capture the long-distance dependencies and structural information. Extensive experiments on synthetic and real datasets demonstrate that the proposed method can effectively remove rain-streaks on rainy image of various densities while well preserving the image details, which achieves significant improvements over the recent state-of-the-art methods.
Conference Paper
Single image rain streaks removal is extremely important since rainy condition adversely affects many computer vision systems. Deep learning based methods have great success in image deraining tasks. In this paper, we propose a novel residual-guide feature fusion network, called ResGuideNet, for single image deraining that progressively predicts high-quality reconstruction while using fewer parameters than previous methods. Specifically, we propose a cascaded network and adopt residuals from shallower blocks to guide deeper blocks. We can obtain a coarse-to-fine estimation of negative residual as the blocks go deeper with this strategy. The outputs of different blocks are merged into the final reconstruction. We adopt recursive convolution to build each block and apply supervision to intermediate de-rained results. ResGuideNet is detachable to meet different rainy conditions. For images with light rain streaks and limited computational resource at test time, we can obtain a decent performance even with several building blocks. Experiments validate that ResGuideNet can benefit other low- and high-level vision tasks.
Conference Paper
We present a novel method for removing rain streaks from a single input image by decomposing it into a rain-free background layer B and a rain-streak layer R. A joint optimization process is used that alternates between removing rain-streak details from B and removing non-streak details from R. The process is assisted by three novel image priors. Observing that rain streaks typically span a narrow range of directions, we first analyze the local gradient statistics in the rain image to identify image regions that are dominated by rain streaks. From these regions, we estimate the dominant rain streak direction and extract a collection of rain-dominated patches. Next, we define two priors on the background layer B, one based on a centralized sparse representation and another based on the estimated rain direction. A third prior is defined on the rain-streak layer R, based on similarity of patches to the extracted rain patches. Both visual and quantitative comparisons demonstrate that our method outperforms the state-of-the-art.
Article
Rain streaks impair visibility of an image and introduce undesirable interference that can severely affect the performance of computer vision and image analysis systems. Rain streak removal algorithms try to recover a rain streak free background scene. In this paper, we address the problem of rain streak removal from a single image by formulating it as a layer decomposition problem, with a rain streak layer superimposed on a background layer containing the true scene content. Existing decomposition methods that address this problem employ either sparse dictionary learning methods or impose a low rank structure on the appearance of the rain streaks. While these methods can improve the overall visibility, their performance can often be unsatisfactory, for they tend to either over-smooth the background images or generate images that still contain noticeable rain streaks. To address the problems, we propose a method that imposes priors for both the background and rain streak layers. These priors are based on Gaussian mixture models learned on small patches that can accommodate a variety of background appearances as well as the appearance of the rain streaks. Moreover, we introduce a structure residue recovery step to further separate the background residues and improve the decomposition quality. Quantitative evaluation shows our method outperforms existing methods by a large margin. We overview our method and demonstrate its effectiveness over prior work on a number of examples.
Article
In this paper, we propose an efficient algorithm to remove rain or snow from a single color image. Our algorithm takes advantage of two popular techniques employed in image processing, namely, image decomposition and dictionary learning. At first, a combination of rain/snow detection and a guided filter is used to decompose the input image into a complementary pair: (1) the low-frequency part that is free of rain or snow almost completely and (2) the high-frequency part that contains not only the rain/snow component but also some or even many details of the image. Then, we focus on the extraction of image’s details from the high-frequency part. To this end, we design a 3-layer hierarchical scheme. In the first layer, an over-complete dictionary is trained and three classifications are carried out to classify the high-frequency part into rain/snow and nonrain/ snow components in which some common characteristics of rain/snow have been utilized. In the second layer, another combination of rain/snow detection and guided filtering is performed on the rain/snow component obtained in the first layer. In the third layer, the sensitivity of variance across color channels (SVCC) is computed to enhance the visual quality of rain/snow-removed image. The effectiveness of our algorithm is verified through both subjective (the visual quality) and objective (through rendering rain/snow on some ground-truth images) approaches, which shows a superiority over several state-of-the-art works.
Article
We introduce a deep network architecture called DerainNet for removing rain streaks from an image. Based on the deep convolutional neural network (CNN), we directly learn the mapping relationship between rainy and clean image detail layers from data. Because we do not possess the ground truth corresponding to real-world rainy images, we synthesize images with rain for training. To effectively and efficiently train the network, different with common strategies that roughly increase depth or breadth of network, we utilize some image processing domain knowledge to modify the objective function. Specifically, we train our DerainNet on the detail layer rather than the image domain. Better results can be obtained under the same net architecture. Though DerainNet is trained on synthetic data, we still find that the learned network is very effective on real-world images for testing. Moreover, we augment the CNN framework with image enhancement to significantly improve the visual results. Compared with state-of-the- art single image de-rain methods, our method has better rain removal and much faster computation time after network training.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Recurrent squeeze-and-excitation context aggregation net for single image deraining
  • Xia Li
  • Jianlong Wu
  • Zhouchen Lin
  • Hong Liu
  • Hongbin Zha
Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin Zha, "Recurrent squeeze-and-excitation context aggregation net for single image deraining," ECCV, 2018.
Rain streak removal for single image via kernel guided cnn
  • Ye-Tao Wang
  • Xi-Le Zhao
  • Tai-Xiang Jiang
  • Liang-Jian Deng
  • Yi Chang
  • Ting-Zhu Huang
Ye-Tao Wang, Xi-Le Zhao, Tai-Xiang Jiang, Liang-Jian Deng, Yi Chang, and Ting-Zhu Huang, "Rain streak removal for single image via kernel guided cnn," arXiv preprint arXiv:1808.08545, 2018.
Unpaired image-to-image translation with domain supervision
  • Jianxin Lin
  • Sen Liu
  • Yingce Xia
  • Shuxin Zhao
  • Tao Qin
  • Zhibo Chen
Jianxin Lin, Sen Liu, Yingce Xia, Shuxin Zhao, Tao Qin, and Zhibo Chen, "Unpaired image-to-image translation with domain supervision," arXiv preprint arXiv:1902.03782, 2019.
Unpaired image-to-image translation with domain supervision
  • lin
Rain streak removal for single image via kernel guided cnn
  • wang
Recurrent squeeze-and-excitation context aggregation net for single image deraining
  • li