Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The fusion of infrared and visible images of the same scene aims to generate a composite image which can provide a more comprehensive description of the scene. In this paper, we propose an infrared and visible image fusion method based on convolutional neural networks (CNNs). In particular, a siamese convolutional network is applied to obtain a weight map which integrates the pixel activity information from two source images. This CNN-based approach can deal with two vital issues in image fusion as a whole, namely, activity level measurement and weight assignment. Considering the different imaging modalities of infrared and visible images, the merging procedure is conducted in a multi-scale manner via image pyramids and a local similarity-based strategy is adopted to adaptively adjust the fusion mode for the decomposed coefficients. Experimental results demonstrate that the proposed method can achieve state-of-the-art results in terms of both visual quality and objective assessment.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Inspired by the improved performance of CNN-based DL methods for various applications, researchers used these methods for remote sensing image fusion in the last few years (Liu et al. 2018). The use of DL methods for image fusion suggests distinct advantages of these methods over conventional methods owing to their ability in characterizing the relationship between the target and input images (Liu et al. 2018). ...
... Inspired by the improved performance of CNN-based DL methods for various applications, researchers used these methods for remote sensing image fusion in the last few years (Liu et al. 2018). The use of DL methods for image fusion suggests distinct advantages of these methods over conventional methods owing to their ability in characterizing the relationship between the target and input images (Liu et al. 2018). Huang et al. (2015) proposed a deep neural network-based pan sharpening method to solve the problem of image fusion. ...
... Deep siamese CNN was trained with ImageNet dataset for image fusion. The architecture proposed by Liu et al., (2018) was modified to fit requirements and a patch size of 32 with gamma value equal to 0.3 (factor which is multiplied to the learning rate) was considered. In this study, each branch of CNN-based Siamese neural network consists of three convolutional layers where the generated feature maps are used to measure activity levels. ...
Article
Deep learning (DL)-based methods have recently been extensively used for satellite image analysis due to their ability to automatically extract spatial-spectral features from images. Recent advancement in DL-based methods has also allowed the remote sensing community to utilize these methods in fusing the satellite images for enhanced land use/land cover (LULC) classification. Keeping this in view, the present study aims to evaluate the potential of SAR (Sentinel 1) and Optical (Sentinel 2) image fusion using pyramid-based DL methods over an agricultural area in India. In this study, three image fusion methods, i.e., pyramid-based fusion methods, pyramid-based fusion methods coupled with convolutional neural network (CNN), and a combination of two different pyramid decomposition methods concurrently with CNN were used. The performance of the fused images was evaluated in terms of fusion metrics, image quality, and overall classification accuracy by an optimized 2D-CNN-based DL classifier. Results from pyramid-based fusion methods with CNN and a combination of two different pyramid decomposition methods with CNN suggest that these methods were able to retain visual quality as well as the detailed structural information of input images in comparison to the pyramid-based fusion methods. Bayesian optimization method was used to optimize various hyper-parameters of the 2D-CNN-based DL classifier used in this study. Results with fused images obtained using pyramid-based methods coupled with CNN suggest an improved performance by VV (Vertical–Vertical) polarized images in terms of overall classification accuracy (99.23% and 99.33%).
... These algorithms are found to achieve better results because of their automatic feature extraction capabilities from the image patches thus providing a higher level of spatial information. Several researchers used CNN-based siamese, pseudo-siamese and two-channel siamese networks for image fusion by extracting spatial-spectral features using remote sensing dataset (Liu et al. 2018, He et al. 2019, Hughes et al. 2019. CNN-based DL classifiers have also been extensively used for land cover classifications (Guo et al. 2019, Shakya et al. 2021. ...
... CNN-based DL classifiers have also been extensively used for land cover classifications (Guo et al. 2019, Shakya et al. 2021. The use of CNNwith pyramid methods has also been reported for the fusion of infrared images, visible images, medical images, multi-focus images and remotely sensed images (Liu et al. 2018, Shakya et al. 2020. Various pyramid-based image decomposition methods: Laplacian Pyramid (LP), Gaussian Pyramid (GP), Contrast Pyramid (CP), Ratio of Laplacian Pyramid (ROLP), Gradient Pyramid (GRP), Filter Subtract Decimate Pyramid (FSDP) and ...
... Each branch of the twin network accepts one of the two input images at the same time, followed by the final layers. This network comprises pair of input images; therefore, it is necessary to differentiate them based on similarity and dissimilarity measure that can be calculated by a contrastive loss function using Euclidean distance instead of cross entropy which is binary class loss function (Liu et al. 2018). These networks are more likely to utilise the contrastive loss function during training since it improves the network's accuracy. ...
Article
Remote sensing image classification is difficult, especially for agricultural crops with identical phenological growth periods. In this context, multi-sensor image fusion allows a comprehensive representation of biophysical and structural information. Recently, Convolutional Neural Network (CNN)-based methods are used for several applications due to their spatial-spectral interpretability. Hence, this study explores the potential of fused multi-temporal Sentinel 1 (S1) and Sentinel 2 (S2) images for Land Use/Land Cover classification over an agricultural area in India. For classification, Bayesian optimised 2D CNN-based DL and pixel-based SVM classifiers were used. For fusion, a CNN-based siamese network with Ratio-of-Laplacian pyramid method was used for the images acquired over the entire winter cropping period. This fusion strategy leads to better interpretability of results and also found that 2D CNN-based DL classifier performed well in terms of classification accuracy for both single-month (95.14% and 96.11%) as well as multi-temporal (99.87% and 99.91%) fusion in comparison to the SVM with classification accuracy for single-month (80.02% and 81.36%) and multi-temporal fusion (95.69% and 95.84%). Results indicate better performance by Vertical-Vertical polarised fused images than Vertical-Horizontal polarised fused images. Thus, implying the need to analyse classified images obtained by DL classifiers along with the classification accuracy.
... Liu et al. first proposed a convolutional neural network-(CNN-) based fusion algorithm for infrared and visible images [4], which provides better fusion results than traditional methods. Liu et al. [5] used CNN as a feature extraction model to achieve the fusion of multifocused images by rule-based fusion. Li and Wu [6] proposed an auto-encoder-based method for fusing infrared and visible images, which can use feature maps to obtain fused images eventually. ...
... The fusion methods used in this paper fall into two categories: general methods and methods based on deep learning. General methods include ADF [28], guided filter algorithm (GFF) [29], cross bilateral filter (CBF) [30], and VSMWLS [31]; methods based on deep learning include DenseFuse [6], CNN [5], ResNet [32], and our method. DenseFuse, CNN, ResNet, and our model are implemented with Pytorch and trained with double Tesla V100, 16GB RAM GPUs. ...
Article
Full-text available
To fuse infrared and visible images in wireless applications, the extraction and transmission of characteristic information security is an important task. The fused image quality depends on the effectiveness of feature extraction and the transmission of image pair characteristics. However, most fusion approaches based on deep learning do not make effective use of the features for image fusion, which results in missing semantic content in the fused image. In this paper, a novel trustworthy image fusion method is proposed to address these issues, which applies convolutional neural networks for feature extraction and blockchain technology to protect sensitive information. The new method can effectively reduce the loss of feature information by making the output of the feature extraction network in each convolutional layer to be fed to the next layer along with the production of the previous layer, and in order to ensure the similarity between the fused image and the original image, the original input image feature map is used as the input of the reconstruction network in the image reconstruction network. Compared to other methods, the experimental results show that our proposed method can achieve better quality and satisfy human perception.
... The convolutional neural network (CNN) attracts much focus due to its ability of powerful feature representation. Liu et al utilize CNNs to finish the fusion of infrared and visible images [22]. Whereas, CNN model usually requires the ground truth of training images. ...
... Fifteen classical and state-of-the-art image fusion methods are chosen to evaluate the fusion performance of our proposed fusion framework, including: curvelet transform fusion method (CVT) [38], dual-tree complex wavelet transform fusion method (DTCWT) [10], multi-resolution singular value decomposition fusion method (MSVD) [39], cross bilateral filter fusion method (CBF) [40], guided filter fusion method (GFF) [41], gradient transfer and total variation minimization fusion method (GTF) [42], hybrid multi-scale decomposition with Gaussian and bilateral filters fusion method (HMSD-GF) [43], infrared feature extraction and visual information preservation fusion method (IFEVIP) [44], convolutional neural networks fusion method (FCNN) [22], gradient filter fusion method (GF) [45], visual saliency map and weighted least square optimization-based fusion method (WLS) [28], latent low-rank representation fusion method (LatLRR) [24], GAN based fusion method (FusionGAN) [4], dense block based fusion method (DenseFuse) [46], and Nest Connection and Spatial/Channel Attention fusion method (NestFuse) [47]. All above comparison methods are conducted based on their publicly available codes, and their parameters are set according to their papers. ...
Article
Full-text available
Infrared and visible image fusion is a hot topic due to the perfect complementarity of their information. There are two key problems in infrared and visible image fusion. One is how to extract significant target areas and rich texture details from the source images, and the other is how to integrate them to produce satisfactory fused images. To tackle these problems, we propose a novel fusion framework in this paper. A multi-level image decomposition method is used to obtain the base layer and detail layer of the source image. For the fusion of base layer, an ingenious fusion strategy guided by the saliency map of source image is designed to improve the intensity of salient targets and the visual quality of the fused image. For the fusion of detail layer, an efficient approach by introducing the enhanced gradient information is presented to boost the detail features and sharpen the edges of the fused image. Experimental results demonstrate that, compared with fifteen classical and advanced fusion methods, the proposed image fusion framework has better performance in both subjective and objective evaluation.
... Deep learning-based fusion methods specialized for infrared and visible image fusion include the convolutional neural network (CNN) [11], DenseFuse [12], the disentangled representation fusion (DRF) [13] and so on. The main drawback of deep learning-based methods is that it is difficult to train when the training data is insufficient, especially in infrared and visible image fusion tasks, and a very little attention is paid to image decomposition in the deep learning based methods [14]. ...
... Furthermore, 9 recent and classical fusion methods are chosen to conduct the same experiment for comparison purposes. They include the nonsubsampled contourlet transform method (NSCT) [8], the convolutional neural networks based method (CNN) [11], disentangled representation (DRF) [13], MDLatLRR [14], the visual saliency map and weighted least square optimization method (VSM-WLS) [29], convolutional sparse representation method (CSR) [34], the curvelet transform method (CVT) [35], the parameter- [36], and the gradient transfer fusion method (GTF) [37]. For the purpose of a quantitative comparison between our method and the other selected comparison methods, 7 quality metrics are utilized. ...
Article
Full-text available
Latent low-rank representation has been applied to multi-level image decomposition for the fusion of infrared and visible images to obtain good results. However, when the original infrared and visible images are of low quality, the visual effect of the fused images is still unsatisfactory. To combat this challenge, this paper proposes an infrared and visible image fusion method based on multi-level latent low-rank representation joint with image enhancement and multiple visual weight information. First, the source images are decomposed into detail parts - including detail images and detail matrices - and the base images respectively using multi-level latent low-rank representation. Then the nuclear norm based fusion strategy is used to fuse the detail matrices and multi-visual weights determined by the clarity, local contrast and edge-corner saliency is used to fuse the detail images. The aforementioned two fusion results are weight averaged to obtain a fused detail image. The base images are fused by an averaging strategy after Retinex-based enhancement. The final fused image is obtained by combining the fused detail image and the fused base image. Compared with other state-of-the-art fusion methods, the proposed algorithm displays better fusion performance in both subjective and objective evaluation.
... We optimized the model's structure and reduced the number of layers, relying more on the loss function to achieve the model generation quality and obtain fast fusion performance. (The CNN [30] is a Siamese network in which the weights of the two branches are constrained to the same and each branch consists of three convolutional layers and one max-pooling layer.) Table 5 shows the time taken for image fusion. ...
... We optimized the model's structure and reduced the number of layers, relying more on the loss function to achieve the model generation quality and obtain fast fusion performance. (The CNN [30] is a Siamese network in which the weights of the two branches are constrained to the same and each branch consists of three convolutional layers and one max-pooling layer). Table 5. ...
Article
Full-text available
The presence of fake pictures affects the reliability of visible face images under specific circumstances. This paper presents a novel adversarial neural network designed named as the FTSGAN for infrared and visible image fusion and we utilize FTSGAN model to fuse the face image features of infrared and visible image to improve the effect of face recognition. In FTSGAN model design, the Frobenius norm (F), total variation norm (TV), and structural similarity index measure (SSIM) are employed. The F and TV are used to limit the gray level and the gradient of the image, while the SSIM is used to limit the image structure. The FTSGAN fuses infrared and visible face images that contains bio-information for heterogeneous face recognition tasks. Experiments based on the FTSGAN using hundreds of face images demonstrate its excellent performance. The principal component analysis (PCA) and linear discrimination analysis (LDA) are involved in face recognition. The face recognition performance after fusion improved by 1.9% compared to that before fusion, and the final face recognition rate was 94.4%. This proposed method has better quality, faster rate, and is more robust than the methods that only use visible images for face recognition.
... In recent years, deep learning has been rapidly developed and widely applied in the field of image fusion. Liu et al. [18] used the pre-trained convolutional neural network to calculate weights for the fusion of infrared and visible images. Based on the work of Prabhakar et al. [19], Li et al. [20] proposed an infrared and visible image fusion model based on unsupervised learning and dense connection network. ...
... In recent years, many researchers have applied convolutional neural network to infrared and visible image fusion, and the commonly used loss function is pixel loss (L P ). In our network, we also introduced the structural similarity index metric (SSIM) to construct the loss function (L S ) by referring to [18].The total loss function consists of the loss values of three channels and is defined as L. ...
Article
Full-text available
For the lack of labels in infrared and visible image fusion network, an infrared and visible image fusion model based on multi‐channel unsupervised convolutional neural network (CNN) is proposed in this paper, in order to extract more detailed information through multi‐channel inputs. In contrast to conventional unsupervised fusion network, the proposed network contains three channels for extracting infrared features, visible features and common features of infrared and visible images, respectively. The square loss function is used to train the network. Pairs of infrared and visible images are input to DenseNet to extract as more useful features as possible. A fusion module is designed to fuse the extracted features for testing. Experimental results show that the proposed method can preserve both the clear target of infrared and detailed information of visible images simultaneously. Experiments also demonstrate the superiority of the proposed method over the state‐of‐the‐art methods in objective metrics.
... To evaluate the merits of our proposed method, we compare the imagefusing performance of our approach to seven published methods in the literature. These methods are LPP [26] , LP [27] , CVT [28] , DTCWT [29] , GTF [30] , CNN [31] , GAN-McC [32] , PMGI [33] , FusionGAN , DDcGAN, RFN-Nest [34] , and RCGAN [35] . We also provide an additional ablation experiment to compare the proposed DDGANSE with FusionGAN, GANMcC, PMGI, and DDcGAN. ...
Article
Full-text available
Infrared images can provide clear contrast information to distinguish between the target and the background under any lighting conditions. In contrast, visible images can provide rich texture details and are compatible with the human visual system. The fusion of a visible image and infrared image will thus contain both comprehensive contrast information and texture details. In this study, a novel approach for the fusion of infrared and visible images is proposed based on a dual-discriminator generative adversarial network with a squeeze-and-excitation module (DDGANSE). Our approach establishes confrontation training between one generator and two discriminators. The goal of the generator is to generate images that are similar to the source images, and contain the information from both infrared and visible source images. The purpose of the two discriminators is to increase the similarity between the image generated by the generator and the infrared and visible images. We experimentally demonstrated that using continuous adversarial training, DDGANSE outputs images retain the advantages of both infrared and visible images with significant contrast information and rich texture details. Finally, we compared the performance of our proposed method with previously reported techniques for fusing infrared and visible images using both quantitative and qualitative assessments. Our experiments on the TNO dataset demonstrate that our proposed method shows superior performance compared to other similar reported methods in the literature using various performance metrics.
... We select seven pairs of image samples, including the low visibility visible images. Other classic or newly released fusion algorithms, including the fusion method based on NSST [32], the fusion method based on GF multiscale decomposition (MGFF) [33], the fusion method based on infrared image structure extraction and visible image information retention (IFEVIP) [34],the multiscale fusion method through Gaussian and bilateral filters (HMSD) [35], the fusion method based on the convolutional neural network method (CNN) [36], the fusion method based on target-enhanced multiscale transform decomposition (TE-MSD) [37], and the fusion method based on multiscale transformation and norm optimization (MST-NO) [38], are compared with the proposed method on some image samples. The experimental parameters of the proposed method in this paper are set as follows. ...
Article
Full-text available
To improve the fusion quality of infrared and visible images and highlight target and scene details, in this paper, a novel infrared and visible image fusion algorithm is proposed. First, a method for combining dynamic range compression and contrast restoration based on a guided filter is adopted to enhance the contrast of visible source images. Second, guided filter-based image multiscale decomposition is used to decompose images into base layers and detail layers. For base layer fusion, a fusion strategy based on the detail and energy measurements of the source image is proposed to determine the pixel value of the fused image base layer such that the energy loss of the fusion can be reduced and the texture detail features are highlighted to obtain more source image details. Finally, recursive separation and weighted histogram equalization methods are applied to optimize the fused image. Experimental results show that the fusion algorithm and fusion strategy proposed in this paper can effectively improve fusion image clarity, while more detailed target and scene information can still be retained.
... Recent advancement in neural networks, known as convolutional neural network (CNN), plays a major role in image fusion. Liu et al. [26] made use of Siamese convolutional neural network to generate weight map. This weight map undergoes Gaussian pyramid decomposition, and source images are decomposed into multi-scales by Laplacian pyramid. ...
Article
Full-text available
Image fusion integrates several images from different modalities into a single image which contains more spatial and spectral resolution. Artifacts, smoothing and ringing are major issues in convolutional neural networks, edge-preserving filters and transform-based image fusion methods. Orthogonal rectangular with column pivoting (QRCP) matrix factorization-based hybrid approach is proposed in this work to overcome the above issues and to improve the fusion of visible and infrared images. QRCP decomposition is an accurate matrix decomposition that separates base layer and detail layer from source images. Discrete cosine transform and local spatial frequency concept are employed to fuse the base layers. Weight maps are utilized to transfer information into detail layers that are obtained directly from base layers. The obtained fused image is a linear combination of final base layer and final detail layer. The proposed method outperforms the existing methods in terms of performance measures such as entropy, spatial frequency, mutual information, normalized weight performance index, mean and standard deviation.
... The first set contains 5990 frames, the second set contains 5081 frames, and each sequence was divided into training and testing sets. We performed experiments on two datasets, namely, fusionA-22 and fusionC-22, which contain images obtained by fusing infrared and visible light, using the methods of literature [17] and literature [28], respectively [10]. An image after the fusion of IR and visible light images will have better visual quality, and it will be easier to distinguish details such as characters in the image. ...
Preprint
Full-text available
Image super-resolution is important in many fields, such as surveillance and remote sensing. However, infrared (IR) images normally have low resolution since the optical equipment is relatively expensive. Recently, deep learning methods have dominated image super-resolution and achieved remarkable performance on visible images; however, IR images have received less attention. IR images have fewer patterns, and hence, it is difficult for deep neural networks (DNNs) to learn diverse features from IR images. In this paper, we present a framework that employs heterogeneous convolution and adversarial training, namely, heterogeneous kernel-based super-resolution Wasserstein GAN (HetSRWGAN), for IR image super-resolution. The HetSRWGAN algorithm is a lightweight GAN architecture that applies a plug-and-play heterogeneous kernel-based residual block. Moreover, a novel loss function that employs image gradients is adopted, which can be applied to an arbitrary model. The proposed HetSRWGAN achieves consistently better performance in both qualitative and quantitative evaluations. According to the experimental results, the whole training process is more stable.
... Deep learning uses backpropagation algorithm to direct how the machine works, so as to find the complex structure in large data sets and solve the problem of poor consideration of handcrafted. Applications in the field of image processing mainly include CNN [24], GAN [25], Siam network [26], and automatic encoder [27]. ...
Article
Full-text available
Image fusion is to effectively enhance the accuracy, stability, and comprehensiveness of information. Generally, infrared images lack enough background details to provide an accurate description of the target scene, while visible images are difficult to detect radiation under adverse conditions, such as low light. People hoped that the richness of image details can be improved by using effective fusion algorithms. In this paper, we propose an infrared and visible image fusion algorithm, aiming to overcome some common defects in the process of image fusion. Firstly, we use fast approximate bilateral filter to decompose the infrared image and visible image to obtain the small-scale layers, large-scale layer, and base layer. Then, the fused base layer is obtained based on local energy characteristics, which avoid information loss of traditional fusion rules. The fused small-scale layers are acquired by selecting the absolute maximum, and the fused large-scale layer is obtained by summation rule. Finally, the fused small-scale layers, large-scale layer, and base layer are merged to reconstruct the final fused image. Experimental results show that our method retains more detailed appearance information of the fused image and achieves good results in both qualitative and quantitative evaluations.
... We extracted 55 images from this dataset as the infrared image training datasets named IR55. We employed two datasets for testing, defined as result-A and result-C, which consisted of 22 images obtained by fusing the infrared images and visible light images utilizing the approaches proposed in [36,37], respectively. We used the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) to measure the quality of the reconstructed images. ...
Article
Full-text available
Deep convolutional neural networks are capable of achieving remarkable performance in single-image super-resolution (SISR). However, due to the weak availability of infrared images, heavy network architectures for insufficient infrared images are confronted by excessive parameters and computational complexity. To address these issues, we propose a lightweight progressive compact distillation network (PCDN) with a transfer learning strategy to achieve infrared image super-resolution reconstruction with a few samples. We design a progressive feature residual distillation (PFDB) block to efficiently refine hierarchical features, and parallel dilation convolutions are utilized to expand PFDB’s receptive field, thereby maximizing the characterization power of marginal features and minimizing the network parameters. Moreover, the bil-global connection mechanism and the difference calculation algorithm between two adjacent PFDBs are proposed to accelerate the network convergence and extract the high-frequency information, respectively. Furthermore, we introduce transfer learning to fine-tune network weights with few-shot infrared images to obtain infrared image mapping information. Experimental results suggest the effectiveness and superiority of the proposed framework with low computational load in infrared image super-resolution. Notably, our PCDN outperforms existing methods on two public datasets for both ×2 and ×4 with parameters less than 240 k, proving its efficient and excellent reconstruction performance.
... By means of combining the fused base layer B f (Equation (13)) and detail layer D f (Equation (19)), the final fused result F is reconstructed: In order to verify the effectiveness and superiority of our MFGS fusion method, a significant amount of experiments are conducted to compare the proposed method with nine state-of-the-art fusion methods including NSCT [33], HyMSD (hybrid multiscale decomposition with Gaussian and bilateral filters) [23], CSR (convolutional sparse representation) [14], GTF (gradient transfer fusion) [15], VSMWLS (visual saliency map and weighted least square optimization) [24], CNN (convolutional neural networks) [55], DLVGG (deep learning framework using imagenet-vgg-verydeep-19) [40], ResNet (deep learning framework based on ResNet and zero-phase component analysis) [41], and TE (target-enhanced) [51]. The first scheme is a frequently-used and representative MSTbased method so far, while the latter six schemes are state-of-the-art methods proposed in recent years. ...
Article
Full-text available
As a powerful technique to merge complementary information of original images, infrared (IR) and visible image fusion approaches are widely used in surveillance, target detecting, tracking, and biological recognition, etc. In this paper, an efficient IR and visible image fusion method is proposed to simultaneously enhance the significant targets/regions in all source images and preserve rich background details in visible images. The multi-scale representation based on the fast global smoother is firstly used to decompose source images into the base and detail layers, aiming to extract the salient structure information and suppress the halos around the edges. Then, a target-enhanced parallel Gaussian fuzzy logic-based fusion rule is proposed to merge the base layers, which can avoid the brightness loss and highlight significant targets/regions. In addition, the visual saliency map-based fusion rule is designed to merge the detail layers with the purpose of obtaining rich details. Finally, the fused image is reconstructed. Extensive experiments are conducted on 21 image pairs and a Nato-camp sequence (32 image pairs) to verify the effectiveness and superiority of the proposed method. Compared with several state-of-the-art methods, experimental results demonstrate that the proposed method can achieve more competitive or superior performances according to both the visual results and objective evaluation.
... In [32], Li et al. have used ResNet and zero-phase component analysis, which achieve the good fusion performances. More generally, many convolutional neural networks (CNN) based methods were proposed for the IR and visible image fusion [33][34][35][36]. Unlike CNN and ResNet models, Ma et al. [37] have adopted DDcGAN (Dual-discriminator Conditional Generative Adversarial Network) to attain the fusion outputs with the enhanced targets, which facilitates the understandings of scenes for humans. ...
Article
Full-text available
An efficient method for the infrared and visible image fusion is presented using truncated Huber penalty function smoothing and visual saliency based threshold optimization. The method merges complementary information from multimodality source images into a more informative composite image in two-scale domain, in which the significant objects/regions are highlighted and rich feature information is preserved. Firstly, source images are decomposed into two-scale image representations, namely, the approximate and residual layers, using truncated Huber penalty function smoothing. Benefiting from the edge- and structure-preserving characteristics, the significant objects and regions in the source images are effectively extracted without halo artifacts around the edges. Secondly, a visual saliency based threshold optimization fusion rule is designed to fuse the approximate layers aiming to highlight the salient targets in infrared images and remain the high-intensity regions in visible images. The sparse representation based fusion rule is adopted to fuse the residual layers with the goal of acquiring rich detail texture information. Finally, combining the fused approximate and residual layers reconstructs the fused image with more natural visual effects. Sufficient experimental results demonstrate that the proposed method can achieve comparable or superior performances compared with several state-of-the-art fusion methods in visual results and objective assessments.
... In 2017, Liu et al. [42] propose an image fusion method based on a convolutional neural network. In this paper, infrared and visible images are used as source images, generating the output containing visible details and infrared features. ...
Preprint
Deep learning-based image fusion approaches have obtained wide attention in recent years, achieving promising performance in terms of visual perception. However, the fusion module in the current deep learning-based methods suffers from two limitations, \textit{i.e.}, manually designed fusion function, and input-independent network learning. In this paper, we propose an unsupervised adaptive image fusion method to address the above issues. We propose a feature mutual mapping fusion module and dual-branch multi-scale autoencoder. More specifically, we construct a global map to measure the connections of pixels between the input source images. % The found mapping relationship guides the image fusion. Besides, we design a dual-branch multi-scale network through sampling transformation to extract discriminative image features. We further enrich feature representations of different scales through feature aggregation in the decoding process. Finally, we propose a modified loss function to train the network with efficient convergence property. Through sufficient training on infrared and visible image data sets, our method also shows excellent generalized performance in multi-focus and medical image fusion. Our method achieves superior performance in both visual perception and objective evaluation. Experiments prove that the performance of our proposed method on a variety of image fusion tasks surpasses other state-of-the-art methods, proving the effectiveness and versatility of our approach.
... (a) The modal differences of source images are ignored. This type of network [14,15,18,20,27,28] is generally designed with an encoder to extract features and a fusion rule to fuse features. (b) Encoders adopted the same network structure [16,17], but the fusion rules are not specific to different encoders. ...
Preprint
Infrared and visible images, as multi-modal image pairs, show significant differences in the expression of the same scene. The image fusion task is faced with two problems: one is to maintain the unique features between different modalities, and the other is to maintain features at various levels like local and global features. This paper discusses the limitations of deep learning models in image fusion and the corresponding optimization strategies. Based on artificially designed structures and constraints, we divide models into explicit models, and implicit models that adaptively learn high-level features or can establish global pixel associations. Ten models for comparison experiments on 21 test sets were screened. The qualitative and quantitative results show that the implicit models have more comprehensive ability to learn image features. At the same time, the stability of them needs to be improved. Aiming at the advantages and limitations to be solved by existing algorithms, we discuss the main problems of multi-modal image fusion and future research directions.
... The proposed fusion algorithm was compared with RGF_MDFB [32], NSCT-SR [14], Hybrid-MSD [40], CNN [16],GFF [12], VILS [32]and GTF [20] for the multi-modal infrared image fusion, infrared and visual image fusion. The Fig. 3 is the infrared (IR) and visual image (VIS), and the fusion images of the different fusion algorithms. ...
Article
Full-text available
The key problem of multi-modal image fusion is that the complementary features of the source images are easily lose in the fusion. In this paper, the fusion algorithm is proposed with the hybrid ℓ0ℓ1 layer decomposing and multi-directional filter banks to extract edge, contour, and detail features from images and fuse the complementary features well. First, the hybrid ℓ0ℓ1 layer decomposing is ability to effectively overcome halo artifacts and over-enhancement, when the detail features are eliminated, so the low-frequency and detail features of the source images are separated. Then, the visual salient detection based on ant colony optimisation and local phase coherenceis is introduced to guide the fusion of the base lay images. Then the detail image is decomposed by using multi-directional filter banker, and the different direction detail features are extracted, and the fusion rule multi-directional gradient and principal component analysis are adopted for the detail sub-band images to prevent the loss of detail features. Finally, the final fused image is reconstructed by the inverse transformation of the hybrid ℓ0ℓ1 layer decomposing. Experimental results demonstrate that the values of the spatial frequency, the the standard deviation, the edge strength,the difference seminary index, and the difference structural similarity are increased significantly, so the proposed fusion algorithm can effectively preserve the complementary information between images and improve the quality of fused images.
... All the experiments are implemented using MATLAB 2018a on a notebook PC. And five recent methods are compared in the same experimental environment for verification, such as image fusion with ResNet and zero-phase component analysis (ResNet) proposed by Li et al.[21], image fusion with the convolutional neural network (CNN) proposed by Liu et al.[22], gradient transfer and total variation minimization-based image fusion method (GTF) proposed by Ma et al.[23], image ...
Article
Full-text available
To improve the fusion performance of infrared and visible images and effectively retain the edge structure information of the image, a fusion algorithm based on iterative control of anisotropic diffusion and regional gradient structure is proposed. First, the iterative control operator is introduced into the anisotropic diffusion model to effectively control the number of iterations. Then, the image is decomposed into a structure layer containing detail information and a base layer containing residual energy information. According to the characteristics of different layers, different fusion schemes are utilized. The structure layer is fused by combining the regional structure operator and the structure tensor matrix, and the base layer is fused through the Visual Saliency Map. Finally, the fusion image is obtained by reconstructing the structure layer and the energy layer. Experimental results show that the proposed algorithm can not only effectively deal with the fusion of infrared and visible images but also has high efficiency in calculation.
Article
Image fusion is the process of amalgamation of essential information from multiple images into a single image resulting in a higher resolution image. It is an essential step of image augmentation in satellite imagery. Various frequency bands hold feature information related to a certain frequency range. Image fusion enables to superpose of co-registered images captured from different sensors to yield a better image with features from both the source images. Image fusion helps to get a more detailed single image than multiple images with different features. In our paper, we have presented an anisotropic diffusion-based fusion method for C-band and L-band AirSAR(Airborne Synthetic Aperture Radar) images. We found that the anisotropic diffusion method works better than old pansharpening techniques and works much faster and involves much fewer resources than Convolutional Neural Networks.
Article
Most of the deep learning object detection methods based on multi-modal information fusion cannot directly control the quality of the fused images at present, because the fusion only depends on the detection results. The indirectness of control is not conducive to the target detection of the network in principle. For the sake of the problem, we propose a multimodal information cross-fusion detection method based on a generative adversarial network (CrossGAN-Detection), which is composed of GAN and a target detection network. And the target detection network acts as the second discriminator of GAN during training. Through the content loss function and dual discriminator, directly controllable guidance is provided for the generator, which is designed to learn the relationship between different modes adaptively through cross fusion. We conduct abundant experiments on the KITTI dataset, which is the prevalent dataset in the fusion-detection field. The experimental results show that the AP of the novel method for vehicle detection achieves 96.66%, 87.15%, and 78.46% in easy, moderate, and hard categories respectively, which is improved about 7% compared to the state-of-art methods.
Article
For infrared and visible image fusion technology, it has always been a challenge to effectively select useful information from the source image and integrate them because the imaging principle of infrared and visible images are widely different. To solve this problem, a novel infrared and visible image fusion algorithm are proposed, which includes the following contributions: (i) an infrared visual saliency extracting scheme using global measurement are presented, (ii) a visible visual saliency measure scheme by a local measurement strategy are proposed, and (iii) a fusion rule based on orthogonal space is designed to combine the extracted saliency maps. Specifically, in order to make humans pay attention to infrared targets, coarse-scale decomposition is performed. Then a global measurement strategy is utilized to get saliency maps. In addition, since visible images have rich textures, fine-scale decomposition can make the visual system pay attention to tiny details. Next, the visual saliency is measured by a local measurement strategy. Different from the general fusion rules, the orthogonal space is constructed to integrate the saliency maps, which can remove the correlation of saliency maps to avoid mutual interference. Experiments on public databases demonstrate that the fusion results of the proposed fusion algorithm are better than other comparison algorithms in qualitative and quantitative assessment.
Article
We propose an infrared and visible image fusion method based on an iterative differential thermal information filter to generate a fusion image with the salient thermal targets of the infrared image and detailed information of the visible image. Firstly, we enhance thermal information of infrared images using a dynamic threshold thermal information filter. Then, we use the multiple difference rolling guidance filter feature fusion method to separate and enhance the detailed information of the visible image. Finally, we gain the fusion image by a weighted-averaging strategy. The advantages and effectiveness of the proposed method are experimentally demonstrated by qualitatively and quantitatively comparing with the deep learning and non deep learning-based methods.
Article
Existing cross-modal image fusion methods pay limited research attention to image fusion efficiency and network architecture. However, the efficiency and accuracy of image fusion have an important impact on practical applications. To solve this problem, we propose a light-weight, efficient, and general cross-modal image fusion network, termed as AE-Netv2. Firstly, we analyze the influence of different network architectures (e.g., group convolution, depth-wise convolution, inceptionNet, squeezeNet, shuffleNet, and multi-scale module) on image fusion quality and efficiency, which provides a reference for the design of image fusion architecture. Secondly, we explore the commonness and characteristics of different image fusion tasks, which provides a research basis for further research on the continuous learning characteristics of the human brain. Finally, positive sample loss is added to the similarity loss to reduce the difference of data distribution of different cross-modal image fusion tasks. Comprehensive experiments demonstrate the superiority of our method compared to state-of-the-art methods in different fusion tasks at a real-time speed of 100+ FPS on GTX 2070. Compared with the fastest image fusion method based on deep learning, the efficiency of AE-Netv2 is improved by 2.14 times. Compared with the image fusion model with the smallest model size, the size of our model is reduced by 11.59 times.
Chapter
Image super-resolution is important in many fields, such as surveillance and remote sensing. However, infrared (IR) images normally have low resolution since the optical equipment is relatively expensive. Recently, deep learning methods have dominated image super-resolution and achieved remarkable performance on visible images; however, IR images have received less attention. IR images have fewer patterns, and hence, it is difficult for deep neural networks (DNNs) to learn diverse features from IR images. In this paper, we present a framework that employs heterogeneous convolution and adversarial training, namely, heterogeneous kernel-based super-resolution Wasserstein GAN (HetSRWGAN), for IR image super-resolution. The HetSRWGAN algorithm is a lightweight GAN architecture that applies a plug-and-play heterogeneous kernel-based residual block. Moreover, a novel loss function that employs image gradients is adopted, which can be applied to an arbitrary model. The proposed HetSRWGAN achieves consistently better performance in both qualitative and quantitative evaluations. According to the experimental results, the whole training process is more stable.
Article
Existing image fusion methods always use hand-crafted fusion rules due to the uninterpretability of deep feature maps, which restrict the performance of networks and result in distortion. To address these limitations, this paper for the first time realizes the interpretable importance evaluation of feature maps in a deep learning manner. This importance-oriented fusion rule helps preserve valuable feature maps and thus reduce distortion. In particular, we propose a pixel-wise classification saliency-based fusion rule. First, we employ a classifier to classify two types of source images which capture the differences and uniqueness between two classes. Then, the importance of each pixel is quantified as its contribution to the classification result. The importance is shown in the form of classification saliency maps. Finally, the feature maps are fused according to the saliency maps to generate fusion results. Moreover, because there is no need of manually deciding the characteristics to be retained, it is an unsupervised method with less human participation. Both qualitative and quantitative experiments demonstrate the superiority of our method over the state-of-the-art fusion methods even if using a simple network.
Article
In this study, a novel distillation paradigm named perceptual distillation is proposed to guide the training of image fusion networks without ground truths. In the paradigm, the student network which we called main autoencoder takes in source images and produces a fused image, and the teacher network is a well-trained network exploited to compute teacher representations of images. Knowledge in the teacher representations of source images is distilled and transferred to our student main autoencoder with the help of the perceptual saliency scheme. Meanwhile, it derives a pixel level scheme of pixel compensation, which combines with source image to enhance the pixel intensity of the fused image. Moreover, a multi-autoencoder architecture is developed by assembling two auxiliary decoders behind the main autoencoder. The architecture is trained with self-supervision to consolidate fusion training against the limitation of teacher network. Qualitative and quantitative experiments demonstrate that the proposed network achieves the state-of-the-art performance on multi-source image fusion compared with the existing fusion methods.
Article
The infrared and visible image fusion aims to integrate complementary information from the input image to generate an information-rich fused image, which has been widely used to improve the performance of various surveillance systems and high-level vision tasks. In this paper, we propose a novel infrared and visible image fusion via salient object extraction and low-light region enhancement. The proposed method can accurately extract salient objects from the source image and preserve the visual background. For infrared images, the most widely distributed pixels are utilized as seed points to measure intensity saliency. Not only that, since the spatial distribution of objects in images also affects visual attention, we design a central deviation model to measure spatial distribution saliency. The infrared salient object can be extracted by combining the intensity and spatial distribution saliency. For visible images, different from existing fusion methods, the direction uniformity instead of the gradient magnitude is utilized to extract salient objects. Finally, because the fusion task is always performed under low-light conditions or complex environments, we reduce the low-light region to improve the visual quality of visible images and then use the enhanced visible image as the background to reconstruct the fused image. Experimental results demonstrate that the proposed fusion method outperforms nine state-of-the-art image fusion methods in a series of qualitative and quantitative evaluations, and also has excellent visual effects.
Article
The Enhanced Flight Vision System (EFVS) plays a significant role in the Next-Generation low visibility aircraft landing technology, where the involvement of optical sensing systems increases the visual dimension for pilots. This paper focuses on deploying infrared and visible image fusion systems in civil flight, particularly generating integrated results to contend with registration deviation and adverse weather conditions. The existing enhancement methods push ahead with metrics-driven integration, while the dynamic distortion and the continuous visual scene are overlooked in the landing stage. Hence, the proposed visual enhancement scheme is divided into homography estimation and image fusion based on deep learning. A lightweight framework integrating hardware calibration and homography estimation is designed for image calibration before fusion and reduces the offset between image pairs. The transformer structure adopting the self-attention mechanism in distinguishing composite properties is incorporated into a concise autoencoder to construct the fusion strategy, and the improved weight allocation strategy enhances the feature combination. These things considered, a flight verification platform accessing the performances of different algorithms is built to capture image pairs in the landing stage. Experimental results confirm the equilibrium of the proposed scheme in perception-inspired and feature-based metrics compared to other approaches.
Article
In order to address the fusion problem of infrared (IR) and visible images, this paper proposes a method using a local non-subsampled shearlet transform (LNSST) based on a generative adversarial network (GAN). We first decompose the source images into basic images and salient images by LNSST, then use two GANs fuse basic images and salient images. Lastly, we compose the fused basic images and salient images by inverse LNSST. We adopt public data sets to verify our method and by comparing with eight objective evaluation parameters obtained by 10 other methods. It is demonstrated that our method is able to achieve better performance than the state of the art on preserving both texture details and thermal information.
Article
Multimodal medical image fusion is the most popular tool to integrate important information of multimodal medical images into a single complete informative image. Fusion provides an effective way for medical image diagnosis and treatment. However the acquired medical images may be corrupted with noise due to patient movement or faulty transmission, which misguides the image analysis and requires denoising. Therefore, in the suggested scheme non-subsampled contourlet transform (NSCT) is first used to extract features from the noisy source images. Then, a Siamese convolutional neural network (sCNN) is utilized for weighted fusion of important features from the two multimodal images. Simultaneously, a fractional order total generalized variation (FOTGV) is implemented for noise removal with improved degree of freedom. The image processing results demonstrate that the NSCT + sCNN + FOTGV scheme performs effectively for clean as well as noisy images in comparison to the other state-of-the-art conventional techniques on the basis of visual and quantitative analysis.
Article
The goal of infrared and visible image fusion is to generate an informative image which preserves texture details and infrared targets. Most generative adversarial network (GAN) based methods take a concatenation of two source images as the input, hence the extracted feature is insufficient for preserving background and detail information. To this end, we propose a novel GAN based fusion framework, termed as double-stream guided filter network (DSG-Fusion). Given the diverse modalities of infrared and visible images, the generator of DSG network extracts features of two images through two independent data flows. To address the problem of extracting representative background information and force the DSG network focus on details, we integrate guided filter into the generator to obtain base and detail layers of source images. The base layers are concatenated with the corresponding source images to extract deeper features, while detail layers participate in the fusion procedure directly. Thus, DSG-Fusion can extract texture and intensity information sufficiently, and more background and detail information are preserved. Furthermore, a DSG loss consisting of intensity and structural similarity (SSIM) is designed to impel the network to generate a better fused image. Extensive experimental results show that DSG-Fusion achieves better performance comparing with five representative methods.
Article
Multimodal image fusion is a contemporary branch of medical imaging that aims to increase the accuracy of clinical diagnosis of the disease stage development. The fusion of different image modalities can be a viable medical imaging approach. It combines the best features to produce a composite image with higher quality than its predecessors and can significantly improve medical diagnosis. Recently, sparse representation (SR) and Siamese Convolutional Neural Network (SCNN) methods have been introduced independently for image fusion. However, some of the results from these approaches have recorded defects, such as edge blur, less visibility, and blocking artifacts. To remedy these deficiencies, in this paper, a smart blending approach based on a combination of SR and SCNN is introduced for image fusion, which comprises three steps as follows. Firstly, entire source images are fed into the classical orthogonal matching pursuit (OMP), where the SR-fused image is obtained using the max-rule that aims to improve pixel localization. Secondly, a novel scheme of SCNN-based K-SVD dictionary learning is re-employed for each source image. The method has shown good non-linearity behavior, contributing to increasing the fused output's sparsity characteristics and demonstrating better extraction and transfer of image details to the output fused image. Lastly, the fusion rule step employs a linear combination between steps 1 and 2 to obtain the final fused image. The results depict that the proposed method is advantageous, compared to other previous methods, notably by suppressing the artifacts produced by the traditional SR and SCNN model.
Article
For the past few years, image fusion technology has made great progress, especially in infrared and visible light image infusion. However, the fusion methods, based on traditional or deep learning technology, have some disadvantages such as unobvious structure or texture detail loss. In this regard, a novel generative adversarial network named MSAt-GAN is proposed in this paper. It is based on multi-scale feature transfer and deep attention mechanism feature fusion, and used for infrared and visible image fusion. First, this paper employs three different receptive fields to extract the multi-scale and multi-level deep features of multi-modality images in three channels rather than artificially setting a single receptive field. In this way, the important features of the source image can be better obtained from different receptive fields and angles, and the extracted feature representation is also more flexible and diverse. Second, a multi-scale deep attention fusion mechanism is designed in this essay. It describes the important representation of multi-level receptive field extraction features through both spatial and channel attention and merges them according to the level of attention. Doing so can lay more emphasis on the attention feature map and extract significant features of multi-modality images, which eliminates noise to some extent. Third, the concatenate operation of the multi-level deep features in the encoder and the deep features in the decoder are cascaded to enhance the feature transmission while making better use of the previous features. Finally, this paper adopts a dual-discriminator generative adversarial network on the network structure, which can force the generated image to retain the intensity of the infrared image and the texture detail information of the visible image at the same time. Substantial qualitative and quantitative experimental analysis of infrared and visible image pairs on three public datasets show that compared with state-of-the-art fusion methods, the proposed MSAt-GAN network has comparable outstanding fusion performance in subjective perception and objective quantitative measurement.
Article
Existing studies on infrared and visible image fusion generally need to first decompose the fusion image and then extract features beneficial to image fusion from the decomposition results for better fusion results. However, they usually focus on decomposing single-modality images though various techniques such as latent low-rank representation (LatLRR), without considering the spatial consistency of both infrared and visible image modalities, which may fail to effectively capture inherent image features. In this paper, we propose a sparse consistency constrained latent low-rank representation (SccLatLRR) method to fuse infrared and visible images. Firstly, infrared and visible images are performed low-rank representation decomposition simultaneously as the inputs of different tasks. In the decomposition process, the L2,1 norm is used to constrain the rank to maintain sparse consistency, and the low-rank consensus representations of infrared and visible images are obtained simultaneously. Secondly, the basic information is further mined respectively using the Very Deep Convolutional Network (VGG) network and non-subsampled contourlet transforms (NSCT) method to extract more effective fusion features. Finally, different fusion strategies are used for the base part and the salient part. An effective iterative algorithm optimization model is proposed. Experimental results on the public dataset TNO suggest the effectiveness of our method compared with several state-of-the-art methods.
Article
With the development of infrared imaging technology, the study of infrared polarization and infrared intensity image fusion has great potential for application. In this paper, we propose a fusion method based on multi-decomposition latent low-rank representation (LatLRR) and combine it with dual- simplified pulse coupled neural network (D-SPCNN) for the improvement of detail layer fusion. First, the source images are decomposed into the low-rank part and saliency part using LatLRR, and the weight map of the low-rank part is calculated as the final base layer using the luminance weighting strategy. Then, the saliency part is decomposed by LatLRR again to get the detail layers after multiple decompositions, and the saliency part is used as the input of D-SPCNN to output the activated detail weight map. After that, the final detail layer is obtained by summing the results of the multiple times detail weighting strategy. Finally, the detail layer and the base layer are combined and reconstructed to get the fused image. Our method has an excellent effect of improving brightness, while retaining sufficient image details. The experimental results show that our proposed method obtains better performance in both objective evaluation and subjective metrics when compared with other methods.
Article
Infrared and visible image fusion technology aims to integrate the heat source information of infrared image into the visible image to generate a more informative image. Many fusion methods proposed in recent years have problems such as loss of detailed information and low contrast. In this paper, we propose a novel infrared and visible image fusion method based on visibility enhancement and hybrid multiscale decomposition. Firstly, we propose an effective image pre-processing method to increase the details and improve the quality of the source image. Then, the pre-processed images are decomposed by ℓ1−ℓ0 decomposition model to obtain the base and detail layers. Thirdly, for the base layer fusion part, we propose a meaningful method based on the weight of the visual saliency illumination map (VSIM), which not only preserves the contrast information but also does it guarantee the overall structure of the fusion result. For the detail layer fusion part, we take the advantage of convolutional neural network (CNN) to obtain the decision map of the fused detail layer. Next, we employ Laplacian and Gaussian pyramids to decompose the detail layers and decision map, respectively and then fuse them by a synthetic detail fusion strategy. Finally, we reconstruct the base and detail layers to generate the final fusion result. Experiment results demonstrate that our results are more in line with the human visual system (HVS) and outperform some state-of-the-art methods on quantitative metrics.
Article
Infrared and visible image fusion can synthesize complementary features of salient objects and texture details which are important for all-weather detection and other tasks. Nowadays, the deep learning based unsupervised fusion solutions are preferred and have obtained good results since the reference images for fusion tasks are not available. In the existing methods, some prominent features are missing in the fused images and the visual vitality needs to be improved. From this thought, attention mechanism is introduced to the fusion network. Especially, channel dimension and spatial dimension attention are jointed to supplement each other for feature extraction. Multiple attention branches emphasize on multi-scale features to complete the encoding. Skip connections are added to learn residual information. The multi-layer perceptual loss, the structure similarity loss and the content loss together construct the strong constraints for training. Comparative experiments with subjective and objective evaluations on 4 traditional and 9 deep learning based methods demonstrate the advantages of the proposed model.
Article
In this paper, we propose a novel method for visible and infrared image fusion by decomposing feature information, which is termed as CUFD. It adopts two pairs of encoder–decoder networks to implement feature map extraction and decomposition, respectively. On the one hand, the shallow features of the image contain abundant information while the deep features focus more on extracting the thermal targets. Thus, we use an encoder–decoder network to extract both shallow and deep features. Unlike existing methods, both of the shallow and deep features are used for fusion and reconstruction with different emphases. On the other hand, the infrared and visible features of the same layer have both similarities and differences. Therefore, we train the other encoder–decoder network to decompose the feature maps into common and unique information based on their similarities and differences. After that, we apply different fusion rules according to the flexible requirements. This operation is more beneficial to retain the significant feature information in the fusion results. Qualitative and quantitative experiments on publicly available TNO and RoadScene datasets demonstrate the superiority of our CUFD over the state-of-the-art.
Article
Infrared and visible image fusion aims to integrate the prominent infrared target and visible texture details as much as possible. However, infrared images are susceptible to noise pollution during the transmission process, which could reduce the quality of the fused image. To solve this problem, a novel significant target analysis and detail preserving based scheme is proposed for infrared and visible image fusion tasks. Since the infrared image has high visual saliency, the noise pixels in the infrared image are often retained into the fused result by most traditional fusion methods. To better highlight the infrared targets while reducing interference from noise pixels, an infrared target detection model is proposed based on boundary-aware salient target detection network (BASNet) and significant target analysis. To fully preserve the visual details, we employ a weight mean curvature (WMC) based multiscale transform (MST) fusion scheme which can effectively suppress noise and preserve valuable details. Difference of Gaussian (DoG) is also applied to enhance the overall details in the proposed fusion scheme. Qualitative and quantitative experimental results demonstrate that the proposed method can generate fused images with abundant texture details and prominent infrared targets, which is superior to some of the existing methods in visual effect and objective quality evaluations.
Article
Techniques for the fusion of infrared and visible images have gradually become a popular research topic in the field of computer vision. In our paper, accelerated convergent convolutional dictionary learning (CDL) is first introduced for infrared-visible image fusion. The proposed method combines the advantages of CDL and convolutional sparse representation (CSR) while also compensating for model mismatches between the training and fusion stages. Each image is decomposed into a base layer and a detail layer, for which different fusion strategies are used. Unlike previous CSR/CDL-based fusion methods, we introduce a practical and convergent Fast Block Proximal Gradient Using a Diagonal Majorizer (FBPG-M) method with two-block and multiblock schemes into the detail layer. Influenced by various imaging mechanisms, an ‘averaging’ fusion strategy is used for the base layer. Our method is evaluated and compared qualitatively and quantitatively with five typical fusion methods on 10 public datasets. The model is both subjectively and objectively evaluated, and the results show that the proposed method achieves notable success in terms of preserving details and focusing on targets.
Article
We aim to address the challenging task of infrared and visible image fusion. The existed fusion methods cannot achieve the balance of clear boundaries and rich details. In this paper, we propose a novel fusion model using a triple-discriminator generative adversarial network, which can achieve the balance. The difference image obtained by image subtraction can highlight the difference information, extract image details, and obtain the target outlines in some scenes. Therefore, besides the visible discriminator and infrared discriminator, a new difference image discriminator is added to retain the difference between infrared and visible images, thereby improving the contrast of infrared targets and keeping the texture details in visible images. Multi-level features extracted by the discriminators are used for information measurement, and as a result, deriving perceptual fusion weights for adaptive fusion. SSIM loss function and target edge-enhancement loss are also introduced to improve the quality of the fused image. Compared with existing state-of-the-art fusion methods on public datasets, it is demonstrated that our model has a better performance on quantitative metrics and qualitative effects.
Article
Full-text available
Infrared and visible image fusion aims to synthesize a single fused image that not only contains salient targets and abundant texture details but also facilitates high-level vision tasks. However, the existing fusion algorithms unilaterally focus on the visual quality and statistical metrics of fused images but ignore the demands of high-level vision tasks. To address these challenges, this paper bridges the gap between image fusion and high-level vision tasks and proposes a semantic-aware real-time image fusion network (SeAFusion). On the one hand, we cascade the image fusion module and semantic segmentation module and leverage the semantic loss to guide high-level semantic information to flow back to the image fusion module, which effectively boosts the performance of high-level vision tasks on fused images. On the other hand, we design a gradient residual dense block (GRDB) to enhance the description ability of the fusion network for fine-grained spatial details. Extensive comparative and generalization experiments demonstrate the superiority of our SeAFusion over state-of-the-art alternatives in terms of maintaining pixel intensity distribution and preserving texture detail. More importantly, the performance comparison of various fusion algorithms in task-driven evaluation reveals the natural advantages of our framework in facilitating high-level vision tasks. In addition, the superior running efficiency allows our algorithm to be effortlessly deployed as a real-time pre-processing module for high-level vision tasks. The source code will be released at https://github.com/Linfeng-Tang/SeAFusion.
Article
Full-text available
Remote sensing image fusion (RSIF) is referenced as restoring the high-resolution multispectral image from its corresponding low-resolution multispectral (LMS) image aided by the panchromatic (PAN) image. Most RSIF methods assume that the missing spatial details of the LMS image can be obtained from the high resolution PAN image. However, the distortions would be produced due to the much difference between the structural component of LMS image and that of PAN image. Actually, the LMS image can fully utilize its spatial details to improve the resolution. In this paper, a novel two-stage RSIF algorithm is proposed, which makes full use of both spatial details and spectral information of the LMS image itself. In the first stage, the convolutional neural network based super-resolution is used to increase the spatial resolution of the LMS image. In the second stage, Gram–Schmidt transform is employed to fuse the enhanced MS and the PAN images for further improvement the resolution of MS image. Since the spatial resolution enhancement in the first stage, the spectral distortions in the fused image would be decreased in evidence. Moreover, the spatial details can be preserved to construct the fused images. The QuickBird satellite source images are used to test the performances of the proposed method. The experimental results demonstrate that the proposed method can achieve better spatial details and spectral information simultaneously compared with other well-known methods.
Article
Full-text available
In order to achieve perceptually better fusion of infrared (IR) and visible images than conventional pixel-level fusion algorithms based on multi-scale decomposition (MSD), we present a novel multi-scale fusion method based on a hybrid multi-scale decomposition (hybrid-MSD). The proposed hybrid-MSD transform decomposes the source images into multi-scale texture details and edge features by jointly using multi-scale Gaussian and bilateral filters. This transform enables to better capture important multi-scale IR spectral features and separate fine-scale texture details from large-scale edge features. As a result, we can use it to achieve better fusion result for human visual perception than those obtained from conventional multi-scale fusion methods, by injecting the multi-scale IR spectral features into the visible image, while preserving (or properly enhancing) important perceptual cues of the background scenery and details from the visible image. In the decomposed information fusion process, three different combination algorithms are adaptively used in accordance to different scale levels (i.e., the small-scale levels, the large-scale levels and the base level). A regularization parameter is introduced to control the relative amount of IR spectral information injected into the visible image in a soft manner, which can be adjusted further depending on user preferences. Moreover, by testing different settings of the parameter, we demonstrate that injecting a moderate amount of IR spectral information with this parameter can actually make the fused images visually better for some infrared and visible source images. Experimental results of both objective assessment and subjective evaluation by human observers also prove the superiority of the proposed method compared with conventional MSD-based fusion methods.
Article
Full-text available
Image fusion is a process of generating a more informative image from a set of source images. Major applications of image fusion are in navigation and military. Here infrared and visible sensors are used to capture complementary images of the targeted scene. The complementary information of these source images has to be integrated into a single image using some fusion algorithms.The aim of any fusion method is to transfer maximum information from the source images to the fused image with minimum information loss. It has to minimize the artifacts in the fused image. In this context, we propose a new edge preserving image fusion method for infrared and visible sensor images. Anisotropic diffusion is used to decompose the source images into approximation and detail layers. Final detail and approximation layers are calculated with the help of Karhunen-Loeve (KL) transform and weighted linear superposition respectively. Fused image is generated from the linear combination of final detail and approximation layers. Performance of the proposed algorithm is assessed with the help of petrovic metrics. Results of the proposed algorithm are compared with the traditional and recent image fusion algorithms. Results reveal that proposed method outperforms the existing methods.
Article
Full-text available
Like bilateral filter (BF), cross bilateral filter (CBF) considers both gray-level similarities and geometric closeness of the neighboring pixels without smoothing edges, but it uses one image for finding the kernel and other to filter, and vice versa. In this paper, it is proposed to fuse source images by weighted average using the weights computed from the detail images that are extracted from the source images using CBF. The performance of the proposed method has been verified on several pairs of multisensor and multifocus images and compared with the existing methods visually and quantitatively. It is found that, none of the methods have shown consistence performance for all the performance metrics. But as compared to them, the proposed method has shown good performance in most of the cases. Further, the visual quality of the fused image by the proposed method is superior to other methods.
Article
Full-text available
A method based on low rank and sparse decomposition is proposed for moving object detection by the fusion of visual and infrared video. The visual and infrared image sequences are decomposed into the joint low rank background term, the uncorrelated sparse moving nonobject term, and the common sparse moving object term via a joint minimization cost of nuclear norm, F norm, and l<sub>1</sub> norm. This method provides a flexible framework that can easily fuse information from visual and infrared video. The prior fusion strategies are not required. The complementary information on visual and infrared images can be naturally fused in the procedure of object detection. The experimental results show that the proposed algorithm is effective.
Article
Full-text available
Comparison of image processing techniques is critically important in deciding which algorithm, method, or metric to use for enhanced image assessment. Image fusion is a popular choice for various image enhancement applications such as overlay of two image products, refinement of image resolutions for alignment, and image combination for feature extraction and target recognition. Since image fusion is used in many geospatial and night vision applications, it is important to understand these techniques and provide a comparative study of the methods. In this paper, we conduct a comparative study on 12 selected image fusion metrics over six multiresolution image fusion algorithms for two different fusion schemes and input images with distortion. The analysis can be applied to different image combination algorithms, image processing methods, and over a different choice of metrics that are of use to an image processing expert. The paper relates the results to an image quality measurement based on power spectrum and correlation analysis and serves as a summary of many contemporary techniques for objective assessment of image fusion algorithms. Index Terms—Night vision, context enhancement, pixel-level image fusion, multiresolution analysis, objective fusion assessment, performance metric, image quality. Ç
Article
Full-text available
With the development of numerous imaging sensors, many images can be simultaneously pictured by various sensors. However, there are many scenarios where no one sensor can give the complete picture. Image fusion is an important approach to solve this problem and produces a single image which preserves all relevant information from a set of different sensors. In this paper, we proposed a new image fusion method using the support value transform, which uses the support value to represent the salient features of image. This is based on the fact that, in support vector machines (SVMs), the data with larger support values have a physical meaning in the sense that they reveal relative more importance of the data points for contributing to the SVM model. The mapped least squares SVM (mapped LS-SVM) is used to efficiently compute the support values of image. The support value analysis is developed by using a series of multiscale support value filters, which are obtained by filling zeros in the basic support value filter deduced from the mapped LS-SVM to match the resolution of the desired level. Compared with the widely used image fusion methods, such as the Laplacian pyramid, discrete wavelet transform methods, the proposed method is an undecimated transform-based approach. The fusion experiments are undertaken on multisource images. The results demonstrate that the proposed approach is effective and is superior to the conventional image fusion methods in terms of the pertained quantitative fusion evaluation indexes, such as quality of visual information (Q(AB/F)), the mutual information, etc.
Article
Full-text available
A measure for objectively assessing the pixel level fusion performance is defined. The proposed metric reflects the quality of visual information obtained from the fusion of input images and can be used to compare the performance of different image fusion algorithms. Experimental results clearly indicate that this metric is perceptually meaningful
Article
Full-text available
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day
Article
The aim of multi-focus image fusion is to create a synthetic all-in-focus image from several images each of which is obtained with different focus settings. However, if the resolution of source images is low, the fused images with traditional fusion method would be also in low-quality, which hinders further image analysis even the fused image is all-in-focus. This paper presents a novel joint multi-focus image fusion and super-resolution method via convolutional neural network (CNN). The first level network features of different source images are fused with the guidance of the local clarity calculated from the source images. The final high-resolution fused image is obtained with the reconstruction network filters which act like averaging filters. The experimental results demonstrate that the proposed approach can generate the fused images with better visual quality and acceptable computation efficiency as compared to other state-of-the-art works.
Article
Multifocus image fusion generates a single image by combining redundant and complementary information of multiple images coming from the same scene. The combination, includes more information of the scene than any of the individual source images. In this paper, a novel multifocus image fusion method based on extreme learning machine (ELM) and human visual system (HVS) is proposed. Three visual features that reflect the clarity of a pixel are first extracted and used to train the ELM to judge which pixel is clearer. The clearer pixels are then used to construct the initial fused image. Then, we measure the similarity between the source image and the initial fused image and perform morphological opening and closing operations to obtain the focused regions. Lastly, the final fused image is achieved by employing a fusion rule in the focus regions and the initial fused image. Experimental results indicate that the proposed method is more effective and better than other series of existing popular fusion methods in terms of both subjective and objective evaluations.
Article
As is well known, activity level measurement and fusion rule are two crucial factors in image fusion. For most existing fusion methods, either in spatial domain or in a transform domain like wavelet, the activity level measurement is essentially implemented by designing local filters to extract high-frequency details, and the calculated clarity information of different source images are then compared using some elaborately designed rules to obtain a clarity/focus map. Consequently, the focus map contains the integrated clarity information, which is of great significance to various image fusion issues, such as multi-focus image fusion, multi-modal image fusion, etc. However, in order to achieve a satisfactory fusion performance, these two tasks are usually difficult to finish. In this study, we address this problem with a deep learning approach, aiming to learn a direct mapping between source images and focus map. To this end, a deep convolutional neural network (CNN) trained by high-quality image patches and their blurred versions is adopted to encode the mapping. The main novelty of this idea is that the activity level measurement and fusion rule can be jointly generated through learning a CNN model, which overcomes the difficulty faced by the existing fusion methods. Based on the above idea, a new multi-focus image fusion method is primarily proposed in this paper. Experimental results demonstrate that the proposed method can obtain state-of-the-art fusion performance in terms of both visual quality and objective assessment. The computational speed of the proposed method using parallel computing is fast enough for practical usage. The potential of the learned CNN model for some other-type image fusion issues is also briefly exhibited in the experiments.
Article
As a popular signal modeling technique, sparse representation (SR) has achieved great success in image fusion over the last few years with a number of effective algorithms being proposed. However, due to the patch-based manner applied in sparse coding, most existing SR-based fusion methods suffer from two drawbacks, namely, limited ability in detail preservation and high sensitivity to mis-registration, while these two issues are of great concern in image fusion. In this letter, we introduce a recently emerged signal decomposition model known as convolutional sparse representation (CSR) into image fusion to address this problem, which is motivated by the observation that the CSR model can effectively overcome the above two drawbacks. We propose a CSR-based image fusion framework, in which each source image is decomposed into a base layer and a detail layer, for multi-focus image fusion and multi-modal image fusion. Experimental results demonstrate that the proposed fusion methods clearly outperform the SR-based methods in terms of both objective assessment and visual quality.
Article
It is important to well maintain the information of infrared (IR) and visible images, including image regions and details, in a fusion image. To be effective for fusion, an algorithm for fusion IR and visible images based on the morphological center operator through feature extraction and correlation coefficient is given. This paper utilizes the contrast enlargement strategy for fusion. Firstly, the morphological center or anti-center operator identifies the bright and dim features of the IR and visible images, and these identified features are used for fusion based on the correlation coefficient. Secondly, the multi-scale morphological theory is employed to extract the multi-scale features through the correlation coefficient based strategy to form the final fusion features. Finally, the extracted final fusion features are combined to form the final fusion image by utilizing the contrast enlargement strategy. Because of the effectively feature identifying by the morphological center and anti-center operators, the proposed algorithm has good performance for IR and visible image fusion. Experiments on different IR and visible images verified that the proposed algorithm performed effectively.
Article
In image fusion, the most desirable information is obtained from multiple images of the same scene and merged to generate a composite image. This resulting new image is more appropriate for human visual perception and further image-processing tasks. Existing methods typically use the same representations and extract the similar characteristics for different source images during the fusion process. However, it may not be appropriate for infrared and visible images, as the thermal radiation in infrared images and the appearance in visible images are manifestations of two different phenomena. To keep the thermal radiation and appearance information simultaneously, in this paper we propose a novel fusion algorithm, named Gradient Transfer Fusion (GTF), based on gradient transfer and total variation (TV) minimization. We formulate the fusion problem as an ℓ1-TV minimization problem, where the data fidelity term keeps the main intensity distribution in the infrared image, and the regularization term preserves the gradient variation in the visible image. We also generalize the formulation to fuse image pairs without pre-registration, which greatly enhances its applicability as high-precision registration is very challenging for multi-sensor data. The qualitative and quantitative comparisons with eight state-of-the-art methods on publicly available databases demonstrate the advantages of GTF, where our results look like sharpened infrared images with more appearance details.
Article
This paper proposed a novel fusion method for the infrared and visible image based on the accurate extraction of the target region. Firstly, the super-pixels-based saliency analysis method is used to extract the salient regions of the infrared image and obtain the coarse contour of the infrared target. Then the multi-directional detection operators and the adaptive threshold algorithm are used to refine the boundary of the target region and obtain the fusion decision map. In order to capture the details of the visible image, non-subsampled Shearlet transform (NSST) is used to select the fusion coefficients of the background. Experimental results indicate that the proposed method is superior to other state of the-art methods in subjective visual and objective performance.
Article
Infrared and visible image fusion technique is a popular topic in image analysis because it can integrate complementary information and obtain reliable and accurate description of scenes. Multiscale transform theory as a signal representation method is widely used in image fusion. In this paper, a novel infrared and visible image fusion method is proposed based on spectral graph wavelet transform (SGWT) and bilateral filter. The main novelty of this study is that SGWT is used for image fusion. On the one hand, source images are decomposed by SGWT in its transform domain. The proposed approach not only effectively preserves the details of different source images, but also excellently represents the irregular areas of the source images. On the other hand, a novel weighted average method based on bilateral filter is proposed to fuse low- and high-frequency subbands by taking advantage of spatial consistency of natural images. Experimental results demonstrate that the proposed method outperforms seven recently proposed image fusion methods in terms of both visual effect and objective evaluation metrics.
Article
In image fusion literature, multi-scale transform (MST) and sparse representation (SR) are two most widely used signal/image representation theories. This paper presents a general image fusion framework by combining MST and SR to simultaneously overcome the inherent defects of both the MST- and SR-based fusion methods. In our fusion framework, the MST is firstly performed on each of the pre-registered source images to obtain their low-pass and high-pass coefficients. Then, the low-pass bands are merged with a SR-based fusion approach while the high-pass bands are fused using the absolute values of coefficients as activity level measurement. The fused image is finally obtained by performing the inverse MST on the merged coefficients. The advantages of the proposed fusion framework over individual MST- or SR-based method are first exhibited in detail from a theoretical point of view, and then experimentally verified with multi-focus, visible-infrared and medical image fusion. In particular, six popular multi-scale transforms, which are Laplacian pyramid (LP), ratio of low-pass pyramid (RP), discrete wavelet transform (DWT), dual-tree complex wavelet transform (DTCWT), curvelet transform (CVT) and nonsubsampled contourlet transform (NSCT), with different decomposition levels ranging from one to four are tested in our experiments. By comparing the fused results subjectively and objectively, we give the best-performed fusion method under the proposed framework for each category of image fusion. The effect of the sliding window’s step length is also investigated. Furthermore, experimental results demonstrate that the proposed fusion framework can obtain state-of-the-art performance, especially for the fusion of multimodal images.
Article
With the ongoing development of sensor technologies, more and more kinds of video sensors are being employed in video surveillance systems to improve robustness and monitoring performance. In addition, there is often a strong motivation to simultaneously observe the same scene by more than one kind of sensor. How to sufficiently and effectively utilize the information captured by these different sensors is thus of considerable interest. This can be realized using video fusion, by which multiple aligned videos from different sensors are merged into a composite. In this paper, a video fusion algorithm is presented based on the 3D Surfacelet Transform (3D-ST) and the higher order singular value decomposition (HOSVD). In the proposed method, input videos are first decomposed into many subbands using the 3D-ST. Then the relevant subbands from all of the input videos are merged to obtain the corresponding subbands of the intended fused video. Finally, the fused video is constructed by performing the inverse 3D-ST on the merged subband coefficients. Typically, the spatial information in the scene backgrounds and the temporal information related to moving objects are mixed together in each subband. In the proposed fusion method, the spatial and temporal information are actually first separated from each other and then merged using the HOSVD. This is different from the currently published fusion rules (e.g., spatio-temporal energy “maximum” or “matching”), which are usually just simple extensions of static image fusion rules. In these, the spatial and temporal information contained in the input videos are generally treated equally and merged by the same fusion strategy. In addition, we note that the so-called “scene noise” in an input video has been largely ignored by the current literature. We show that this noise can be distinguished from the spatio-temporal objects of interest in the scene and then suppressed using the HOSVD. Clearly, this would be very advantageous for a surveillance system, particularly one dealing with scenes of crowds. Experimental results demonstrate that the proposed fusion method exhibits a lower computational complexity than some existing published video fusion methods, such as the ones based on the structure tensor and the pulse-coupled neural network (PCNN). When the videos are noisy, a modified version of the proposed method is shown to perform better than specialized methods based on the Bivariate-Laplacian model and the PCNN.
Article
An appropriate fusion of infrared and visible images can integrate their complementary information and obtain more reliable and better description of the environmental conditions. Compressed sensing theory, as a low signal sampling and compression method based on the sparsity of signal under a certain transformation, is widely used in various fields. Applying to the image fusion applications, only a part of sparse coefficients are needed to be fused. Furthermore, the fused sparse coefficients can be used to accurately reconstruct the fused image. The CS-based fusion approach can greatly reduce the computational complexity and simultaneously enhance the quality of the fused image. In this paper, an improved image fusion scheme based on compressive sensing is presented. This proposed approach can preserve more detail information, such as edges, lines and contours in comparison to the conventional transform-based image fusion approaches. In the proposed approach, the sparse coefficients of the source images are obtained by discrete wavelet transform. The low and high coefficients of infrared and visible images are fused by an improved entropy weighted fusion rule and a max-abs-based fusion rule, respectively. The fused image is reconstructed by a compressive sampling matched pursuit algorithm after local linear projection using a random Gaussian matrix. Several comparative experiments are conducted. The experimental results show that the proposed image fusion scheme can achieve better image fusion quality than the existing state-of-the-art methods.
Article
The effective measurement of pixel's sharpness is a key factor in multi-focus image fusion. In this paper, a gray image is considered as a two-dimensional surface, and the neighbor distance deduced from the oriented distance in differential geometry is used as a measure of pixel's sharpness, where the smooth image surface is restored by kernel regression. Based on the deduced neighbor distance filter, we construct a multi-scale image analysis framework, and propose a multi-focus image fusion method based on the neighbor distance. The experiments demonstrate that the proposed method is superior to the conventional image fusion methods in terms of some objective evaluation indexes, such as spatial frequency, standard deviation, average gradient, etc.
Article
Because subjective evaluation is not adequate for assessing work in an automatic system, using an objective image fusion performance metric is a common approach to evaluate the quality of different fusion schemes. In this paper, a multi-resolution image fusion metric using visual information fidelity (VIF) is presented to assess fusion performance objectively. This method has four stages: (1) Source and fused images are filtered and divided into blocks. (2) Visual information is evaluated with and without distortion information in each block. (3) The visual information fidelity for fusion (VIFF) of each sub-band is calculated. (4) The overall quality measure is determined by weighting the VIFF of each sub-band. In our experiment, the proposed fusion assessment method is compared with several existing fusion metrics using the subjective test dataset provided by Petrovic. We found that VIFF performs better in terms of both human perception matching and computational complexity.Highlights► We propose a new image fusion metric (VIFF) based on visual information fidelity. ► What is a fair performance comparison among image fusion metrics is discussed. ► VIFF is compared with 8 popular image fusion metrics on a database. ► VIFF shows highest predictive performance over others. ► An approximate estimation of fusion metric’s time complexity is given.
Article
The mutual information (MI) measure has become a popular metric to assess image fusion performance. However, despite its publicity, it provides a questionable result that consistently favours additive fusion (averaging) over multi-scale decomposition (MSD) fusion algorithms. Presented is a localised variation of MI to assess image fusion performance while preserving the importance of local structural similarity. The presented metric has been validated with extensive tests on popular image fusion test cases.
Article
A novel image fusion algorithm based on the nonsubsampled contourlet transform (NSCT) is proposed in this paper, aiming at solving the fusion problem of multifocus images. The selection principles of different subband coefficients obtained by the NSCT decomposition are discussed in detail. Based on the directional vector normal, a ‘selecting’ scheme combined with the ‘averaging’ scheme is presented for the lowpass subband coefficients. Based on the directional bandlimited contrast and the directional vector standard deviation, a selection principle is put forward for the bandpass directional subband coefficients. Experimental results demonstrate that the proposed algorithm cannot only extract more important visual information from source images, but also effectively avoid the introduction of artificial information. It significantly outperforms the traditional discrete wavelet transform-based and the discrete wavelet frame transform-based image fusion methods in terms of both visual quality and objective evaluation, especially when the source images are not perfectly registered.
Article
As a novel multiscale geometric analysis tool, contourlet has shown many advantages over the conventional image representation methods. In this paper, a new fusion algorithm for multimodal medical images based on contourlet transform is proposed. All fusion operations are performed in contourlet domain. A novel contourlet contrast measurement is developed, which is proved to be more suitable for human vision system. Other fusion rules like local energy, weighted average and selection are combined with “region” idea for coefficient selection in the lowpass and highpass subbands, which can preserve more details in source images and further improve the quality of fused image. The final fusion image is obtained by directly applying inverse contourlet transform to the fused lowpass and highpass subbands. Extensive fusion experiments have been made on three groups of multimodality CT/MR dataset, both visual and quantitative analysis show that comparing with conventional image fusion algorithms, the proposed approach can provide a more satisfactory fusion outcome.
Article
A number of pixel-based image fusion algorithms (using averaging, contrast pyramids, the discrete wavelet transform and the dual-tree complex wavelet transform (DT-CWT) to perform fusion) are reviewed and compared with a novel region-based image fusion method which facilitates increased flexibility with the definition of a variety of fusion rules. A DT-CWT is used to segment the features of the input images, either jointly or separately, to produce a region map. Characteristics of each region are calculated and a region-based approach is used to fuse the images, region-by-region, in the wavelet domain. This method gives results comparable to the pixel-based fusion methods as shown using a number of metrics. Despite an increase in complexity, region-based methods have a number of advantages over pixel-based methods. These include: the ability to use more intelligent semantic fusion rules; and for regions with certain properties to be attenuated or accentuated.
Article
A multiresolution image representation is presented in which iterative morphological filters of many scales but identical shape serve as basis functions. Structural pattern decomposition is achieved by subtracting successive layers in the multiresolution representation. The representation differs from established techniques in that the code elements have a well defined location and size. The resulting image description provides a useful basis for multiresolution shape analysis and is well suited for VLSI implementation.
Article
This paper presents an overview on image fusion techniques using multiresolution decompositions. The aim is twofold: (i) to reframe the multiresolution-based fusion methodology into a common formalism and, within this framework, (ii) to develop a new region-based approach which combines aspects of both object and pixel-level fusion. To this end, we first present a general framework which encompasses most of the existing multiresolution-based fusion schemes and provides freedom to create new ones. Then, we extend this framework to allow a region-based fusion approach. The basic idea is to make a multiresolution segmentation based on all different input images and to use this segmentation to guide the fusion process. Performance assessment is also addressed and future directions and open problems are discussed as well.
Article
The goal of image fusion is to integrate complementary information from multisensor data such that the new images are more suitable for the purpose of human visual perception and computer-processing tasks such as segmentation, feature extraction, and object recognition. This paper presents an image fusion scheme which is based on the wavelet transform. The wavelet transforms of the input images are appropriately combined, and the new image is obtained by taking the inverse wavelet transform of the fused wavelet coefficients. An area-based maximum selection rule and a consistency verification step are used for feature selection. The proposed scheme performs better than the Laplacian pyramid-based methods due to the compactness, directional selectivity, and orthogonality of the wavelet transform. A performance measure using specially generated test images is suggested and is used in the evaluation of different fusion methods, and in comparing the merits of different wavelet transform kernels. Extensive experimental results including the fusion of multifocus images, Landsat and Spot images, Landsat and Seasat {SAR} images, {IR} and visible images, and {MRI} and {PET} images are presented in the paper
Article
This paper presents an integrated image fusion and match score fusion of mul- tispectral face images. The fusion of visible and long wave infrared face images is performed using 2-Granular SVM which uses multiple SVMs to learn both the local and global properties of the multispectral face images at dierent granularity levels and resolution. The 2-GSVM performs accurate classification which is sub- sequently used to dynamically compute the weights of visible and infrared images for generating a fused face image. 2D log polar Gabor transform and local binary pattern feature extraction algorithms are applied to the fused face image to extract global and local facial features respectively. The corresponding match scores are fused using Dezert Smarandache theory of fusion which is based on plausible and paradoxical reasoning. The ecacy of the proposed algorithm is validated using the Notre Dame and Equinox databases and is compared with existing statistical, learning, and evidence theory based fusion algorithms.
Article
Image fusion combines information from multiple images of the same scene to get a composite image that is more suitable for human visual perception or further image-processing tasks. In this paper, we compare various multi-resolution decomposition algorithms, especially the latest developed image decomposition methods, such as curvelet and contourlet, for image fusion. The investigations include the effect of decomposition levels and filters on fusion performance. By comparing fusion results, we give the best candidates for multi-focus images, infrared–visible images, and medical images. The experimental results show that the shift-invariant property is of great importance for image fusion. In addition, we also conclude that short filter usually provides better fusion results than long filter, and the appropriate setting for the number of decomposition levels is four.
Article
Comparative evaluation of fused images is a critical step to evaluate the relative performance of different image fusion algorithms. Human visual inspection is often used to assess the quality of fused images. In this paper, we propose some variants of a new image quality metric based on the human vision system {(HVS).} The proposed measures evaluate the quality of a fused image by comparing its visual differences with the source images and require no knowledge of the ground truth. First, the images are divided into different local regions. These regional images are then transformed to the frequency domain. Second, the difference between the local regional images in frequency domain is weighted with a human contrast sensitivity function {(CSF).} The quality of a local regional image is obtained by computing the {MSE} of the weighted difference images obtained from the fused regional image and source regional images. Finally, the quality of a fused image is the weighted summation of the local regional images quality measures. Our experimental results show that these metrics are consistent with perceptually obtained results. © 2005 Elsevier {B.V.} All rights reserved.