An illustration of the receptive field for one dilated convolution with different dilation factors. A 3×3 convolution kernel is used in the example.

An illustration of the receptive field for one dilated convolution with different dilation factors. A 3×3 convolution kernel is used in the example.

Source publication
Article
Full-text available
Jointly using spatial and spectral information has been widely applied to hyperspectral image (HSI) classification. Especially, convolutional neural networks (CNN) have gained attention in recent years due to their detailed representation of features. However, most of CNN-based HSI classification methods mainly use patches as input classifier. This...

Contexts in source publication

Context 1
... is likely constructed by inserting holes between each pixel in the convolution kernel. The operation of one 3×3 dilated convolution with different dilation factors is shown in Figure 1. It can be found that the dilation factor decides the sampling distance of the convolution kernel. ...
Context 2
... networks were designed based on the ResMRFF module, HyMSCN-A, and HyMSCN-B (Figure 10). Both networks were designed using the bottom-up pathway to compute hierarchical features at several feature levels. ...
Context 3
... first network (HyMSCN-A) is composed of several ResMRFF blocks with stride = 1. The spatial size of all feature maps is the same since it does not contain any spatial scaling structure (see Figure 10a). The network takes hyperspectral imagery as input and learns the high-level spatial-spectral features to produce classification results. ...
Context 4
... network takes hyperspectral imagery as input and learns the high-level spatial-spectral features to produce classification results. In comparison, the second network (HyMSCN-B) employs the feature pyramid structure to fully take advantage of the hierarchical features with different spatial scales as shown in Figure 10b. There are several stages in the network with different feature levels, where one stage refers to the blocks that produce features with the same spatial size. ...
Context 5
... 20 spectral bands were discarded due to atmospheric absorption. The GSD was 20 m and the available samples contained 16 classes as shown in Figure 11. ...
Context 6
... GSD was 1.3 m. The color composite and reference data are displayed in Figure 12. ...
Context 7
... 154-167, and 224). The GSD was 3.7 m per pixel and the ground truth contained 16 classes shown in Figure 13. ...
Context 8
... 2 reports the individual classification results of different methods. Figure 14 shows the corresponding classification maps of different methods. A number of observations can be made based on the above data. ...
Context 9
... HyMSCN-B-128 provided the best results, thereby demonstrating that a well-designed network combining multiscale and multi-level features is suitable for HSIC. It should be noted that the classification map for HyMSCN-B-128 preserved the clear boundaries for ground objects shown in Figure 14j. Although the classification results in each class do not achieve the best accuracy, the overall classification accuracy is the highest. ...
Context 10
... 4 displays the overall accuracy, average accuracy, k statistic, and individual accuracy. Figure 15 displays the corresponding classification maps, training, and testing samples. Table 4, we can draw conclusions similar to the results of the Indian Pines experiment. ...
Context 11
... this experiment, we first randomly selected 30 samples per class with a total of 480 samples (approximately 0.88% of the labeled pixels) to form the training set. The classification map and results obtained from different methods are shown in Figure 16 and Table 6. The HyMSCN-B-128 network was observed to provide the best performance. ...
Context 12
... total of 50 samples per class were randomly selected as training samples for each dataset and the total epochs were set to 1000. Figure 17 displays the results of the overall accuracy for over 1000 training epochs. For these three datasets, HyMSCN-A-64 achieved superior results compared to HyMSCN-N by approximately 3%, 2.7%, and 1.1% for Indian Pines, Paiva University, and Salinas data, respectively. ...
Context 13
... the performances of different number dilation factors were evaluated for the MRFF block and the HyMSCN-B-64 was used as the backbone. The classification results for the different number of dilation factors are shown in Figure 18. In this experiment, seven kinds of combination for the dilation factors were considered including: (1), (1,2), (1,2,3), (1,2,3,4), (1,2,3,4,5), (1,2,3,4,5,6) and (1,2,3,4,5,6,7), and 30 samples per class were selected as training samples. ...
Context 14
... training process for each network was repeated five times using the same training samples. Figure 19 displays the average overall accuracy. ...

Similar publications

Article
Full-text available
The temporal and spatial heterogeneity of mowing on intensively used grasslands can be an important factor influencing the survival of insects or small animals after harvest. High heterogeneity contributes to the preservation of biodiversity on these grasslands. In our study we use satellite image time-series to analyze the temporal distribution of...
Article
Full-text available
Depth estimation from a single image is a challenging task, especially inside the highly structured forest environment. In this paper, we propose a supervised deep learning model for monocular depth estimation based on forest imagery. We train our model on a new data set of forest RGB-D images that we collected using a terrestrial laser scanner. Al...

Citations

... The neural network selected to perform semantic segmentation is a typical FCN known as U-Net [9] which was originally intended for biological image segmentation but has been widely used for other segmentation tasks, such as, precision agriculture [10] and aerial city recognition [3]. The idea of using a FCN is to combine the intrinsic spectral characteristics of the different classes with the spatial relationships that should be extracted by the convolution operations. ...
Preprint
Full-text available
Advanced Driver Assistance Systems (ADAS) are designed with the main purpose of increasing the safety and comfort of vehicle occupants. Most of current computer vision-based ADAS perform detection and tracking tasks quite successfully under regular conditions, but are not completely reliable, particularly under adverse weather and changing lighting conditions, neither in complex situations with many overlapping objects. In this work we explore the use of hyperspectral imaging (HSI) in ADAS on the assumption that the distinct near infrared (NIR) spectral reflectances of different materials can help to better separate the objects in a driving scene. In particular, this paper describes some experimental results of the application of fully convolutional networks (FCN) to the image segmentation of HSI for ADAS applications. More specifically, our aim is to investigate to what extent the spatial features codified by convolutional filters can be helpful to improve the performance of HSI segmentation systems. With that aim, we use the HSI-Drive v1.1 dataset, which provides a set of labelled images recorded in real driving conditions with a small-size snapshot NIR-HSI camera. Finally, we analyze the implementability of such a HSI segmentation system by prototyping the developed FCN model together with the necessary hyperspectral cube preprocessing stage and characterizing its performance on an MPSoC.
... Intermediate skip connections enable the fusion of low-and high-level features. U-Net was originally intended for biological image segmentation [44] but has been successfully used in other segmentation tasks, such as precision agriculture [51], food quality assessment [46] and aerial city recognition [15]. The idea of using an FCN to process HSI is to combine the intrinsic spectral characteristics of the different classes with the spatial relationships extracted by the convolution operations. ...
Preprint
Full-text available
Most of current computer vision-based advanced driver assistance systems (ADAS) perform detection and tracking of objects quite successfully under regular conditions. However, under adverse weather and changing lighting conditions, and in complex situations with many overlapping objects, these systems are not completely reliable. The spectral reflectance of the different objects in a driving scene beyond the visible spectrum can offer additional information to increase the reliability of these systems, especially under challenging driving conditions. Furthermore, this information may be significant enough to develop vision systems that allow for a better understanding and interpretation of the whole driving scene. In this work we explore the use of snapshot, video-rate hyperspectral imaging (HSI) cameras in ADAS on the assumption that the near infrared (NIR) spectral reflectance of different materials can help to better segment the objects in real driving scenarios. To do this, we have used the HSI-Drive 1.1 dataset to perform various experiments on spectral classification algorithms. However, the information retrieval of hyperspectral recordings in natural outdoor scenarios is challenging, mainly because of deficient colour constancy and other inherent shortcomings of current snapshot HSI technology, which poses some limitations to the development of pure spectral classifiers. In consequence, in this work we analyze to what extent the spatial features codified by standard, tiny fully convolutional network (FCN) models can improve the performance of HSI segmentation systems for ADAS applications. The abstract above is truncated due to submission limits. For the full abstract, please refer to the published article.
... Furthermore, it is imperative to employ the feature fusion strategy to extract multiangular features from images [45], [46]. Cui et al. [47] introduced the multiscale spatial-spectral convolutional network with image-based framework, which incorporated dilated convolutions and fusion strategies, and demonstrated impressive results. Yang et al. [48] devised an enhanced multiscale feature fusion network, emphasizing a multiscale feature fusion strategy in their approach. ...
Article
Full-text available
The adequate and finer spectral information in Hyperspectral images (HSI) are benefit for various downstream applications like smart agriculture, and environmental monitoring. In HSI classification, dual-stream convolutional networks have gained much attention and have been widely used. In patch-based hyperspectral classification tasks, however, merely using center-labeled patches could lead to an increased unlabeled noise in the data. Moreover, in the application of dual-stream network structures, heterogeneity existed in both the data and feature semantic levels to capture more representative features. To tackle these challenges, we have devised a framework called Cross-Semantic Heterogeneous Modeling Network (CreatingNet), which aligns more closely with the design principles of dual-stream networks by adjusting the input size. This framework introduces a distance metric attention mechanism (DMAM) based on spectral and spatial distances to strengthen the influence of the center pixel on the entire patch. Additionally, we present a fusion module named CrossViT, which combines features with diverse structures and characteristics, leveraging their complementarity. The proposed multi-scale heterogeneous fusion module allows for more effective integration of spatial and spectral features in the images. Extensive experiments on four well-known HSI datasets (Indian Pines, Pavia University, Salinas, and Houston 2013) demonstrate the superior classification performance by the proposed CreatingNet than several state-of-the-art methods. The effectiveness of the proposed model is further validated through ablation studies.
... Therefore, hyperspectral images (HSIs) [7] have been employed in various fields [8][9][10][11][12] in recent years. Hyperspectral technology contains many image processing tasks, such as change detection [13], classification [14,15], anomaly detection [16][17][18], fusion [5,19,20], band selection [21], and so on [22][23][24][25]. ...
Article
Full-text available
In a hyperspectral image, there is a close correlation between spectra and a certain degree of correlation in the pixel space. However, most existing low-rank representation (LRR) methods struggle to utilize these two characteristics simultaneously to detect anomalies. To address this challenge, a novel low-rank representation with dual graph regularization and an adaptive dictionary (DGRAD-LRR) is proposed for hyperspectral anomaly detection. To be specific, dual graph regularization, which combines spectral and spatial regularization, provides a new paradigm for LRR, and it can effectively preserve the local geometrical structure in the spectral and spatial information. To obtain a robust background dictionary, a novel adaptive dictionary strategy is utilized for the LRR model. In addition, extensive comparative experiments and an ablation study were conducted to demonstrate the superiority and practicality of the proposed DGRAD-LRR method.
... Based on the above information, two classification methods, SVM [11] and ResFPN [12], are selected to evaluate the fusion results and MSI. The evaluation results are shown in Figure 5, and the quantitative performance is shown in Table 2. ...
Conference Paper
Full-text available
In recent years, fusion methods based on unsupervised deep learning have achieved impressive performance in the fusion of hyperspectral image (HSI) and multispectral image (MSI). However, there are still some limitations in the current research. Most existing fusion methods only apply to simulated data and need more verification on real data sets. To solve these issues, this paper designed an unsupervised dynamic convolutional neural network fusion model (UDCNN), which can adaptively learn the radiometric difference between HSI and MSI. This model achieves better performance on simulated data compared with related unsupervised deep learning methods, and achieves more accurate results on real data through classification-oriented application of the fusion results.
... The spatio-temporal fusion method based on the convolutional neural network proposed by Cui and others [3] has greatly improved the performance of fusion compared with other methods. The number of neural network layers used by the author is less (3 hidden layers), and for MODIS-Landsat images with large spatial scale differences and from different satellite sensors, shallow convolutional neural networks are difficult to accurately learn the nonlinear mapping relationship between them. ...
Article
Full-text available
With the rapid development of deep learning in recent years, it has shown excellent performance in various image and video processing tasks. In addition, it also has a great role in promoting the spatio-temporal fusion of remote sensing images. The reconstructed image can give people a good visual experience. The invention relates to a remote sensing image fusion method based on a progressive cascade deep residual network and provides an end-to-end progressive cascade deep residual network model for remote sensing image fusion. The use of the MSE loss function may cause oversmoothing of the fused image, so a new joint loss function is defined to capture finer spatial information to improve the spatial resolution of the fused image. Resize-convolution is used to replace the transposed convolution to eliminate the checkerboard effect in the fused image caused by the transposed convolution. Through the experiments on the remote sensing image fusion simulation and real datasets of multiple satellites, the data results of the proposed algorithm are more than 5.25% better than those of the comparative algorithm in the average quantification. The calculation time and system resource occupation are also reduced, which has important theoretical significance and application value in the field of artificial intelligence and image processing. It will play a certain role in promoting the theoretical research and application of remote sensing image fusion.
... Cui et al. 2019 [51] Hyperspectral Image ...
Article
Full-text available
In general, most of the existing convolutional neural network (CNN)-based deep-learning models suffer from spatial-information loss and inadequate feature-representation issues. This is due to their inability to capture multiscale-context information and the exclusion of semantic information throughout the pooling operations. In the early layers of a CNN, the network encodes simple semantic representations, such as edges and corners, while, in the latter part of the CNN, the network encodes more complex semantic features, such as complex geometric shapes. Theoretically, it is better for a CNN to extract features from different levels of semantic representation because tasks such as classification and segmentation work better when both simple and complex feature maps are utilized. Hence, it is also crucial to embed multiscale capability throughout the network so that the various scales of the features can be optimally captured to represent the intended task. Multiscale representation enables the network to fuse low-level and high-level features from a restricted receptive field to enhance the deep-model performance. The main novelty of this review is the comprehensive novel taxonomy of multiscale-deep-learning methods, which includes details of several architectures and their strengths that have been implemented in the existing works. Predominantly, multiscale approaches in deep-learning networks can be classed into two categories: multiscale feature learning and multiscale feature fusion. Multiscale feature learning refers to the method of deriving feature maps by examining kernels over several sizes to collect a larger range of relevant features and predict the input images’ spatial mapping. Multiscale feature fusion uses features with different resolutions to find patterns over short and long distances, without a deep network. Additionally, several examples of the techniques are also discussed according to their applications in satellite imagery, medical imaging, agriculture, and industrial and manufacturing systems.
... Exploring FCNs for the segmentation of HSI applied to ADAS 3 ...
... The neural network selected to perform semantic segmentation is a typical FCN known as U-Net [9] which was originally intended for biological image segmentation but has been widely used for other segmentation tasks, such as, precision agriculture [10] and aerial city recognition [3]. The idea of using a FCN is to combine the intrinsic spectral characteristics of the different classes with the spatial relationships that should be extracted by the convolution operations. ...
Chapter
Full-text available
Advanced Driver Assistance Systems (ADAS) are designed with the main purpose of increasing the safety and comfort of vehicle occupants. Most of current computer vision-based ADAS perform detection and tracking tasks quite successfully under regular conditions, but are not completely reliable, particularly under adverse weather and changing lighting conditions, neither in complex situations with many overlapping objects. In this work we explore the use of hyperspectral imaging (HSI) in ADAS on the assumption that the distinct near infrared (NIR) spectral reflectances of different materials can help to better separate the objects in a driving scene. In particular, this paper describes some experimental results of the application of fully convolutional networks (FCN) to the image segmentation of HSI for ADAS applications. More specifically, our aim is to investigate to what extent the spatial features codified by convolutional filters can be helpful to improve the performance of HSI segmentation systems. With that aim, we use the HSI-Drive v1.1 dataset, which provides a set of labelled images recorded in real driving conditions with a small-size snapshot NIR-HSI camera. Finally, we analyze the implementability of such a HSI segmentation system by prototyping the developed FCN model together with the necessary hyperspectral cube preprocessing stage and characterizing its performance on an MPSoC.
... The neural network selected to perform semantic segmentation is a typical FCN known as U-Net [9] which was originally intended for biological image segmentation but has been widely used for other segmentation tasks, such as, precision agriculture [10] and aerial city recognition [3]. The idea of using a FCN is to combine the intrinsic spectral characteristics of the different classes with the spatial relationships that should be extracted by the convolution operations. ...
Conference Paper
Full-text available
Exploring fully convolutional networks for the segmentation of hyperspectral imaging applied to advanced driver assistance systems ⋆ Jon Gutiérrez-Zaballa 1[0000−0002−6633−4148] , Koldo Basterretxea 2[0000−0002−5934−4735] , Javier Echanobe 3[0000−0002−1064−2555] , M. Victoria Martínez 3 , and Inés del Campo 3[0000−0002−6378−5357] 1 Jon Gutiérrez-Zaballa is with the Abstract. Advanced Driver Assistance Systems (ADAS) are designed with the main purpose of increasing the safety and comfort of vehicle occupants. Most of current computer vision-based ADAS perform detection and tracking tasks quite successfully under regular conditions, but are not completely reliable, particularly under adverse weather and changing lighting conditions, neither in complex situations with many overlapping objects. In this work we explore the use of hyperspectral imaging (HSI) in ADAS on the assumption that the distinct near infrared (NIR) spectral reflectances of different materials can help to better separate the objects in a driving scene. In particular, this paper describes some experimental results of the application of fully convolutional networks (FCN) to the image segmentation of HSI for ADAS applications. More specifically, our aim is to investigate to what extent the spatial features codified by convolutional filters can be helpful to improve the performance of HSI segmentation systems. With that aim, we use the HSI-Drive v1.1 dataset, which provides a set of labelled images recorded in real driving conditions with a small-size snapshot NIR-HSI camera. Finally, we analyze the implementability of such a HSI segmentation system by prototyping the developed FCN model together with the necessary hyperspectral cube preprocessing stage and characterizing its performance on an MPSoC. the Basque Country for allocation of computational resources. 2 J. Gutiérrez-Zaballa et al.
... For water bodies, the kappa coefficient was 0.925 and the F1 score was 93.0%. Based on the idea of dilated convolution [122,123], ASPP introduced dilation rate, on the basis of SPP, which allows the neural network model to capture multiscale semantic information. For attention mechanism, Ren, Y. et al. [124] further introduced the dual-attention mechanism into the original U-Net, forming a dual-attention U-Net model (DAU-Net), which can improve the feature representation ability of the model and perform higher accuracy water-body segmentation tasks. ...
Article
Full-text available
Synthetic Aperture Radar (SAR), as a microwave sensor that can sense a target all day or night under all-weather conditions, is of great significance for detecting water resources, such as coastlines, lakes and rivers. This paper reviews literature published in the past 30 years in the field of water body extraction in SAR images, and makes some proposals that the community working with SAR image waterbody extraction should consider. Firstly, this review focuses on the main ideas and characteristics of traditional water body extraction on SAR images, mainly focusing on traditional Machine Learning (ML) methods. Secondly, how Deep Learning (DL) methods are applied and optimized in the task of water-body segmentation for SAR images is summarized from the two levels of pixel and image. We also pay more attention to the most popular networks, such as U-Net and its modified models, and novel networks, such as the Cascaded Fully-Convolutional Network (CFCN) and River-Net. In the end, an in-depth discussion is presented, along with conclusions and future trends, on the limitations and challenges of DL for water-body segmentation.