Fig 2 - uploaded by Zhipeng Deng
Content may be subject to copyright.
Source publication
Automatic detection of multi-class objects in remote sensing images is a fundamental but challenging problem faced for remote sensing image analysis. Traditional methods are based on hand-crafted or shallow-learning-based features with limited representation power. Recently, deep learning algorithms, especially Faster region based convolutional neu...
Contexts in source publication
Context 1
... R-CNN divides the framework of detection in two stages (see Fig. 2). In the first stage, called the RPN, images are processed by a feature extractor (e.g., Zeiler and Fergus (ZF) mode (Zeiler and Fergus, 2014) and VGG16 model (Simonyan and Zisserman, 2014)), and the topmost feature maps are used to predict bounding box proposals. In the second stage, these proposals are used to crop features from the ...
Context 2
... order to compare the outputs of each algorithm and verify the effectiveness of detecting densely peaked objects with large scales variability, we selected a subarea that contains numerous densely peaked aircrafts with multiple size (e.g. the size of trans- porter is about five times larger than that of fighter). Fig. 20 shows the comparison results with different approaches. It can be seen that, our method could correctly detect most small size and den- sely peaked fighters, whereas other deep CNN based methods show more wrong detections. The results demonstrate that our method is more effective for densely peaked small size objects. Furthermore, our ...
Context 3
... reached 93.94%, which is the best among all the eleven methods, but the recall rate is rather low. Ours-Fuse harvest the highest recall rate of 83.59%, while its precision rate is not very good. YOLO1 is the fastest method, but with some compromise of detec- tion accuracy. Our method achieved a better tradeoff of detection accuracy and speed. Fig. 22 displays the visual performance of the detection result by Ours-Fuse method, where the green box indicates the correct location, and the red and blue box indicate the false alarm and the ground truth, respectively. To see more details of the perfor- mance, we enlarged four cropped image blocks (which are shown by four rectangles). It ...
Context 4
... separately on ten large scale SAR images. During the testing process, large-scale SAR image was cropped into small scale image blocks with an overlap of 50 pixels (larger than the average size of ships). Then, the image blocks are detected separately. Afterwards, the adjacent image blocks with detection results are stitched together. Table 7 and Fig. 23 show the numerical comparison results, Recall vs. IoU overlap ratios, and PRCs of the eleven methods on the SAR-Ship data set. We can observe the similar performance of vehicle detection. (1) In Fig. 23(a), Ours and Ours-Fuse perform well and surpass the other deep CNN based methods by a signifi- cant margin. This further demonstrates ...
Context 5
... Then, the image blocks are detected separately. Afterwards, the adjacent image blocks with detection results are stitched together. Table 7 and Fig. 23 show the numerical comparison results, Recall vs. IoU overlap ratios, and PRCs of the eleven methods on the SAR-Ship data set. We can observe the similar performance of vehicle detection. (1) In Fig. 23(a), Ours and Ours-Fuse perform well and surpass the other deep CNN based methods by a signifi- cant margin. This further demonstrates that our method is effec- tive for ship detection in SAR images and can generate candidate ship-like regions with a relatively high recall. (2) In Fig. 23(b), FRCN-VGG and YOLO2 achieved higher precision ...
Context 6
... the similar performance of vehicle detection. (1) In Fig. 23(a), Ours and Ours-Fuse perform well and surpass the other deep CNN based methods by a signifi- cant margin. This further demonstrates that our method is effec- tive for ship detection in SAR images and can generate candidate ship-like regions with a relatively high recall. (2) In Fig. 23(b), FRCN-VGG and YOLO2 achieved higher precision than FRCN-ZF and YOLO1 respectively, which demonstrates that the improve- ment is still useful in ship detection task. Furthermore, the detec- tion performance of Ours-Fuse is slightly improved over Ours, which demonstrates that the combination of multiple feature maps is useful for ...
Context 7
... Fig. 24 exhibits the qualitative result on one testing image from Hong Kong port, where the green boxes indicate the correct loca- tion, the red and blue boxes represent the false alarms and the ground truth respectively. To see the detection results more clearly, we enlarged four small regions denoted by cyan rectan- gles. From this figure, ...
Context 8
... R-CNN divides the framework of detection in two stages (see Fig. 2). In the first stage, called the RPN, images are processed by a feature extractor (e.g., Zeiler and Fergus (ZF) mode (Zeiler and Fergus, 2014) and VGG16 model (Simonyan and Zisserman, 2014)), and the topmost feature maps are used to predict bounding box proposals. In the second stage, these proposals are used to crop features from the topmost feature maps which are subsequently fed to the Fast R-CNN for classification and bounding box regression. While Faster R-CNN is an order of magnitude faster than Fast R- CNN, its speed is limited by the CNN feature extraction in the first stage and costly per-region computation in the second stage. PVA- NET ( Kim et al., 2016) mainly redesigns the feature extraction part by combining recent technical innovations, achieving the state-of- the-art accuracy in multi-category object detection task while min- imizing the computational cost. R-FCN ( Li et al., 2016) adopts the recent state-of-the-art Residual Nets (ResNets) to construct a fully convolutional object detector (see Fig. 3). On the one hand, ResNets has deep architectures (e.g. 50 or 101 layers) that are as translation- invariant as possible to effectively detect deformable objects ( He et al., 2016). On the other hand, the fully connected layers in Fast R-CNN are removed and replaced by a set of position-sensitive score maps, which can encode the spatial information for accurate object detection. Based on these improvement, R-FCN model could achieve comparable accuracy to Faster R-CNN at faster running times. However, all above three detectors struggle with small-size objects detection, mainly due to the coarseness of the feature maps used for proposal generation or detection. HyperNet ( Kong et al., 2016) and MS-CNN (Cai et al., 2016) conduct detection at multiple output layers, which provide an effective framework for multi-scale objects detection, but lead to heavy computational ...
Context 9
... order to compare the outputs of each algorithm and verify the effectiveness of detecting densely peaked objects with large scales variability, we selected a subarea that contains numerous densely peaked aircrafts with multiple size (e.g. the size of trans- porter is about five times larger than that of fighter). Fig. 20 shows the comparison results with different approaches. It can be seen that, our method could correctly detect most small size and den- sely peaked fighters, whereas other deep CNN based methods show more wrong detections. The results demonstrate that our method is more effective for densely peaked small size objects. Furthermore, our method have achieved lower missing detection for larger size transporter or bomber than other deep CNN based methods, which shows that our method is more effective for detecting objects with scales ...
Context 10
... vehicles in aerial images plays an important role for a wide range of applications. However, it is still a challenging problem due to the relatively small size, varying types, and the complex background. In order to test the robustness of our method, we trained a vehicle detector on ten aerial images and evaluated the detection performance on the other ten aerial images. For large-scale aerial images, we crop each image into small scale image blocks with an overlap of 50 pixels (larger than the average size of vehicles). Then, the image blocks are detected separately. Afterwards, the adjacent image blocks with detection results are stitched together. Table 6 and Fig. 21 show the numerical comparison results, Recall vs. IoU overlap ratios, and PRCs of the eleven methods on the Aerial-Vehicle data set. We can observe the following: (1) In Fig. 21(a), the recall of ACF drops more quickly than deep CNN based methods. Ours and Ours-Fuse performs well and surpasses the other deep CNN based methods by a significant margin. This further demonstrates that our method can generate candidate vehicle-like regions with a relatively high recall, which is desirable Fig. 21(b), FRCN-VGG and YOLO2 have higher precision than FRCN-ZF and YOLO1 respectively. This demonstrates that deeper network and the added anchor boxes can improve the detection performance. Furthermore, the detec- tion performance of Ours-Fuse is significantly improved over Ours, which demonstrates that the combination of multiple feature maps is useful for improving the detection performance. (3) In Table 6, Ours-Fuse could get the most desirable result in terms of F1 score and AP value, both of which are two comprehensive met- rics on recall rate and precision rate. The precision rate of PVANET reached 93.94%, which is the best among all the eleven methods, but the recall rate is rather low. Ours-Fuse harvest the highest recall rate of 83.59%, while its precision rate is not very good. YOLO1 is the fastest method, but with some compromise of detec- tion accuracy. Our method achieved a better tradeoff of detection accuracy and speed. Fig. 22 displays the visual performance of the detection result by Ours-Fuse method, where the green box indicates the correct location, and the red and blue box indicate the false alarm and the ground truth, respectively. To see more details of the perfor- mance, we enlarged four cropped image blocks (which are shown by four rectangles). It shows explicitly that despite some vehicles located in the shade, the proposed approach has successfully detected most of the vehicles. Specially, in the parking area(as shown in the top-right sub- figure), vehicles are densely parked, our method still achieved an excellent detection results. This demonstrates that our method is effective for densely peaked small size ...
Context 11
... order to illustrate the application of the proposed method in multi-modal remote sensing images, we explore the ship detection task in large scale SAR images. With the accurately annotated SAR ship dataset available, we trained our method and other nine base- line methods separately on ten large scale SAR images. During the testing process, large-scale SAR image was cropped into small scale image blocks with an overlap of 50 pixels (larger than the average size of ships). Then, the image blocks are detected separately. Afterwards, the adjacent image blocks with detection results are stitched together. Table 7 and Fig. 23 show the numerical comparison results, Recall vs. IoU overlap ratios, and PRCs of the eleven methods on the SAR-Ship data set. We can observe the similar performance of vehicle detection. (1) In Fig. 23(a), Ours and Ours-Fuse perform well and surpass the other deep CNN based methods by a signifi- cant margin. This further demonstrates that our method is effec- tive for ship detection in SAR images and can generate candidate ship-like regions with a relatively high recall. (2) In Fig. 23(b), FRCN-VGG and YOLO2 achieved higher precision than FRCN-ZF and YOLO1 respectively, which demonstrates that the improve- ment is still useful in ship detection task. Furthermore, the detec- tion performance of Ours-Fuse is slightly improved over Ours, which demonstrates that the combination of multiple feature maps is useful for improving the detection performance. (3) In Table 6, Ours-Fuse achieves the best performance in terms of F1 score and AP, respectively. The highest recall rate can be obtained by Ours-Fuse of 81.44%. Although CFAR achieved the highest preci- sion rate, the recall rate is rather low. Not surprisingly, YOLO1 is still the fastest method, our method is also considerably faster. Performance comparison between different methods on Aerial-Vehicle data set. The bold numbers denote the optimal values in each column. The italic underlined numbers denote the suboptimal values in each ...
Context 12
... order to illustrate the application of the proposed method in multi-modal remote sensing images, we explore the ship detection task in large scale SAR images. With the accurately annotated SAR ship dataset available, we trained our method and other nine base- line methods separately on ten large scale SAR images. During the testing process, large-scale SAR image was cropped into small scale image blocks with an overlap of 50 pixels (larger than the average size of ships). Then, the image blocks are detected separately. Afterwards, the adjacent image blocks with detection results are stitched together. Table 7 and Fig. 23 show the numerical comparison results, Recall vs. IoU overlap ratios, and PRCs of the eleven methods on the SAR-Ship data set. We can observe the similar performance of vehicle detection. (1) In Fig. 23(a), Ours and Ours-Fuse perform well and surpass the other deep CNN based methods by a signifi- cant margin. This further demonstrates that our method is effec- tive for ship detection in SAR images and can generate candidate ship-like regions with a relatively high recall. (2) In Fig. 23(b), FRCN-VGG and YOLO2 achieved higher precision than FRCN-ZF and YOLO1 respectively, which demonstrates that the improve- ment is still useful in ship detection task. Furthermore, the detec- tion performance of Ours-Fuse is slightly improved over Ours, which demonstrates that the combination of multiple feature maps is useful for improving the detection performance. (3) In Table 6, Ours-Fuse achieves the best performance in terms of F1 score and AP, respectively. The highest recall rate can be obtained by Ours-Fuse of 81.44%. Although CFAR achieved the highest preci- sion rate, the recall rate is rather low. Not surprisingly, YOLO1 is still the fastest method, our method is also considerably faster. Performance comparison between different methods on Aerial-Vehicle data set. The bold numbers denote the optimal values in each column. The italic underlined numbers denote the suboptimal values in each ...
Context 13
... order to illustrate the application of the proposed method in multi-modal remote sensing images, we explore the ship detection task in large scale SAR images. With the accurately annotated SAR ship dataset available, we trained our method and other nine base- line methods separately on ten large scale SAR images. During the testing process, large-scale SAR image was cropped into small scale image blocks with an overlap of 50 pixels (larger than the average size of ships). Then, the image blocks are detected separately. Afterwards, the adjacent image blocks with detection results are stitched together. Table 7 and Fig. 23 show the numerical comparison results, Recall vs. IoU overlap ratios, and PRCs of the eleven methods on the SAR-Ship data set. We can observe the similar performance of vehicle detection. (1) In Fig. 23(a), Ours and Ours-Fuse perform well and surpass the other deep CNN based methods by a signifi- cant margin. This further demonstrates that our method is effec- tive for ship detection in SAR images and can generate candidate ship-like regions with a relatively high recall. (2) In Fig. 23(b), FRCN-VGG and YOLO2 achieved higher precision than FRCN-ZF and YOLO1 respectively, which demonstrates that the improve- ment is still useful in ship detection task. Furthermore, the detec- tion performance of Ours-Fuse is slightly improved over Ours, which demonstrates that the combination of multiple feature maps is useful for improving the detection performance. (3) In Table 6, Ours-Fuse achieves the best performance in terms of F1 score and AP, respectively. The highest recall rate can be obtained by Ours-Fuse of 81.44%. Although CFAR achieved the highest preci- sion rate, the recall rate is rather low. Not surprisingly, YOLO1 is still the fastest method, our method is also considerably faster. Performance comparison between different methods on Aerial-Vehicle data set. The bold numbers denote the optimal values in each column. The italic underlined numbers denote the suboptimal values in each ...
Context 14
... Fig. 24 exhibits the qualitative result on one testing image from Hong Kong port, where the green boxes indicate the correct loca- tion, the red and blue boxes represent the false alarms and the ground truth respectively. To see the detection results more clearly, we enlarged four small regions denoted by cyan rectan- gles. From this figure, we can conclude briefly that (1) Whether in the areas of offshore or inshore, most of the ships has been suc- cessfully detected. Specially, ships in inland rivers or around small islands can also be correctly detected, which shows that our method is effective and useful. It is noteworthy that our method takes a large scale SAR image as input, and outputs the ship detec- tion results directly without land-masking. As shown in this fig- ure, there are nearly no false alarms on land area. Therefore, it has great potential for wide field application. (2) For the inshore areas, ships are small in size and densely peaked with complex background, our method still achieved satisfying detection perfor- mance, which demonstrates that our method is effective for den- sely peaked small size objects. (3) Although there are some false alarms (the red boxes), they look very similar to ships. Without any further information to determine they are ground truth, we consider them as false alarms, which reduces the precision rate of our method. Table 7 Performance comparison between different methods on SAR-Ship data set. The bold numbers denote the optimal values in each column. The italic underlined numbers denote the suboptimal values in each ...
Similar publications
Faster RCNN is a region proposal based object detection approach. It integrates the region proposal stage and classification stage into a single pipeline, which has both rapid speed and high detection accuracy. However, when the model is applied to the target detection of remote sensing imagery, faced with multi-scale targets, its performance is de...
Salient object detection is a hot spot of current computer vision. The emergence of the convolutional neural network (CNN) greatly improves the existing detection methods. In this paper, we present 3MNet, which is based on the CNN, to make the utmost of various features of the image and utilize the contour detection task of the salient object to ex...
Vehicle detection as a special case of object detection has practical meaning but faces challenges, such as the difficulty of detecting vehicles of various orientations, the serious influence from occlusion, the clutter of background, etc. In addition, existing effective approaches, like deep-learning-based ones, demand a large amount of training t...
With the increase of training data and the improvement of machine performance, the object detection method based on convolutional neural network (CNN) has become the mainstream algorithm in field of the current object detection. However, due to the complex background, occlusion and low resolution, there are still problems of small object detection....
Geospatial object detection from high spatial resolution (HSR) remote sensing imagery is a heated and challenging problem in the field of automatic image interpretation. Despite convolutional neural networks (CNNs) having facilitated the development in this domain, the computation efficiency under real-time application and the accurate positioning...
Citations
... In our study, data augmentation required a long computation time, but it resulted in species-and site-specific performance improvements. Similarly, a slight improvement in the robustness of the detector to varying conditions has been reported in other studies [14,63]. Therefore, we believe that the added benefit of data augmentation for tree species detection should be explored further, and additional augmentation methods, such as brightness and contrast variations, should be considered. ...
Automatic identification and mapping of tree species is an essential task in forestry and conservation. However, applications that can geolocate individual trees and identify their species in heterogeneous forests on a large scale are lacking. Here, we assessed the potential of the Convolutional Neural Network algorithm, Faster R-CNN, which is an efficient end-to-end object detection approach, combined with open-source aerial RGB imagery for the identification and geolocation of tree species in the upper canopy layer of heterogeneous temperate forests. We studied four tree species, i.e., Norway spruce (Picea abies (L.) H. Karst.), silver fir (Abies alba Mill.), Scots pine (Pinus sylvestris L.), and European beech (Fagus sylvatica L.), growing in heterogeneous temperate forests. To fully explore the potential of the approach for tree species identification, we trained single-species and multi-species models. For the single-species models, the average detection accuracy (F1 score) was 0.76. Picea abies was detected with the highest accuracy, with an average F1 of 0.86, followed by A. alba (F1 = 0.84), F. sylvatica (F1 = 0.75), and Pinus sylvestris (F1 = 0.59). Detection accuracy increased in multi-species models for Pinus sylvestris (F1 = 0.92), while it remained the same or decreased slightly for the other species. Model performance was more influenced by site conditions, such as forest stand structure, and less by illumination. Moreover, the misidentification of tree species decreased as the number of species included in the models increased. In conclusion, the presented method can accurately map the location of four individual tree species in heterogeneous forests and may serve as a basis for future inventories and targeted management actions to support more resilient forests.
... The difference in brightness or colour that differentiates an item is referred to as contrast (Deng et al. 2018). A larger degree of contrast indicates additional visible colour differences. ...
Satellite Image Processing is a vital field of research and development that involves the processing of images of the Earth and satellites obtained by artificial satellites. Images are digitally taken before being analyzed by computers to get information. Due to image format inadequacies and defects, data received from imaging sensors on satellite platforms includes deficiencies and errors, necessitating additional activities to improve image quality. The massive network of remote sensing satellites circling the Earth provides comprehensive and periodic coverage of the Earth, enabling a wide range of uses for human benefit. Before being applied to the kernel fuzzy C-means algorithm with spatial information with Penguin search Optimization (SKFCM with PeSOA) segmentation step, the image data is pre-processed. To extract a collection of features from the segmented nucleus, hybrid feature extraction is performed. In this hybrid approach, the discrete wavelet transform with gray-level co-occurrence matrix (DWT with GLCM) algorithm was used. The attributes that have been segmented and retrieved are utilised to train the Enhanced Probabilistic Neural Network classifier. Metrics such as accuracy, f-measure, specificity, and sensitivity are used to assess classification efficiency. When compared to other classifiers, the Enhanced Probabilistic Neural Network classifier has 98.1 percent accuracy.
... For instance, numerous CNNs have been implemented with various patch sizes and dimensions. to increase the accuracy of scene classification, Deng et al. [20] improved feature representations at various scales. Multi-scale CNNs were utilized by Yang et al. to recognize complex scenes (airports, homes, businesses, etc.) in remote sensing photos with greater accuracy than single-scale CNN networks. ...
The development of digital image technology has experienced rapid development, both in terms of the development of models and algorithms used as well as the quality and results of the management process carried out. Utilization of digital image management can be used in classifying the condition of vacant land in certain areas. A high level of urbanization causes an increase in population growth and uneven development in certain areas. Advanced technology has resulted in a vast constellation of satellites and aerial platforms. In general, many remote sensing images with an excellent spatial resolution (VFSR) are commercially available to the general public, like google earth. This platform provides much information regarding spatial conditions. So, data available on the platform allows it to be used as a medium for analyzing and classifying the availability of vacant land in certain areas. To support good regional and city planning and overcome problems due to high levels of urbanization, a model that can automatically classify vacant land in certain areas is needed using data that is openly available on Google Earth. Thus, this study experimented by classifying vacant land based on images from google earth using the Deep Learning model, namely Convolutional Neural Network (CNN). The CNN method is used because of its superiority in classifying images. The experiment results have an optimal for image classification using the CNN algorithm.
... Therefore, this kind of method has high design cost, poor feature robustness and weak generalization ability. Compared with the method of manually designed features, object detection based on deep learning uses CNN to extract image features [23,24], which has automatic and powerful feature extraction ability, better robustness and higher detection accuracy. Therefore, the traditional object detection method has been gradually replaced by deep learning-based methods. ...
... Therefore, this method can finally get a deformation effect similar to the expansion of the attention region, and the existence of distance measurement k also prevents selecting the point corresponding to the maximum position of Grad-CAM in the original image for each sampling. Finally, both the numerator and denominator in equation (24) can be realized by a convolution operation. In this case, k corresponds to one convolution operation (input and output channels are 1). ...
Remote sensing images contain important information such as airports, ports, and ships. By extracting remote sensing image features and learning the mapping relationship between image features and text semantic features, the interpretation and description of remote sensing image content can be realized, which has a wide range of application value in military and civil fields such as national defense security, land monitoring, urban planning, disaster mitigation and so on. Aiming at the complex background of remote sensing images and the lack of interpretability of existing target detection models, and the problems in feature extraction between different network structures, different layers, and the accuracy of target classification, we propose an object detection and interpretation model based on Gradient weighted class activation mapping and reinforcement learning. Firstly, ResNet is used as the main backbone network to extract the features of remote sensing images and generate feature graphs. Then, we add the global average pooling layer to obtain the corresponding feature weight vector of the feature graph. The weighted vectors are superimposed to output class activation maps. The reinforcement learning method is used to optimize the generated region generation network. At the same time, we improve the reward function of reinforcement learning to improve the effect of the region generation network. Finally, network dissecting analysis is used to obtain the interpretable semantic concept in the model. Through experiments, the average accuracy is more than 85%. Experimental results in the public remote sensing image description dataset show that the proposed method has high detection accuracy and good description performance for remote sensing images in complex environments.
... So far, many scholars from the SAR community have applied them successfully to ship detection. For example, Faster R-CNN was improved by Li et al. [30], Zhang et al. [31,32], Kang et al. [33], Lin et al. [34], Deng et al. [35], and Zhao et al. [36]. Cui et al. [37], Yang et al. [38], Fu et al. [39], and Gao et al. [40] proposed various variants of FPN to boost multi-scale detection performance. ...
Ship instance segmentation in synthetic aperture radar (SAR) images can provide more detailed location information and shape information, which is of great significance for port ship scheduling and traffic management. However, there is little research work on SAR ship instance segmentation, and the general accuracy is low because the characteristics of target SAR ship task, such as multi-scale, ship aspect ratio, and noise interference, are not considered. In order to solve these problems, we propose an idea of scale in scale (SIS) for SAR ship instance segmentation. Its essence is to establish multi-scale modes in a single scale. In consideration of the characteristic of the targeted SAR ship instance segmentation task, SIS is equipped with four tentative modes in this paper, i.e., an input mode, a backbone mode, an RPN mode (region proposal network), and an ROI mode (region of interest). The input mode establishes multi-scale inputs in a single scale. The backbone mode enhances the ability to extract multi-scale features. The RPN mode makes bounding boxes better accord with ship aspect ratios. The ROI mode expands the receptive field. Combined with them, a SIS network (SISNet) is reported, dedicated to high-quality SAR ship instance segmentation on the basis of the prevailing Mask R-CNN framework. For Mask R-CNN, we also redesign (1) its feature pyramid network (FPN) for better small ship detection and (2) its detection head (DH) for a more refined box regression. We conduct extensive experiments to verify the effectiveness of SISNet on the open SSDD and HRSID datasets. The experimental results reveal that SISNet surpasses the other nine competitive models. Specifically, the segmentation average precision (AP) index is superior to the suboptimal model by 4.4% on SSDD and 2.5% on HRSID.
... In recent years, with the rapid development of deep learning, computer vision has been widely used in face detection [2,3], medical image analysis [4], traffic sign detection [5], remote sensing detection [6] and other fields. At the same time, there are many classical target detection methods, which can be roughly divided into two categories: The two-stage detection algorithm is represented by R-CNN [7] and Faster-RCNN [8], which are based on feature extraction first into candidate frames, and then classification by a convolutional neural network. ...
For the problems of inaccurate recognition and the high missed detection rate of existing mask detection algorithms in actual scenes, a novel mask detection algorithm based on the YOLO-GBC network is proposed. Specifically, in the backbone network part, the global attention mechanism (GAM) is integrated to improve the ability to extract key information through cross-latitude information interaction. The cross-layer cascade method is adopted to improve the feature pyramid structure to achieve effective bidirectional cross-scale connection and weighted feature fusion. The sampling method of content-aware reassembly of features (CARAFE) is integrated into the feature pyramid network to fully retain the semantic information and global features of the feature map. NMS is replaced with Soft-NMS to improve model prediction frame accuracy by confidence decay method. The experimental results show that the average accuracy (mAP) of the YOLO-GBC reached 91.2% in the mask detection data set, which is 2.3% higher than the baseline YOLOv5, and the detection speed reached 64FPS. The accuracy and recall have also been improved to varying degrees, increasing the detection task of correctly wearing masks.
... For example, Markov's random field approach is used for feature classification (Grinias et al. 2016). Also, Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are utilized for different types of analysis, such as land-use change detection (Shihab et al. 2017), separation and detection of surface phenomena by material , identifying the behavior of the reflection spectrum related to natural phenomena (Deng et al. 2018) and (Cheng and Han 2016), predicting natural events and the rate of spatial density in terms of Spatio-temporal (Das and Ghosh 2016) and (Munawar et al. 2022), and optimal storage and query on remote sensing data using metadata (Weipeng et al. 2018) and . ...
In recent years, one of the biggest concerns of researchers has been environmental knowledge. This concern can be resolved by collecting and processing remote sensing data in the shortest possible time and cost with the highest accuracy and efficiency. In remote sensing, various types of satellite data are processed for different purposes and applications. In this process, data storage and processing methods, resource management, scalability, performance improvement, and efficiency are among the issues and challenges in this scope. This paper presents a service-oriented framework using big data and parallel processing in remote sensing to address these challenges. The proposed framework provides scalability, flexibility, and generalization without dependency on specific data or processing techniques. In addition, it provides reasonable results to quality criteria such as response time, efficiency, and performance. The evaluation results of the proposed framework show the effectiveness of the framework for various types of analysis of remote sensing data with acceptable accuracy.
... 3) Target Detection: The research of remote sensing image target detection has a broad application perspective. It can monitor the traffic conditions of important areas [191], roads, ports, and airports, and then coordinate the detection of aircraft in airports [192], vehicles on roads [193], and ships in ports [194]. However, owing to the complex information of remote sensing images and the small size of targets, detection methods based on natural images cannot achieve good results on remote sensing images. ...
Brain-inspired algorithms have become a new trend in next-generation artificial intelligence. Through research on brain science, the intelligence of remote sensing algorithms can be effectively improved. This paper summarizes and analyzes the essential properties of brain cognise learning and the recent advance of remote sensing interpretation. Firstly, this paper introduces the structural composition and the properties of the brain. Then, five represent brain-inspired algorithms are studied, including multiscale geometry analysis, compressed sensing, attention mechanism, reinforcement learning, and transfer learning. Next, this paper summarizes the data types of remote sensing, the development of typical applications of remote sensing interpretation and the implementations of remote sensing, including datasets, software, and hardware. Finally, the top ten open problems and the future direction of brain-inspired remote sensing interpretation are discussed. This work aims to comprehensively review the brain mechanisms and the development of remote sensing and to motivate future research on brain-inspired remote sensing interpretation.
... In deep learning, there are a variety of multi-scale models, mainly in the input side [30] and feature fusion [31]. In order to match the scale variations of different objects in remote sensing images, [32] proposed a multi-scale target detection model for remote sensing images by combining a multi-scale object proposal network (MS-OPN) and an accurate object detection network (AODN) . [8] used time sequence coding network to extract rich and adaptive multi-scale spatiotemporal features of crops from multi-temporal images. ...
As the basic unit of farmland, parcel is crucial for remote sensing tasks, such as urban management. Previous studies of farmland parcels extraction are based on boundary detection and instance segmentation methods. However, these methods perform poorly in the parcels with complex shape and fuzzy boundary due to the insufficient feature extraction capability. Moreover, for the lack of multi-scale features extraction and fusion, they are difficult to extract different scale farmland parcels accurately. Based on these issues, we propose a Fuzzy-Boundary Enhanced Trident Network, named FBETNet, to enhance the feature of fuzzy boundary and generate multi-scale parcels. First, a semantic-guided multi-task strategy is introduced in order to enhance the feature of fuzzy boundary. Second, we design a multi-scale trident module to further improve the performance of multi-scale feature extraction. Finally, a adversarial data augmentation strategy is employed in the training phase to strengthen the robustness and stability of out proposed method. Experiments show that our proposed method improves significantly in both accuracy and visualization, especially for the parcels with fuzzy boundary and complex shape.
... Deng et al. [13] present a unified and effectual approach to concurrently identifying multiclass objects from RSI with huge scales of variabilities. Primarily, the researchers reform the feature extraction by implementing Concatenated ReLU and Inception elements that is improves the variation of receptive field sizes. ...