AFMSFFNet: An Anchor-Free-Based Feature Fusion Model for Ship Detection
Remote Sensing
Abstract and Figures
This paper aims to improve a small-scale object detection model to achieve detection accuracy matching or even surpassing that of complex models. Efforts are made in the module design phase to minimize parameter count as much as possible, thereby providing the potential for rapid detection of maritime targets. Here, this paper introduces an innovative Anchor-Free-based Multi-Scale Feature Fusion Network (AFMSFFNet), which improves the problems of missed detection and false positives, particularly in inshore or small target scenarios. Leveraging the YOLOX tiny as the foundational architecture, our proposed AFMSFFNet incorporates a novel Adaptive Bidirectional Fusion Pyramid Network (AB-FPN) for efficient multi-scale feature fusion, enhancing the saliency representation of targets and reducing interference from complex backgrounds. Simultaneously, the designed Multi-Scale Global Attention Detection Head (MGAHead) utilizes a larger receptive field to learn object features, generating high-quality reconstructed features for enhanced semantic information integration. Extensive experiments conducted on publicly available Synthetic Aperture Radar (SAR) image ship datasets demonstrate that AFMSFFNet outperforms the traditional baseline models in detection performance. The results indicate an improvement of 2.32% in detection accuracy compared to the YOLOX tiny model. Additionally, AFMSFFNet achieves a Frames Per Second (FPS) of 78.26 in SSDD, showcasing superior efficiency compared to the well-established performance networks, such as faster R-CNN and CenterNet, with efficiency improvement ranging from 4.7 to 6.7 times. This research provides a valuable solution for efficient ship detection in complex backgrounds, demonstrating the efficacy of AFMSFFNet through quantitative improvements in accuracy and efficiency compared to existing models.
Figures - available from: Remote Sensing
This content is subject to copyright.
ResearchGate has not been able to resolve any citations for this publication.
Ship detection by using remote-sensing images based on a synthetic aperture radar (SAR) plays an important role in managing water transportation and marine safety. However, complex background, a small ship size, and low focus on small ships results in difficulties in feature extraction and low detection accuracy. This study proposes a new small SAR ship-detection network. First, a transformer-based dynamic sparse attention module is used to improve the focus and extraction of small ship features. Second, the feature maps are fused with deep layers, and small target-friendly detection heads are used to improve the processing of global information in the network. Third, a more suitable fused loss function is used for small ships to ensure the multi-scale detection capability. Experimental results on publicly available datasets, LS-SSDD_v1.0 and AIR-SARShip-1.0, show that the proposed method effectively improves the detection accuracy of small ships on SAR images without computational burden boost. Compared with other methods based on the convolutional neural network, the proposed method demonstrates better multiscale detection performance.
Target detection technology has been greatly improved for the synthetic aperture radar (SAR) images recently, due to the advancement in deep learning (DL) domain. However, because of the existence of clutter in the SAR images, it's still a challenge to detect small targets with high accuracy and low computational complexity. To solve this problem, a detection algorithm based on feature fusion and cross-layer connection (FFCLC) network is proposed in this paper. Firstly, the attention feature fusion (AFF) is applied to improve the feature fusion ability for the small targets through allocating weights to various feature maps adaptively. Meanwhile, the depthwise separable convolution (DW-Conv) is used to reduce the computational complexity caused by the increasement of network layers. Then, a cross-layer connection (Cross-Connect) submodule is proposed to fuse shallow features with deep features further. Finally, a multi-scale target detection (Multi-Detect) submodule is designed to improve the detection ability for the small targets. We compare the proposed algorithm with the other representative methods on the SAR-Ship-Dataset and SSDD, quantitative evaluations show that our proposed algorithm's can reach the highest computational efficiency. Therefore, because of the superior performance in terms of accuracy and efficiency, the algorithm proposed in this paper is more suitable to detect small targets for the SAR images.
One-shot neural architecture search (NAS) has achieved impressive results in the field of synthetic aperture radar (SAR) ship detection. However, it is a challenge to balance resource consumption and search speed. To address this issue, we propose a zero-shot NAS method for searching the backbone of SAR ship detection model, named as ZeroSARNas, which is implemented via a multi-characterization proxy and an integer linear programming (ILP) search algorithm. Specifically, we first design the multi-characterization proxy for network capacity prediction, which takes advantage of information entropy and local intrinsic dimensionality (LID) of feature maps, named as ELID proxy, to obtain a more comprehensive understanding of each candidate module in the search space. We then formulate the NAS problem as a '0-1' ILP problem which maximizes the ELID value under the different constraints such as parameters to quickly identify the optimal network. The experimental results show that the detection accuracy of the networks found by ZeroSARNas on the SSDD, HRSID, and LS-SSDD-v1.0 datasets can reach 98.59%, 91.30%, and 75.11% in mean average precision (mAP) with only 1.23M, 1.75M, and 1.29M parameters, respectively. The proposed method reduces the search time from several GPU days or hours to 10.0 seconds, achieving competitive search efficiency.
In recent years, transformer-based networks, known for their ability to model long-range dependencies, have been widely used in downstream computer vision tasks, surpassing certain neural network architectures. However, transformer-based networks suffer from issues such as large parameter size, high computational complexity, and difficulties in extending spatial and channel features to the third or even higher orders, resulting in convergence challenges for small to medium-sized datasets and limited effectiveness in extracting high-order detailed features. In this paper, we propose a high-order spatial and channel controllable convolution module, named CS
n
, which can replace standard convolutions in any convolutional network. In the context of remote sensing small object detection, CS
n
demonstrates superior performance compared to Self-Attention structures embedded in neural networks. Moreover, it introduces long-range dependency relationships among pixels, similar to Self-Attention, and adopts a cascaded recursive approach to extend spatial and channel features to arbitrary higher orders without introducing significant additional computation. This extension captures crucial information from high-order spatial and channel dimensions, resulting in improved accuracy for small object detection. Additionally, we construct a novel, versatile CS
n
-(FPN+PAN) structure for object detection networks, referred to as CS
n
Net. Finally, our proposed model exhibits significant advantages in remote sensing detection when compared to state-of-the-art methods on publicly available SAR datasets (SSDD, HRSID) and optical remote sensing dataset (NWPU-10), achieving respective improvements of 3.4%, 4.5%, and 0.4% in mAP
50
compared to baseline models.