Chao Hu’s research while affiliated with CHINA COMMUNICATIONS CONSTRUCTION and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


Multi-scale Feature Imitation for Unsupervised Anomaly Localization
  • Conference Paper

February 2024

·

2 Reads

Lecture Notes in Electrical Engineering

Chao Hu

·

Shengxin Lai

The unsupervised anomaly localization task is challenging due to the absence of abnormal samples during the training phase, dealing with multiple exceptions for the same object, and detecting unseen anomalies. In order to address these problems, we propose a novel approach that consists of a separate teacher-student feature imitation network and a multi-scale processing strategy that combines an image and feature pyramid. Additionally, we design a side task to optimize weight for each student network block through gradient descent algorithm. Compared with these anomaly localization methods based on feature modeling, experimental results demonstrate that our proposed method has a better performance on MVTec dataset which is a real industrial product detection dataset. Furthermore, our multi-scale strategy effectively improves the performance compared to the benchmark method.


When SAM Meets Sonar Images

January 2024

·

27 Reads

·

14 Citations

IEEE Geoscience and Remote Sensing Letters

·

·

Liqiang Zhu

·

[...]

·

Chao Hu

Segment anything model (SAM) has revolutionized the way of segmentation due to its remarkable capacity for generalized segmentation. However, SAM’s performance may decline when applied to tasks involving domains that differ from natural images. Nonetheless, by employing fine-tuning techniques, SAM exhibits promising capabilities in specific domains, such as medicine and planetary science. Notably, there is a lack of research on the application of SAM to sonar imaging. In this letter, we aim to address this gap by conducting a comprehensive investigation of SAM’s performance on sonar images. Specifically, we evaluate SAM with various settings on sonar images. Moreover, we fine-tune SAM for sonar images using effective methods both with prompts and for semantic segmentation. The experimental results reveal a substantial enhancement in the performance of the fine-tuned SAM, increasing from 0.24 to 0.75 in mIoU. This underscores the promising potential of SAM for sonar image segmentation applications. Additionally, even when only 2 out of the 11 categories are utilized for training, the model with box prompt sustains an mIoU of 0.69, showcasing its outstanding capability for general segmentation in sonar images. The code is available at https://github.com/wangsssky/SonarSAM .



Figure 2: Illustration of the SAM network structure (a) and the mainstream methods for adapting SAM for specific tasks (b), i.e., fine-tuning the image encoder with adaptation methods, fine-tuning the light-weight mask decoder and prompt encoder, and using a custom segmentation head.
Figure 3: The fine-tuning settings and the major structures of the adaptation methods for SAM.
The performance of SAM on sonar images evaluated by DICE score.
Performance of the models with different scale backbones evaluated by DICE Score.
When SAM Meets Sonar Images
  • Preprint
  • File available

June 2023

·

347 Reads

·

1 Citation

Segment Anything Model (SAM) has revolutionized the way of segmentation. However, SAM's performance may decline when applied to tasks involving domains that differ from natural images. Nonetheless, by employing fine-tuning techniques, SAM exhibits promising capabilities in specific domains, such as medicine and planetary science. Notably, there is a lack of research on the application of SAM to sonar imaging. In this paper, we aim to address this gap by conducting a comprehensive investigation of SAM's performance on sonar images. Specifically, we evaluate SAM using various settings on sonar images. Additionally, we fine-tune SAM using effective methods both with prompts and for semantic segmentation, thereby expanding its applicability to tasks requiring automated segmentation. Experimental results demonstrate a significant improvement in the performance of the fine-tuned SAM.

Download


Multi-scale Feature Imitation for Unsupervised Anomaly Localization

December 2022

·

7 Reads

The unsupervised anomaly localization task faces the challenge of missing anomaly sample training, detecting multiple types of anomalies, and dealing with the proportion of the area of multiple anomalies. A separate teacher-student feature imitation network structure and a multi-scale processing strategy combining an image and feature pyramid are proposed to solve these problems. A network module importance search method based on gradient descent optimization is proposed to simplify the network structure. The experimental results show that the proposed algorithm performs better than the feature modeling anomaly localization method on the real industrial product detection dataset in the same period. The multi-scale strategy can effectively improve the effect compared with the benchmark method.


IDMS: Instance Depth for Multi-scale Monocular 3D Object Detection

December 2022

·

5 Reads

Due to the lack of depth information of images and poor detection accuracy in monocular 3D object detection, we proposed the instance depth for multi-scale monocular 3D object detection method. Firstly, to enhance the model's processing ability for different scale targets, a multi-scale perception module based on dilated convolution is designed, and the depth features containing multi-scale information are re-refined from both spatial and channel directions considering the inconsistency between feature maps of different scales. Firstly, we designed a multi-scale perception module based on dilated convolution to enhance the model's processing ability for different scale targets. The depth features containing multi-scale information are re-refined from spatial and channel directions considering the inconsistency between feature maps of different scales. Secondly, so as to make the model obtain better 3D perception, this paper proposed to use the instance depth information as an auxiliary learning task to enhance the spatial depth feature of the 3D target and use the sparse instance depth to supervise the auxiliary task. Finally, by verifying the proposed algorithm on the KITTI test set and evaluation set, the experimental results show that compared with the baseline method, the proposed method improves by 5.27\% in AP40 in the car category, effectively improving the detection performance of the monocular 3D object detection algorithm.



Data Augmentation Vision Transformer for Fine-grained Image Classification

November 2022

·

22 Reads

Recently, the vision transformer (ViT) has made breakthroughs in image recognition. Its self-attention mechanism (MSA) can extract discriminative labeling information of different pixel blocks to improve image classification accuracy. However, the classification marks in their deep layers tend to ignore local features between layers. In addition, the embedding layer will be fixed-size pixel blocks. Input network Inevitably introduces additional image noise. To this end, this paper studies a data augmentation vision transformer (DAVT) based on data augmentation and proposes a data augmentation method for attention cropping, which uses attention weights as the guide to crop images and improve the ability of the network to learn critical features. Secondly, this paper also proposes a hierarchical attention selection (HAS) method, which improves the ability of discriminative markers between levels of learning by filtering and fusing labels between levels. Experimental results show that the accuracy of this method on the two general datasets, CUB-200-2011, and Stanford Dogs, is better than the existing mainstream methods, and its accuracy is 1.4\% and 1.6\% higher than the original ViT, respectively.


Pedestrian Spatio-Temporal Information Fusion For Video Anomaly Detection

November 2022

·

7 Reads

Aiming at the problem that the current video anomaly detection cannot fully use the temporal information and ignore the diversity of normal behavior, an anomaly detection method is proposed to integrate the spatiotemporal information of pedestrians. Based on the convolutional autoencoder, the input frame is compressed and restored through the encoder and decoder. Anomaly detection is realized according to the difference between the output frame and the true value. In order to strengthen the characteristic information connection between continuous video frames, the residual temporal shift module and the residual channel attention module are introduced to improve the modeling ability of the network on temporal information and channel information, respectively. Due to the excessive generalization of convolutional neural networks, in the memory enhancement modules, the hopping connections of each codec layer are added to limit autoencoders' ability to represent abnormal frames too vigorously and improve the anomaly detection accuracy of the network. In addition, the objective function is modified by a feature discretization loss, which effectively distinguishes different normal behavior patterns. The experimental results on the CUHK Avenue and ShanghaiTech datasets show that the proposed method is superior to the current mainstream video anomaly detection methods while meeting the real-time requirements.


Citations (3)


... This is a great inspiration for us to abandon the use of Mask prompts in SAM and instead utilize collaborative dense prompt embedding and box prompts to enhance segmentation performance and generate more effective segmentation masks. 2) Prompt Difficulty: As shown in Fig. 1(a), shadow occlusion, high noise, and low signal-to-noise ratio surround the segmented object, resulting in unclear boundaries, while box prompts can degrade SAM performance due to unclear boundaries [12], [13]. 3) Interaction Difficulty: The SAM image encoder, trained on natural optical images, lacks the ability to effectively represent the boundaries and acoustic features in FLS images, as shown in Fig. 1(b). ...

Reference:

Fine-Tuning SAM for Forward-Looking Sonar With Collaborative Prompts and Embedding
When SAM Meets Sonar Images
  • Citing Article
  • January 2024

IEEE Geoscience and Remote Sensing Letters

... The core strength of the SAM is its zero-shot capability, which raises the question of whether this capability transfers to different use cases in environments not included during training. Multiple recent publications deal with surveying use cases utilizing the SAM in different domains, e.g., medical image analysis , geoinformation science / remote sensing [77][78][79][80][81], exogeology [82,83], construction [84], material science [76], biological imaging [85], sonar imaging [86], and agriculture [87,88]. For cross-domain surveys looking at the SAM, we refer readers to [54,89,90]. ...

When SAM Meets Sonar Images

... Telecommunication networks empower the massive amount of video streaming data collected from multiple Industrial Internet of Things (IIoT) [29] devices and can be utilized for various computer vision (CV) tasks. Few application areas include autonomous cars [60], robotic manipulation [31], medical image segmentation [14], surveillance system [27] [57], smart traffic management [23], smart home [61], and many more. Video salient object detection (VSOD) is a crucial pre-processing component of computer vision systems, which extracts visually distinctive objects in a video stream. ...

Efficient Unsupervised Video Object Segmentation Network Based on Motion Guidance
  • Citing Conference Paper
  • December 2022