Kai Yi’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


Figure 2. The architecture of our network with the proposed self-supervised meta-learning approach.
Figure 3. Illustration of the proposed DACB.
Figure 4. Comparison of visualization results on PASCAL-5 i . Columns correspond to the query image with mask, support image with mask, MaskSplit results, and our results.
Figure 5. Ablation experiments on the value of parameters a and b. 4.5.2. The Architecture of MLDAC As shown in Table 5, the combination of different schemes was validated to search for the optimal settings of MLDAC. As shown in Table 5, our proposed learnable linear positional encoding and skip connection are, indeed, effective. The former enhances the connections between different features, and the latter strengthens the semantic information, making it easier to obtain the correlated region between the supporting and query images. Meanwhile, the 1/4 and 1/8 features accomplish the segmentation task in a more effective way.
Comparison of results on PASCAL-5 i between our method and other popular methods.

+4

A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation
  • Article
  • Full-text available

July 2024

·

5 Reads

·

1 Citation

Kai Yi

·

Weihang Wang

·

Yi Zhang

Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g. vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model’s generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-5i and COCO-20i datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy.

Download