Ganlin Cai’s research while affiliated with Minjiang University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


The overall architecture of our implementation of 3D object detection consists of three main components: (a) Feature extraction module. (b) BEV fusion encoder module. The input radar and image features are encoded mainly through local image sampling attention and maximum adjacent radar sampling attention to generate BEV features. (c) Object decoder module. Content encoding is initialized using heat maps, and position encoding is generated using dynamic reference points. In addition, noisy queries are added to the query to help stabilize the matching process.
The details of BEV Fusion Encoder. Discretizing three-dimensional space into uniformly distributed points, project these points onto a multiscale feature map, and sample the points that fall into the feature map. In addition, the distance of these points from the radar point is calculated, and each point is sampled for the closest radar feature.
Local image feature sampling. Project the 3D reference point onto the graph to obtain the sampling position in the x and y directions. Obtain the characteristic value of the sampling point by obtaining the values around the sampling point.
Maximum proximity radar sampling. The distance between the radar point and the 3D reference point is calculated to obtain a 3D tensor. The features of the K radar points closest to each 3D reference point are taken as sampling features.
Original position encoding and dynamic position encoding. Traditional position coding uses randomly generated fixed position coding to generate reference points. Dynamic position coding first generates reference points then uses the reference points to generate position coding. At the same time, each layer outputs position information to modify the reference points, so the position coding information also changes dynamically.

+9

IRBEVF-Q: Optimization of Image–Radar Fusion Algorithm Based on Bird’s Eye View Features
  • Article
  • Full-text available

July 2024

·

33 Reads

Ganlin Cai

·

Feng Chen

·

Ente Guo

In autonomous driving, the fusion of multiple sensors is considered essential to improve the accuracy and safety of 3D object detection. Currently, a fusion scheme combining low-cost cameras with highly robust radars can counteract the performance degradation caused by harsh environments. In this paper, we propose the IRBEVF-Q model, which mainly consists of BEV (Bird’s Eye View) fusion coding module and an object decoder module.The BEV fusion coding module solves the problem of unified representation of different modal information by fusing the image and radar features through 3D spatial reference points as a medium. The query in the object decoder, as a core component, plays an important role in detection. In this paper, Heat Map-Guided Query Initialization (HGQI) and Dynamic Position Encoding (DPE) are proposed in query construction to increase the a priori information of the query. The Auxiliary Noise Query (ANQ) then helps to stabilize the matching. The experimental results demonstrate that the proposed fusion model IRBEVF-Q achieves an NDS of 0.575 and a mAP of 0.476 on the nuScenes test set. Compared to recent state-of-the-art methods, our model shows significant advantages, thus indicating that our approach contributes to improving detection accuracy.

Download