Figure - available from: Sensors
This content is subject to copyright.
The overall architecture of our implementation of 3D object detection consists of three main components: (a) Feature extraction module. (b) BEV fusion encoder module. The input radar and image features are encoded mainly through local image sampling attention and maximum adjacent radar sampling attention to generate BEV features. (c) Object decoder module. Content encoding is initialized using heat maps, and position encoding is generated using dynamic reference points. In addition, noisy queries are added to the query to help stabilize the matching process.

The overall architecture of our implementation of 3D object detection consists of three main components: (a) Feature extraction module. (b) BEV fusion encoder module. The input radar and image features are encoded mainly through local image sampling attention and maximum adjacent radar sampling attention to generate BEV features. (c) Object decoder module. Content encoding is initialized using heat maps, and position encoding is generated using dynamic reference points. In addition, noisy queries are added to the query to help stabilize the matching process.

Source publication
Article
Full-text available
In autonomous driving, the fusion of multiple sensors is considered essential to improve the accuracy and safety of 3D object detection. Currently, a fusion scheme combining low-cost cameras with highly robust radars can counteract the performance degradation caused by harsh environments. In this paper, we propose the IRBEVF-Q model, which mainly c...

Similar publications

Article
Full-text available
Fusing LiDARand camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Due to inherent differences between different modalities, seeking an efficient and accurate fusion method is of great importance. Recently, significant progress has been made in 3D object detection methods based on...
Conference Paper
Full-text available
This paper describes the implementation of an in-telligent navigation system for the mobile robot GFS-X equippedwith a manipulator, aimed at solving a range of tasks on the arenawithin a limited time frame. The approach to route planning andadjustment using a map obtained from an overhead camera isexamined. YOLOv11 is utilized to gather information...
Preprint
Full-text available
Camera-to-robot (also known as eye-to-hand) calibration is a critical component of vision-based robot manipulation. Traditional marker-based methods often require human intervention for system setup. Furthermore, existing autonomous markerless calibration methods typically rely on pre-trained robot tracking models that impede their application on e...
Preprint
Full-text available
This research addresses the challenge of camera calibration and distortion parameter prediction from a single image using deep learning models. The main contributions of this work are: (1) demonstrating that a deep learning model, trained on a mix of real and synthetic images, can accurately predict camera and lens parameters from a single image, a...
Article
Full-text available
Compared to conventional single-aperture infrared cameras, the bio-inspired infrared compound eye camera integrates the advantages of infrared imaging technology with the benefits of multi-aperture systems, enabling simultaneous information acquisition from multiple perspectives. This enhanced detection capability demonstrates unique performance in...