Image Processing Pipeline: The objects in the frame (four cars) marked in different colors are reflected in the BEV Cartesian and Polar pixel images. The origin is at the bottom center. The azimuth (θ), range (r) ground truth polar coordinates are marked for reference. r denotes the distance from the objects to the ego vehicle (in meters); θ represents the angle at which the objects are located in degrees.

Image Processing Pipeline: The objects in the frame (four cars) marked in different colors are reflected in the BEV Cartesian and Polar pixel images. The origin is at the bottom center. The azimuth (θ), range (r) ground truth polar coordinates are marked for reference. r denotes the distance from the objects to the ego vehicle (in meters); θ represents the angle at which the objects are located in degrees.

Source publication
Preprint
Full-text available
Cameras can be used to perceive the environment around the vehicle, while affordable radar sensors are popular in autonomous driving systems as they can withstand adverse weather conditions unlike cameras. However, radar point clouds are sparser with low azimuth and elevation resolution that lack semantic and structural information of the scenes, r...

Contexts in source publication

Context 1
... we transform the camera image to an RA like representation that entails less intensive computational requirements. This transformation involves two steps, as shown in Fig. 2. We emphasize that a taxonomy of algorithms are presented in a recent survey [42] that includes our inspiration PolarFormer [43] which performs object detection in BEV Polar ...
Context 2
... feature extractor: As explained in Section III-B, the camera images are processed (refer Fig. 2) to obtain a Bird's-Eye View RA Polar representation. This representation is the input to our camera only CNN model. We have chosen this representation as it directly relates to the decoded features of the radar only model, which in turn supplements the radar features upon fusion, as shown by a thick black fusion circle in Fig. ...

Similar publications

Article
Full-text available
Multi-camera 3D object detection for autonomous driving is a challenging problem that has garnered notable attention from both academia and industry. An obstacle encountered in vision-based techniques involves the precise extraction of geometry-conscious features from RGB images. Recent approaches have utilized geometric-aware image backbones pretr...
Article
Full-text available
LiDAR and camera are two key sensors that provide mutually complementary information for 3D detection in autonomous driving. Existing multimodal detection methods often decorate the original point cloud data with camera features to complete the detection, ignoring the mutual fusion between camera features and point cloud features. In addition, grou...
Article
Full-text available
Registering light detection and ranging (LiDAR) data with optical camera images enhances spatial awareness in autonomous driving, robotics, and geographic information systems. The current challenges in this field involve aligning 2D-3D data acquired from sources with distinct coordinate systems, orientations, and resolutions. This paper introduces...
Preprint
Full-text available
Cross-modal data registration has long been a critical task in computer vision, with extensive applications in autonomous driving and robotics. Accurate and robust registration methods are essential for aligning data from different modalities, forming the foundation for multimodal sensor data fusion and enhancing perception systems' accuracy and re...
Preprint
Full-text available
Surgical automation requires precise guidance and understanding of the scene. Current methods in the literature rely on bulky depth cameras to create maps of the anatomy, however this does not translate well to space-limited clinical applications. Monocular cameras are small and allow minimally invasive surgeries in tight spaces but additional proc...