About
193
Publications
75,505
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,891
Citations
Publications
Publications (193)
Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. However, the scarcity of large-scale driving datasets has hindered the development of robust and generalizable motion prediction models, limiting their ability to capture complex interactions an...
Object-level mapping builds a 3D map of objects in a scene with detailed shapes and poses from multi-view sensor observations. Conventional methods struggle to build complete shapes and estimate accurate poses due to partial occlusions and sensor noise. They require dense observations to cover all objects, which is challenging to achieve in robotic...
We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised...
Simultaneous localization and mapping (SLAM) in slowly varying scenes is important for long-term robot task completion. Failing to detect scene changes may lead to inaccurate maps and, ultimately, lost robots. Classical SLAM algorithms assume static scenes, and recent works take dynamics into account, but require scene changes to be observed in con...
3D object reconstruction is important for semantic scene understanding. It is challenging to reconstruct detailed 3D shapes from monocular images directly due to a lack of depth information, occlusion and noise. Most current methods generate deterministic object models without any awareness of the uncertainty of the reconstruction. We tackle this p...
The large-scale deployment of autonomous vehicles is yet to come, and one of the major remaining challenges lies in urban dense traffic scenarios. In such cases, it remains challenging to predict the future evolution of the scene and future behaviors of objects, and to deal with rare adverse events such as the sudden appearance of occluded objects....
Current LiDAR-based 3D object detectors for autonomous driving are almost entirely trained on human-annotated data collected in specific geographical domains with specific sensor setups, making it difficult to adapt to a different domain. MODEST is the first work to train 3D object detectors without any labels. Our work, HyperMODEST, proposes a uni...
Accurate 3D object detection in all weather conditions remains a key challenge to enable the widespread deployment of autonomous vehicles, as most work to date has been performed on clear weather data. In order to generalize to adverse weather conditions, supervised methods perform best if trained from scratch on all weather data instead of finetun...
We introduce ProPanDL, a family of networks capable of uncertainty-aware panoptic segmentation. Unlike existing segmentation methods, ProPanDL is capable of estimating full probability distributions for both the semantic and spatial aspects of panoptic segmentation. We implement and evaluate ProPanDL variants capable of estimating both parametric (...
In this work, we present a novel target-based lidar-camera extrinsic calibration methodology that can be used for non-overlapping field of view (FOV) sensors. Contrary to previous work, our methodology overcomes the non-overlapping FOV challenge using a motion capture system (MCS) instead of traditional simultaneous localization and mapping approac...
Robotic eye-in-hand calibration is the task of determining the rigid 6-DoF pose of the camera with respect to the robot end-effector frame. In this paper, we formulate this task as a non-linear optimization problem and introduce an active vision approach to strategically select the robot pose for maximizing calibration accuracy. Specifically, given...
An effective framework for learning 3D representations for perception tasks is distilling rich self-supervised image features via contrastive learning. However, image-to point representation learning for autonomous driving datasets faces two main challenges: 1) the abundance of self-similarity, which results in the contrastive losses pushing away s...
Object detection is a safety-critical aspect of autonomous driving, allowing vehicles to identify moving objects in the scene for tracking, prediction and decision making. Current detectors, however, tend to provide point estimates for detected objects, which lack information on the variability of the prediction and how well it fits the model that...
Estimating the uncertainty in deep neural network predictions is crucial for many real-world applications. A common approach to model uncertainty is to choose a parametric distribution and fit the data to it using maximum likelihood estimation. The chosen parametric form can be a poor fit to the data-generating distribution, resulting in unreliable...
6D pose estimation of textureless objects is a valuable but challenging task for many robotic applications. In this work, we propose a framework to address this challenge using only RGB images acquired from multiple viewpoints. The core idea of our approach is to decouple 6D pose estimation into a sequential two-step process, first estimating the 3...
As autonomous decision-making agents move from narrow operating environments to unstructured worlds, learning systems must move from a closed-world formulation to an open-world and few-shot setting in which agents continuously learn new classes from small amounts of information. This stands in stark contrast to modern machine learning systems that...
3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections remains challenging as existing systems tend to omit critical contextual information. Our proposed solution, Inter...
Maintaining an up-to-date map to reflect recent
changes in the scene is very important, particularly in situations
involving repeated traversals by a robot operating in an environ-
ment over an extended period. Undetected changes may cause a
deterioration in map quality, leading to poor localization, inef-
ficient operations, and lost robots. Volum...
The estimation of uncertainty in robotic vision, such as 3D object detection, is an essential component in developing safe autonomous systems aware of their own performance. However, the deployment of current uncertainty estimation methods in 3D object detection remains challenging due to timing and computational constraints. To tackle this issue,...
Maintaining an up-to-date map to reflect recent changes in the scene is very important, particularly in situations involving repeated traversals by a robot operating in an environment over an extended period. Undetected changes may cause a deterioration in map quality, leading to poor localization, inefficient operations, and lost robots. Volumetri...
LiDAR has become one of the primary 3D object detection sensors in autonomous driving. However, LiDAR's diverging point pattern with increasing distance results in a non-uniform sampled point cloud ill-suited to discretized volumetric feature extraction. Current methods either rely on voxelized point clouds or use inefficient farthest point samplin...
Camera and LiDAR sensor modalities provide complementary appearance and geometric information useful for detecting 3D objects for autonomous vehicle applications. However, current fusion models underperform state-of-art LiDAR-only methods on 3D object detection benchmarks. Our proposed solution, Dense Voxel Fusion (DVF) is a sequential fusion metho...
Depth acquisition with the active stereo camera is a challenging task for highly reflective objects. When setup permits, multi-view fusion can provide increased levels of depth completion. However, due to the slow acquisition speed of high-end active stereo cameras, collecting a large number of viewpoints for a single scene is generally not practic...
We propose a network architecture capable of reliably estimating uncertainty of regression based predictions without sacrificing accuracy. The current state-of-the-art uncertainty algorithms either fall short of achieving prediction accuracy comparable to the mean square error optimization or underestimate the variance of network predictions. We pr...
Autonomous driving datasets are often skewed and in particular, lack training data for objects at farther distances from the ego vehicle. The imbalance of data causes a performance degradation as the distance of the detected objects increases. In this paper, we propose pattern-aware ground truth sampling, a data augmentation technique that downsamp...
In this work, we introduce a microscopic traffic flow model called Scalar Capacity Model (SCM) which can be used to study the formation of traffic on an airway link for autonomous Unmanned Aerial Vehicles (UAVs) as well as for the ground vehicles on the road. Given the 3D trajectory of UAV flights (as opposed to the 2D trajectory of ground vehicles...
Relying on monocular image data for precise 3D object detection remains an open problem, whose solution has broad implications for cost-sensitive applications such as traffic monitoring. We present UrbanNet, a modular architecture for long range monocular 3D object detection with static cameras. Our proposed system combines commonly available urban...
Capturing uncertainty in object detection is indispensable for safe autonomous driving. In recent years, deep learning has become the de-facto approach for object detection, and many probabilistic object detectors have been proposed. However, there is no summary on uncertainty estimation in deep object detection, and existing methods are either bui...
As autonomous decision-making agents move from narrow operating environments to unstructured worlds, learning systems must move from a closed-world formulation to an open-world and few-shot setting in which agents continuously learn new classes from small amounts of information. This stands in stark contrast to modern machine learning systems that...
Monocular 3D object detection is a key problem for autonomous vehicles, as it provides a solution with simple configuration compared to typical multi-sensor systems. The main challenge in monocular 3D detection lies in accurately predicting object depth, which must be inferred from object and scene cues due to the lack of direct range measurement....
Successful visual navigation depends upon capturing images that contain sufficient useful information. In this paper, we explore a data-driven approach to account for environmental lighting changes, improving the quality of images for use in visual odometry (VO) or visual simultaneous localization and mapping (SLAM). We train a deep convolutional n...
Predictive uncertainty estimation is an essential next step for the reliable deployment of deep object detectors in safety-critical tasks. In this work, we focus on estimating predictive distributions for bounding box regression output with variance networks. We show that in the context of object detection, training variance networks with negative...
The Canadian Adverse Driving Conditions (CADC) dataset was collected with the Autonomoose autonomous vehicle platform, based on a modified Lincoln MKZ. The dataset, collected during winter within the Region of Waterloo, Canada, is the first autonomous driving dataset that focuses on adverse driving conditions specifically. It contains 7,000 frames...
Capturing uncertainty in object detection is indispensable for safe autonomous driving. In recent years, deep learning has become the de-facto approach for object detection, and many probabilistic object detectors have been proposed. However, there is no summary on uncertainty estimation in deep object detection, and existing methods are not only b...
Accurate and reliable 3D object detection is vital to safe autonomous driving. Despite recent developments, the performance gap between stereo-based methods and LiDAR-based methods is still considerable. Accurate depth estimation is crucial to the performance of stereo-based 3D object detection methods, particularly for those pixels associated with...
The Canadian Adverse Driving Conditions (CADC) dataset was collected with the Autonomoose autonomous vehicle platform, based on a modified Lincoln MKZ. The dataset, collected during winter within the Region of Waterloo, Canada, is the first autonomous vehicle dataset that focuses on adverse driving conditions specifically. It contains 7,000 frames...
We define a new problem called the Vehicle Scheduling Problem (VSP). The goal is to minimize an objective function, such as the number of tardy vehicles over a transportation network subject to maintaining safety distances, meeting hard deadlines, and maintaining speeds on each link between the allowed minimums and maximums. We prove VSP is an NP-h...
The use of mobile ground and aerial robotics presents a powerful means to augment current visual inspection practice by overcoming some common weaknesses related to accessibility, repeatability, hidden defect detection, and quantification. In this paper, a novel ground robotic bridge inspection platform consisting of a rugged mobile platform equipp...
Unmanned aerial vehicles (UAVs) have increasingly been adopted for safety, security, and rescue missions, for which they need precise and reliable pose estimates relative to their environment. To ensure mission safety when relying on visual perception, it is essential to have an approach to assess the integrity of the visual localization solution....
Safe autonomous driving requires reliable 3D object detection-determining the 6 DoF pose and dimensions of objects of interest. Using stereo cameras to solve this task is a cost-effective alternative to the widely used LiDAR sensor. The current state-of-the-art for stereo 3D object detection takes the existing PSMNet stereo matching network, with n...
In this work, we introduce a microscopic traffic flow model called Scalar Capacity Model (SCM) which can be used to study the formation of traffic on an airway link for autonomous Unmanned Aerial Vehicles (UAV) as well as for the ground vehicles on the road. Given the 3D nature of UAV flights, the main novelty in our model is to eliminate the commo...
Accurately estimating the orientation of pedestrians is an important and challenging task for autonomous driving because this information is essential for tracking and predicting pedestrian behavior. This paper presents a flexible Virtual Multi-View Synthesis module that can be adopted into 3D object detection methods to improve orientation estimat...
The University of Toronto is one of eight teams competing in the SAE AutoDrive Challenge -- a competition to develop a self-driving car by 2020. After placing first at the Year 1 challenge, we are headed to MCity in June 2019 for the second challenge. There, we will interact with pedestrians, cyclists, and cars. For safe operation, it is critical t...
We present MonoPSR, a monocular 3D object detection method that leverages proposals and shape reconstruction. First, using the fundamental relations of a pinhole camera model, detections from a mature 2D object detector are used to generate a 3D proposal per object in a scene. The 3D location of these proposals prove to be quite accurate, which gre...
One of the challenging aspects of incorporating deep neural networks into robotic systems is the lack of uncertainty measures associated with their output predictions. Recent work has identified aleatoric and epistemic as two types of uncertainty in the output of deep neural networks, and provided methods for their estimation. However, these method...
In this paper, we address the state initialization problem in recurrent neural networks (RNNs), which seeks proper values for the RNN initial states at the beginning of a prediction interval. The proposed methods employ various forms of neural networks (NNs) to generate proper initial state values for RNNs. A variety of RNNs are trained using the p...
In order to facilitate long-term localization using a visual simultaneous localization and mapping (SLAM) algorithm, careful feature selection is required such that reference points persist over long durations and the runtime and storage complexity of the algorithm remain consistent. We present SIVO (Semantically Informed Visual Odometry and Mappin...
In this paper, we present a convex-optimizationbased method to solve speed planning problems over a fixed path for autonomous driving in both static and dynamic environments. Our contributions are twofold. First, we introduce a general, flexible and complete speed planning optimization which includes time efficiency, smoothness objectives and dynam...
Dynamic Camera Clusters (DCCs) are multi-camera systems where one or more cameras are mounted on actuated mechanisms such as a gimbal. Existing methods for DCC calibration rely on joint angle measurements to resolve the time-varying transformation between the dynamic and static camera. This information is usually provided by motor encoders, however...
Training 3D object detectors for autonomous driving has been limited to small datasets due to the effort required to generate annotations. Reducing both task complexity and the amount of task switching done by annotators is key to reducing the effort and time required to generate 3D bounding box annotations. This paper introduces a novel ground tru...
Traffic light and sign detectors on autonomous cars are integral for road scene perception. The literature is abundant with deep learning networks that detect either lights or signs, not both, which makes them unsuitable for real-life deployment due to the limited graphics processing unit (GPU) memory and power available on embedded systems. The ro...
Recurrent Neural Networks (RNNs) can encode rich dynamics which makes them suitable for modeling dynamic systems. To train an RNN for multi-step prediction of dynamic systems, it is crucial to efficiently address the state initialization problem, which seeks proper values for the RNN initial states at the beginning of each prediction interval. In t...