June 2019
·
166 Reads
·
740 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
June 2019
·
166 Reads
·
740 Citations
December 2018
·
1,781 Reads
·
2 Citations
Our goal is to train a policy for autonomous driving via imitation learning that is robust enough to drive a real vehicle. We find that standard behavior cloning is insufficient for handling complex driving scenarios, even when we leverage a perception system for preprocessing the input and a controller for executing the output on the car: 30 million examples are still not enough. We propose exposing the learner to synthesized data in the form of perturbations to the expert's driving, which creates interesting situations such as collisions and/or going off the road. Rather than purely imitating all data, we augment the imitation loss with additional losses that penalize undesirable events and encourage progress -- the perturbations then provide an important signal for these losses and lead to robustness of the learned model. We show that the ChauffeurNet model can handle complex situations in simulation, and present ablation experiments that emphasize the importance of each of our proposed changes and show that the model is responding to the appropriate causal factors. Finally, we demonstrate the model driving a car in the real world.
March 2017
·
268 Reads
·
391 Citations
We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.
March 2016
We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.
March 2016
·
860 Reads
·
2,311 Citations
The International Journal of Robotics Research
We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.
June 2015
·
249 Reads
·
83 Citations
Proceedings - IEEE International Conference on Robotics and Automation
Pedestrian detection is of crucial importance to autonomous driving applications. Methods based on deep learning have shown significant improvements in accuracy, which makes them particularly suitable for applications, such as pedestrian detection, where reducing the miss rate is very important. Although they are accurate, their runtime has been at best in seconds per image, which makes them not practical for onboard applications. We present a Large-Field-Of-View (LFOV) deep network for pedestrian detection, that can achieve high accuracy and is designed to make deep networks work faster for detection problems. The idea of the proposed Large-Field-of-View deep network is to learn to make classification decisions simultaneously and accurately at multiple locations. The LFOV network processes larger image areas at much faster speeds than typical deep networks have been able to, and can intrinsically reuse computations. Our pedestrian detection solution, which is a combination of a LFOV network and a standard deep network, works at 280 ms per image on GPU and achieves 35.85 average miss rate on the Caltech Pedestrian Detection Benchmark.
January 2015
·
253 Reads
·
263 Citations
June 2014
·
20,725 Reads
·
42,404 Citations
Journal of Machine Learning Research
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
April 2014
·
1,107 Reads
·
854 Citations
I present a new way to parallelize the training of convolutional neural networks across multiple GPUs. The method scales significantly better than all alternatives when applied to modern convolutional neural networks.
July 2012
·
8,160 Reads
·
6,537 Citations
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.
... A classic approach to tackle this problem is DAgger [23,25,35], which unrolls the policy and queries an expert to generate new demonstrations, but querying experts is not readily available for traffic simulation. Prior work has proposed closed-loop training using hand-crafted recovery controllers [1] or reinforcement learning [18,21,33]. However, it is inherently difficult to design rewards with high behavioral realism or recovery controllers robust to divergent modes. ...
June 2019
... LiDAR-based 3D object detection methodologies have gained widespread adoption in the domains of autonomous driving [1], [2] and robot navigation systems [3], [4], primarily because point clouds generated by LiDAR sensors explicitly capture geometric information and are minimally influenced by weather conditions. Due to sparsity and non-uniform distribution features of point clouds, there are two challenges for 3D object detection: establishing robust object representations from points with diverse distributions and designing high performance detection networks. ...
December 2018
... Although the deep neural networks (DNNs) and convolutional neural networks (CNNs) show unprecedented accuracy surpassing the human capabilities for tasks such as image classification, natural language processing, and speech recognition [27], they are compute-and resource-intensive owing to the huge number of multiply and accumulate operations (MAC) [28]. This restricts their deployment on the resourceconstrained IoT edge devices with conventional von-Neumann computing engines. ...
January 2012
... Reinforcement learning (RL) is a promising machine learning method (Canese et al. 2021;Sutton and Barto 2018) and shows great potential in various industrial and research applications, such as autonomous vehicles (Kiran et al. 2022), manufacturing (Elsayed et al. 2022), robot control (Levine et al. 2017), and natural language processing (Ramamurthy et al. 2022). The core idea of RL is to learn by trial and error, the agent constantly adjusts the policy according to the feedback of the random environment to maximize the long-term reward, so as to obtain the optimal policy. ...
March 2017
... As the precise modeling of contact or irregularly shaped objects is infeasible, model-free methods are required to expand the workplace for robots. An example of a model-free method is reinforcement learning [1]. Although this method is completely model-free, it requires numerous attempts, which are difficult to achieve. ...
March 2016
The International Journal of Robotics Research
... Han et al (15) proposed a fusion system combining LIDAR and color camera for detection., and improved the detection accuracy by improving the YOLO algorithm. Kuang et al (16) effectively improved the performance of pedestrian detection by extending the original YOLOv3 structure and newly defining the loss function.Sermanet al (17) proposed a convolutional sparse coding unsupervised model for pedestrian detection.Angelova et al. (18) merged the concept of fast cascades and depth networks to achieve pedestrian detection. Li et al (19) detect pedestrians using multiple built-in subnet adaptive scales.Cai et al (20) combined highly diverse features of complexity in order to design a complex perceptual cascade detector that could be used for detecting pedestrians.Wang et al (21) suggested a definition for the Repulsion loss function, which could be utilized for detecting pedestrians who are occluded. ...
January 2015
... The loss or errors are then calculated, and the weights of the connections between neurons are adjusted in order to minimize the error, a process known as back-propagation. One pass of the entire training data is referred to as an epoch (which consists of one or more batches and a part of the dataset is used to train the neural network) and it is repeated iteratively until the weights have converged [38]. Normalization of the inputs and at times the inputs of the internal layers are carried out (known as batch normalization) to stabilize the artificial neural networks. ...
June 2014
Journal of Machine Learning Research
... Autonomous vehicles (AVs) face numerous technical and societal hurdles that must be addressed to ensure their successful integration into everyday traffic and acceptance by the public. Here's an overview of the challenges: [11], [12], [13] Technical Challenges: AVs must reliably detect obstacles at high speeds and over long distances, a crucial aspect for ensuring safety. The complexity of the software and the need for robust real-time data analytics pose significant challenges to ensuring reliable operations under diverse traffic conditions and environmental factors. ...
June 2015
Proceedings - IEEE International Conference on Robotics and Automation
... The advent of big data and big computing has enabled these networks to become deeper, and they are capable of learning and representing a wide selection of nonlinear functions [29]. Deep learning has been a powerful tool for automating the extraction of meaningful data from large datasets and has resulted in remarkable progress in a number of areas, such as computer vision [30,31] and speech recognition [32,33]. We will overview the deep learning's benefits and drawbacks, then discuss the constituent part of a deep NN, and at last, review some of the networks which are used for deep materials informatics. ...
January 2012
Advances in Neural Information Processing Systems
... The Split CIFAR-100 [80] divides the CIFAR-100 dataset [100] into Q = 10 subsets (or tasks) {D 0 , . . . D 9 }. ...
May 2012