Alex Krizhevsky’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (17)


ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
  • Conference Paper

June 2019

·

167 Reads

·

751 Citations

Mayank Bansal

·

Alex Krizhevsky

·

Abhijit Ogale

Figure 2: Training the driving model. (a) The core ChauffeurNet model with a FeatureNet and an AgentRNN, (b) Co-trained road mask prediction net and PerceptionRNN, and (c) Training losses are shown in blue, and the green labels depict the ground-truth data. The dashed arrows represent the recurrent feedback of predictions from one iteration to the next.
Figure 3: (a) Schematic of ChauffeurNet. (b) Memory updates over multiple iterations.
Figure 4: Software architecture for the end-to-end driving pipeline.
Figure 5: Trajectory Perturbation. (a) An original logged training example where the agent is driving along the center of the lane. (b) The perturbed example created by perturbing the current agent location (red point) in the original example away from the lane center and then fitting a new smooth trajectory that brings the agent back to the original target location along the lane center.
Figure 6: Visualization of predictions and loss functions on an example input. The top row is at the input resolution, while the bottom row shows a zoomed-in view around the current agent location.

+1

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
  • Preprint
  • File available

December 2018

·

1,799 Reads

·

4 Citations

Our goal is to train a policy for autonomous driving via imitation learning that is robust enough to drive a real vehicle. We find that standard behavior cloning is insufficient for handling complex driving scenarios, even when we leverage a perception system for preprocessing the input and a controller for executing the output on the car: 30 million examples are still not enough. We propose exposing the learner to synthesized data in the form of perturbations to the expert's driving, which creates interesting situations such as collisions and/or going off the road. Rather than purely imitating all data, we augment the imitation loss with additional losses that penalize undesirable events and encourage progress -- the perturbations then provide an important signal for these losses and lead to robustness of the learned model. We show that the ChauffeurNet model can handle complex situations in simulation, and present ablation experiments that emphasize the importance of each of our proposed changes and show that the model is responding to the appropriate causal factors. Finally, we demonstrate the model driving a car in the real world.

Download

Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection

March 2017

·

269 Reads

·

393 Citations

We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.


Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

March 2016

·

2 Reads

·

1 Citation

We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.


Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

March 2016

·

866 Reads

·

2,341 Citations

The International Journal of Robotics Research

We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.


Pedestrian detection with a Large-Field-Of-View deep network

June 2015

·

250 Reads

·

83 Citations

Proceedings - IEEE International Conference on Robotics and Automation

Pedestrian detection is of crucial importance to autonomous driving applications. Methods based on deep learning have shown significant improvements in accuracy, which makes them particularly suitable for applications, such as pedestrian detection, where reducing the miss rate is very important. Although they are accurate, their runtime has been at best in seconds per image, which makes them not practical for onboard applications. We present a Large-Field-Of-View (LFOV) deep network for pedestrian detection, that can achieve high accuracy and is designed to make deep networks work faster for detection problems. The idea of the proposed Large-Field-of-View deep network is to learn to make classification decisions simultaneously and accurately at multiple locations. The LFOV network processes larger image areas at much faster speeds than typical deep networks have been able to, and can intrinsically reuse computations. Our pedestrian detection solution, which is a combination of a LFOV network and a standard deep network, works at 280 ms per image on GPU and achieves 35.85 average miss rate on the Caltech Pedestrian Detection Benchmark.



Dropout: A Simple Way to Prevent Neural Networks from Overfitting

June 2014

·

20,861 Reads

·

43,067 Citations

Journal of Machine Learning Research

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.



Fig. 2: The frame classification error rate on the core test set of the TIMIT benchmark. Com- parison of standard and dropout finetuning for different network architectures. Dropout of 50% of the hidden units and 20% of the input units improves classification. 
Fig. 4: Some Imagenet test cases with the probabilities of the best 5 labels underneath. Many of the top 5 labels are quite plausible. 
Fig. 6: Frame classification error and cross-entropy on the (a) Training and (b) Validation set as learning progresses. The training error is computed using the stochastic nets.
Fig. 7: Classification error rate on the (a) training and (b) validation sets of the Reuters dataset as learning progresses. The training error is computed using the stochastic nets.
Improving neural networks by preventing co-adaptation of feature detectors

July 2012

·

8,235 Reads

·

6,599 Citations

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.


Citations (17)


... For a neuron i in a hidden layer l, the error gradient with respect to the weight connecting it to neuron j in the previous layer l−1 is computed as: (14) where: + is the error term for neuron i in layer l, + −1 is the activation from the previous layer. ...

Reference:

Self-Driving Car Navigation with Single-Beam LiDAR and Neural Networks Using JavaScript
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
  • Citing Preprint
  • March 2016

... Traditionally, AD architectures have adopted a modular approach, with specialized components handling distinct aspects such as perception [5][6][7][8][9], mapping [6,10], prediction [11,12], and planning [13]. However, while this compartmentalization aids in debugging and optimizing individual modules, it often leads to scalability issues due to inter-module communication errors and rigid, predefined interfaces that struggle to adapt to new or unforeseen conditions [14,15,11,16]. ...

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
  • Citing Conference Paper
  • June 2019

... Famously demonstrated in dynamic or unpredictable scenarios such as AlphaGo and Atari games (Silver et al. 2016;Mnih et al. 2013), RL has attracted the interest of researchers looking to tackle real-world problems (Castelletti et al. 2013;Mahmud et al. 2018;Song et al. 2021;Degrave et al. 2022). Among them are studies applying RL techniques to control tasks such as contact-rich manipulation (Thomas et al. 2018;Luo et al. 2019;Wu et al. 2022;Elguea-Aguinaco et al. 2023), humanoid bipedal walking (García and Shafie 2020; Rodriguez and Behnke 2021;Li et al. 2021), and autonomous vehicle driving (Bansal et al. 2018;Yuan et al. 2019;Kiran et al. 2021). Previous research into RL in construction has for the most part focused on RL's potential to enable robots to conduct core construction activities. ...

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

... Deep learning, based on convolutional neural network (CNN), is a developed method in computer vision that can be applied for optimizing geospatial data. Since AlexNet [9] achieved satisfying results in the TELKOMNIKA Telecommun Comput El Control  Deep learning-based palm tree detection in unmanned aerial vehicle imagery with … (Agung Syetiawan) 157 ImageNet large scale visual recognition challenge (ILSVRC) in 2012, CNN has gained popularity. Then, research on deep learning has significantly advanced, including in earth observation [10]. ...

Imagenet classification with deep convolutional neural networks
  • Citing Conference Paper
  • January 2012

... In recent years, with the rapid rise of deep learning and computer vision technologies, researchers have begun to introduce these advanced techniques into the field of robotic arm grasping. Deep learning-based grasping methods [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17] have quickly become mainstream in robotic grasping due to their advantages, such as no need for manual feature matching, strong feature extraction capabilities, and good robustness (see Section 2 for details). Nevertheless, existing grasp detection algorithms [3][4][5][6][7][8][9][10][11][12][13][14]16] struggle to achieve a perfect balance between accuracy and speed. ...

Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection
  • Citing Conference Paper
  • March 2017

... To increase the chances of success of random movements, most of the approaches constrain the operational space to top-down movements (Yang et al. (2023)) or are limited to parallel grippers (Fang et al. (2020)). As the self-supervised acquisition of data is very expensive (Levine et al. (2018)), most of the recent works on grasping rely on human-provided demonstrations (Wang et al. (2021), Mosbach and Behnke (2023)). But manually collecting the demonstrations is time expensive. ...

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
  • Citing Article
  • March 2016

The International Journal of Robotics Research

... Han et al (15) proposed a fusion system combining LIDAR and color camera for detection., and improved the detection accuracy by improving the YOLO algorithm. Kuang et al (16) effectively improved the performance of pedestrian detection by extending the original YOLOv3 structure and newly defining the loss function.Sermanet al (17) proposed a convolutional sparse coding unsupervised model for pedestrian detection.Angelova et al. (18) merged the concept of fast cascades and depth networks to achieve pedestrian detection. Li et al (19) detect pedestrians using multiple built-in subnet adaptive scales.Cai et al (20) combined highly diverse features of complexity in order to design a complex perceptual cascade detector that could be used for detecting pedestrians.Wang et al (21) suggested a definition for the Repulsion loss function, which could be utilized for detecting pedestrians who are occluded. ...

Real-Time Pedestrian Detection with Deep Network Cascades
  • Citing Conference Paper
  • January 2015

... The loss or errors are then calculated, and the weights of the connections between neurons are adjusted in order to minimize the error, a process known as back-propagation. One pass of the entire training data is referred to as an epoch (which consists of one or more batches and a part of the dataset is used to train the neural network) and it is repeated iteratively until the weights have converged [38]. Normalization of the inputs and at times the inputs of the internal layers are carried out (known as batch normalization) to stabilize the artificial neural networks. ...

Dropout: A Simple Way to Prevent Neural Networks from Overfitting
  • Citing Article
  • June 2014

Journal of Machine Learning Research

... Autonomous vehicles (AVs) face numerous technical and societal hurdles that must be addressed to ensure their successful integration into everyday traffic and acceptance by the public. Here's an overview of the challenges: [11], [12], [13] Technical Challenges: AVs must reliably detect obstacles at high speeds and over long distances, a crucial aspect for ensuring safety. The complexity of the software and the need for robust real-time data analytics pose significant challenges to ensuring reliable operations under diverse traffic conditions and environmental factors. ...

Pedestrian detection with a Large-Field-Of-View deep network
  • Citing Article
  • June 2015

Proceedings - IEEE International Conference on Robotics and Automation

... The advent of big data and big computing has enabled these networks to become deeper, and they are capable of learning and representing a wide selection of nonlinear functions [29]. Deep learning has been a powerful tool for automating the extraction of meaningful data from large datasets and has resulted in remarkable progress in a number of areas, such as computer vision [30,31] and speech recognition [32,33]. We will overview the deep learning's benefits and drawbacks, then discuss the constituent part of a deep NN, and at last, review some of the networks which are used for deep materials informatics. ...

ImageNet Classification with Deep Convolutional Neural Networks
  • Citing Article
  • January 2012

Advances in Neural Information Processing Systems