Alex Krizhevsky's research while affiliated with Google Inc. and other places

Publications (15)

Preprint
Full-text available
Our goal is to train a policy for autonomous driving via imitation learning that is robust enough to drive a real vehicle. We find that standard behavior cloning is insufficient for handling complex driving scenarios, even when we leverage a perception system for preprocessing the input and a controller for executing the output on the car: 30 milli...
Conference Paper
We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independentl...
Article
We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independentl...
Article
Pedestrian detection is of crucial importance to autonomous driving applications. Methods based on deep learning have shown significant improvements in accuracy, which makes them particularly suitable for applications, such as pedestrian detection, where reducing the miss rate is very important. Although they are accurate, their runtime has been at...
Article
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for address...
Article
I present a new way to parallelize the training of convolutional neural networks across multiple GPUs. The method scales significantly better than all alternatives when applied to modern convolutional neural networks.
Article
Full-text available
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several oth...
Article
April 8, 2009Groups at MIT and NYU have collected a dataset of millions of tiny colour images from the web. It is, in principle, an excellent dataset for unsupervised training of deep generative models, but previous researchers who have tried this have found it di cult to learn a good set of lters from the images. We show how to train a multi-layer...
Article
We describe how to train a two-layer convolutional Deep Belief Network (DBN) on the 1.6 million tiny images dataset. When training a convolutional DBN, one must decide what to do with the edge pixels of teh images. As the pixels near the edge of an image contribute to the fewest convolutional lter outputs, the model may
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60...
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 m...
Conference Paper
Full-text available
The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, hand-engineered features, like SIFT [6], that produce a whole vector of outputs including an explicit representation of the pose of t...
Conference Paper
We show how to learn many layers of features on color images and we use these features to initialize deep autoencoders. We then use the autoencoders to map images to short binary codes. Using semantic hashing [6], 28-bit codes can be used to retrieve images that are similar to a query image in a time that is independent of the size of the database....
Article
Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The Gaussian-Binary RBMs that have been used to model real-valued data are no...

Citations

... CNNs usually contain a combination of convolutional (filtering), pooling layers and non-linear activations which can learn a hierarchy of different feature representations, from broad in the initial layers to more taskspecific in the later layers (LeCun et al. 2015). Indeed, it is standard practice in image processing to use the activations of different layers of so-called imageNets CNNs, trained on over a million images (Krizhevsky et al. 2012;Simonyan and Zisserman 2014), to perform feature extraction in other visual classification tasks. A somewhat recent fascinating result from computational speech analysis shows that imageNets can also produce meaningful speech, and audio feature representations for tasks such as emotion recognition and bipolar mood state, for example, (Cummins et al. 2017a;Ringeval et al. 2018) and audio features for tasks such as irregular heart sound detection (Ren et al. 2018). ...
... In recent years, there has been significant advances in robot manipulation: from grasping to pushing and pick/place tasks [2,36,23]; from manipulating a rubik's cube [1] to opening cabinet doors or makeshift doors [60,49]. While there has been substantial progress, most experiments in this area have still been restricted to simulation [37,4,71] or table-top experiments in the lab [31,11]. We ask a basic question as to why hasn't this progress transferred to manipulation in the real-world setup and why do we still see most experiments in lab setups or simulations? ...
... Berscheid et al. [112] utilised model-based learning, taking RGBD images as the observation for a pick and move task. Levine et al. [113] studied learning-based real-time robotic grasping on a vast scale. Their case studies were conducted on two robotic lines, one has 14 robots, and the other has eight robots. ...
... Convolutional neural networks (CNNs), which exhibit significant powers of discrimination in the color image domain [4], have been successfully applied to people detection. Angelova et al. [5] propose a cascade framework consisting of several CNNs for pedestrian detection where proposals are obtained by a dense sliding window. However, each CNN in each cascade stage is applied repeatedly to each proposal, without sharing computations on convolutions. ...
... Considering the problem of data sparse in the recommendation system based on deep learning, batch Adagrad algorithm is adopted as the optimizer rather than the naive SGD algorithm, because the learning speed of Adagrad algorithm can be adaptive in the training stage, resulting in its faster convergence speed. At the same time, dropout techniques are used in pooling operations to deal with over-fitting of models [30]. ...
... Perception Vehicle Detection [34][35][36][37][38][39][40][41][42][43][44][45] Traffic Sign and Light Recognition [46][47][48][49][50][51][52][53][54][55][56][57][58][59] Pedestrian Detection [60][61][62][63][64][65][66][67][68][69][70][71][72][73][74][75][76][77][78] Lane Detection and Tracking [44, Traffic Scene Analysis [55,[102][103][104][105][106][107][108][109][110][111][112][113][114][115][116][117][118][119][120] Decision Making - End-to-End Controlling and Prediction - [144][145][146][147][148][149][150][151][152][153][154][155][156][157][158][159][160][161][162][163] Path and Motion Planning - [164][165][166][167][168][169][170][171][172][173][174][175] AR-HUD - [176][177][178][179][180][181][182][183][184][185][186] To visualize the leading algorithms of each domain or subdomain, Figure 4 presents the distribution of algorithms, where the reviewed algorithm-centered approaches have a predominant role in AVS development. Figure 5 shows the dataset clustering which was used for the reviewed approaches. ...
... DCNN takes advantage of the connection of more layers compared with the traditional neural network, which results in applying for complicated tasks. Many pre-trained networks have been introduced to deal with different datasets such as AlexNet [21], VGG [22], ResNet [23], GoogleNet [24], and DenseNet [25]. As a result, transfer learning approaches are widely employed in many applications which could transfer the previous learning features to the new tasks with impressive outcomes in terms of training time and accuracy. ...
... In our empirical evaluation, we conduct extensive experiments on the CIFAR10 [26] and MNIST [29] datasets in different FL settings. In these experiments, we estimate the privacy leakage using a mutual information neural estimator [6] and evaluate the dependency of the leakage on different FL system parameters: number of users, local batch size and model size. ...
... Generative learning technique is typically used to characterize the high-order correlation properties or features for pattern analysis or synthesis, as well as the joint statistical distributions of the visible data and their associated classes [29]. Commonly used deep neural network techniques for generative learning are Autoencoder (AE) [55], Deep Belief Network (DBN) [56], and Self-Organizing Map (SOM) [57]. ...
... DenseNet-121 (Huang et al., 2018), Res-Nets (He et al., 2016), Alex-Net (Krizhevsky, 2014), and VGGs (Simonyan, 2015). All performances were compared on the test set and presented in Table 2 in terms of the number of parameters and average inference times of CPU and GPU. ...