Xiangyu Zhang’s research while affiliated with Microsoft and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


Identity Mappings in Deep Residual Networks
  • Conference Paper

October 2016

·

2,973 Reads

·

8,995 Citations

Lecture Notes in Computer Science

Kaiming He

·

Xiangyu Zhang

·

Shaoqing Ren

·

Jian Sun

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62 % error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https:// github. com/ KaimingHe/ resnet-1k-layers.



Identity Mappings in Deep Residual Networks

March 2016

·

981 Reads

·

5,709 Citations

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which further makes training easy and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10/100, and a 200-layer ResNet on ImageNet.


Deep Residual Learning for Image Recognition

December 2015

·

8,325 Reads

·

105,621 Citations

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.



Accelerating Very Deep Convolutional Networks for Classification and Detection

May 2015

·

536 Reads

·

929 Citations

IEEE Transactions on Pattern Analysis and Machine Intelligence

This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community. Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We develop an effective solution to the resulting nonlinear optimization problem without the need of stochastic gradient descent (SGD). More importantly, while current methods mainly focus on optimizing one or two layers, our nonlinear method enables an asymmetric reconstruction that reduces the rapidly accumulated error when multiple (e.g., >=10) layers are approximated. For the widely used very deep VGG-16 model, our method achieves a whole-model speedup of 4x with merely a 0.3% increase of top-5 error in ImageNet classification. Our 4x accelerated VGG-16 model also shows a graceful accuracy degradation for object detection when plugged into the latest Fast R-CNN detector.


Object Detection Networks on Convolutional Feature Maps

April 2015

·

1,279 Reads

·

461 Citations

IEEE Transactions on Pattern Analysis and Machine Intelligence

Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep ConvNet architectures. The object classifier, however, has not received much attention and most state-of-the-art systems (like R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We take inspiration from traditional object classifiers, such as DPM, and experiment with deep networks that have part-like filters and reason over latent variables. We discover that on pre-trained convolutional feature maps, even randomly initialized deep classifiers produce excellent results, while the improvement due to fine-tuning is secondary; on HOG features, deep classifiers outperform DPMs and produce the best HOG-only results without external data. We believe these findings provide new insight for developing object detection systems. Our framework, called Networks on Convolutional feature maps (NoC), achieves outstanding results on the PASCAL VOC 2007 (73.3% mAP) and 2012 (68.8% mAP) benchmarks.


Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

February 2015

·

3,985 Reads

·

13,383 Citations

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.


Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

February 2015

·

1,275 Reads

·

16,482 Citations

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.


Efficient and Accurate Approximations of Nonlinear Convolutional Networks

November 2014

·

172 Reads

·

183 Citations

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint which helps to reduce the complexity of filters. We develop an effective solution to this constrained nonlinear optimization problem. An algorithm is also presented for reducing the accumulated error when multiple layers are approximated. A whole-model speedup ratio of 4x is demonstrated on a large network trained for ImageNet, while the top-5 error rate is only increased by 0.9%. Our accelerated model has a comparably fast speed as the "AlexNet", but is 4.7% more accurate.


Citations (11)


... The objective of the CNN branch is to utilize the capabilities of the CNN-based algorithms for local feature extraction. Specifically, a similar identity mapping concept (He et al., 2016) to the ResNet CNN model was employed in the Scorch Mapper. In deep learning models, accuracy becomes saturated and then rapidly declines when network depth increases. ...

Reference:

High-resolution UAV-based blueberry scorch virus mapping utilizing a deep vision transformer algorithm
Identity Mappings in Deep Residual Networks
  • Citing Article
  • March 2016

... Units in input layer [16,256]: 96 Units in hidden layers [16,256]: 240 Number of hidden layers [1,10]: 2 Activation [35][36][37][38] ReLU, Leaky ReLU, SiLU, PReLU, Tanh Optimizer [39][40][41] Adam, SGD, RMSprop Learning rate [10 −5 , 10 −1 ]: . ⋅ − Scheduler type [42,43] Exponential, ReduceOnPlateau, CosineAnnealing, None Layer weights initialization [44,45] xavier uniform, xavier normal, orthogonal, normal, uniform Batch normalization [46] T r u e , False use this metric as an additional cross-check for the feature importances computed for the FCNN model. ...

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
  • Citing Conference Paper
  • February 2015

... The advent of big data and big computing has enabled these networks to become deeper, and they are capable of learning and representing a wide selection of nonlinear functions [29]. Deep learning has been a powerful tool for automating the extraction of meaningful data from large datasets and has resulted in remarkable progress in a number of areas, such as computer vision [30,31] and speech recognition [32,33]. We will overview the deep learning's benefits and drawbacks, then discuss the constituent part of a deep NN, and at last, review some of the networks which are used for deep materials informatics. ...

Deep Residual Learning for Image Recognition
  • Citing Conference Paper
  • June 2016

... However, achieving high performance through convolutional neural networks (CNNs) typically requires the use of a large number of channels [7][8][9][10][11], resulting in significant computational costs and long training times. Accordingly, there have been many studies focusing on modifying the convolutional layer to reduce its computational complexity [3,[12][13][14][15][16]. ...

Efficient and accurate approximations of nonlinear convolutional networks
  • Citing Conference Paper
  • June 2015

... This combination of structured data, unstructured text, and generated images creates a more robust framework for breast cancer metastasis detection, offering greater flexibility and adaptability in clinical scenarios. VGG-16 [7] and ViT [8] models were used to extract vision features from generated images. These features were then combined with text features extracted using the BERT model to form early fusion in a multi-modal context. ...

Accelerating Very Deep Convolutional Networks for Classification and Detection
  • Citing Article
  • May 2015

IEEE Transactions on Pattern Analysis and Machine Intelligence

... In the case of images, local combinations of edges form motifs which assemble into parts, and parts become objects. This way convolutional neural networks can transform data, like the images in the dataset used in this paper, into high-level representations, making detection or classification possible [41]. The models presented in this paper utilise transfer learning through pre-trained models to solve the classification problem. ...

Object Detection Networks on Convolutional Feature Maps
  • Citing Article
  • April 2015

IEEE Transactions on Pattern Analysis and Machine Intelligence

... For both Synth and qSynth, we also show results for models trained with a mix of the synthetic data and the real ATLAS MPRAGE images. Model Architecture and Training Details: The segmentation models were configured with PReLU activations [13] and one residual unit per block. For the qATLAS model, we predicted two classes: background and stroke lesion. ...

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
  • Citing Article
  • February 2015

... To prove the effectiveness of the method in this paper, we conduct a large number of experiments on the CIFAR-10 [33] and ImageNet [34] datasets. CIFAR-10 consists of 50 k training samples and 10 k testing samples with 10 classes, each class contains 6000 samples, and each sample is a 32 × 32 three-channel RGB picture. ...

Efficient and Accurate Approximations of Nonlinear Convolutional Networks
  • Citing Article
  • November 2014