Xiangyu Zhang's research while affiliated with Microsoft and other places

Publications (11)

Conference Paper
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Article
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide compreh...
Article
This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community. Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We develop an ef...
Article
Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep ConvNet architectures. The object classifier, however, has not received much attention and most state-of-the-art systems (like R-CNN) use simple mult...
Article
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
Article
This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint whi...
Article
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the a...

Citations

... The architecture of our proposed approach can be seen in Figure 1. We select a pretrained ResNet-152 [19,20] object-detection backbone for our network under the hypothesis that the semantic features extracted by such a network are relevant for complexity perception. We truncate the backbone before the final classification layer, with extracted feature tensor R H×W ×C ∈ R where H, W, C are the height, width, and channels of the feature tensor; in this case, 7×7×2048. ...
... Starting from the graph convolutional layers, after each layer batch normalization is applied as well as dropout [75] (fraction of 0.2 for all layers except the last one where the fraction is 0.1). All layers employ 256 trainable nodes and PReLU [76] activation functions, except the output layer using a sigmoid or a linear activation function depending on the task (classification or regression). A schematic representation is given in Fig. 2. The same preprocessing steps are performed as for the FCN. ...
... Our research is divided into three sections and the workflow is depicted in Fig. 1. To begin, this study utilizes Resnet50 [15] Architecture, a supervised machine learning technique for separating photos with cracks from those without cracks. After classification, the model utilized the Yolo v5 [16] object detection algorithm to further analyze the damage i.e., linear and branching. ...
... The number of floating-point operations (FLOPs) has increased dramatically with larger networks, and this has become an obstacle for CNNs being developed for mobile and embedded devices. In this context, numerous methods for CNN compression and acceleration have been proposed including network pruning [8,11,12,18,25], parameter quantization [7,32,36,39,40,44], lowrank decomposition [1,2,43] and knowledge distillation [14,33]. Sheng 1 Network pruning has universally been categorized as non-structured or structured. ...
... Recent studies show that deepening network depth and widening network width can improve the performance of convolutional neural networks. In terms of deepening network depth, He et al. [6] proposed a ResNet network with 152 layers, which achieved the most advanced performance in ILSVRC multi-task in 2015. In terms of widening network width, WRN network proposed by Zagoruyko et al. [7] reduced the depth and increased the width of ResNet and achieved good performance. ...
... This idea eliminated two major problems (a) the overfitting problem that enlarged networks are prone to; (b) increased use of computational resources due to uniformly increased network size. InceptionResNetV2 proposed by Christian Szegedy et al. was formulated based on the structure of the Inception network and the residual connections [22] that replaced the filter concatenation stage of the Inception architecture. These residual connections not only overcame the degradation problem caused due to increasing depth of structures but also reduced the training time [23]. ...
... This study employs the transfer learning technique using a modified VGG16 network, which has excellent potential for classification tasks (Zhang et al., 2016). VGG16 use CNN architecture to achieve high performance in image classification. ...
... We use the standard implementation for the ResNet-based Faster R-CNN network [38]. ResNet-50 and ResNet-101 [15] pretrained on ImageNet [39] are used as the backbones in different experimental settings, following [13,34]. ...
... The selected deep learning models in this experimental study are trained from scratch for 300 epochs with 'he' uniform variance scaling initializer (He et al., 2015) for weight initialization. The weights were set at a level that was neither too high nor too low. ...
... To prove the effectiveness of the method in this paper, we conduct a large number of experiments on the CIFAR-10 [33] and ImageNet [34] datasets. CIFAR-10 consists of 50 k training samples and 10 k testing samples with 10 classes, each class contains 6000 samples, and each sample is a 32 × 32 three-channel RGB picture. ...