Xiangyu Zhang's research while affiliated with Microsoft and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (11)
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Deeper neural networks are more difficult to train. We present a residual
learning framework to ease the training of networks that are substantially
deeper than those used previously. We explicitly reformulate the layers as
learning residual functions with reference to the layer inputs, instead of
learning unreferenced functions. We provide compreh...
This paper aims to accelerate the test-time computation of convolutional
neural networks (CNNs), especially very deep CNNs that have substantially
impacted the computer vision community. Unlike existing methods that are
designed for approximating linear filters or linear responses, our method takes
the nonlinear units into account. We develop an ef...
Most object detectors contain two important components: a feature extractor
and an object classifier. The feature extractor has rapidly evolved with
significant research efforts leading to better deep ConvNet architectures. The
object classifier, however, has not received much attention and most
state-of-the-art systems (like R-CNN) use simple mult...
Rectified activation units (rectifiers) are essential for state-of-the-art
neural networks. In this work, we study rectifier neural networks for image
classification from two aspects. First, we propose a Parametric Rectified
Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU
improves model fitting with nearly zero extra comp...
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
This paper aims to accelerate the test-time computation of deep convolutional
neural networks (CNNs). Unlike existing methods that are designed for
approximating linear filters or linear responses, our method takes the
nonlinear units into account. We minimize the reconstruction error of the
nonlinear responses, subject to a low-rank constraint whi...
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the a...
Citations
... The architecture of our proposed approach can be seen in Figure 1. We select a pretrained ResNet-152 [19,20] object-detection backbone for our network under the hypothesis that the semantic features extracted by such a network are relevant for complexity perception. We truncate the backbone before the final classification layer, with extracted feature tensor R H×W ×C ∈ R where H, W, C are the height, width, and channels of the feature tensor; in this case, 7×7×2048. ...
... Starting from the graph convolutional layers, after each layer batch normalization is applied as well as dropout [75] (fraction of 0.2 for all layers except the last one where the fraction is 0.1). All layers employ 256 trainable nodes and PReLU [76] activation functions, except the output layer using a sigmoid or a linear activation function depending on the task (classification or regression). A schematic representation is given in Fig. 2. The same preprocessing steps are performed as for the FCN. ...
... Our research is divided into three sections and the workflow is depicted in Fig. 1. To begin, this study utilizes Resnet50 [15] Architecture, a supervised machine learning technique for separating photos with cracks from those without cracks. After classification, the model utilized the Yolo v5 [16] object detection algorithm to further analyze the damage i.e., linear and branching. ...
... The number of floating-point operations (FLOPs) has increased dramatically with larger networks, and this has become an obstacle for CNNs being developed for mobile and embedded devices. In this context, numerous methods for CNN compression and acceleration have been proposed including network pruning [8,11,12,18,25], parameter quantization [7,32,36,39,40,44], lowrank decomposition [1,2,43] and knowledge distillation [14,33]. Sheng 1 Network pruning has universally been categorized as non-structured or structured. ...
Reference: Filter pruning via expectation-maximization
... Recent studies show that deepening network depth and widening network width can improve the performance of convolutional neural networks. In terms of deepening network depth, He et al. [6] proposed a ResNet network with 152 layers, which achieved the most advanced performance in ILSVRC multi-task in 2015. In terms of widening network width, WRN network proposed by Zagoruyko et al. [7] reduced the depth and increased the width of ResNet and achieved good performance. ...
... This idea eliminated two major problems (a) the overfitting problem that enlarged networks are prone to; (b) increased use of computational resources due to uniformly increased network size. InceptionResNetV2 proposed by Christian Szegedy et al. was formulated based on the structure of the Inception network and the residual connections [22] that replaced the filter concatenation stage of the Inception architecture. These residual connections not only overcame the degradation problem caused due to increasing depth of structures but also reduced the training time [23]. ...
... This study employs the transfer learning technique using a modified VGG16 network, which has excellent potential for classification tasks (Zhang et al., 2016). VGG16 use CNN architecture to achieve high performance in image classification. ...
... We use the standard implementation for the ResNet-based Faster R-CNN network [38]. ResNet-50 and ResNet-101 [15] pretrained on ImageNet [39] are used as the backbones in different experimental settings, following [13,34]. ...
... The selected deep learning models in this experimental study are trained from scratch for 300 epochs with 'he' uniform variance scaling initializer (He et al., 2015) for weight initialization. The weights were set at a level that was neither too high nor too low. ...
... To prove the effectiveness of the method in this paper, we conduct a large number of experiments on the CIFAR-10 [33] and ImageNet [34] datasets. CIFAR-10 consists of 50 k training samples and 10 k testing samples with 10 classes, each class contains 6000 samples, and each sample is a 32 × 32 three-channel RGB picture. ...