Xiangyu Zhang's research while affiliated with Microsoft and other places

Publications (11)

Conference Paper
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Article
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide compreh...
Article
This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community. Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We develop an ef...
Article
Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep ConvNet architectures. The object classifier, however, has not received much attention and most state-of-the-art systems (like R-CNN) use simple mult...
Article
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
Article
This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint whi...
Article
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the a...

Citations

... Although these models solve the issue of the efficient usage of data, they still suffer from relying on numerical weather models, which are computationally expensive for nowcasting purposes. In [31], a combination of satellite data recorded over Europe and lightning data recorded by ground-based lightning detection networks are used to train a residual U-Net model for lightning activity prediction [32][33][34][35]. In [36], the GLM data in combination with aerosol features are used to train a Gradient-boosted decision tree model called LightGBM to perform hourly forecasts. ...
... Parametric rectified linear unit (PReLU) is a variant of LeakyReLU. The parameter of PReLU [69] is not set artificially but is obtained through training. The parameter is the key to improving the classification performance. ...
... We implement the DETisSeg on a station with a NVIDIA RTX A6000 GPU and 64G memory in PyTorch. We use the pre-training ResNet-50 [42] as our backbone, and the AdamW optimizer to train the network. We employ a polynomial learning rate decay schedule to adjust the learning rate dynamically, which could be calculated by: ...
... The weights obtained while training is removed by neural network pruning that creates more compact and efficient networks that minimize parameters and computations for optimal performance [11][12][13][14][15][16]. Other neural network compression techniques include distillation [19,20], parameter quantization [17,18], and weight decomposition [7][8][9][10]. It is the easiest method to implement and can be compared to these other methods. ...
... ResNets (He et al., 2015(He et al., , 2016 have been found to be an effective deep learning model for road network pattern classification . We choose ResNet-34, which performs the best among the ResNets family in our preliminary experiments, as the final model architecture (see details in Supplementary Note. ...
... As shown in Figure 10, we utilize Widar3 and WiSee as the common HGR models; they respectively depend on the body-coordinate velocity profile (BVP) and Doppler frequency shift (DFS) features. For our sensing sub-model F, its gating model g uses a standard residual block [52] followed by 4 full-connection layers, with V = 8 for our current design. Each of the 8 encoders consists of 4 consecutive convolutional layers, with a kernel size 3, stride 1, padding size 1, and their output channels being 32, 64, 64 and 128, respectively. ...
... Dense Layers: These layers perform a final classification based on the features of the image. After each fully connected layer, the ReLU activation function is also used [36][37][38]. ...
... In spite of this, computer vision systems do have some limitations [17]. In challenging environments with poor visibility, they may struggle due to the lack of consistent lighting conditions [18][19][20] Further, they require significant computational power, which can be a limitation in applications with limited resources [21][22][23][24]. ...
... The training process minimizes a given loss function, enabling the network to learn discriminative representations from the data. The depth of a CNN plays a crucial role in its performance on various tasks and significant advancements have been achieved by employing very deep models (Simonyan and Zisserman, 2015;He et al., 2015). However, increasing the depth of the network by stacking more blocks and expanding the number of layers introduces challenges such as dealing with vanishing or exploding gradient and degradation problems. ...
... To prove the effectiveness of the method in this paper, we conduct a large number of experiments on the CIFAR-10 [33] and ImageNet [34] datasets. CIFAR-10 consists of 50 k training samples and 10 k testing samples with 10 classes, each class contains 6000 samples, and each sample is a 32 × 32 three-channel RGB picture. ...