Xiangyu Zhang's research while affiliated with Microsoft and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (11)
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Deeper neural networks are more difficult to train. We present a residual
learning framework to ease the training of networks that are substantially
deeper than those used previously. We explicitly reformulate the layers as
learning residual functions with reference to the layer inputs, instead of
learning unreferenced functions. We provide compreh...
This paper aims to accelerate the test-time computation of convolutional
neural networks (CNNs), especially very deep CNNs that have substantially
impacted the computer vision community. Unlike existing methods that are
designed for approximating linear filters or linear responses, our method takes
the nonlinear units into account. We develop an ef...
Most object detectors contain two important components: a feature extractor
and an object classifier. The feature extractor has rapidly evolved with
significant research efforts leading to better deep ConvNet architectures. The
object classifier, however, has not received much attention and most
state-of-the-art systems (like R-CNN) use simple mult...
Rectified activation units (rectifiers) are essential for state-of-the-art
neural networks. In this work, we study rectifier neural networks for image
classification from two aspects. First, we propose a Parametric Rectified
Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU
improves model fitting with nearly zero extra comp...
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
This paper aims to accelerate the test-time computation of deep convolutional
neural networks (CNNs). Unlike existing methods that are designed for
approximating linear filters or linear responses, our method takes the
nonlinear units into account. We minimize the reconstruction error of the
nonlinear responses, subject to a low-rank constraint whi...
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the a...
Citations
... Although these models solve the issue of the efficient usage of data, they still suffer from relying on numerical weather models, which are computationally expensive for nowcasting purposes. In [31], a combination of satellite data recorded over Europe and lightning data recorded by ground-based lightning detection networks are used to train a residual U-Net model for lightning activity prediction [32][33][34][35]. In [36], the GLM data in combination with aerosol features are used to train a Gradient-boosted decision tree model called LightGBM to perform hourly forecasts. ...
... Parametric rectified linear unit (PReLU) is a variant of LeakyReLU. The parameter of PReLU [69] is not set artificially but is obtained through training. The parameter is the key to improving the classification performance. ...
... We implement the DETisSeg on a station with a NVIDIA RTX A6000 GPU and 64G memory in PyTorch. We use the pre-training ResNet-50 [42] as our backbone, and the AdamW optimizer to train the network. We employ a polynomial learning rate decay schedule to adjust the learning rate dynamically, which could be calculated by: ...
... The weights obtained while training is removed by neural network pruning that creates more compact and efficient networks that minimize parameters and computations for optimal performance [11][12][13][14][15][16]. Other neural network compression techniques include distillation [19,20], parameter quantization [17,18], and weight decomposition [7][8][9][10]. It is the easiest method to implement and can be compared to these other methods. ...
... ResNets (He et al., 2015(He et al., , 2016 have been found to be an effective deep learning model for road network pattern classification . We choose ResNet-34, which performs the best among the ResNets family in our preliminary experiments, as the final model architecture (see details in Supplementary Note. ...
... As shown in Figure 10, we utilize Widar3 and WiSee as the common HGR models; they respectively depend on the body-coordinate velocity profile (BVP) and Doppler frequency shift (DFS) features. For our sensing sub-model F, its gating model g uses a standard residual block [52] followed by 4 full-connection layers, with V = 8 for our current design. Each of the 8 encoders consists of 4 consecutive convolutional layers, with a kernel size 3, stride 1, padding size 1, and their output channels being 32, 64, 64 and 128, respectively. ...
... Dense Layers: These layers perform a final classification based on the features of the image. After each fully connected layer, the ReLU activation function is also used [36][37][38]. ...
... In spite of this, computer vision systems do have some limitations [17]. In challenging environments with poor visibility, they may struggle due to the lack of consistent lighting conditions [18][19][20] Further, they require significant computational power, which can be a limitation in applications with limited resources [21][22][23][24]. ...
... The training process minimizes a given loss function, enabling the network to learn discriminative representations from the data. The depth of a CNN plays a crucial role in its performance on various tasks and significant advancements have been achieved by employing very deep models (Simonyan and Zisserman, 2015;He et al., 2015). However, increasing the depth of the network by stacking more blocks and expanding the number of layers introduces challenges such as dealing with vanishing or exploding gradient and degradation problems. ...
... To prove the effectiveness of the method in this paper, we conduct a large number of experiments on the CIFAR-10 [33] and ImageNet [34] datasets. CIFAR-10 consists of 50 k training samples and 10 k testing samples with 10 classes, each class contains 6000 samples, and each sample is a 32 × 32 three-channel RGB picture. ...