Shaoqing Ren's research while affiliated with Microsoft and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (15)
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set...
Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set...
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-ima...
Deeper neural networks are more difficult to train. We present a residual
learning framework to ease the training of networks that are substantially
deeper than those used previously. We explicitly reformulate the layers as
learning residual functions with reference to the layer inputs, instead of
learning unreferenced functions. We provide compreh...
State-of-the-art object detection networks depend on region proposal
algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN
have reduced the running time of these detection networks, exposing region
proposal computation as a bottleneck. In this work, we introduce a Region
Proposal Network (RPN) that shares full-image convol...
Most object detectors contain two important components: a feature extractor
and an object classifier. The feature extractor has rapidly evolved with
significant research efforts leading to better deep ConvNet architectures. The
object classifier, however, has not received much attention and most
state-of-the-art systems (like R-CNN) use simple mult...
Rectified activation units (rectifiers) are essential for state-of-the-art
neural networks. In this work, we study rectifier neural networks for image
classification from two aspects. First, we propose a Parametric Rectified
Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU
improves model fitting with nearly zero extra comp...
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
This paper presents a highly efficient, very accurate regression approach for face alignment. Our approach has two novel components: a set of local binary features, and a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independent...
We present a new state-of-the-art approach for face detection. The key idea is to combine face alignment with detection, observing that aligned face shapes provide better features for face classification. To make this combination more effective, our approach learns the two tasks jointly in the same cascade framework, by exploiting recent advances i...
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the a...
Citations
... The architecture of our proposed approach can be seen in Figure 1. We select a pretrained ResNet-152 [19,20] object-detection backbone for our network under the hypothesis that the semantic features extracted by such a network are relevant for complexity perception. We truncate the backbone before the final classification layer, with extracted feature tensor R H×W ×C ∈ R where H, W, C are the height, width, and channels of the feature tensor; in this case, 7×7×2048. ...
... Starting from the graph convolutional layers, after each layer batch normalization is applied as well as dropout [75] (fraction of 0.2 for all layers except the last one where the fraction is 0.1). All layers employ 256 trainable nodes and PReLU [76] activation functions, except the output layer using a sigmoid or a linear activation function depending on the task (classification or regression). A schematic representation is given in Fig. 2. The same preprocessing steps are performed as for the FCN. ...
... Multibox Detector (SSD) [123], You Only Look Once (YOLO) [91], Region-based Convolutional Neural Network (R-CNN) [92], Fast R-CNN [93], Faster R-CNN [94] and Mask R-CNN [95]. These person detection methods generate a bounding box around the detected person in the frame. ...
... Our research is divided into three sections and the workflow is depicted in Fig. 1. To begin, this study utilizes Resnet50 [15] Architecture, a supervised machine learning technique for separating photos with cracks from those without cracks. After classification, the model utilized the Yolo v5 [16] object detection algorithm to further analyze the damage i.e., linear and branching. ...
... Random forest regression. One of the most common machine learning methods is a random forest (RF) algorithm 49 . This is a controlled approach that employs a regression method for learning. ...
... Building on this approach, Yu et al. [22] used multiple atrous convolutional layers with different dilation rates to model the multi-scale context. In recent years, atrous convolution techniques have also been widely used in various deep deep learning tasks, such as object detection [24] and semantic segmentation [25]. In this paper, we introduce atrous convolution [26] into the VO task for the first time, and we use densely linked multi-layer atrous convolutions to capture multi-scale information in images. ...
... Recent studies show that deepening network depth and widening network width can improve the performance of convolutional neural networks. In terms of deepening network depth, He et al. [6] proposed a ResNet network with 152 layers, which achieved the most advanced performance in ILSVRC multi-task in 2015. In terms of widening network width, WRN network proposed by Zagoruyko et al. [7] reduced the depth and increased the width of ResNet and achieved good performance. ...
... e skeleton generator in SkelGAN output a font character skeleton with a one-pixel width structure and do not need any post-processing techniques. (Dai et al. [2016]). To solve this problem, some researchers transform pretrained deep classi ers into FCNs (Long et al. [2014]). ...
... This idea eliminated two major problems (a) the overfitting problem that enlarged networks are prone to; (b) increased use of computational resources due to uniformly increased network size. InceptionResNetV2 proposed by Christian Szegedy et al. was formulated based on the structure of the Inception network and the residual connections [22] that replaced the filter concatenation stage of the Inception architecture. These residual connections not only overcame the degradation problem caused due to increasing depth of structures but also reduced the training time [23]. ...
... The paper's limitation is that they cannot achieve more precise results without going through the process of systematic analysis. Chen et al. [9] collaborated on the alignment and detection of pixel value difference features using a random forest based on pixel value difference features. Multiple CNNs are used by Zhang et al. [10], but the performance of multi-view face identification is still restricted by the weak face detector's detection windows. ...