Shaoqing Ren's research while affiliated with Microsoft and other places

Publications (15)

Conference Paper
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Conference Paper
Full-text available
Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set...
Article
Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set...
Article
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when...
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-ima...
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide compreh...
Article
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convol...
Article
Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep ConvNet architectures. The object classifier, however, has not received much attention and most state-of-the-art systems (like R-CNN) use simple mult...
Article
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra comp...
Article
This paper presents a highly efficient, very accurate regression approach for face alignment. Our approach has two novel components: a set of local binary features, and a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independent...
Conference Paper
We present a new state-of-the-art approach for face detection. The key idea is to combine face alignment with detection, observing that aligned face shapes provide better features for face classification. To make this combination more effective, our approach learns the two tasks jointly in the same cascade framework, by exploiting recent advances i...
Article
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the a...

Citations

... The architecture of our proposed approach can be seen in Figure 1. We select a pretrained ResNet-152 [19,20] object-detection backbone for our network under the hypothesis that the semantic features extracted by such a network are relevant for complexity perception. We truncate the backbone before the final classification layer, with extracted feature tensor R H×W ×C ∈ R where H, W, C are the height, width, and channels of the feature tensor; in this case, 7×7×2048. ...
... Starting from the graph convolutional layers, after each layer batch normalization is applied as well as dropout [75] (fraction of 0.2 for all layers except the last one where the fraction is 0.1). All layers employ 256 trainable nodes and PReLU [76] activation functions, except the output layer using a sigmoid or a linear activation function depending on the task (classification or regression). A schematic representation is given in Fig. 2. The same preprocessing steps are performed as for the FCN. ...
... Multibox Detector (SSD) [123], You Only Look Once (YOLO) [91], Region-based Convolutional Neural Network (R-CNN) [92], Fast R-CNN [93], Faster R-CNN [94] and Mask R-CNN [95]. These person detection methods generate a bounding box around the detected person in the frame. ...
... Our research is divided into three sections and the workflow is depicted in Fig. 1. To begin, this study utilizes Resnet50 [15] Architecture, a supervised machine learning technique for separating photos with cracks from those without cracks. After classification, the model utilized the Yolo v5 [16] object detection algorithm to further analyze the damage i.e., linear and branching. ...
... Random forest regression. One of the most common machine learning methods is a random forest (RF) algorithm 49 . This is a controlled approach that employs a regression method for learning. ...
... Building on this approach, Yu et al. [22] used multiple atrous convolutional layers with different dilation rates to model the multi-scale context. In recent years, atrous convolution techniques have also been widely used in various deep deep learning tasks, such as object detection [24] and semantic segmentation [25]. In this paper, we introduce atrous convolution [26] into the VO task for the first time, and we use densely linked multi-layer atrous convolutions to capture multi-scale information in images. ...
... Recent studies show that deepening network depth and widening network width can improve the performance of convolutional neural networks. In terms of deepening network depth, He et al. [6] proposed a ResNet network with 152 layers, which achieved the most advanced performance in ILSVRC multi-task in 2015. In terms of widening network width, WRN network proposed by Zagoruyko et al. [7] reduced the depth and increased the width of ResNet and achieved good performance. ...
... e skeleton generator in SkelGAN output a font character skeleton with a one-pixel width structure and do not need any post-processing techniques. (Dai et al. [2016]). To solve this problem, some researchers transform pretrained deep classi ers into FCNs (Long et al. [2014]). ...
... This idea eliminated two major problems (a) the overfitting problem that enlarged networks are prone to; (b) increased use of computational resources due to uniformly increased network size. InceptionResNetV2 proposed by Christian Szegedy et al. was formulated based on the structure of the Inception network and the residual connections [22] that replaced the filter concatenation stage of the Inception architecture. These residual connections not only overcame the degradation problem caused due to increasing depth of structures but also reduced the training time [23]. ...
... The paper's limitation is that they cannot achieve more precise results without going through the process of systematic analysis. Chen et al. [9] collaborated on the alignment and detection of pixel value difference features using a random forest based on pixel value difference features. Multiple CNNs are used by Zhang et al. [10], but the performance of multi-view face identification is still restricted by the weak face detector's detection windows. ...