Shifeng Chen's research while affiliated with Chinese Academy of Sciences and other places

Publications (24)

Preprint
Neural radiance fields have made a remarkable breakthrough in the novel view synthesis task at the 3D static scene. However, for the 4D circumstance (e.g., dynamic scene), the performance of the existing method is still limited by the capacity of the neural network, typically in a multilayer perceptron network (MLP). In this paper, we present the m...
Preprint
In this paper, a computation efficient regression framework is presented for estimating the 6D pose of rigid objects from a single RGB-D image, which is applicable to handling symmetric objects. This framework is designed in a simple architecture that efficiently extracts point-wise features from RGB-D data using a fully convolutional network, call...
Preprint
One-class novelty detection is conducted to iden-tify anomalous instances, with different distributions from theexpected normal instances. In this paper, the Generative Adver-sarial Network based on the Encoder-Decoder-Encoder scheme(EDE-GAN) achieves state-of-the-art performance. The two fac-tors bellow serve the above purpose: 1) The EDE-GAN calc...
Article
We propose a machine learning framework to solve the lithology classification problem from well log curves by incorporating an additional geological constraint. The constraint is a stratigraphic unit, and we use it as an dditional feature. This method demonstrates the possibility of solving the lithology identification problem from a multi-scale da...
Preprint
Full-text available
The detection of traffic anomalies is a critical component of the intelligent city transportation management system. Previous works have proposed a variety of notable insights and taken a step forward in this field, however, dealing with the complex traffic environment remains a challenge. Moreover, the lack of high-quality data and the complexity...
Article
For portable devices with limited resources, it is often difficult to deploy deep networks due to the prohibitive computational overhead. Numerous approaches have been proposed to quantize weights and/or activations to speed up the inference. Loss-aware quantization has been proposed to directly formulate the impact of weight quantization on the mo...
Article
With the advantages of low storage cost and extremely fast retrieval speed, deep hashing methods have attracted much attention for image retrieval recently. However, large-scale face image retrieval with significant intra-class variations is still challenging. Neither existing pairwise/triplet labels-based nor softmax classification loss-based deep...
Preprint
One-class novelty detection is to identify anomalous instances that do not conform to the expected normal instances. In this paper, the Generative Adversarial Networks (GANs) based on encoder-decoder-encoder pipeline are used for detection and achieve state-of-the-art performance. However, deep neural networks are too over-parameterized to deploy o...
Conference Paper
One-class novelty detection is to identify anomalous instances that do not conform to the expected normal instances. In this paper, the Generative Adversarial Networks (GANs) based on encoder-decoder-encoder pipeline are used for detection and achieve state-of-the-art performance. However, deep neural networks are too over-parameterized to deploy o...
Article
Recently, convolutional neural networks (CNNs) have achieved great improvements in single image dehazing and attained much attention in research. Most existing learning-based dehazing methods are not fully end-to-end, which still follow the traditional dehazing procedure: first estimate the medium transmission and the atmospheric light, then recove...
Preprint
Full-text available
Though action recognition in videos has achieved great success recently, it remains a challenging task due to the massive computational cost. Designing lightweight networks is a possible solution, but it may degrade the recognition performance. In this paper, we innovatively propose a general dynamic inference idea to improve inference efficiency b...
Preprint
Recently, convolutional neural networks (CNNs) have achieved great improvements in single image dehazing and attained much attention in research. Most existing learning-based dehazing methods are not fully end-to-end, which still follow the traditional dehazing procedure: first estimate the medium transmission and the atmospheric light, then recove...
Article
Automated precise segmentation of glands from the histological images plays an important role in glandular morphology analysis, which is a crucial criterion for cancer grading and planning of treatment. However, it is non-trivial due to the diverse shapes of the glands under different histological grades and the presence of tightly connected glands...
Article
Cross-modal hashing has received intensive attention due to its low computation and storage efficiency in cross-modal retrieval task. Most previous cross-modal hashing methods mainly focus on extracting correlated binary codes from the pairwise label, but largely ignore the semantic categories of cross-modal data. On the other hand, human perceptio...
Preprint
Full-text available
Video Recognition has drawn great research interest and great progress has been made. A suitable frame sampling strategy can improve the accuracy and efficiency of recognition. However, mainstream solutions generally adopt hand-crafted frame sampling strategies for recognition. It could degrade the performance, especially in untrimmed videos, due t...
Article
Object detection methods based on neural networks have made considerable progress. However, methods like Faster RCNN and SSD that adopt large neural networks as the base models. It's still a challenge to deploy such large detection networks in mobile or embedded devices. In this paper, we propose a low bit-width weight optimization approach to trai...
Preprint
Convolutional neural networks have been widely used in content-based image retrieval. To better deal with large-scale data, the deep hashing model is proposed as an effective method, which maps an image to a binary code that can be used for hashing search. However, most existing deep hashing models only utilize fine-level semantic labels or convert...
Article
Deep supervised hashing has emerged as an influential solution to large-scale semantic image retrieval problems in computer vision. In the light of recent progress, convolutional neural network based hashing methods typically seek pair-wise or triplet labels to conduct the similarity preserving learning. However, complex semantic concepts of visual...
Article
Deep distance metric learning (DDML), which is proposed to learn image similarity metrics in an end-to-end manner based on the convolution neural network, has achieved encouraging results in many computer vision tasks.$L2$-normalization in the embedding space has been used to improve the performance of several DDML methods. However, the commonly us...

Citations

... The most relevant work is the trespassing detection at the grade crossing [34], the goal of which is to detect the unsafe trespassing of vehicles and pedestrians, although how to localize the anomalies has not been presented. Research works such as [37,38] mainly focus on anomalous vehicle behaviors and traffic accident detection at crossings or highways. They are different from the present research because they directly use surveillance videos as the model input, and their datasets only contain vehicles. ...
... Currently, manual logging by geologists and logging interpretation by geophysicists are the two main ways of lithology identification in the exploration of sandstone-type uranium deposits. In most cases, the logging interpretations are corrected by geological logging, or both are verified mutually [4,5]. Geological logging is commonly of high accuracy and strong reliability but low efficiency. ...
... However, deep neural networks with high computational costs and large storage prohibit their deployment to computation and memory resource limited systems. For tackling the above issue, neural network compression has been widely applied in recent years [8], [34], [35]. As one of the mainstream compression methods, Knowledge Distillation (KD) following a teacher-student paradigm transfers knowledge from a teacher network with higher performance to a student network. ...
... Medical images are particularly challenging due to variability in image acquisition protocols, anatomical variability and high dimensionality while biometric images are challenging due to various factors like illumination variation, sensor inter-operability, aging and many more. Very recently in [55] a deep center dual constrained hashing framework has been proposed to learn discriminative face feature vectors. The proposed approach [55] minimizes the intra-class variance by clustering within class samples into a learnable class center. ...
... Reducing spatio-temporal redundancy for efficient video analysis has recently been a popular research topic. The mainstream approaches mostly train an additional lightweight network to achieve: (i) adaptive frame selection [12]- [14], [16], [44], i.e., dynamically determining the relevant frames for the recognition networks; (ii) adaptive frame resolution [12], i.e., learning an optimal resolution for each frame online; (iii) early stopping [45], i.e., terminating the inference process before observing all frames; (iv) adaptive spatio-temporal regions [10], [11], i.e., localizing the most task-relevant spatiotemporal regions; (v) adaptive network architectures [15], [16], [46], i.e., adjusting the network architecture to save computation on less informative features. Another line is to manually define low redundant sampling rules, such as MGSampler [47], which selects frames containing rich motion information by the cumulative motion distribution. ...
... It is often prohibitive to collect a representative set of anomalous samples. As a result, many studies [8]- [10] have resorted to learning in the unsupervised setting, i.e., training with normal samples only. ...
... Yang et al. [32] proposed the DisentGAN which employed three generators to produce dehazing images, transmission maps, and ambient light from the hazy input, respectively. Dong et al. [33] proposed an end-to-end GAN with fusion-discriminator (FD-GAN) which takes the frequency information of the image as additional priors and integrates them into the discriminator. ...
... Reducing spatio-temporal redundancy for efficient video analysis has recently been a popular research topic. The mainstream approaches mostly train an additional lightweight network to achieve: (i) adaptive frame selection [12]- [14], [16], [44], i.e., dynamically determining the relevant frames for the recognition networks; (ii) adaptive frame resolution [12], i.e., learning an optimal resolution for each frame online; (iii) early stopping [45], i.e., terminating the inference process before observing all frames; (iv) adaptive spatio-temporal regions [10], [11], i.e., localizing the most task-relevant spatiotemporal regions; (v) adaptive network architectures [15], [16], [46], i.e., adjusting the network architecture to save computation on less informative features. Another line is to manually define low redundant sampling rules, such as MGSampler [47], which selects frames containing rich motion information by the cumulative motion distribution. ...
... Most previous systems relied on preprocessing to extract features for deep learning structures [34][35][36][37]. Only a few of these systems used end-to-end learning, allowing automatic extraction of features from images without requiring expert feature detection [38][39][40][41]. However, the information essential for clinical decision-making based on these architectures is often hidden in high-dimensional spaces and is not comprehensible to humans. ...
... Attention-aware deep adversarial hashing [17] utilizes adversarial mechanisms to focus on important image areas and text fragments for generating hash codes. Dual supervised attention deep hashing [18] uses attention mechanism to narrow the semantic gap between the image features and text features and focuses on relevant correlations of them. Self-constraining and attention hashing [19] adopts attention mechanism to merge the hash representations of different layers in the network to generate high-quality hash codes. ...