Chunhua Shen

Chunhua Shen
University of Adelaide · School of Computer Science

PhD

About

484
Publications
128,891
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
32,850
Citations

Publications

Publications (484)
Article
Monocular visual odometry (VO) is an important task in robotics and computer vision. Thus far, how to build accurate and robust monocular VO systems that can work well in diverse scenarios remains largely unsolved. In this article, we propose a framework to exploit monocular depth estimation for improving VO. The core of our framework is a monocula...
Article
Full-text available
In this paper, we propose to train binarized convolutional neural networks (CNNs) that are of significant importance for deploying deep learning to mobile devices with limited power capacity and computing resources. Previous works on quantizing CNNs often seek to approximate the floating-point information of weights and/or activations using a set o...
Preprint
Full-text available
Point annotations are considerably more time-efficient than bounding box annotations. However, how to use cheap point annotations to boost the performance of semi-supervised object detection remains largely unsolved. In this work, we present Point-Teaching, a weakly semi-supervised object detection framework to fully exploit the point annotations....
Preprint
Full-text available
The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics. In contrast, we propose a fully-convolutional 3D point cloud instance segmentation method that works in a per-point prediction f...
Preprint
Despite most existing anomaly detection studies assume the availability of normal training samples only, a few labeled anomaly examples are often available in many real-world applications, such as defect samples identified during random quality inspection, lesion images confirmed by radiologists in daily medical screening, etc. These anomaly exampl...
Preprint
Full-text available
Most existing real-time deep models trained with each frame independently may produce inconsistent results across the temporal axis when tested on a video sequence. A few methods take the correlations in the video sequence into account,e.g., by propagating the results to the neighboring frames using optical flow or extracting frame representations...
Article
Arbitrarily shaped scene text detection has witnessed great development in recent years, and text detection using segmentation has been proven to an effective approach. However, problems caused by the diverse attributes of text instances, such as shapes, scales, and presentation styles (dense or sparse), persist. In this paper, we propose a novel t...
Preprint
Full-text available
We propose a direct, regression-based approach to 2D human pose estimation from single images. We formulate the problem as a sequence prediction task, which we solve using a Transformer network. This network directly learns a regression mapping from images to the keypoint coordinates, without resorting to intermediate representations such as heatma...
Preprint
Full-text available
Deep image matting methods have achieved increasingly better results on benchmarks (e.g., Composition-1k/alphamatting.com). However, the robustness, including robustness to trimaps and generalization to images from different domains, is still under-explored. Although some works propose to either refine the trimaps or adapt the algorithms to real-wo...
Article
Region-level representation learning plays a key role in providing discriminative information for person retrieval. Current methods rely on heuristically coarse-grained region strips or directly borrow pixel-level annotations from pretrained human parsing models for region representation learning. How to learn a discriminative region representation...
Article
Full-text available
Ghosting artifacts caused by moving objects and misalignments are a key challenge in constructing high dynamic range (HDR) images. Current methods first register the input low dynamic range (LDR) images using optical flow before merging them. This process is error-prone, and often causes ghosting in the resulting merged image. We propose a novel du...
Article
Full-text available
Recently, much attention has been spent on neural architecture search (NAS), aiming to outperform those manually-designed neural architectures on high-level vision recognition tasks. Inspired by the success, here we attempt to leverage NAS techniques to automatically design efficient network architectures for low-level image restoration tasks. In p...
Article
Accurate gland segmentation in histology tissue images is a critical but challenging task. Although deep models have demonstrated superior performance in medical image segmentation, they commonly require a large amount of annotated data, which are hard to obtain due to the extensive labor costs and expertise required. In this paper, we propose an i...
Article
Monocular depth estimation using CNNs trained from unlabelled videos has shown significant promise. However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices. In this work, we establish that the complex ego-motions exhib...
Preprint
Full-text available
Almost all scene text spotting (detection and recognition) methods rely on costly box annotation (e.g., text-line box, word-level box, and character-level box). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-e...
Article
Full-text available
Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures. What is noteworthy is that as of now, object detection is less touched by NAS algorithms despite its significant importance in computer vision. To the best of our knowledge, most of t...
Article
Full-text available
Low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, leading to a considerable decrease in accuracy. We propose to treat these spatial details and categorical semantics separately to achieve high ac...
Preprint
Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures. What is noteworthy is that as of now, object detection is less touched by NAS algorithms despite its significant importance in computer vision. To the best of our knowledge, most of t...
Preprint
Few-shot learning aims to adapt knowledge learned from previous tasks to novel tasks with only a limited amount of labeled data. Research literature on few-shot learning exhibits great diversity, while different algorithms often excel at different few-shot learning scenarios. It is therefore tricky to decide which learning strategies to use under d...
Article
Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that has made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the detect-then-segment strategy (e.g., Mask R-CNN), or predict embedding vectors first then...
Article
Full-text available
We propose a monocular depth estimation method SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time. Our contributions include: (i) we propose a geometry consistency loss, which penalizes the inconsistency of predicted depths between adjacent views; (ii) we propose a self-discove...
Article
Text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity between the two tasks. Here, we address the arbitrarily-shaped scene text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2). Our contributions are four-fold: 1) For the first time, we ad...
Preprint
Existing anomaly detection paradigms overwhelmingly focus on training detection models using exclusively normal data or unlabeled data (mostly normal samples). One notorious issue with these approaches is that they are weak in discriminating anomalies from normal samples due to the lack of the knowledge about the anomalies. Here, we study the probl...
Preprint
Full-text available
Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that has made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the 'detect-then-segment' strategy (e.g., Mask R-CNN), or predict embedding vectors first the...
Article
This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progres...
Article
Full-text available
Multi-orientation scene text detection has recently gained significant research attention. Previous methods directly predict words or text lines, typically by using quadrilateral shapes. However, many of these methods neglect the significance of consistent labeling, which is important for maintaining a stable training process, especially when it co...
Preprint
Full-text available
We propose a fully convolutional multi-person pose estimation framework using dynamic instance-aware convolutions, termed FCPose. Different from existing methods, which often require ROI (Region of Interest) operations and/or grouping post-processing, FCPose eliminates the ROIs and grouping post-processing with dynamic instance-aware keypoint estim...
Preprint
Full-text available
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time. Our contributions include: (i) we propose a geometry consistency loss, which penalizes the inconsistency of predicted depths between adjacent views; (ii) we propose a self-discovered mask...
Preprint
Scene flow in 3D point clouds plays an important role in understanding dynamic environments. Although significant advances have been made by deep neural networks, the performance is far from satisfactory as only per-point translational motion is considered, neglecting the constraints of the rigid motion in local regions. To address the issue, we pr...
Preprint
Full-text available
End-to-end text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity of the two complimentary tasks. It remains an open problem especially when processing arbitrarily-shaped text instances. Previous methods can be roughly categorized into two groups: character-b...
Preprint
Full-text available
Recently, deep neural network models have achieved impressive results in various research fields. Come with it, an increasing number of attentions have been attracted by deep super-resolution (SR) approaches. Many existing methods attempt to restore high-resolution images from directly down-sampled low-resolution images or with the assumption of Ga...
Article
Full-text available
Scene text recognition is an important task in computer vision. Despite tremendous progress achieved in the past few years, issues such as varying font styles, arbitrary shapes and complex backgrounds etc. have made the problem very challenging. In this work, we propose to improve text recognition from a new perspective by separating the text conte...
Preprint
Full-text available
We propose a human pose estimation framework that solves the task in the regression-based fashion. Unlike previous regression-based methods, which often fall behind those state-of-the-art methods, we formulate the pose estimation task into a sequence prediction problem that can effectively be solved by transformers. Our framework is simple and dire...
Preprint
Full-text available
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this iss...
Article
Full-text available
Anomaly detection, a.k.a. outlier detection or novelty detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection ,...
Article
Person retrieval is an important vision task, aiming at matching the images of the same person under various camera views. The key challenge of person retrieval lies in the large intra-class variations among the person images. Therefore, how to learn discriminative feature representations becomes the core problem. In this paper, we propose a deep p...
Preprint
Full-text available
The control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas. However, it is challenging since traffic dynamics are complicated in real situations. Because of the high complexity of modelling the optimisation problem, experimental settings of current works are often inconsistent. Moreover, it is not triv...
Preprint
Full-text available
We present Automatic Bit Sharing (ABS) to automatically search for optimal model compression configurations (e.g., pruning ratio and bitwidth). Unlike previous works that consider model pruning and quantization separately, we seek to optimize them jointly. To deal with the resultant large designing space, we propose a novel super-bit model, a singl...
Preprint
Full-text available
Recently, much attention has been spent on neural architecture search (NAS) approaches, which often outperform manually designed architectures on highlevel vision tasks. Inspired by this, we attempt to leverage NAS technique to automatically design efficient network architectures for low-level image restoration tasks. In this paper, we propose a me...
Preprint
Full-text available
Person search aims to localize and identify a specific person from a gallery of images. Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches. The former views person search as two independent tasks and achieves dominant results using separately trained person detection and re-identification (Re-ID) models. The...
Preprint
Full-text available
Recently, hyperspectral image (HSI) classification approaches based on deep learning (DL) models have been proposed and shown promising performance. However, because of very limited available training samples and massive model parameters, DL methods may suffer from overfitting. In this paper, we propose an end-to-end 3-D lightweight convolutional n...
Chapter
3D point cloud semantic and instance segmentation are crucial and fundamental for 3D scene understanding. Due to the complex structure, point sets are distributed off-balance and diversely, appearing as both category and pattern imbalance. It has been proved that deep networks can easily forget the non-dominant cases during training, which influenc...
Chapter
We present a new, embarrassingly simple approach to instance segmentation. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the “detect-then-...
Preprint
Full-text available
We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training. While this setting has been studied in the literature, here we show significantly stronger performance with a simple design (e.g., dramatically improving previous best reported mask AP of 21.1% in Hsu et al. (2019)...
Preprint
Full-text available
We show that learning affinity in upsampling provides an effective and efficient approach to exploit pairwise interactions in deep networks. Second-order features are commonly used in dense prediction to build adjacent relations with a learnable module after upsampling such as non-local blocks. Since upsampling is essential, learning affinity in up...
Preprint
Full-text available
With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods. A prevailing method for improving the inference efficiency is model quantization, which allows for replacing the expensive floating-point operations with ef...
Article
Clusters of viral pneumonia occurrences over a short period may be a harbinger of an outbreak or pandemic. Rapid and accurate detection of viral pneumonia using chest X-rays can be of significant value for large-scale screening and epidemic prevention, particularly when other more sophisticated imaging modalities are not readily accessible. However...
Preprint
Full-text available
Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions. The inevitable variation in the instance scales can l...
Article
Traditional portrait matting methods typically consist of a trimap estimation network and a matting network. Here, we propose a new light‐weight portrait matting approach, termed parameter‐sharing portrait matting (PSPM). Different from conventional portrait matting models where the encoder and decoder networks in two tasks are often separately des...
Preprint
Full-text available
It has been widely recognized that the success of deep learning in image segmentation relies overwhelmingly on a myriad amount of densely annotated training data, which, however, are difficult to obtain due to the tremendous labor and expertise required, particularly for annotating 3D medical images. Although self-supervised learning (SSL) has show...
Preprint
Watermarking is the procedure of encoding desired information into an image to resist potential noises while ensuring the embedded image has little perceptual perturbations from the original image. Recently, with the tremendous successes gained by deep neural networks in various fields, digital watermarking has attracted increasing number of attent...
Preprint
Full-text available
Due to the intensive cost of labor and expertise in annotating 3D medical images at a voxel level, most benchmark datasets are equipped with the annotations of only one type of organs and/or tumors, resulting in the so-called partially labeling issue. To address this, we propose a dynamic on-demand network (DoDNet) that learns to segment multiple o...
Preprint
Full-text available
To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method t...
Chapter
Non-local operation is widely explored to model the long-range dependencies. However, the redundant computation in this operation leads to a prohibitive complexity. In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy. Instead of propagating the...
Chapter
We formulate counting as a sequential decision problem and present a novel crowd counting model solvable by deep reinforcement learning. In contrast to existing counting models that directly output count values, we divide one-step estimation into a sequence of much easier and more tractable sub-decision problems. Such sequential decision nature cor...
Chapter
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions. Dominant methods based on supervised learning clone expert’s behaviours and thus perform better on seen environments, while showing restricted performance on unseen ones. Reinforcement Learning (RL) ba...
Chapter
We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation). Top-performing instance segmentation methods such as Mask R-CNN rely on ROI operations (typically ROIPool or ROIAlign) to obtain the final instance masks. In contrast, we propose to solve instance segmentation fro...
Article
Vehicle instance retrieval (IR) often requires one to recognize the fine-grained visual differences between vehicles. Besides the holistic appearance of vehicles which is easily affected by the viewpoint variation and distortion, vehicle parts also provide crucial cues to differentiate near-identical vehicles. Motivated by these observations, we in...