# Yongsheng Liang's research while affiliated with Shenzhen University and other places

## Publications (40)

Article
Full-text available
In recent years, image compression methods based on deep learning have received extensive attention and research. Most methods focus on minimizing the mean squared error (MSE) to obtain reconstructed images with higher peak signal-to-noise ratio (PSNR). However, the ability of pixel-wise distortion to capture the perceptual differences between imag...
Article
Full-text available
Using an attention mechanism based on the convolutional neural networks (CNNs) improves the performance of computer vision tasks by enhancing the representation of the features. The existing attention methods enhance the expression of the features by modeling the internal information of the features. However, due to the limited information flow of...
Article
Full-text available
Attention mechanisms are widely used for Convolutional Neural Networks (CNNs) when performing various visual tasks. Many methods introduce multi-scale information into attention mechanisms to improve their feature transformation performance; however, these methods do not take into account the potential importance of scale invariance. This paper pro...
Preprint
Full-text available
Recently, learned image compression methods have developed rapidly and exhibited excellent rate-distortion performance when compared to traditional standards, such as JPEG, JPEG2000 and BPG. However, the learning-based methods suffer from high computational costs, which is not beneficial for deployment on devices with limited resources. To this end...
Preprint
Full-text available
Visual object tracking acts as a pivotal component in various emerging video applications. Despite the numerous developments in visual tracking, existing deep trackers are still likely to fail when tracking against objects with dramatic variation. These deep trackers usually do not perform online update or update single sub-branch of the tracking m...
Article
Rank minimization-based subspace clustering methods have been widely developed in the past decades. Although some smooth surrogates, such as the nuclear norm and Schatten-p norm mitigate the NP-hard issue to some extend, these existing methods may yield unsatisfactory results, due to the singular values of the coefficient matrix not being further s...
Preprint
Full-text available
In this paper, a unified transformation method in learned image compression(LIC) is proposed from the perspective of communication. Firstly, the quantization in LIC is considered as a generalized channel with additive uniform noise. Moreover, the LIC is interpreted as a particular communication system according to the consistency in structures and...
Preprint
Full-text available
Neural image compression have reached or out-performed traditional methods (such as JPEG, BPG, WebP). However,their sophisticated network structures with cascaded convolution layers bring heavy computational burden for practical deployment. In this paper, we explore the structural sparsity in neural image compression network to obtain real-time acc...
Article
In recent years, low-rank representation (LRR) has received increasing attention on subspace clustering. Due to inevitable matrix inversion and singular value decomposition in each iteration, however, most of existing LRR algorithms may suffer from high computational complexity, and hence can not cope with the large-scale sample data commendably. T...
Article
Recently, remarkable progress has been made in learned image compression (LIC), in which nonlinear transforms (NTs) play a crucial role. Although there are many NT methods for improving the rate distortion performance, all the existing methods sacrifice the computational complexity and the number of parameters of the transformation. This paper prov...
Article
Active learning maximizes the performance of the current learning model by soliciting benefits from unlabeled data. In its early stage with insufficient labels, finding representations that maintain a consistent hypothesis with the entire unlabeled pool is an intelligent paradigm. From a matrix perspective, this paradigm can be transformed into a s...
Article
Full-text available
In deep neural network compression, channel/filter pruning is widely used for compressing the pre-trained network by judging the redundant channels/filters. In this paper, we propose a two-step filter pruning method to judge the redundant channels/filters layer by layer. The first step is to design a filter selection scheme based on $$\ell _{2,1}$$...
Chapter
Gastrointestinal polyps are the main cause of colorectal cancer. Given the polyp variations in terms of size, color, texture and poor optical conditions brought by endoscopy, polyp segmentation is still a challenging problem. In this paper, we propose a Learnable Oriented-Derivative Network (LOD-Net) to refine the accuracy of boundary predictions f...
Article
Full-text available
Capsule endoscopy is a leading diagnostic tool for small bowel lesions which faces certain challenges such as time-consuming interpretation and harsh optical environment inside the small intestine. Specialists unavoidably waste lots of time on searching for a high clearness degree image for accurate diagnostics. However, current clearness degree cl...
Article
Attention mechanisms have achieved success in video-based person re-identification (re-ID). However, current global attentions tend to focus on the most salient parts, e.g., clothes, and ignore other subtle but valuable cues, e.g., hair, bag, and shoes. They still do not make full use of valuable information from diverse parts of human bodies. To t...
Article
The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can be easily restricted by lo...
Article
As well known, low rank representation method (LRR) has obtained promising performance for subspace clustering, and many LRR variants have been developed, which mainly solve the three problems existing in LRR: 1) Problem of mean calculation; 2) Problem of deviating from the real low rank solution; 3) Problem of high computation cost on the large-sc...
Article
The fast pandemics of coronavirus disease (COVID-19) has led to a devastating influence on global public health. In order to treat the disease, medical imaging emerges as a useful tool for diagnosis. However, the computed tomography (CT) diagnosis of COVID-19 requires experts’ extensive clinical experience. Therefore, it is essential to achieve rap...
Article
Full-text available
Nuclear segmentation of histopathological images is a crucial step in computer-aided image analysis. There are complex, diverse, dense, and even overlapping nuclei in these histopathological images, leading to a challenging task of nuclear segmentation. To overcome this challenge, this paper proposes a hybrid-attention nested UNet (Han-Net), which...
Article
Inspired by the mean calculation of RPCA_OM and inductiveness of IRPCA, we first propose an inductive robust principal component analysis method with removing the optimal mean automatically, which is shorted as IRPCA_OM. Furthermore, IRPCA_OM is extended to Schatten- $p$ norm and a more general framework (i.e., EIRPCA_OM) is presented. The objecti...
Chapter
Breast cancer is a main malignant tumor for women and the incidence is trending to ascend. Detecting positive and negative tumor cells in the immunohistochemically stained sections of breast tissue to compute the Ki-67 index is an essential means to determine the degree of malignancy of breast cancer. However, there are scarcely public datasets abo...
Preprint
Triplet loss is widely used for learning local descriptors from image patch. However, triplet loss only minimizes the Euclidean distance between matching descriptors and maximizes that between the non-matching descriptors, which neglects the topology similarity between two descriptor sets. In this paper, we propose topology measure besides Euclidea...
Conference Paper
Full-text available
Existing deep Thermal InfraRed (TIR) trackers usually use the feature models of RGB trackers for representation. However, these feature models learned on RGB images are neither effective in representing TIR objects nor taking fine-grained TIR information into consideration. To this end, we develop a multi-task framework to learn the TIR-specific di...
Article
One-shot action recognition aims at recognizing actions in unseen classes in cases where only one training video is provided. Compared with one-shot image recognition, one-shot learning on videos is more difficult due to the fact that the temporal dimension of video may lead to greater variation. To handle this variation, it is important to conduct...
Data
Multi-Task Driven Feature Models for Thermal Infrared Tracking--Supplementary Materials
Conference Paper
Full-text available
Existing deep Thermal InfraRed (TIR) trackers usually use the feature models of RGB trackers for representation. However , these feature models learned on RGB images are neither effective in representing TIR objects nor taking fine-grained TIR information into consideration. To this end, we develop a multi-task framework to learn the TIR-specific d...
Article
Full-text available
The bandwidth of a kernel function is a crucial parameter in the mean shift algorithm. This paper proposes a novel adaptive bandwidth strategy which contains three main contributions. (1) The differences among different adaptive bandwidth are analyzed. (2) A new mean shift vector based on bidirectional adaptive bandwidth is defined, which combines...

## Citations

... Such approach is meant to explore the potential of learning-based compression, rather than be used in practice as is. There has also been a considerable amount of work on variable-rate learning- Yin et al. (2022), where a single model is able to produce multiple rate-distortion points. However, in terms of rate-distortion performance, "fixed-rate" approaches such as Cheng et al. (2020); Guo et al. (2022) currently seem to have an advantage over variable-rate ones. ...
... Recently, Li et al. [14] have proposed AdderIC, which utilizes AdderNet to construct an image compression framework. However, it still exists a performance gap between AdderIC and its CNN counterpart. ...
... after achieving the features, they applied the KNN classifier to group the activities. In [23], in addition to using auto-encoder, they normalized motion sequence based on spatial-temporal asynchronous method during the process of auto-encoder. They pruned the temporal information that had less effect on the learning process. ...
... As the newly-proposed methods, SANet [41] and MSNet [42] design the shallow attention module and subtraction unit, respectively, to achieve precise and efficient segmentation. Additionally, several works opt for introducing additional constraints via three main-stream manners: exerting explicit boundary supervision [43][44][45][46][47] , introducing implicit boundary-aware representation [48][49][50] , and exploring uncertainty for ambiguous regions [51] . 2) Transformer-based approaches. ...
... First, the design strategy of the models could be further improved. The above deep learning methods take advantage of targeting the classification task but fail to exploit the performance gained to the model from the facilitation of the classification task by the segmentation task [18,19], i.e., the performance and effectiveness of the classification task can be improved with segmentation features and lesion masks generated by the segmentation task [20]. Second, the model can fully use low-level and fine-grained shallow features. ...
... In order to make better use of feature information, [41] proposed Multiscale Siamese Model (SMSNet) to capture multi-scale ReID features. [36] proposed Co-segmentation based Attention Module (COSAM) to enhance common abstract features and suppress background features, and [34] proposed the diverse part attentive network (DPAN) to use differentiated and diverse body cues to obtain robust features. ...
... Zhang et al. [66] proposed a multi-interaction dual decoder to better fuse cross-modality features, multilevel features and global contextual features. Gao et al. [67] performed multiple stages and multiple scales feature fusion in RGB-T SOD. Wang et al. [68] proposed a novel cross-guided fusion network to perform adequate cross-modality fusion and took full advantage of high-level semantic information. ...
... Some of them simultaneously completed the two tasks of COVID-19 detection and lesion segmentation. 14- 16 Gao et al. 14 proposed a dual-branch combined network for the diagnosis of COVID-19, which could realize classification and lesion segmentation simultaneously. The algorithm first used UNet to extract the accurate lung region, and then used the lesion attention (LA) module to complete the slice classification and segmentation of CT images at the same time. ...
... Chen et al. adopted a low rank approximation for eliminating the redundancy among filters [4]. [19] employed a two-step feature map reconstruction procedure to remove redundant filters and channels. The filter selection scheme was designed based on an l 2, 1 norm. ...
... The network processed the input at a variety of resolutions, providing an accurate representation of nuclei of different sizes. To separate complex and diverse nuclear boundaries, a hybrid attention block in Han-Net [21] was used to explore attention information and build correlations between different pixels to further expand the U-Net. SONNET [22] was a self-guided ordinal regression neural network for nuclear segmentation, which exploits the intrinsic characteristics of nuclei and focuses on highly uncertain areas during training. ...