About
198
Publications
10,643
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,296
Citations
Introduction
Current institution
Additional affiliations
Education
September 2004 - January 2009
Publications
Publications (198)
In this paper, we propose a new supervised linear dimensionality reduction method called discriminant projection embedding (DPE). DPE can preserve within-class neighboring geometry and extract between-class relevant structures for classification effectively. The proposed method is applied to face and palmprint recognition and is examined using the...
In this paper, a novel one-dimensional correlation filter based class-dependence feature analysis (1D-CFA) method is presented for robust face recognition. Compared with original CFA that works in the two dimensional (2D) image space, 1D-CFA encodes the image data as vectors. In 1D-CFA, a new correlation filter called optimal extra-class origin out...
This paper develops a novel Class-dependence Feature Analysis (CFA) method for robust face recognition. A new correlation filter called Optimal Origin Correlation output Tradeoff Filter (OOCTF) is designed in the two-dimensional (2-D) feature space obtained by Second-order Tensor Subspace Analysis (STSA). Designing correlation filters in the 2-D fe...
Video object detection is a challenging task in computer vision since it needs to handle the object appearance degradation problem that seldom occurs in the image domain. Off-the-shelf video object detection methods typically aggregate multiframe features at one stroke to alleviate appearance degradation. However, these existing methods do not take...
Effectively handling the co-occurrence of non-IID data and long-tailed distributions remains a critical challenge in federated learning. While fine-tuning vision-language models (VLMs) like CLIP has shown to be promising in addressing non-IID data challenges, this approach leads to severe degradation of tail classes in federated long-tailed scenari...
Data heterogeneity, stemming from local non-IID data and global long-tailed distributions, is a major challenge in federated learning (FL), leading to significant performance gaps compared to centralized learning. Previous research found that poor representations and biased classifiers are the main problems and proposed neural-collapse-inspired syn...
Automatic X-ray prohibited item detection is vital for public safety. Existing deep learning-based methods all assume that the annotations of training X-ray images are correct. However, obtaining correct annotations is extremely hard if not impossible for large-scale X-ray images, where item overlapping is ubiquitous.As a result, X-ray images are e...
Hu Ding Yan Yan Yang Lu- [...]
Hanzi Wang
Most facial expression recognition (FER) models are trained on large-scale expression data with centralized learning. Unfortunately, collecting a large amount of centralized expression data is difficult in practice due to privacy concerns of facial images. In this paper, we investigate FER under the framework of personalized federated learning, whi...
Existing state-of-the-art methods for few-shot action recognition (FSAR) achieve promising performance by spatial and temporal modeling. However, most current methods ignore the importance of edge information and motion cues, leading to inferior performance. For the few-shot task, it is important to effectively explore limited data. Additionally, e...
Automatic detection of prohibited items in X-ray images plays a crucial role in public security. However, existing methods rely heavily on labor-intensive box annotations. To address this, we investigate X-ray prohibited item detection under labor-efficient point supervision and develop an intra-inter objectness learning network (I2OL-Net). I2OL-Ne...
Existing pulmonary nodule detection methods often train models in a fully-supervised setting that requires strong labels (i.e., bounding box labels) as label information. However, manual annotation of bounding boxes in CT images is very time-consuming and labor-intensive. To alleviate the annotation burden, in this paper, we investigate pulmonary n...
In recent years, few-shot action recognition has achieved remarkable performance through spatio-temporal relation modeling. Although a wide range of spatial and temporal alignment modules have been proposed, they primarily address spatial or temporal misalignments at the video level, while the spatio-temporal relationships across different videos a...
Visible-infrared person re-identification (VIReID) has attracted increasing attention due to the requirements for 24-hour intelligent surveillance systems. In this task, one of the major challenges is the modality discrepancy between the visible (VIS) and infrared (NIR) images. Most conventional methods try to design complex networks or generative...
Video object detection is an important yet challenging task in the computer vision field. One limitation of off-the-shelf video object detection methods is that they only explore information from the visual modality, without considering the semantic knowledge of the textual modality due to the large inter-modality discrepancies, resulting in limite...
Existing caricature-visual face recognition methods train the models based on caricature-visual image pairs from the same identities. Unfortunately, in many real-world applications, facial caricatures and visual facial images are usually unpaired in the training set due to the difficulty of collecting facial caricatures drawn by artists. In this pa...
Recent Siamese trackers have taken advantage of transformers to achieve impressive advancements. However, existing transformer trackers ignore considering the positional and structural information between tokens, and traditional template update strategies easily introduce noises to the dynamic templates during tracking. In order to alleviate this i...
Most existing GAN inversion methods either achieve accurate reconstruction but lack editability or offer strong editability at the cost of fidelity. Hence, how to balance the distortion-editability trade-off is a significant challenge for GAN inversion. To address this challenge, we introduce a novel spatial-contextual discrepancy information compe...
Visible-infrared person re-identification (VI-ReID) aims to retrieve images of the same persons captured by visible (VIS) and infrared (IR) cameras. Existing VI-ReID methods ignore high-order structure information of features while being relatively difficult to learn a reasonable common feature space due to the large modality discrepancy between VI...
Human emotions contain both basic and compound facial expressions. In many practical scenarios, it is difficult to access all the compound expression categories at one time. In this paper, we investigate comprehensive facial expression recognition (FER) in the class-incremental learning paradigm, where we define well-studied and easily-accessible b...
Few-shot action recognition aims to recognize new unseen categories with only a few labeled samples of each class. However, it still suffers from the limitation of inadequate data, which easily leads to the overfitting and low-generalization problems. Therefore, we propose a cross-modal contrastive learning network (CCLN), consisting of an adversar...
With the recent advance of deep learning, a large number of methods have been developed for prohibited item detection in X-ray security images. Generally, these methods train models on a single X-ray image dataset that may contain only limited categories of prohibited items. To detect more prohibited items, it is desirable to train a model on the m...
In this paper, we study facial expression recognition (FER) in the class-incremental learning (CIL) setting, which defines the classification of well-studied and easily-accessible basic expressions as an initial task while learning new compound expressions gradually. Motivated by the fact that compound expressions are meaningful combinations of bas...
The recent one-to-one label assignment plays a crucial role in removing the last non-differentiable component, i.e., Non-Maximum Suppression (NMS), used in the post-processing step of the one-to-many label assignment, thus building an efficient end-to-end detection system. However, due to the limited number of foreground samples, the one-to-one lab...
Hu Ding Yan Yan Yang Lu- [...]
Hanzi Wang
Most facial expression recognition (FER) models are trained on large-scale expression data with centralized learning. Unfortunately, collecting a large amount of centralized expression data is difficult in practice due to privacy concerns of facial images. In this paper, we investigate FER under the framework of personalized federated learning, whi...
Automatic X-ray prohibited item detection is vital for public safety. Existing deep learning-based methods all assume that the annotations of training X-ray images are correct. However, obtaining correct annotations is extremely hard if not impossible for large-scale X-ray images, where item overlapping is ubiquitous. As a result, X-ray images are...
Due to the diversity of human emotions, it is often difficult to collect all the expression categories at once in many practical applications. In this paper, we investigate facial expression recognition (FER) under the class-incremental learning (CIL) paradigm, where we define easily-accessible basic expressions as an initial task and learn new com...
Few-shot action recognition aims to classify unseen action classes with limited labeled training samples. Most current works follow the metric learning technology to learn a good embedding and an appropriate comparison metric. Due to the limited labeled data, the generalization of embedding networks is limited when employing the meta-learning proce...
Knowledge distillation (KD), which aims at transferring the knowledge from a complex network (a teacher) to a simpler and smaller network (a student), has received considerable attention in recent years. Typically, most existing KD methods work on well-labeled data. Unfortunately, real-world data often inevitably involve noisy labels, thus leading...
Facial attribute recognition (FAR) is an important and yet challenging multi-label learning task in computer vision. Existing FAR methods have achieved promising performance with the development of deep learning. However, they usually suffer from prohibitive computational and memory costs. In this paper, we propose an identity-aware contrastive kno...
Video object detection aims at accurately localizing the objects in videos and correctly recognizing their categories. Off-the-shelf video object detection methods have made some progress in recent years but they still suffer from the problems of inaccurate object localization, incorrect object recognition or insufficient relation learning, resulti...
Most existing compound facial expression recognition (FER) methods rely on large-scale labeled compound expression data for training. However, collecting such data is labor-intensive and time-consuming. In this paper, we address the compound FER task in the cross-domain few-shot learning (FSL) setting, which requires only a few samples of compound...
Video object detection is a fundamental and important task in computer vision. One mainstay solution for this task is to aggregate features from different frames to enhance the detection on the current frame. Off-the-shelf feature aggregation paradigms for video object detection typically rely on inferring feature-to-feature (Fea2Fea) relations. Ho...
Visible-infrared person re-identification (VI-ReID), which aims to search identities across different spectra, is a challenging task due to large cross-modality discrepancy between visible and infrared images. The key to reduce the discrepancy is to filter out identity-irrelevant interference and effectively learn modality-invariant person represen...
Yan Yan Ying Shu Si Chen- [...]
Hanzi Wang
Existing deep learning-based facial attribute recognition (FAR) methods rely heavily on large-scale labeled training data. Unfortunately, in many real-world applications, only limited labeled data are available, resulting in the performance deterioration of these methods. To address this issue, we propose a novel spatial-semantic patch learning net...
Visible-infrared person re-identification (VI-ReID), which aims to search identities across different spectra, is a challenging task due to large cross-modality discrepancy between visible and infrared images. The key to reduce the discrepancy is to filter out identity-irrelevant interference and effectively learn modality-invariant person represen...
Video object detection has attracted increasing attention in recent years. Although great success has been achieved by off-the-shelf video object detection methods through delicately designing various types of feature aggregation, they overlook the class-aware supervision and thus still suffer from the problem of classification incapability, which...
Video-based person re-identification (re-ID) aims to match the same pedestrian of video sequences across non-overlapping cameras. Video re-ID methods generally adopt frame-level feature extraction for different video frames, but they still lack effective spatio-temporal interaction, easily leading to the multi-frame misalignment problem. In this pa...
Most existing compound facial expression recognition (FER) methods rely on large-scale labeled compound expression data for training. However, collecting such data is labor-intensive and time-consuming. In this paper, we address the compound FER task in the cross-domain few-shot learning (FSL) setting, which requires only a few samples of compound...
This paper studies a new yet practical setting of semi-supervised semantic segmentation, i.e., hybrid-supervised semantic segmentation, where a small number of pixel-level (strong) annotations and a large number of image-level (weak) annotations are provided. It is a common practice to utilize pseudo labels to mitigate the issue of lacking strong a...
Recently, graph-based methods have been widely applied to model fitting. However, in these methods, association information is invariably lost when data points and model hypotheses are mapped to the graph domain. In this paper, we propose a novel model fitting method based on co-clustering on bipartite graphs (CBG) to estimate multiple model instan...
Real-time semantic segmentation, which aims to achieve high segmentation accuracy at real-time inference speed, has received substantial attention over the past few years. However, many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference, thus leading to degrad...
The prosperity of deep learning contributes to the rapid progress in scene text detection. Among all the methods with convolutional networks, segmentation-based ones have drawn extensive attention due to their superiority in detecting text instances of arbitrary shapes and extreme aspect ratios. However, the bottom-up methods are limited to the per...
Visual tracking is a core component of intelligent transportation systems and it is crucial to reduce or avoid traffic accidents. Recently, deep correlation filter (DCF) based trackers have exhibited good tracking performance. However, existing DCF based trackers are still ineffective to cope with large scale variations and severe distortions (e.g....
Most existing compound facial expression recognition (FER) methods rely on large-scale labeled compound expression data for training. However, collecting such data is labor-intensive and time-consuming. In this paper, we address the compound FER task in the cross-domain few-shot learning (FSL) setting, which requires only a few samples of compound...
Recent methods in network pruning have indicated that a dense neural network involves a sparse subnetwork (called a winning ticket), which can achieve similar test accuracy to its dense counterpart with much fewer network parameters. Generally, these methods search for the winning tickets on well-labeled data. Unfortunately, in many real-world appl...
Recently, attention mechanisms have shown great potential in improving the performance of mobile networks. Typically, they involve 2D symmetric convolution operations or generate 2D attention maps. However, such manners usually introduce high computational cost and large memory consumption, increasing the computational burden of mobile networks. To...
Human emotions involve basic and compound facial expressions. However, current research on facial expression recognition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER relies heavily on abundant labeled compound expression training...
Person attribute recognition (PAR) aims to simultaneously predict multiple attributes of a person. Existing deep learning-based PAR methods have achieved impressive performance. Unfortunately, these methods usually ignore the fact that different attributes have an imbalance in the number of noisy-labeled samples in the PAR training datasets, thus l...
Real-time semantic segmentation, which aims to achieve high segmentation accuracy at real-time inference speed, has received substantial attention over the past few years. However, many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference, thus leading to degrad...
Xi Weng Yan Yan Si Chen- [...]
Hanzi Wang
Over the past few years, deep convolutional neural network-based methods have made great progress in semantic segmentation of street scenes. Some recent methods align feature maps to alleviate the semantic gap between them and achieve high segmentation accuracy. However, they usually adopt the feature alignment modules with the same network configu...
Visual tracking is a crucial research topic in computer vision, which aims to locate any object as precisely as possible over a sequence of image frames. However, the existing trackers often suffer from the object drifting problem due to the difficulty of adapting to complex environments. In this paper, we propose a novel multi-stage adaptation net...
In this paper, we propose a novel adaptive deep disturbance-disentangled learning (ADDL) method for effective facial expression recognition (FER). ADDL involves a two-stage learning procedure. First, a disturbance feature extraction model is trained to identify multiple disturbing factors on a large-scale face database involving disturbance label i...
Human emotions involve basic and compound facial expressions. However, current research on facial expression recognition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER relies heavily on abundant labeled compound expression training...
Event-based approaches, which are based on bio-inspired asynchronous event cameras, have achieved promising performance on various computer vision tasks. However, the study of the fundamental event data association problem is still in its infancy. In this paper, we propose a novel Event Data Association approach (called EDA) to explicitly address t...
Xi Weng Yan Yan Si Chen- [...]
Hanzi Wang
Over the past few years, deep convolutional neural network-based methods have made great progress in semantic segmentation of street scenes. Some recent methods align feature maps to alleviate the semantic gap between them and achieve high segmentation accuracy. However, they usually adopt the feature alignment modules with the same network configu...
One of the main challenges in facial expression recognition (FER) is to address the disturbance caused by various disturbing factors, including common ones (such as identity, pose, and illumination) and potential ones (such as hairstyle, accessory, and occlusion). Recently, a number of FER methods have been developed to explicitly or implicitly all...
The small-loss criterion is widely used in recent label-noise learning methods. However, such a criterion only considers the loss of each training sample in a mini-batch but ignores the loss distribution in the whole training set. Moreover, the selection of clean samples depends on a heuristic clean data rate. As a result, some noisy-labeled sample...
Ying Shu Yan Yan Si Chen- [...]
Hanzi Wang
Recent advances in deep learning have demonstrated excellent results for Facial Attribute Recognition (FAR), typically trained with large-scale labeled data. However, in many real-world FAR applications, only limited labeled data are available, leading to remarkable deterioration in performance for most existing deep learning-based FAR methods. To...
Existing learning-based dehazing methods are prone to cause excessive dehazing and failure to dense haze, mainly because that the global features of hazy images are not fully utilized, while the local features of hazy images are not enough discriminative. In this letter, we propose a Recurrent Context Aggregation Network (RCAN) to effectively dehaz...
In this paper, we propose a novel hierarchical representation via message propagation (HRMP) method for robust model fitting, which simultaneously takes advantages of both the consensus analysis and the preference analysis to estimate the parameters of multiple model instances from data corrupted by outliers, for robust model fitting. Instead of an...
In recent years, Convolutional Neural Network (CNN) based trackers have achieved state-of-the-art performance on multiple benchmark datasets. Most of these trackers train a binary classifier to distinguish the target from its background. However, they suffer from two limitations. Firstly, these trackers cannot effectively handle significant appeara...
In recent years, deep learning-based person re-identification (Re-ID) methods have made significant progress. However, the performance of these methods substantially decreases when dealing with occlusion, which is ubiquitous in realistic scenarios. In this paper, we propose a novel semantic-aware occlusion-robust network (SORN) that effectively exp...
To achieve effective facial expression recognition (FER), it is of great importance to address various disturbing factors, including pose, illumination, identity, and so on. However, a number of FER databases merely provide the labels of facial expression, identity, and pose, but lack the label information for other disturbing factors. As a result,...
In recent years, Discriminative Correlation Filter (DCF) based tracking methods have achieved impressive performance in visual tracking. However, their excellent performance usually comes at the cost of sacrificing the computational speed. Furthermore, training correlation filters using high dimensional raw features may introduce the risk of severe...
In this paper, we propose a novel hierarchical representation via message propagation (HRMP) method for robust model fitting, which simultaneously takes advantages of both the consensus analysis and the preference analysis to estimate the parameters of multiple model instances from data corrupted by outliers, for robust model fitting. Instead of an...
Object tracking is a challenging task in computer vision based intelligent transportation systems. Recently, Siamese based object tracking methods have attracted significant attention due to their highly efficient performance. These tracking methods usually train a Siamese network to match the initial target patch of the first frame with candidates...
Recently, some correlation filter based trackers with detection proposals have achieved state-of-the-art tracking results. However, a large number of redundant proposals given by the proposal generator may degrade the performance and speed of these trackers. In this paper, we propose an adaptive proposal selection algorithm which can generate a sma...
In recent years, deep learning based visual tracking methods have obtained great success owing to the powerful feature representation ability of Convolutional Neural Networks (CNNs). Among these methods, classification-based tracking methods exhibit excellent performance while their speeds are heavily limited by the expensive computation for massiv...
Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding performance in semantic image segmentation. However, state-of-the-art DCNN-based semantic segmentation methods usually suffer from high computational complexity due to the use of complex network architectures. This greatly limits their applications in the real-world scenario...
Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding performance in semantic image segmentation. However, state-of-the-art DCNN-based semantic segmentation methods usually suffer from high computational complexity due to the use of complex network architectures. This greatly limits their applications in the real-world scenario...
Weakly-supervised object detection has recently attracted increasing attention since it only requires image-levelannotations. However, the performance obtained by existingmethods is still far from being satisfactory compared with fully-supervised object detection methods. To achieve a good trade-off between annotation cost and object detection perf...
Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually...
Facial Attribute Classification (FAC) has attracted increasing attention in computer vision and pattern recognition. However, state-of-the-art FAC methods perform face detection/alignment and FAC independently. The inherent dependencies between these tasks are not fully exploited. In addition, most methods predict all facial attributes using the sa...
Recently, the Siamese network based visual tracking methods have shown great potentials in balancing the tracking accuracy and computational efficiency. These methods use two-branch convolutional neural networks (CNNs) to generate a response map between the target exemplar and each of candidate patches in the search region. However, since these met...
Recently, deep learning based facial expression recognition (FER) methods have attracted considerable attention and they usually require large-scale labelled training data. Nonetheless, the publicly available facial expression databases typically contain a small amount of labelled data. In this paper, to overcome the above issue, we propose a novel...
In recent years, deep learning based visual tracking methods have obtained great success owing to the powerful feature representation ability of Convolutional Neural Networks (CNNs). Among these methods, classification-based tracking methods exhibit excellent performance while their speeds are heavily limited by the expensive computation for massiv...