Xinbo GaoChongqing University of Posts and Telecommunications · School of Computer Science and Technology
Xinbo Gao
Doctor of Philosophy
About
1,258
Publications
139,855
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
33,798
Citations
Introduction
Xinbo Gao (M'02-SM'07-F'24) received the B.Eng., M.Sc. and Ph.D. degrees in electronic engineering, signal and information processing from Xidian University, Xi'an, China, in 1994, 1997, and 1999, respectively. Since 2020, he was a Professor of Computer Science and Technology of Chongqing University of Posts and Telecommunications. He is a Fellow of the Institute of Engineering and Technology, a Fellow of CIE, a Fellow of CCF, and a Fellow of CAAI.
Additional affiliations
December 1999 - May 2020
April 2000 - March 2001
December 1999 - present
Education
August 1996 - September 1999
August 1994 - March 1997
August 1990 - July 1994
Publications
Publications (1,258)
Compositional Zero-Shot Learning (CZSL) recognizes new combinations by learning from known attribute-object pairs. However, the main challenge of this task lies in the complex interactions between attributes and object visual representations, which lead to significant differences in images. In addition, the long-tail label distribution in the real...
Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class. However, its practical application is limited by its inability to retrieve classes absent from the training set. To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR), where model perfor...
Recently, AI-generated images (AIGIs) created by given prompts (initial prompts) have garnered widespread attention. Nevertheless, due to technical nonproficiency, they often suffer from poor perception quality and Text-to-Image misalignment. Therefore, assessing the perception quality and alignment quality of AIGIs is crucial to improving the gene...
Continual Learning (CL) aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge. Recently, State Space Models (SSMs), particularly the Mamba model, have achieved notable success in computer vision. Building on the strengths of SSMs, this study explores leveraging the Mamba mod...
The local binary pattern (LBP) is an effective feature, describing the size relationship between the neighboring pixels and the current pixel. While individual LBP-based methods yield good results, co-occurrence LBP-based methods exhibit a better ability to extract structural information. However, most of the co-occurrence LBP-based methods excel m...
The proliferation of 2D foundation models has sparked research into adapting them for open-world 3D instance segmentation. Recent methods introduce a paradigm that leverages superpoints as geometric primitives and incorporates 2D multi-view masks from Segment Anything model (SAM) as merging guidance, achieving outstanding zero-shot instance segment...
Traditional fuzzy set methods, designed around the finest granularity of inputs-individual points and their membership degrees-often struggle with inefficiencies and label noise. To overcome these challenges, we introduce granular-ball computing into the fuzzy set, creating the new granular-ball fuzzy set framework. This approach uses granular-ball...
Deep learning systems typically suffer from
catastrophic forgetting
of old knowledge when learning from new data continually. Recently, various class incremental learning (CIL) methods have been proposed to address this issue, and some approaches achieve promising performances by relying on rehearsing the training data of previous tasks. However,...
Person search aims to locate target pedestrians from scene images, involving detection and re-identification. The former seeks to separate the background and focus on the commonality between pedestrians, while the latter aims to identify the target and focus on the difference between pedestrians. To address the paradox of detection and re-identific...
Remote sensing change detection aims to perceive changes occurring on the Earth's surface from remote sensing data in different periods, and feed these changes back to humans. However, most existing methods only focus on detecting change regions, lacking the ability to interact with users to identify changes that the users expect. In this paper, we...
Traditional clustering algorithms often focus on the most fine-grained information and achieve clustering by calculating the distance between each pair of data points or implementing other calculations based on points. This way is not inconsistent with the cognitive mechanism of "global precedence" in human brain, resulting in those methods' bad pe...
Recent blind super-resolution (BSR) methods are explored to handle unknown degradations and achieve impressive performance. However, the prevailing assumption in most BSR methods is the spatial invariance of degradation kernels across the entire image, which leads to significant performance declines when faced with spatially variant degradations ca...
Accurate repetitive action counting has crucial applications in the era of AI-assisted universal fitness. Existing methods are prone to large errors in spatially fine-grained action counting scenarios. In this study, we propose a joint-wise temporal self-similarity periodic selection network (JTSPS-Net) with a human skeleton as its input. Periodic...
Recent CNN-driven face super-resolution (FSR) technologies have achieved excellent breakthroughs by incorporating facial prior knowledge. However, most of them suffer from some obvious limitations. They always estimate facial priors from input low-resolution (LR) faces or coarsely enhanced LR faces, obtaining unfaithful priors that cannot be adequa...
In the field of object detection, detecting small objects is an important and challenging task. However, most existing methods tend to focus on designing complex network structures, lack attention to global representation, and ignore redundant noise and dense distribution of small objects in complex networks. To address the above problems, this pap...
Cross-resolution person re-identification (ReID) is a challenging task that addresses the issue of matching individuals across different resolution conditions. Traditional person ReID methods often assume that images have sufficiently high resolution and overlook the practical scenarios involving low-resolution or blurry images. Existing cross-reso...
Recent advances indicate that diffusion models hold great promise in image super-resolution. While the latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effect...
While numerous Video Violence Detection (VVD) methods have focused on representation learning in Euclidean space, they struggle to learn sufficiently discriminative features, leading to weaknesses in recognizing normal events that are visually similar to violent events (\emph{i.e.}, ambiguous violence). In contrast, hyperbolic representation learni...
The true label plays an important role in semi-supervised medical image segmentation (SSMIS) because it can provide the most accurate supervision information when the label is limited. The popular SSMIS method trains labeled and unlabeled data separately, and the unlabeled data cannot be directly supervised by the true label. This limits the contri...
Eye movement biometrics has received increasing attention thanks to its high secure identification. Although deep learning (DL) models have been recently successfully applied for eye movement recognition, the DL architecture still is determined by human prior knowledge. Differentiable Neural Architecture Search (DARTS) automates the manual process...
Most few-shot learning methods employ either adaptive approaches or parameter amortization techniques. However, their reliance on pre-trained models presents a significant vulnerability. When an attacker’s trigger activates a hidden backdoor, it may result in the misclassification of images, profoundly affecting the model’s performance. In our rese...
Existing facial expression recognition (FER) methods typically fine-tune a pre-trained visual encoder using discrete labels. However, this form of supervision limits to specify the emotional concept of different facial expressions. In this paper, we observe that the rich knowledge in text embeddings, generated by vision-language models, is a promis...
Image fusion combines images from multiple domains into one image, containing complementary information from source domains. Existing methods take pixel intensity, texture and high-level vision task information as the standards to determine preservation of information, lacking enhancement for human perception. We introduce an image fusion method, H...
The potential vulnerability of deep neural networks and the complexity of pedestrian images, greatly limits the application of person re-identification techniques in the field of smart security. Current attack methods often focus on generating carefully crafted adversarial samples or only disrupting the metric distances between targets and similar...
Recent advancements in deep learning have greatly advanced the field of infrared small object detection (IRSTD). Despite their remarkable success, a notable gap persists between these IRSTD methods and generic segmentation approaches in natural image domains. This gap primarily arises from the significant modality differences and the limited availa...
Compared to single-modal knowledge distillation, cross-modal knowledge distillation faces more severe challenges due to domain gaps between modalities. Although various methods have proposed various solutions to overcome these challenges, there is still limited research on how domain gaps affect cross-modal knowledge distillation. This paper provid...
Few-shot font generation (FFG) aims to preserve the underlying global structure of the original character while generating target fonts by referring to a few samples. It has been applied to font library creation, a personalized signature, and other scenarios. Existing FFG methods explicitly disentangle content and style of reference glyphs universa...
This article proposes a finite-time based sliding-mode controller (FTSMC) and disturbance observer (FTDO) for connected vehicle (CV) platoon with uncertain dynamics. In particular, a recursive structure consisting of first-level and second-level sliding mode surfaces (SMSs) is developed for the chattering of the conventional SMC. Herein, the first-...
Binary neural network (BNN) is an effective approach to reduce the memory usage and the computational complexity of full-precision convolutional neural networks (CNNs), which has been widely used in the field of deep learning. However, there are different properties between BNNs and real-valued models, making it difficult to draw on the experience...
Food image composition requires the use of existing dish images and background images to synthesize a natural new image, while diffusion models have made significant advancements in image generation, enabling the construction of end-to-end architectures that yield promising results. However, existing diffusion models face challenges in processing a...
Deep neural networks have recently achieved promising performance in the vein recognition task and have shown an increasing application trend, however, they are prone to adversarial perturbation attacks by adding imperceptible perturbations to the input, resulting in making incorrect recognition. To address this issue, we propose a novel defense mo...
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. However, these approaches typically require tens or even hundreds of iterative samplings, resulting in significant latency. Recently, techniques have been devised to enhance the sampling...
Due to the advantages such as high security, high privacy, and liveness recognition, vein recognition has been received more and more attention in past years. Recently, deep learning models, e.g., Mamba has shown robust feature representation with linear computational complexity and successfully applied for visual tasks. However, vision Manba can c...
One-step Weakly Supervised Person Search (WSPS) jointly performs pedestrian detection and person Re-IDentification (ReID) only with bounding box annotations, which makes the traditional person ReID problem more suitable and efficient for real-world applications. However, this task is very challenging due to the following reasons: 1) large feature g...
The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual...
The emergence of face forgery has raised global concerns on social security, thereby facilitating the research on automatic forgery detection. Although current forgery detectors have demonstrated promising performance in determining authenticity, their susceptibility to adversarial perturbations remains insufficiently addressed. Given the nuanced d...
Due to the successful development of deep image generation technology, forgery detection plays a more important role in social and economic security. Racial bias has not been explored thoroughly in the deep forgery detection field. In the paper, we first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we pro...
Mixup data augmentation approaches have been applied for various tasks of deep learning to improve the generalization ability of deep neural networks. Some existing approaches CutMix, SaliencyMix, etc. randomly replace a patch in one image with patches from another to generate the mixed image. Similarly, the corresponding labels are linearly combin...
The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a...
Granular-ball support vector machine (GBSVM) is a significant attempt to construct a classifier using the coarse-to-fine granularity of a granular ball as input, rather than a single data point. It is the first classifier whose input contains no points. However, the existing model has some errors, and its dual model has not been derived. As a resul...
In recent years, vein biometrics has gained significant attention due to its high security and privacy features. While deep neural networks have become the predominant classification approaches for their ability to automatically extract discriminative vein features, they still face certain drawbacks: 1) Existing transformer-based vein classifiers s...
Magnetic resonance imaging (MRI)-based deep neural networks (DNN) have been widely developed to perform prostate cancer (PCa) classification. However, in real-world clinical situations, prostate MRIs can be easily impacted by rectal artifacts, which have been found to lead to incorrect PCa classification. Existing DNN-based methods typically do not...
In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes a Multi-modality Regional Alignment Network (MRANet), an explainable model for radiolog...
Hand gesture recognition is pivotal in facilitating human–machine interaction within the Internet of Things. Nevertheless, it encounters challenges, including labeling expenses and robustness. To tackle these issues, we propose a semi-supervised learning framework guided by pseudo-label consistency. This framework utilizes a dual-branch structure w...
Visible infrared person re-identification (VI-ReID) exposes considerable challenges because of the modality gaps between the person images captured by daytime visible cameras and nighttime infrared cameras. Several fully-supervised VI-ReID methods have improved the performance with extensive labeled heterogeneous images. However, the identity of th...
Deep neural networks are proven to be vulnerable to fine-designed adversarial examples, and adversarial defense algorithms draw more and more attention nowadays. Pre-processing based defense is a major strategy, as well as learning robust feature representation has been proven an effective way to boost generalization. However, existing defense work...
With the great development of generative model techniques, face forgery detection draws more and more attention in the related field. Researchers find that existing face forgery models are still vulnerable to adversarial examples with generated pixel perturbations in the global image. These generated adversarial samples still can't achieve satisfac...