Xiaohua Xie

Xiaohua Xie
Sun Yat-Sen University | SYSU · School of Information Science and Technology

Ph.D

About

125
Publications
7,868
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,507
Citations
Additional affiliations
January 2011 - July 2015
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Position
  • Professor (Associate)

Publications

Publications (125)
Article
Full-text available
Anomaly detectors are widely used in industrial manufacturing to detect and localize unknown defects in query images. These detectors are trained on anomaly-free samples and have successfully distinguished anomalies from most normal samples. However, hard-normal examples are scattered and far apart from most normal samples, and thus they are often...
Preprint
3D Gaussian Splatting (3DGS) has recently created impressive assets for various applications. However, the copyright of these assets is not well protected as existing watermarking methods are not suited for 3DGS considering security, capacity, and invisibility. Besides, these methods often require hours or even days for optimization, limiting the a...
Article
Deep learning techniques have achieved superior performance in computer-aided medical image analysis, yet they are still vulnerable to imperceptible adversarial attacks, resulting in potential misdiagnosis in clinical practice. Oppositely, recent years have also witnessed remarkable progress in defense against these tailored adversarial examples in...
Preprint
Recent vision foundation models can extract universal representations and show impressive abilities in various tasks. However, their application on object detection is largely overlooked, especially without fine-tuning them. In this work, we show that frozen foundation models can be a versatile feature enhancer, even though they are not pre-trained...
Preprint
Full-text available
Generative Steganography (GS) is a novel technique that utilizes generative models to conceal messages without relying on cover images. Contemporary GS algorithms leverage the powerful generative capabilities of Diffusion Models (DMs) to create high-fidelity stego images. However, these algorithms, while yielding relatively satisfactory generation...
Preprint
Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gra...
Preprint
Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of developing cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogr...
Article
Few-shot image classification (FSIC) is beneficial for a variety of real-world scenarios, aiming to construct a recognition system with limited training data. In this article, we extend the original FSIC task by incorporating defense against malicious adversarial examples. This can be an arduous challenge because numerous deep learning-based approa...
Article
Universal domain adaptation (UniDA) is a practical but challenging problem, in which information about the relation between the source and the target domains is not given for knowledge transfer. Existing UniDA methods may suffer from the problems of overlooking intra-domain variations in the target domain and difficulty in separating between the si...
Article
Group re-identification (G-ReID) aims to correctly associate groups with the same members captured by different cameras. However, supervised approaches for this task often suffer from the high cost of cross-camera sample labeling. Unsupervised methods based on clustering can avoid sample labeling, but the problem of member variations often makes cl...
Article
Full-text available
Group re-identification (GReID) aims to correctly associate images containing the same group members captured with non-overlapping camera networks, which has important applications in video surveillance. Unlike the person re-identification, the unique challenge of GReID lies in variations of group structure, including the number and layout of membe...
Article
Recent defect instance segmentation methods heavily rely on pixel-level annotated images. However, acquiring labeled defect data from modern manufacturing industries takes significant time and effort. In this paper, we propose a novel semi-supervised approach for defect instance segmentation via Teacher-Student model Collaboration (TSC) to address...
Article
Effectively and efficiently mining valuable clustering patterns is a challenging problem when handling large-scale data from diverse sources. Existing approaches adopt anchor graph learning or binary representation embedding to reduce computational complexity. Normally, anchor graph learning can not directly obtain the clustering assignment except...
Chapter
RGB-Infrared object detection in aerial images has gained significant attention due to its effectiveness in mitigating the challenges posed by illumination restrictions. Existing methods often focus heavily on enhancing the fusion of two modalities while ignoring the optimization imbalance caused by inherent differences between modalities. In this...
Chapter
Multi-modal tracking has increasingly gained attention due to its superior accuracy and robustness in complex scenarios. The primary challenges in this field lie in effectively extracting and fusing multi-modal data that inherently contain gaps. To address the above issues, we propose a novel regularized single-stream multi-modal tracking framework...
Preprint
Full-text available
Recent sparse detectors with multiple, e.g. six, decoder layers achieve promising performance but much inference time due to complex heads. Previous works have explored using dense priors as initialization and built one-decoder-layer detectors. Although they gain remarkable acceleration, their performance still lags behind their six-decoder-layer c...
Conference Paper
Spiking Neural Networks (SNNs) are the promising models of neuromorphic vision recognition. The mean square error (MSE) and cross-entropy (CE) losses are widely applied to supervise the training of SNNs on neuromorphic datasets. However, the relevance between the output spike counts and predictions is not well modeled by the existing loss functions...
Conference Paper
Full-text available
Raven's Progressive Matrices (RPM), one of the standard intelligence tests in human psychology, has recently emerged as a powerful tool for studying abstract visual reasoning (AVR) abilities in machines. Although existing computational models for RPM problems achieve good performance, they require a large number of labeled training examples for sup...
Article
Surface-defect detection aims to accurately locate and classify defect areas in images via pixel-level annotations. Different from the objects in traditional image segmentation, defect areas comprise a small group of pixels with random shapes, characterized by uncommon textures and edges that are inconsistent with the normal surface patterns of ind...
Preprint
Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging that aims to acquire high-quality CT images based on sparsely-sampled measurements. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based mapping between sinograms and CT images. However, these methods have not considered...
Article
Modern deep neural networks have made numerous breakthroughs in real-world applications, yet they remain vulnerable to some imperceptible adversarial perturbations. These tailored perturbations can severely disrupt the inference of current deep learning-based methods and may induce potential security hazards to artificial intelligence applications....
Article
We are concerned with retrieving a query person from multiple videos captured by a non-overlapping camera network. Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network. To address this issue, we propose a pedestrian retrieval framework based on cross-camera t...
Article
We focus on addressing the problem of shadow removal for an image, and attempt to make a weakly supervised learning model that does not depend on the pixelwise-paired training samples, but only uses the samples with image-level labels that indicate whether an image contains shadow or not. To this end, we propose a deep reciprocal learning model tha...
Article
Pose Guided Person Image Generation (PGPIG) is the task of transforming a person's image from the source pose to a target pose. Existing PGPIG methods often tend to learn an end-to-end transformation between the source image and the target image, but do not seriously consider two issues: 1) the PGPIG is an ill-posed problem, and 2) the texture mapp...
Preprint
Since adversarial examples appeared and showed the catastrophic degradation they brought to DNN, many adversarial defense methods have been devised, among which adversarial training is considered the most effective. However, a recent work showed the inequality phenomena in $l_{\infty}$-adversarial training and revealed that the $l_{\infty}$-adversa...
Article
Attribute-based person search aims to find the target person from the gallery images based on the given query text. It often plays an important role in surveillance systems when visual information is not reliable, such as identifying a criminal from a few witnesses. Although recent works have made great progress, most of them neglect the attribute...
Preprint
Anomaly detectors are widely used in industrial production to detect and localize unknown defects in query images. These detectors are trained on nominal images and have shown success in distinguishing anomalies from most normal samples. However, hard-nominal examples are scattered and far apart from most normalities, they are often mistaken for an...
Preprint
Medical image arbitrary-scale super-resolution (MIASSR) has recently gained widespread attention, aiming to super sample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their appl...
Preprint
Deep learning techniques have achieved superior performance in computer-aided medical image analysis, yet they are still vulnerable to imperceptible adversarial attacks, resulting in potential misdiagnosis in clinical practice. Oppositely, recent years have also witnessed remarkable progress in defense against these tailored adversarial examples in...
Article
DeepFake face swapping presents a significant threat to online security and social media, which can replace the source face in an arbitrary photo/video with the target face of an entirely different person. In order to prevent this fraud, some researchers have begun to study the adversarial methods against DeepFake or face manipulation. However, exi...
Article
Personalized Fixation-based Object Segmentation (PFOS) aims at segmenting the gazed objects in images conditioned on personalized fixations. However, the performances of existing PFOS methods are degraded when facing anomalous fixation maps (some fixations fall in the background) or enormous objects because of their poor localization ability. In th...
Preprint
Although current deep learning techniques have yielded superior performance on various computer vision tasks, yet they are still vulnerable to adversarial examples. Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. These methods usually regularize the difference between...
Preprint
Existing Deep-Learning-based (DL-based) Unsupervised Salient Object Detection (USOD) methods learn saliency information in images based on the prior knowledge of traditional saliency methods and pretrained deep networks. However, these methods employ a simple learning strategy to train deep networks and therefore cannot properly incorporate the "hi...
Article
Group re-identification (G-ReID) focuses on associating the group images containing the same persons under different cameras. The key challenge of G-ReID is that all the cases of the intra-group member and layout variations are hard to exhaust. To this end, we propose a novel uncertainty modeling, which treats each image as a distribution depending...
Preprint
Spiking neural networks are efficient computation models for low-power environments. Spike-based BP algorithms and ANN-to-SNN (ANN2SNN) conversions are successful techniques for SNN training. Nevertheless, the spike-base BP training is slow and requires large memory costs. Though ANN2NN provides a low-cost way to train SNNs, it requires many infere...
Article
Spiking neural networks (SNNs) is a promising learning model due to its computational efficiency for discrete spike events. However, because of the binary output of spiking neurons, the standard backpropagation (BP) method is not suitable for deep SNN training. In this paper, we design a gradient-based spiking neuron named Relaxation Leaky Integrat...
Preprint
Weakly Supervised Semantic Segmentation (WSSS) based on image-level labels has attracted much attention due to low annotation costs. Existing methods often rely on Class Activation Mapping (CAM) that measures the correlation between image pixels and classifier weight. However, the classifier focuses only on the discriminative regions while ignoring...
Preprint
Full-text available
Pose Guided Person Image Generation (PGPIG) is the task of transforming a person image from the source pose to a given target pose. Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), whic...
Preprint
In recent years, deep network-based methods have continuously refreshed state-of-the-art performance on Salient Object Detection (SOD) task. However, the performance discrepancy caused by different implementation details may conceal the real progress in this task. Making an impartial comparison is required for future researches. To meet this need,...
Article
This paper focuses on the Unsupervised Salient Object Detection (USOD) issue. We come up with a two-stage Activation-to-Saliency (A2S) framework that effectively excavates saliency cues to train a robust saliency detector. It is worth noting that our method does not require any manual annotation in the whole process. In the first stage, we transfor...
Preprint
Unsupervised Salient Object Detection (USOD) is of paramount significance for both industrial applications and downstream tasks. Existing deep-learning (DL) based USOD methods utilize some low-quality saliency predictions extracted by several traditional SOD methods as saliency cues, which mainly capture some conspicuous regions in images. Furtherm...
Article
Pose Guided Person Image Generation (PGPIG) is a popular task in deepfake, which aims at generating a person image with the given pose based on the source image. However, existing methods cannot comprehensively model the correlation between the source and the target domain. Most of them only focus on the correlation of the keypoints but ignore deta...
Article
Learning discriminative and rich features is an important research task for person re-identification. Previous studies have attempted to capture global and local features at the same time and layer of the model in a non-interactive manner, which are called synchronous learning. However, synchronous learning leads to high similarity, and further def...
Chapter
Recently, Salient Object Detection (SOD) has been witnessed remarkable advancements owing to the introduction of Convolution Neural Networks (CNNs). However, when faced with complex situations, such as salient objects of different sizes appearing in the same image, the detection results are far away from satisfaction. In this paper, we propose a sc...
Article
Minimizing the computation complexity is essential for the popularization of deep networks in practical applications. Nowadays, most researches attempt to accelerate deep networks by designing new network structure or compressing the network parameters. Meanwhile, transfer learning techniques such as knowledge distillation are utilized to keep the...
Article
RGB-Infrared (RGB-IR) cross-modality person re-identification (re-ID) is attracting more and more attention due to requirements for 24-h scene surveillance. However, the high cost of labeling person identities of an RGB-IR dataset largely limits the scalability of supervised models in real-world scenarios. In this paper, we study the unsupervised R...
Article
Face illumination perception and processing is a significantly difficult issue especially due to asymmetric shadings, local highlights, and local shadows. This study focuses on the face illumination transfer problem, which is to transfer the illumination style from a reference face image to a target face image while preserving other attributes. Suc...
Article
We present a learning model that makes full use of boundary information for salient object segmentation. Specifically, we come up with a novel loss function, i.e., Contour Loss, which leverages object contours to guide models to perceive salient object boundaries. Such a boundary-aware network can learn boundary-wise distinctions between salient ob...
Chapter
Current leading algorithms of the image-based virtual try-on systems mainly model the deformation of clothes as a whole. However, the deformation of different clothes parts can change drastically. Thus the existing algorithms fail to transfer the clothes to the proper shape in cases, such as self-occlusion, complex pose, and sophisticated textures....
Article
In this paper, we address a problem of view-disturbing raindrop removal on a single image. In existing methods to tackle this problem, machine learning based ones seem promising but require elaborate pairwise images, i.e., the raindrop-degraded image and the corresponding clean image of the same scene, for training. To overcome this drawback, we pr...
Article
In many public security applications such as anomaly detection, it is important to re-identify a group of pedestrians by other surveillance cameras, which ascribes to the group retrieval problem. Most previous studies focus on single-person re-identification (re-id) and ignore the correlations among group members, and they lack a large and comprehe...
Article
Accurate optical flow estimation with the frequency-domain regularization is a challenging problem in computer vision. In this paper, we solve this issue by introducing a novel optical flow method related to the frequency domain that uses TV-wavelet regularization. Specifically, we regard TV-wavelet regularization as a filtering process. After wave...
Chapter
Person re-identification (ReID) is a challenging task in computer vision area due to the dramatic changes across different non-overlapping camera views, e.g., lighting, view angle, and pose, among which occlusion is one of the hardest challenges. Recently, occluded person re-identification (Occluded-ReID) is proposed to address this problem. Nevert...
Chapter
The group refers to several pedestrians gathering together with a high motion collectiveness for a sustained period of time. The existing person re-identification (re-id) approaches focus on extracting individual appearance cues, but ignores the correlations of different persons in a group. In this paper, we propose a group-guided re-id method name...
Chapter
Low-resolution person re-identification (LR-REID) refers to matching cross-view pedestrians from varying resolutions, which is common and challenging for realistic surveillance systems. The LR-REID is largely under-study and the performance of the existing methods drops dramatically in LR domain. In this paper, we propose a novel adaptive dual-bran...
Chapter
In many artificial intelligence applications such as security field, it is important to identify if a specific group of pedestrians has been observed over a network of other surveillance cameras, which ascribes to the pedestrian group retrieval problem. To address this issue, this paper contributes a novel dataset for the pedestrian group retrieval...
Chapter
Person Re-identification (re-id) needs to tackle with the problem of changing resolutions because the pedestrians from surveillance systems or public datasets have low-resolution problem (LR-REID) including low quality, blurry textures and so on, which results in a difficult challenge to extract the identity information under various resolutions. H...
Chapter
Narrowing the modal gap in person re-identification between visible domain and near infrared domain (VIS-NIR Re-ID) is a challenging problem. In this paper, we propose the deep heterogeneous metric learning (DHML) for VIS-NIR Re-ID. Our method explicitly learns a specific projection transformation for each modality. Furthermore, we design a heterog...
Preprint
We present a learning model that makes full use of boundary information for salient object segmentation. Specifically, we come up with a novel loss function, i.e., Contour Loss, which leverages object contours to guide models to perceive salient object boundaries. Such a boundary-aware network can learn boundary-wise distinctions between salient ob...
Article
Traditional person re-identification (re-id) methods perform poorly under changing illuminations. This situation can be addressed by using dual-cameras that capture visible images in a bright environment and infrared images in a dark environment. Yet, this scheme needs to solve the visible-infrared matching issue, which is largely under-studied. Ma...
Article
Most of current person re-identification (ReID) methods neglect a spatial-temporal constraint. Given a query image, conventional methods compute the feature distances between the query image and all the gallery images and return a similarity ranked table. When the gallery database is very large in practice, these approaches fail to obtain a good pe...
Article
Many recent variational optical flow methods are not robust for illumination variance, and they only consider local image relation in terms of illumination. In this paper, we propose a new efficient illumination-invariance total variation optical flow method called the weighted regularization transform (WRT), which uses and optimizes the Weber’s La...
Article
Full-text available
Intrinsic image decomposition from a single image or image sequences is always a challenging task in computer vision and image processing due to the ill-posed characteristics. In order to attain a reasonable estimation of intrinsic images, in this paper we present a low-rank sparse model (LRSM) to derive intrinsic images from an image sequence of t...
Article
Efficient optical flow estimation with high accuracy is a challenging problem in computer vision. In this paper, we present a simple but efficient segmentation-based PatchMatch framework to address this issue. Specifically, it firstly generates sparse seeds without losing important motion information by oversegmentation, and then yields sparse matc...
Chapter
Currently, most methods of object detection are monocular-based. However, due to the sensitivity to color, these methods can not handle many hard samples. With the depth information, disparity maps are helpful to get over this problem. In this paper, we propose the asymmetric two-stream networks for RGB-Disparity based object detection. Our method...
Chapter
Human body detection is a key technology in the fields of biometric recognition, and the detection in a depth image is rather challenging due to serious noise effects and lack of texture information. For addressing this issue, we propose the feature visualization based stacked convolutional neural network (FV-SCNN), which can be trained by a two-la...
Chapter
The ability to extract the discriminative features remains a fundamental task of object detection, especially for small objects. Many mainstream object detection models, use the feature pyramids structure, a kind of fusion approaches, to predict objects of different scales. This traditional fusion strategy aims to merge different feature maps by li...
Chapter
The state-of-the-art performance for object detection has been significantly improved over the past two years. Despite the effectiveness on still images, something stands in the way of transferring the powerful detection networks to videos object detection. In this work, we present a fast and accurate framework for video object detection that incor...