About
221
Publications
31,424
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,849
Citations
Introduction
Zheng Yuhui currently works at the School of computer and software, NUIST, Nanjing University of Information Science & Technology. Zheng does research in Artificial Intelligence, Computing in Mathematics, Natural Science, Engineering and Medicine and Information Science.
Current institution
Additional affiliations
July 2018 - present
April 2013 - June 2018
December 2014 - December 2015
Publications
Publications (221)
The recent widespread utilization of facial expression recognition (FER) has garnered significant attention in the affective computing field. To address the issue of dominant features being suppressed during feature fusion in FER, this study proposes a self‐learning weight network based on label distribution training (SLW‐LDT). First, based on the...
Image-sentence matching that aims to understand the correspondence between vision and language, has achieved significant progress with various deep methods trained under large-scale supervision. Different from natural images taken by camera, diagrams in the textbooks contain more graphic objects, drawings, and natural objects, and the diagram-sente...
Vision Transformer (ViT), known for capturing non-local features, is an effective tool for hyperspectral image classification (HSIC). However, ViT’s multi-head self-attention (MHSA) mechanism often struggles to balance local details and long-range relationships for complex high-dimensional data, leading to a loss in spectral-spatial information rep...
Occluded person re-identification (Re-ID) is a challenging problem due to the absence of notable discriminative features resulting from incomplete body part images and interference from occluded regions. Recently, some transformer-based methods have demonstrated excellent capabilities in resolving this problem, however these methods are not able to...
The main purpose of image restoration is to recover high‐quality image content from degraded versions. However, current mainstream models tend to focus solely on spatial details or contextual semantics, resulting in poor repair effects. To address this issue, a multi‐task image repair network based on spatial aggregation attention and multi‐feature...
Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of...
Underwater imagery frequently exhibits a multitude of degradation phenomena, including chromatic aberrations, optical blurring, and diminished contrast, thereby exacerbating the complexity of underwater endeavors. Among the existing underwater image enhancement (UIE) methods, cycle-consistent generative adversarial network (CycleGAN)-based methods...
Hashing utilizes hash code as a compact image representation, offering excellent performance in large-scale image retrieval due to its computational and storage advantages. However, the prevalence of degraded images on social media platforms, resulting from imperfections in the image capture process, poses new challenges for conventional image retr...
As video-based social networks continue to grow exponentially, there is a rising interest in video retrieval using natural language. Cross-modal hashing, which learns compact hash code for encoding multi-modal data, has proven to be widely effective in large-scale cross-modal retrieval, e.g., image-text retrieval, primarily due to its computation a...
Few-shot semantic segmentation (FSS) aims to generate a model for segmenting novel classes using a limited number of annotated samples. Previous FSS methods have shown sensitivity to background noise due to inherent bias, attention bias, and spatial-aware bias. In this study, we propose a Transformer-Based Adaptive Prototype Matching Network to est...
Transformers have shown remarkable success in modeling sequential data and capturing intricate patterns over long distances. Their self-attention mechanism allows for efficient parallel processing and scalability, making them well-suited for the high-dimensional data in hyperspectral and LiDAR imagery. However, further research is needed on how to...
Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g.“, ocean” and “cloud”, often co-occur. However, existing cross-modal hashing methods overlook labe...
Conventional image set methods typically learn from small to medium-sized image set datasets. However, when applied to large-scale image set applications such as classification and retrieval, they face two primary challenges: 1) effectively modeling complex image sets, and 2) efficiently performing tasks. To address the above issues, we propose a n...
Conventional image set methods typically learn from image sets stored in one location. However, in real-world applications, image sets are often distributed or collected across different positions. Learning from such distributed image sets presents a challenge that has not been studied thus far. Moreover, efficiency is seldom addressed in large-sca...
The performance of existing unsupervised video object segmentation methods typically suffers from severe performance degradation on test videos when tested in out-of-distribution scenarios. The primary reason is that the test data in real- world may not follow the independent and identically distribution (i.i.d.) assumption, leading to domain shift...
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples in support images. Although progress has been made recently by combining prototype-based metric learning, existing methods still face two main challenges. First, various intra-class objects between the support and query images or s...
Image dehazing is an emblematical low-level vision task that aims at restoring haze-free images from haze images. Recently, some methods adopts deep learning techniques to rebuild haze-free images. However, in real-world scenarios, complex degradation of captured images and non-uniform spatial distributions of haze will significantly weaken the gen...
The effective combination of hyperspectral image (HSI) and light detection and ranging (LiDAR) data can be used for land cover classification. Recently, deep-learning-based classification methods, especially those using transformer networks, have achieved remarkable success. However, deep learning classification methods for multisource data still e...
As a tool for near-earth remote sensing, unmanned aerial vehicle (UAV) can be used to acquire images and data of the earth's surface. This provides a powerful support for Earth observation and resource management. Object tracking in UAV videos has been a topic of much interest in recent years. A large number of algorithms have been proposed. Among...
In recent years, convolutional neural networks (CNNs) have achieved remarkable success in hyperspectral image (HSI) classification tasks, primarily due to their outstanding spatial feature extraction capabilities. However, CNNs struggle to capture the diagnostic spectral information inherent in HSI. In contrast, vision transformers (ViTs) exhibit f...
Recently, large-scale synthetic datasets have effectively alleviated the issue of insufficient person re-identification (Re-ID) datasets. However, synthetic datasets grapple with inherent challenges, including the subpar quality of synthetic pedestrians and single data collection. This paper presents InfinitePerson, a costless pipeline that fully u...
Effectively evaluating the perceptual quality of dehazed images remains an under-explored research issue. In this paper, we propose a no-reference complex-valued convolutional neural network (CV-CNN) model to conduct automatic dehazed image quality evaluation. Specifically, a novel CV-CNN is employed that exploits the advantages of complex-valued r...
Image-to-image translation (I2IT) is an important visual task that aims to learn a mapping of images from one domain to another while preserving the representation of the content. The phenomenon known as mode collapse makes this task challenging. Most existing methods usually learn the relationship between the data and latent distributions to train...
In the hyperspectral image (HSI) classification task, every HSI pixel is labeled as a specific land cover category. Although convolutional neural network (CNN)-based HSI classification methods have made significant progress in enhancing classification performance in recent years, they still have limitations in acquiring deep semantic features and f...
Video hashing learns compact representation by mapping video into low-dimensional Hamming space and has achieved promising performance in large-scale video retrieval. It is challenging to effectively exploit temporal and spatial structure in an unsupervised setting. To fulfill this gap, this paper proposes Contrastive Transformer Hashing (CTH) for...
Adverse weather conditions such as haze and snowfall can degrade the quality of captured images and affect performance of drone detection. Therefore, it is challenging to locate and identify targets in adverse weather scenarios. In this paper, a novel model called Object Detection in a Foggy Condition with YOLO (ODFC-YOLO) is proposed, which perfor...
Object tracking is aimed at locating a specific object in the image sequence, such as pedestrians, vehicles, and so on. The existing algorithms based on siamese neural network predict the target through similarity matching. Although these algorithms have achieved satisfactory performance, in the process of similarity calculation between template im...
In recent years, the development of instance segmentation has garnered significant attention in a wide range of applications. However, the training of a fully-supervised instance segmentation model requires costly both instance-level and pixel-level annotations. In contrast, weakly-supervised instance segmentation methods (i.e., with image-level cl...
The existing deep-learning-based image inpainting algorithms often suffer from local structure disconnections and blurring when dealing with large irregular defective images. To solve these problems, an image structure-induced semantic pyramid network for inpainting is proposed. The model consists of two parts: the edge inpainting network and the c...
Due to the increasing maturity of deep learning and remote sensing technology, the performance of object detection in satellite images has significantly improved and plays an important role in military reconnaissance, urban planning, and agricultural monitoring. However, satellite images have challenges such as small objects, multiscale objects, an...
Single image super-resolution (SISR) aims to recover a high-resolution image from a single low-resolution image. In recent years, SISR methods based on deep convolutional neural networks have achieved remarkable success, and some methods further improve the performance of the SISR model by introducing nonlocal attention into the model. However, mos...
Existing deep learning-based interactive image segmentation methods have significantly reduced the user's interaction burden with simple click interactions. However, they still require excessive numbers of clicks to continuously correct the segmentation for satisfactory results. This article explores how to harvest accurate segmentation of interest...
Cross-modality visible-infrared person re-identification (VI-ReID) aims to recognize images with the same identity between visible modality and infrared modality, which is a very challenging task because it not only includes the troubles of variations between cross-cameras in traditional person ReID, but also suffers from the huge differences betwe...
Due to their excellent performance on aggregating global features, Transformer structures are being widely employed in deep learning-based visual object tracking algorithms, recently. Nevertheless, existing Transformer-based trackers still fail to handle occlusion problems due to drift in feature distributions. To address this issue, we introduce d...
In hyperspectral images (HSIs), mixed noise (
e.g
., Gaussian noise, impulse noise, stripe noise, and deadlines) contamination is a common phenomenon that greatly reduces the visual quality of the image. In recent years, methods combining global and non-local low-rankness have been widely used in the field of HSI denoising. However, most methods a...
The purpose of object tracking is to locate a given target in image sequence, such as people and vehicles. In recent years, with the development of UAV technology, object tracking in UAV videos has engaged many scholars. It has been widely used in traffic control, water quality inspection, wildlife census and other fields. However, low resolution,...
Yu Fang Qiaolin Ye Le Sun- [...]
Zebin Wu
Hyperspectral image (HSI) classification is currently a hot topic in the field of remote sensing. The goal is to utilize the spectral and spatial information from HSI to accurately identify land covers. Convolution neural network (CNN) is a powerful approach for HSI classification. However, CNN has limited ability to capture non-local information t...
Despite the remarkable progress made by the salient object detection of natural sensing images (NSI-SOD), the complex background and scale diversity issues of remote sensing images (RSIs) still pose a substantial obstacle. In this study, we build an end-to-end channel-enhanced remodeling-based network (CRNet) for optical RSIs (ORSIs) to highlight s...
Person re-identification (re-ID) aims to match the same person across different cameras. However, most existing re-ID methods assume that people wear the same clothes in different views, which limit their performance in identifying target pedestrians who change clothes. Cloth-changing re-ID is a quite challenging problem as clothes occupying a larg...
Recently, memory-based methods have exhibited remarkable performance in Video Object Segmentation (VOS) by employing non-local pixel-wise matching between the query and memory. Nevertheless, these methods suffer from two limitations: 1) Non-local pixel-wise matching can result in the incorrect segmentation of background distractor objects, and 2) m...
Source-free unsupervised domain adaptation (SFUDA) aims to conduct prediction on the target domain by leveraging knowledge from the well-trained source model. Due to the absence of source data in the SFUDA setting, the existing methods mainly build the target classifier by fine-tuning the source model incorporated with empirical adaptation losses....
Low-rank tensor representation philosophy has enjoyed a reputation in many hyperspectral image (HSI) low-level vision applications, but previous studies often failed to comprehensively exploit the low-rank nature of HSI along different modes in low-dimensional subspace, and unsurprisingly handled only one specific task. To address these challenges,...
The goal of unsupervised person re-identification (Re-ID) is to use unlabeled person images to learn discriminative features. In recent years, many approaches have adopted clustered pseudo labels to construct proxies for contrastive learning, and have thereby achieved great success. However, existing methods of this kind only utilize local structur...
Multi-view hashing (MvH) learns compact hash code by efficiently integrating multi-view data, and has achieved promising performance in large-scale retrieval task. In real-world applications, multi-view data is often stored or collected in different locations, and learning hash code in such case is more challenging yet less studied. In addition, un...
Network embedding has shown promising performance in real-world applications. The network embedding typically lies in a continuous vector space, where storage and computation costs are high, especially in large-scale applications. This paper proposes more compact representation to fulfill the gap. The proposed discrete network embedding (DNE) lever...
The performance of person re-identification (re-ID) is easily affected by illumination variations caused by different shooting times, places and cameras. Existing illumination-adaptive methods usually require annotating cross-camera pedestrians on each illumination scale, which is unaffordable for a long-term person retrieval system. The cross-illu...
Many previous occluded person re-identification(re-ID) methods try to use additional clues (pose estimation or semantic parsing models) to focus on non-occluded regions. However, these methods extremely rely on the performance of additional clues and often capture pedestrian features by designing complex modules. In this work, we propose a simple F...
The correlation filter method is effective in visual tracking tasks, whereas it suffers from the boundary effect and filter degradation in complex situations, which can result in suboptimal performance. Aiming at the solving above problem, this study proposes an object tracking method with a discriminant correlation filter, which combines an adapti...
As a novel method of earth observation, video satellites can observe dynamic changes in ground targets in real time. To make use of satellite videos, target tracking in satellite videos has received extensive interest. However, this also faces a variety of new challenges such as global occlusion, low resolution, and insufficient information compare...
Recently, most dehazed image quality assessment
(DQA) methods have focused on estimating remaining haze and
omitting distortion impact from the side effect of dehazing
algorithms, which leads to their limited performance. Addressing
this problem, we propose a method for learning both visibility and
distortion-aware features no-reference (NR) dehaze...
Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description. The significant feature gap between two modalities makes this task very challenging. Many existing methods attempt to utilize local alignment to address this problem in the fine-grained level....
Cross-modality Visible-Infrared Person Re-identification (VI-ReID) aims to recognize images with the same identity between visible modality and infrared modality, which is a very challenging task. Because it not only includes the trouble of variations between cross-cameras in traditional person re-identification, but also suffers from the huge diff...
With the rapid development of deep learning techniques, new breakthroughs have been made in deep learning-based object tracking methods. Although many approaches have achieved state-of-the-art results, existing methods still cannot fully satisfy practical needs. A robust tracker should perform well in three aspects: tracking accuracy, speed, and re...
The impressive progress on image segmentation has been witnessed recently. In this paper, an improved model introducing frequency-tuned salient region detection into Gaussian mixture model (GMM) is proposed, which is named FTGMM. Frequency-tuned salient region detection is added to achieve the saliency map of the original image, which is combined w...
Existing image inpainting methods based on deep learning have made great progress. These methods either generate contextually semantically consistent images or visually excellent images, ignoring that both semantic and visual effects should be appreciated. In this article, we propose a Semantic Residual Pyramid Network (SRPNet) based on a deep gene...
The purpose of person re-identification (ReID) is to find the same person under different cameras and the basic difficulty lies in the need for large amounts of cross-camera pedestrian annotations. In reality, annotating cross-camera pedestrians is time-consuming especially in large-scale surveillance camera networks. This paper focuses on addressi...
In hyperspectral image (HSI) classification, each pixel sample is assigned to a land-cover category. In the recent past, convolutional neural networks (CNNs) based HSI classification methods have greatly improved performance, owing to their superior ability to represent features. However, these methods have limited ability to obtain deep semantic f...
In the convolutional neural network, the precise segmentation of small-scale objects and object boundaries in remote sensing images is a great challenge. As the model gets deeper, low-level features with geometric information and high-level features with semantic information cannot be obtained simultaneously. To alleviate this problem, a successive...
Recognizing the unseen combinations of action and different objects, namely (zero-shot) compositional action recognition, is extremely challenging for conventional action recognition algorithms in real-world applications. Previous methods focus on enhancing the dynamic clues of objects that appear in the scene by building region features or trackle...
Hyperspectral compressive imaging has taken advantage of compressive sensing theory to capture spectral information of the dynamic world in recent decades of years, where an optical encoder is employed to compress high dimensional signals into a single 2-D measurement. The core issue is how to reconstruct the underlying hyperspectral image (HSI), a...
Fusing a low spatial resolution hyperspectral image (LR-HSI) with a high spatial but low spectral resolution multispectral image (HR-MSI) has been regarded as an effective approach to obtain high resolution HSI (HR-HSI). While matrix factorization based approaches obtained promising performance for HSI-MSI fusion, the mixed noise introduced into th...
Single image dehazing has great significance in computer vision. In this paper, we propose a novel unsupervised Dark Channel Attention optimized CycleGAN (DCA-CycleGAN) to deal with the challenging scene with uneven and dense haze concentration. Firstly, the DCA-CycleGAN adopts the dark channel as input and then generate attention through a DCA sub...
Stereoscopic imaging is widely used in many fields. To guarantee the best quality of experience, it’s necessary to design a robust and accurate quality assessment model for stereoscopic content. In this paper, we proposed a no-reference stereoscopic image quality assessment (NR-SIQA) model using both complex contourlet and spatial domain features o...
Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. This problem, named Cross-Resolution Pers...
Deep supervised hashing has greatly improved retrieval performance with the powerful learning capability of deep neural network. In multi-label image retrieval, existing deep hashing simply indicates whether two images are similar by constructing a similarity matrix. However, it ignores the dependency among multiple labels that has been shown impor...
Pansharpening aims to fuse the abundant spectral information of multispectral (MS) images and the spatial details of panchromatic (PAN) images, yielding a high-spatial-resolution MS (HRMS) image. Traditional methods only focus on the linear model, ignoring the fact that degradation process is a nonlinear inverse problem. Due to convolutional neural...
In this paper, we propose a new pansharpening architecture called Sub-Pixel Convolutional Residual Network to obtain high-resolution multispectral (MS) images. Different from previous works, we extract features from MS images in a low-resolution space and pays more attention to the balance of spectral and spatial information. Our architecture consi...
In recent years, generative adversarial networks (GANs) have been widely used to generate realistic fake face images, which can easily deceive human beings. To detect these images, some methods have been proposed. However, their detection performance will be degraded greatly when the testing samples are post-processed. In this paper, some experimen...
Multiview learning (MVL), which enhances the learners' performance by coordinating complementarity and consistency among different views, has attracted much attention. The multiview generalized eigenvalue proximal support vector machine (MvGSVM) is a recently proposed effective binary classification method, which introduces the concept of MVL into...
Joint photographic experts group (JPEG) compression is widely used in image processing and computer vision. Detecting double compressed JPEG images is a common problem in forensics and detecting compressed images with the same quantization matrix remains a challenging task. However, most existing methods were designed for detection in grayscale ima...
Correlation filter (CF) has drawn extensive interest in aerial object tracking due to its remarkable performance. Recently, the popular CF methods based on temporal-spatial regularization have been proved to be able to effectively improve the tracking results. However, the boundary effect and filter template degradation still influence the speed an...
Fusion from a spatially low resolution hyperspectral image (LR-HSI) and a spectrally low resolution multispectral image (HR-MSI) to produce a high spatial-spectral HSI (HR-HSI) has risen to a preferred topic for reinforceing the spatial-spectral resolution of HSI in recent years, that is additionally known as Hyperspectral super-resolution. In this...
Extracting effective and discriminative features is highly important for addressing the challenges of person re-identification (re-ID). At present, deep convolutional neural networks typically use high-level features to identify pedestrians. However, some essential spatial information contained in low-level features will be lost during the learning...
Image super resolution is an important field of computer research. The current mainstream image super-resolution technology is to use deep learning to mine the deeper features of the image, and then use it for image restoration. However, most of these models mentioned above only trained the images in a specific scale and do not consider the relatio...
Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. We name this problem as Cross-Resolution...
Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description. The significant feature gap between two modalities makes this task very challenging. Many existing methods attempt to utilize local alignment to address this problem in the fine-grained level....
To establish robust semantic correspondence between images covering different objects belonging to the same category, there are three important types of information including inter-image relationship, intra-image relationship and cycle consistency. Most existing methods only exploit one or two types of the above information and cannot make them enh...
Currently, person re-identification (re-ID) has been applied in many public security applications. Yet owing to the big visual appearance changes of the same identity under different views, re-ID still faces many challenges. To reduce the intra-person discrepancy, extracting more power feature representations from pedestrian images is a reasonable...