Yuhui Zheng

Yuhui Zheng
Verified
Yuhui verified their affiliation via an institutional email.
Verified
Yuhui verified their affiliation via an institutional email.
  • Ph.D in Pattern Recognition and Intelligent System
  • Professor (Full) at Nanjing University of Information Science and Technology

About

221
Publications
31,424
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,849
Citations
Introduction
Zheng Yuhui currently works at the School of computer and software, NUIST, Nanjing University of Information Science & Technology. Zheng does research in Artificial Intelligence, Computing in Mathematics, Natural Science, Engineering and Medicine and Information Science.
Current institution
Nanjing University of Information Science and Technology
Current position
  • Professor (Full)
Additional affiliations
July 2018 - present
Nanjing University of Information Science and Technology
Position
  • Professor (Full)
April 2013 - June 2018
Nanjing University of Information Science and Technology
Position
  • Professor (Associate)
December 2014 - December 2015
Sungkyunkwan University
Position
  • Professor
Education
October 2008 - December 2008
Ecole Des Mines D’Ales
Field of study
  • Image Processing
September 2004 - June 2009
Nanjing University of Science and Technology
Field of study
  • Artifical Intelligent and Pattern Recognition

Publications

Publications (221)
Article
Full-text available
The recent widespread utilization of facial expression recognition (FER) has garnered significant attention in the affective computing field. To address the issue of dominant features being suppressed during feature fusion in FER, this study proposes a self‐learning weight network based on label distribution training (SLW‐LDT). First, based on the...
Article
Image-sentence matching that aims to understand the correspondence between vision and language, has achieved significant progress with various deep methods trained under large-scale supervision. Different from natural images taken by camera, diagrams in the textbooks contain more graphic objects, drawings, and natural objects, and the diagram-sente...
Article
Vision Transformer (ViT), known for capturing non-local features, is an effective tool for hyperspectral image classification (HSIC). However, ViT’s multi-head self-attention (MHSA) mechanism often struggles to balance local details and long-range relationships for complex high-dimensional data, leading to a loss in spectral-spatial information rep...
Article
Occluded person re-identification (Re-ID) is a challenging problem due to the absence of notable discriminative features resulting from incomplete body part images and interference from occluded regions. Recently, some transformer-based methods have demonstrated excellent capabilities in resolving this problem, however these methods are not able to...
Article
Full-text available
The main purpose of image restoration is to recover high‐quality image content from degraded versions. However, current mainstream models tend to focus solely on spatial details or contextual semantics, resulting in poor repair effects. To address this issue, a multi‐task image repair network based on spatial aggregation attention and multi‐feature...
Preprint
Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of...
Article
Underwater imagery frequently exhibits a multitude of degradation phenomena, including chromatic aberrations, optical blurring, and diminished contrast, thereby exacerbating the complexity of underwater endeavors. Among the existing underwater image enhancement (UIE) methods, cycle-consistent generative adversarial network (CycleGAN)-based methods...
Conference Paper
Hashing utilizes hash code as a compact image representation, offering excellent performance in large-scale image retrieval due to its computational and storage advantages. However, the prevalence of degraded images on social media platforms, resulting from imperfections in the image capture process, poses new challenges for conventional image retr...
Conference Paper
As video-based social networks continue to grow exponentially, there is a rising interest in video retrieval using natural language. Cross-modal hashing, which learns compact hash code for encoding multi-modal data, has proven to be widely effective in large-scale cross-modal retrieval, e.g., image-text retrieval, primarily due to its computation a...
Conference Paper
Few-shot semantic segmentation (FSS) aims to generate a model for segmenting novel classes using a limited number of annotated samples. Previous FSS methods have shown sensitivity to background noise due to inherent bias, attention bias, and spatial-aware bias. In this study, we propose a Transformer-Based Adaptive Prototype Matching Network to est...
Article
Full-text available
Transformers have shown remarkable success in modeling sequential data and capturing intricate patterns over long distances. Their self-attention mechanism allows for efficient parallel processing and scalability, making them well-suited for the high-dimensional data in hyperspectral and LiDAR imagery. However, further research is needed on how to...
Article
Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g.“, ocean” and “cloud”, often co-occur. However, existing cross-modal hashing methods overlook labe...
Article
Conventional image set methods typically learn from small to medium-sized image set datasets. However, when applied to large-scale image set applications such as classification and retrieval, they face two primary challenges: 1) effectively modeling complex image sets, and 2) efficiently performing tasks. To address the above issues, we propose a n...
Article
Conventional image set methods typically learn from image sets stored in one location. However, in real-world applications, image sets are often distributed or collected across different positions. Learning from such distributed image sets presents a challenge that has not been studied thus far. Moreover, efficiency is seldom addressed in large-sca...
Article
The performance of existing unsupervised video object segmentation methods typically suffers from severe performance degradation on test videos when tested in out-of-distribution scenarios. The primary reason is that the test data in real- world may not follow the independent and identically distribution (i.i.d.) assumption, leading to domain shift...
Article
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples in support images. Although progress has been made recently by combining prototype-based metric learning, existing methods still face two main challenges. First, various intra-class objects between the support and query images or s...
Article
Image dehazing is an emblematical low-level vision task that aims at restoring haze-free images from haze images. Recently, some methods adopts deep learning techniques to rebuild haze-free images. However, in real-world scenarios, complex degradation of captured images and non-uniform spatial distributions of haze will significantly weaken the gen...
Article
The effective combination of hyperspectral image (HSI) and light detection and ranging (LiDAR) data can be used for land cover classification. Recently, deep-learning-based classification methods, especially those using transformer networks, have achieved remarkable success. However, deep learning classification methods for multisource data still e...
Article
Full-text available
As a tool for near-earth remote sensing, unmanned aerial vehicle (UAV) can be used to acquire images and data of the earth's surface. This provides a powerful support for Earth observation and resource management. Object tracking in UAV videos has been a topic of much interest in recent years. A large number of algorithms have been proposed. Among...
Article
In recent years, convolutional neural networks (CNNs) have achieved remarkable success in hyperspectral image (HSI) classification tasks, primarily due to their outstanding spatial feature extraction capabilities. However, CNNs struggle to capture the diagnostic spectral information inherent in HSI. In contrast, vision transformers (ViTs) exhibit f...
Article
Recently, large-scale synthetic datasets have effectively alleviated the issue of insufficient person re-identification (Re-ID) datasets. However, synthetic datasets grapple with inherent challenges, including the subpar quality of synthetic pedestrians and single data collection. This paper presents InfinitePerson, a costless pipeline that fully u...
Article
Effectively evaluating the perceptual quality of dehazed images remains an under-explored research issue. In this paper, we propose a no-reference complex-valued convolutional neural network (CV-CNN) model to conduct automatic dehazed image quality evaluation. Specifically, a novel CV-CNN is employed that exploits the advantages of complex-valued r...
Article
Full-text available
Image-to-image translation (I2IT) is an important visual task that aims to learn a mapping of images from one domain to another while preserving the representation of the content. The phenomenon known as mode collapse makes this task challenging. Most existing methods usually learn the relationship between the data and latent distributions to train...
Article
Full-text available
In the hyperspectral image (HSI) classification task, every HSI pixel is labeled as a specific land cover category. Although convolutional neural network (CNN)-based HSI classification methods have made significant progress in enhancing classification performance in recent years, they still have limitations in acquiring deep semantic features and f...
Article
Video hashing learns compact representation by mapping video into low-dimensional Hamming space and has achieved promising performance in large-scale video retrieval. It is challenging to effectively exploit temporal and spatial structure in an unsupervised setting. To fulfill this gap, this paper proposes Contrastive Transformer Hashing (CTH) for...
Article
Full-text available
Adverse weather conditions such as haze and snowfall can degrade the quality of captured images and affect performance of drone detection. Therefore, it is challenging to locate and identify targets in adverse weather scenarios. In this paper, a novel model called Object Detection in a Foggy Condition with YOLO (ODFC-YOLO) is proposed, which perfor...
Article
Object tracking is aimed at locating a specific object in the image sequence, such as pedestrians, vehicles, and so on. The existing algorithms based on siamese neural network predict the target through similarity matching. Although these algorithms have achieved satisfactory performance, in the process of similarity calculation between template im...
Preprint
In recent years, the development of instance segmentation has garnered significant attention in a wide range of applications. However, the training of a fully-supervised instance segmentation model requires costly both instance-level and pixel-level annotations. In contrast, weakly-supervised instance segmentation methods (i.e., with image-level cl...
Article
Full-text available
The existing deep-learning-based image inpainting algorithms often suffer from local structure disconnections and blurring when dealing with large irregular defective images. To solve these problems, an image structure-induced semantic pyramid network for inpainting is proposed. The model consists of two parts: the edge inpainting network and the c...
Article
Full-text available
Due to the increasing maturity of deep learning and remote sensing technology, the performance of object detection in satellite images has significantly improved and plays an important role in military reconnaissance, urban planning, and agricultural monitoring. However, satellite images have challenges such as small objects, multiscale objects, an...
Article
Full-text available
Single image super-resolution (SISR) aims to recover a high-resolution image from a single low-resolution image. In recent years, SISR methods based on deep convolutional neural networks have achieved remarkable success, and some methods further improve the performance of the SISR model by introducing nonlocal attention into the model. However, mos...
Article
Existing deep learning-based interactive image segmentation methods have significantly reduced the user's interaction burden with simple click interactions. However, they still require excessive numbers of clicks to continuously correct the segmentation for satisfactory results. This article explores how to harvest accurate segmentation of interest...
Article
Full-text available
Cross-modality visible-infrared person re-identification (VI-ReID) aims to recognize images with the same identity between visible modality and infrared modality, which is a very challenging task because it not only includes the troubles of variations between cross-cameras in traditional person ReID, but also suffers from the huge differences betwe...
Article
Due to their excellent performance on aggregating global features, Transformer structures are being widely employed in deep learning-based visual object tracking algorithms, recently. Nevertheless, existing Transformer-based trackers still fail to handle occlusion problems due to drift in feature distributions. To address this issue, we introduce d...
Article
In hyperspectral images (HSIs), mixed noise ( e.g ., Gaussian noise, impulse noise, stripe noise, and deadlines) contamination is a common phenomenon that greatly reduces the visual quality of the image. In recent years, methods combining global and non-local low-rankness have been widely used in the field of HSI denoising. However, most methods a...
Article
Full-text available
The purpose of object tracking is to locate a given target in image sequence, such as people and vehicles. In recent years, with the development of UAV technology, object tracking in UAV videos has engaged many scholars. It has been widely used in traffic control, water quality inspection, wildlife census and other fields. However, low resolution,...
Article
Hyperspectral image (HSI) classification is currently a hot topic in the field of remote sensing. The goal is to utilize the spectral and spatial information from HSI to accurately identify land covers. Convolution neural network (CNN) is a powerful approach for HSI classification. However, CNN has limited ability to capture non-local information t...
Article
Despite the remarkable progress made by the salient object detection of natural sensing images (NSI-SOD), the complex background and scale diversity issues of remote sensing images (RSIs) still pose a substantial obstacle. In this study, we build an end-to-end channel-enhanced remodeling-based network (CRNet) for optical RSIs (ORSIs) to highlight s...
Article
Person re-identification (re-ID) aims to match the same person across different cameras. However, most existing re-ID methods assume that people wear the same clothes in different views, which limit their performance in identifying target pedestrians who change clothes. Cloth-changing re-ID is a quite challenging problem as clothes occupying a larg...
Article
Recently, memory-based methods have exhibited remarkable performance in Video Object Segmentation (VOS) by employing non-local pixel-wise matching between the query and memory. Nevertheless, these methods suffer from two limitations: 1) Non-local pixel-wise matching can result in the incorrect segmentation of background distractor objects, and 2) m...
Article
Source-free unsupervised domain adaptation (SFUDA) aims to conduct prediction on the target domain by leveraging knowledge from the well-trained source model. Due to the absence of source data in the SFUDA setting, the existing methods mainly build the target classifier by fine-tuning the source model incorporated with empirical adaptation losses....
Article
Low-rank tensor representation philosophy has enjoyed a reputation in many hyperspectral image (HSI) low-level vision applications, but previous studies often failed to comprehensively exploit the low-rank nature of HSI along different modes in low-dimensional subspace, and unsurprisingly handled only one specific task. To address these challenges,...
Article
The goal of unsupervised person re-identification (Re-ID) is to use unlabeled person images to learn discriminative features. In recent years, many approaches have adopted clustered pseudo labels to construct proxies for contrastive learning, and have thereby achieved great success. However, existing methods of this kind only utilize local structur...
Article
Multi-view hashing (MvH) learns compact hash code by efficiently integrating multi-view data, and has achieved promising performance in large-scale retrieval task. In real-world applications, multi-view data is often stored or collected in different locations, and learning hash code in such case is more challenging yet less studied. In addition, un...
Article
Network embedding has shown promising performance in real-world applications. The network embedding typically lies in a continuous vector space, where storage and computation costs are high, especially in large-scale applications. This paper proposes more compact representation to fulfill the gap. The proposed discrete network embedding (DNE) lever...
Article
The performance of person re-identification (re-ID) is easily affected by illumination variations caused by different shooting times, places and cameras. Existing illumination-adaptive methods usually require annotating cross-camera pedestrians on each illumination scale, which is unaffordable for a long-term person retrieval system. The cross-illu...
Article
Many previous occluded person re-identification(re-ID) methods try to use additional clues (pose estimation or semantic parsing models) to focus on non-occluded regions. However, these methods extremely rely on the performance of additional clues and often capture pedestrian features by designing complex modules. In this work, we propose a simple F...
Article
Full-text available
The correlation filter method is effective in visual tracking tasks, whereas it suffers from the boundary effect and filter degradation in complex situations, which can result in suboptimal performance. Aiming at the solving above problem, this study proposes an object tracking method with a discriminant correlation filter, which combines an adapti...
Article
Full-text available
As a novel method of earth observation, video satellites can observe dynamic changes in ground targets in real time. To make use of satellite videos, target tracking in satellite videos has received extensive interest. However, this also faces a variety of new challenges such as global occlusion, low resolution, and insufficient information compare...
Article
Full-text available
Recently, most dehazed image quality assessment (DQA) methods have focused on estimating remaining haze and omitting distortion impact from the side effect of dehazing algorithms, which leads to their limited performance. Addressing this problem, we propose a method for learning both visibility and distortion-aware features no-reference (NR) dehaze...
Article
Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description. The significant feature gap between two modalities makes this task very challenging. Many existing methods attempt to utilize local alignment to address this problem in the fine-grained level....
Preprint
Full-text available
Cross-modality Visible-Infrared Person Re-identification (VI-ReID) aims to recognize images with the same identity between visible modality and infrared modality, which is a very challenging task. Because it not only includes the trouble of variations between cross-cameras in traditional person re-identification, but also suffers from the huge diff...
Article
Full-text available
With the rapid development of deep learning techniques, new breakthroughs have been made in deep learning-based object tracking methods. Although many approaches have achieved state-of-the-art results, existing methods still cannot fully satisfy practical needs. A robust tracker should perform well in three aspects: tracking accuracy, speed, and re...
Article
Full-text available
The impressive progress on image segmentation has been witnessed recently. In this paper, an improved model introducing frequency-tuned salient region detection into Gaussian mixture model (GMM) is proposed, which is named FTGMM. Frequency-tuned salient region detection is added to achieve the saliency map of the original image, which is combined w...
Article
Full-text available
Existing image inpainting methods based on deep learning have made great progress. These methods either generate contextually semantically consistent images or visually excellent images, ignoring that both semantic and visual effects should be appreciated. In this article, we propose a Semantic Residual Pyramid Network (SRPNet) based on a deep gene...
Article
The purpose of person re-identification (ReID) is to find the same person under different cameras and the basic difficulty lies in the need for large amounts of cross-camera pedestrian annotations. In reality, annotating cross-camera pedestrians is time-consuming especially in large-scale surveillance camera networks. This paper focuses on addressi...
Article
In hyperspectral image (HSI) classification, each pixel sample is assigned to a land-cover category. In the recent past, convolutional neural networks (CNNs) based HSI classification methods have greatly improved performance, owing to their superior ability to represent features. However, these methods have limited ability to obtain deep semantic f...
Article
Full-text available
In the convolutional neural network, the precise segmentation of small-scale objects and object boundaries in remote sensing images is a great challenge. As the model gets deeper, low-level features with geometric information and high-level features with semantic information cannot be obtained simultaneously. To alleviate this problem, a successive...
Article
Recognizing the unseen combinations of action and different objects, namely (zero-shot) compositional action recognition, is extremely challenging for conventional action recognition algorithms in real-world applications. Previous methods focus on enhancing the dynamic clues of objects that appear in the scene by building region features or trackle...
Article
Full-text available
Hyperspectral compressive imaging has taken advantage of compressive sensing theory to capture spectral information of the dynamic world in recent decades of years, where an optical encoder is employed to compress high dimensional signals into a single 2-D measurement. The core issue is how to reconstruct the underlying hyperspectral image (HSI), a...
Article
Full-text available
Fusing a low spatial resolution hyperspectral image (LR-HSI) with a high spatial but low spectral resolution multispectral image (HR-MSI) has been regarded as an effective approach to obtain high resolution HSI (HR-HSI). While matrix factorization based approaches obtained promising performance for HSI-MSI fusion, the mixed noise introduced into th...
Article
Single image dehazing has great significance in computer vision. In this paper, we propose a novel unsupervised Dark Channel Attention optimized CycleGAN (DCA-CycleGAN) to deal with the challenging scene with uneven and dense haze concentration. Firstly, the DCA-CycleGAN adopts the dark channel as input and then generate attention through a DCA sub...
Article
Stereoscopic imaging is widely used in many fields. To guarantee the best quality of experience, it’s necessary to design a robust and accurate quality assessment model for stereoscopic content. In this paper, we proposed a no-reference stereoscopic image quality assessment (NR-SIQA) model using both complex contourlet and spatial domain features o...
Article
Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. This problem, named Cross-Resolution Pers...
Article
Deep supervised hashing has greatly improved retrieval performance with the powerful learning capability of deep neural network. In multi-label image retrieval, existing deep hashing simply indicates whether two images are similar by constructing a similarity matrix. However, it ignores the dependency among multiple labels that has been shown impor...
Article
Full-text available
Pansharpening aims to fuse the abundant spectral information of multispectral (MS) images and the spatial details of panchromatic (PAN) images, yielding a high-spatial-resolution MS (HRMS) image. Traditional methods only focus on the linear model, ignoring the fact that degradation process is a nonlinear inverse problem. Due to convolutional neural...
Article
Full-text available
In this paper, we propose a new pansharpening architecture called Sub-Pixel Convolutional Residual Network to obtain high-resolution multispectral (MS) images. Different from previous works, we extract features from MS images in a low-resolution space and pays more attention to the balance of spectral and spatial information. Our architecture consi...
Article
In recent years, generative adversarial networks (GANs) have been widely used to generate realistic fake face images, which can easily deceive human beings. To detect these images, some methods have been proposed. However, their detection performance will be degraded greatly when the testing samples are post-processed. In this paper, some experimen...
Article
Multiview learning (MVL), which enhances the learners' performance by coordinating complementarity and consistency among different views, has attracted much attention. The multiview generalized eigenvalue proximal support vector machine (MvGSVM) is a recently proposed effective binary classification method, which introduces the concept of MVL into...
Article
Joint photographic experts group (JPEG) compression is widely used in image processing and computer vision. Detecting double compressed JPEG images is a common problem in forensics and detecting compressed images with the same quantization matrix remains a challenging task. However, most existing methods were designed for detection in grayscale ima...
Article
Correlation filter (CF) has drawn extensive interest in aerial object tracking due to its remarkable performance. Recently, the popular CF methods based on temporal-spatial regularization have been proved to be able to effectively improve the tracking results. However, the boundary effect and filter template degradation still influence the speed an...
Article
Full-text available
Fusion from a spatially low resolution hyperspectral image (LR-HSI) and a spectrally low resolution multispectral image (HR-MSI) to produce a high spatial-spectral HSI (HR-HSI) has risen to a preferred topic for reinforceing the spatial-spectral resolution of HSI in recent years, that is additionally known as Hyperspectral super-resolution. In this...
Article
Extracting effective and discriminative features is highly important for addressing the challenges of person re-identification (re-ID). At present, deep convolutional neural networks typically use high-level features to identify pedestrians. However, some essential spatial information contained in low-level features will be lost during the learning...
Chapter
Image super resolution is an important field of computer research. The current mainstream image super-resolution technology is to use deep learning to mine the deeper features of the image, and then use it for image restoration. However, most of these models mentioned above only trained the images in a specific scale and do not consider the relatio...
Preprint
Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. We name this problem as Cross-Resolution...
Preprint
Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description. The significant feature gap between two modalities makes this task very challenging. Many existing methods attempt to utilize local alignment to address this problem in the fine-grained level....
Article
To establish robust semantic correspondence between images covering different objects belonging to the same category, there are three important types of information including inter-image relationship, intra-image relationship and cycle consistency. Most existing methods only exploit one or two types of the above information and cannot make them enh...
Article
Full-text available
Currently, person re-identification (re-ID) has been applied in many public security applications. Yet owing to the big visual appearance changes of the same identity under different views, re-ID still faces many challenges. To reduce the intra-person discrepancy, extracting more power feature representations from pedestrian images is a reasonable...

Network

Cited By