Jie Zhou’s research while affiliated with Shenzhen University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (788)


VesselDiffusion: 3D Vascular Structure Generation Based on Diffusion Model
  • Article

May 2025

IEEE Transactions on Medical Imaging

Zhanqiang Guo

·

·

Jianjiang Feng

·

Jie Zhou

3D vascular structure models are pivotal in disease diagnosis, surgical planning, and medical education. The intricate nature of the vascular system presents significant challenges in generating accurate vascular structures. Constrained by the complex connectivity of the overall vascular structure, existing methods primarily focus on generating local or individual vessels. In this paper, we introduce a novel two-stage framework termed VesselDiffusion for the generation of detailed vascular structures, which is more valuable for medical analysis. Given that training data for specific vascular structure is often limited, direct generation of 3D data often results in inadequate detail and insufficient diversity. To this end, we initially train a 2D vascular generation model utilizing extensively available generic 2D vascular datasets. Taking the generated 2D images as input, a conditional diffusion model, integrating a dual-stream feature extraction (DSFE) module, is proposed to extrapolate 3D vascular systems. The DSFE module, comprising a Vision Transformer and a Graph Convolutional Network, synergistically captures visual features of global connection rationality and structural features of local vascular details, ensuring the authenticity and diversity of the generated 3D data. To the best of our knowledge, VesselDiffusion is the first model designed for generating comprehensive and realistic vascular networks with diffusion process. Comparative analyses with other generation methodologies demonstrate that the proposed framework achieves superior accuracy and diversity. Our code is available at: https://github.com/gzq17/VesselDiffusion.


Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection
  • Preprint
  • File available

May 2025

·

5 Reads

Hanzhe Liang

·

Aoran Wang

·

Jie Zhou

·

[...]

·

Jinbao Wang

In this paper, we go beyond identifying anomalies only in structural terms and think about better anomaly detection motivated by anomaly causes. Most anomalies are regarded as the result of unpredictable defective forces from internal and external sources, and their opposite forces are sought to correct the anomalies. We introduced a Mechanics Complementary framework for 3D anomaly detection (MC4AD) to generate internal and external Corrective forces for each point. A Diverse Anomaly-Generation (DA-Gen) module is first proposed to simulate various anomalies. Then, we present a Corrective Force Prediction Network (CFP-Net) with complementary representations for point-level representation to simulate the different contributions of internal and external corrective forces. A combined loss was proposed, including a new symmetric loss and an overall loss, to constrain the corrective forces properly. As a highlight, we consider 3D anomaly detection in industry more comprehensively, creating a hierarchical quality control strategy based on a three-way decision and contributing a dataset named Anomaly-IntraVariance with intraclass variance to evaluate the model. On the proposed and existing five datasets, we obtained nine state-of-the-art performers with the minimum parameters and the fastest inference speed. The source is available at https://github.com/hzzzzzhappy/MC4AD

Download

Fixed-Length Dense Fingerprint Representation

May 2025

·

12 Reads

Fixed-length fingerprint representations, which map each fingerprint to a compact and fixed-size feature vector, are computationally efficient and well-suited for large-scale matching. However, designing a robust representation that effectively handles diverse fingerprint modalities, pose variations, and noise interference remains a significant challenge. In this work, we propose a fixed-length dense descriptor of fingerprints, and introduce FLARE-a fingerprint matching framework that integrates the Fixed-Length dense descriptor with pose-based Alignment and Robust Enhancement. This fixed-length representation employs a three-dimensional dense descriptor to effectively capture spatial relationships among fingerprint ridge structures, enabling robust and locally discriminative representations. To ensure consistency within this dense feature space, FLARE incorporates pose-based alignment using complementary estimation methods, along with dual enhancement strategies that refine ridge clarity while preserving the original fingerprint modality. The proposed dense descriptor supports fixed-length representation while maintaining spatial correspondence, enabling fast and accurate similarity computation. Extensive experiments demonstrate that FLARE achieves superior performance across rolled, plain, latent, and contactless fingerprints, significantly outperforming existing methods in cross-modality and low-quality scenarios. Further analysis validates the effectiveness of the dense descriptor design, as well as the impact of alignment and enhancement modules on the accuracy of dense descriptor matching. Experimental results highlight the effectiveness and generalizability of FLARE as a unified and scalable solution for robust fingerprint representation and matching. The implementation and code will be publicly available at https://github.com/Yu-Yy/FLARE.


Finger Pose Estimation for Under-screen Fingerprint Sensor

May 2025

·

2 Reads

Two-dimensional pose estimation plays a crucial role in fingerprint recognition by facilitating global alignment and reduce pose-induced variations. However, existing methods are still unsatisfactory when handling with large angle or small area inputs. These limitations are particularly pronounced on fingerprints captured by under-screen fingerprint sensors in smartphones. In this paper, we present a novel dual-modal input based network for under-screen fingerprint pose estimation. Our approach effectively integrates two distinct yet complementary modalities: texture details extracted from ridge patches through the under-screen fingerprint sensor, and rough contours derived from capacitive images obtained via the touch screen. This collaborative integration endows our network with more comprehensive and discriminative information, substantially improving the accuracy and stability of pose estimation. A decoupled probability distribution prediction task is designed, instead of the traditional supervised forms of numerical regression or heatmap voting, to facilitate the training process. Additionally, we incorporate a Mixture of Experts (MoE) based feature fusion mechanism and a relationship driven cross-domain knowledge transfer strategy to further strengthen feature extraction and fusion capabilities. Extensive experiments are conducted on several public datasets and two private datasets. The results indicate that our method is significantly superior to previous state-of-the-art (SOTA) methods and remarkably boosts the recognition ability of fingerprint recognition algorithms. Our code is available at https://github.com/XiongjunGuan/DRACO.


MC3D-AD: A Unified Geometry-aware Reconstruction Model for Multi-category 3D Anomaly Detection

May 2025

·

2 Reads

3D Anomaly Detection (AD) is a promising means of controlling the quality of manufactured products. However, existing methods typically require carefully training a task-specific model for each category independently, leading to high cost, low efficiency, and weak generalization. Therefore, this paper presents a novel unified model for Multi-Category 3D Anomaly Detection (MC3D-AD) that aims to utilize both local and global geometry-aware information to reconstruct normal representations of all categories. First, to learn robust and generalized features of different categories, we propose an adaptive geometry-aware masked attention module that extracts geometry variation information to guide mask attention. Then, we introduce a local geometry-aware encoder reinforced by the improved mask attention to encode group-level feature tokens. Finally, we design a global query decoder that utilizes point cloud position embeddings to improve the decoding process and reconstruction ability. This leads to local and global geometry-aware reconstructed feature tokens for the AD task. MC3D-AD is evaluated on two publicly available Real3D-AD and Anomaly-ShapeNet datasets, and exhibits significant superiority over current state-of-the-art single-category methods, achieving 3.1\% and 9.3\% improvement in object-level AUROC over Real3D-AD and Anomaly-ShapeNet, respectively. The source code will be released upon acceptance.




InstaRevive: One-Step Image Enhancement via Dynamic Score Matching

April 2025

Image enhancement finds wide-ranging applications in real-world scenarios due to complex environments and the inherent limitations of imaging devices. Recent diffusion-based methods yield promising outcomes but necessitate prolonged and computationally intensive iterative sampling. In response, we propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to harness potent generative capability and minimize the sampling steps. To fully exploit the potential of the pre-trained diffusion model, we devise a practical and effective diffusion distillation pipeline using dynamic control to address inaccuracies in updating direction during score matching. Our control strategy enables a dynamic diffusing scope, facilitating precise learning of denoising trajectories within the diffusion model and ensuring accurate distribution matching gradients during training. Additionally, to enrich guidance for the generative power, we incorporate textual prompts via image captioning as auxiliary conditions, fostering further exploration of the diffusion model. Extensive experiments substantiate the efficacy of our framework across a diverse array of challenging tasks and datasets, unveiling the compelling efficacy and efficiency of InstaRevive in delivering high-quality and visually appealing results. Code is available at https://github.com/EternalEvan/InstaRevive.


Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

April 2025

·

5 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Learning with Noisy Labels (LNL) aims to improve the model generalization when facing data with noisy labels, and existing methods generally assume that noisy labels come from known classes, called closed-set noise. However, in real-world scenarios, noisy labels from similar unknown classes, i.e., open-set noise, may occur during the training and inference stage. Such open-world noisy labels may significantly impact the performance of LNL methods. In this study, we propose a novel dual-space joint learning method to robustly handle the open-world noise. To mitigate model overfitting on closed-set and open-set noises, a dual representation space is constructed by two networks. One is a projection network that learns shared representations in the prototype space, while the other is a One-Vs-All (OVA) network that makes predictions using unique semantic representations in the class-independent space. Then, bi-level contrastive learning and consistency regularization are introduced in two spaces to enhance the detection capability for data with unknown classes. To benefit from the memorization effects across different types of samples, class-independent margin criteria are designed for sample identification, which selects clean samples, weights closed-set noise, and filters open-set noise effectively. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods and achieves an average accuracy improvement of 4.55\% and an AUROC improvement of 6.17\% on CIFAR80N.



Citations (30)


... Building upon fusion-based approaches, Liu et al. [20] proposed low-rank decomposition techniques incorporating structural constraints to distinguish physical defects from normal patterns. To enhance foreground-background feature discrimination, feature contrast interference suppression [21] addresses background sensitivity by employing contrastive learning and knowledge distillation. Recent advances in data selection methods like PRISM [22], [23] have also shown promise for efficient training of multimodal defect detection systems. ...

Reference:

Differentiable NMS via Sinkhorn Matching for End-to-End Fabric Defect Detection
Enhanced Fabric Defect Detection With Feature Contrast Interference Suppression
  • Citing Article
  • January 2025

IEEE Transactions on Instrumentation and Measurement

... In summary, GB computing improves the model's learning efficiency by efficiently generating GBs and using them as input instead of individual instances. Current research on GB computing primarily focuses on the efficient generation of GBs [20], [22], [23] and their applications to various tasks, such as classification [24], [25], feature selection [26], [27], sampling [28], clustering [13], [15], and outlier detection [29], [30]. ...

Fuzzy Granule Density-Based Outlier Detection With Multi-Scale Granular Balls
  • Citing Article
  • March 2025

IEEE Transactions on Knowledge and Data Engineering

... Furthermore, the integration of multiple alignment and enhancement strategies is empirically shown to further improve descriptor-level matchinWg effectiveness. This work extends our previous conference paper FDD [32] by incorporating complementary pose estimation methods and introducing a set of newly designed fingerprint enhancement modules, along with a more comprehensive analysis of the fixed-length dense descriptor through additional experiments and ablation studies. Concretely, the contributions of this research are as follows: ...

Fixed-length Dense Descriptor for Efficient Fingerprint Matching
  • Citing Conference Paper
  • December 2024

... The key innovation was putting the transformer architecture to work in computer vision tasks, and the ViT architecture has since been applied in a variety of vision tasks with excellent performance. 5) Dai at.el [5] designed a unified architecture to provide a fair comparison for traditional and modern spatial token mixers. ...

Demystify Transformers & Convolutions in Modern Image Deep Networks
  • Citing Article
  • December 2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

... We categorize fingerprint matching methods based on deep learning into three types. The first is pairwise matching networks, which take two fingerprint images as input and directly produce a similarity score by jointly processing their features [8]- [10]. While this enables fine-grained comparisons and high accuracy, the joint processing is computationally expensive and unsuitable for large-scale identification. ...

Joint Identity Verification and Pose Alignment for Partial Fingerprints
  • Citing Article
  • January 2024

IEEE Transactions on Information Forensics and Security

... While this enables fine-grained comparisons and high accuracy, the joint processing is computationally expensive and unsuitable for large-scale identification. The second category is local representation matching, where local descriptors are extracted from patches centered at detected minutiae or estimated orientation fields [11]- [13]. Each fingerprint is represented by a variable-length set of descriptors, and matching is performed by computing pairwise similarities. ...

Latent Fingerprint Matching via Dense Minutia Descriptor
  • Citing Conference Paper
  • September 2024

... Meanwhile, techniques like VastGaussians [12], PyGS [13], and GS-LRM [14] explore the application of 3DGS for large-scale urban scene reconstruction. Other research efforts, such as SplaTAM [15], RTG-SLAM [16] and MonoGS [17], incorporate 3DGS into simultaneous localization and mapping (SLAM) frameworks, while DrivingGaussian [18], GaussianBEV [19] and GaussianFormer [20] investigates the use of 3DGS in autonomous driving scenarios. Despite these promising developments, unique challenges persist in extending 3DGS to outdoor, unconstrained datasets, particularly in terms of scalability and robustness. ...

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
  • Citing Chapter
  • November 2024

... Training-based methods rely on distillation [4,6], which uses a student model with smaller time steps to distill the original diffusion models. Training-free methods [7,8,9] aim to decrease the sampling steps by proposing a fast solver. This paper focuses on the training-free methods to accelerate FDMs [1,2]. ...

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
  • Citing Chapter
  • November 2024

... This study compares the proposed improved algorithm with eight classical detection methods, including SSD [13] , Faster RCNN [14] , YOLOv3, YOLOv4, YOLOv5s, YOLOv7, YOLOv8n, YOLOv9 [15] , and the recent small-object detection methods [16] and [17], under identical conditions. In the experiments, YOLOv3, YOLOv4, and YOLOv7 were compared using their respective tiny versions. ...

DSPDet3D: 3D Small Object Detection with Dynamic Spatial Pruning
  • Citing Chapter
  • October 2024

... Monocular depth estimation [24,44,69,70] has emerged as a powerful tool for generating dense depth maps from single RGB images, making it particularly valuable in scenarios where dense depth annotations are sparse or unavailable [17,18,30]. Notable models, such as MiDaS [44] and Depth-Anything [69,70], have demonstrated impressive capabilities in predicting dense depth maps across a wide range of scenes. ...

Camera-LiDAR Cross-Modality Gait Recognition
  • Citing Chapter
  • October 2024