Lei Zhu

Lei Zhu
  • Doctor of Philosophy
  • Assistant professor at The Hong Kong University of Science and Technology (Guangzhou)

About

445
Publications
39,336
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,267
Citations
Introduction
computer vision, computer graphics, medical imaging, and deep learning
Current institution
The Hong Kong University of Science and Technology (Guangzhou)
Current position
  • Assistant professor

Publications

Publications (445)
Article
Full-text available
Defocus blur detection (DBD) plays a pivotal role in computer vision, serving as a fundamental step to enhance the performance of various downstream applications, such as image refocusing, depth estimation, and saliency detection. Despite recent advancements, existing methods often struggle in complex scenes with homogeneous regions, subtle blur tr...
Article
Full-text available
The pathologists grade cancers by identifying the morphology of tumor areas, whereas this manual implementation is time-consuming and expensive. The most recent methods deploy deep-learning architectures (like CNNs or Transformers) to conduct pathology image segmentation automatically. However, these implementations are not satisfactory for this mo...
Article
Expert surgeons often have heavy workloads and cannot promptly respond to queries from medical students and junior doctors about surgical procedures. Thus, research on Visual Question Localized-Answering in Surgery (Surgical-VQLA) is essential to assist medical students and junior doctors in understanding surgical scenarios. Surgical-VQLA aims to g...
Article
Recently, Denoising Diffusion Models have achieved outstanding success in generative image modeling and attracted significant attention in the computer vision community. Although a substantial amount of diffusion-based research has focused on generative tasks, few studies apply diffusion models to medical diagnosis. In this paper, we propose a diff...
Article
Microvascular invasion (MVI) of hepatocellular carcinoma (HCC) is a crucial histopathologic prognostic factor associated with cancer recurrence after liver transplantation or hepatectomy. Recently, clinicoradiologic characteristics are combined with medical images to enhance the HCC prediction. However, compared to medical imaging data, the clinico...
Preprint
Full-text available
Defocus blur detection (DBD) plays a pivotal role in computer vision, serving as a fundamental step to enhance the performance of various downstream applications, such as image refocusing, depth estimation, and saliency detection. Despite recent advancements, existing methods often struggle in complex scenes with homogeneous regions, subtle blur tr...
Preprint
Decision-making and motion planning are pivotal in ensuring the safety and efficiency of Autonomous Vehicles (AVs). Existing methodologies typically adopt two paradigms: decision then planning or generation then scoring. However, the former often struggles with misalignment between decisions and planning, while the latter encounters significant cha...
Article
Natural images often contain multiple shadow regions, and existing video shadow detection methods tend to fail in fully identifying all shadow regions, since they mainly learned temporal features at single-scale and single memory. In this work, we develop a novel convolutional neural network (CNN) to learn motion-guided multi-scale memory features...
Preprint
How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and short-term outcome pressure. As a consequence, good performance on standard benchmarks does not guarantee success in real-world scenarios...
Preprint
Implicit Neural Representations (INRs) have emerged as a paradigm in knowledge representation, offering exceptional flexibility and performance across a diverse range of applications. INRs leverage multilayer perceptrons (MLPs) to model data as continuous implicit functions, providing critical advantages such as resolution independence, memory effi...
Article
Question-Driven Sign Language Translation (QSLT) addresses the challenge of translating sign language using pertinent questions in question-answering contexts. However, the pronounced modality complexity between question text and sign video poses a predicament: the model tends to overly depend on questions to generate translations, thereby neglecti...
Article
Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unr...
Preprint
Fundus imaging is a pivotal tool in ophthalmology, and different imaging modalities are characterized by their specific advantages. For example, Fundus Fluorescein Angiography (FFA) uniquely provides detailed insights into retinal vascular dynamics and pathology, surpassing Color Fundus Photographs (CFP) in detecting microvascular abnormalities and...
Article
Full-text available
Existing video instance-level lane detection (ILD) methods often assume that the input videos are clear. However, videos captured in hazy weather conditions are inevitably corrupted by the haze, thereby degrading the video instance-level lane detection accuracy. In this work, we address the hazy video ILD task by fusing masked frequency-level phase...
Preprint
Improving the fairness of federated learning (FL) benefits healthy and sustainable collaboration, especially for medical applications. However, existing fair FL methods ignore the specific characteristics of medical FL applications, i.e., domain shift among the datasets from different hospitals. In this work, we propose Fed-LWR to improve performan...
Preprint
Snow degradations present formidable challenges to the advancement of computer vision tasks by the undesirable corruption in outdoor scenarios. While current deep learning-based desnowing approaches achieve success on synthetic benchmark datasets, they struggle to restore out-of-distribution real-world snowy videos due to the deficiency of paired r...
Preprint
Full-text available
Diffusion models, such as Stable Diffusion, have made significant strides in visual generation, yet their paradigm remains fundamentally different from autoregressive language models, complicating the development of unified language-vision models. Recent efforts like LlamaGen have attempted autoregressive image generation using discrete VQVAE token...
Article
Computer-aided ultrasound (US) imaging is an important prerequisite for early clinical diagnosis and treatment. Due to the harsh ultrasound (US) image quality and the blurry tumor area, recent memory-based video object segmentation models (VOS) achieve frame-level segmentation by performing intensive similarity matching among the past frames which...
Preprint
Recent advancements in adverse weather restoration have shown potential, yet the unpredictable and varied combinations of weather degradations in the real world pose significant challenges. Previous methods typically struggle with dynamically handling intricate degradation combinations and carrying on background reconstruction precisely, leading to...
Article
The Magnetically Controlled Capsule Endoscopy (MCCE) has a limited shooting range, resulting in capturing numerous fragmented images and an inability to precisely locate and examine the region of interest (ROI) as traditional endoscopy can. Addressing this issue, image stitching around the ROI can be employed to aid in the diagnosis of gastrointest...
Preprint
Multi-modal magnetic resonance imaging (MRI) provides rich, complementary information for analyzing diseases. However, the practical challenges of acquiring multiple MRI modalities, such as cost, scan time, and safety considerations, often result in incomplete datasets. This affects both the quality of diagnosis and the performance of deep learning...
Preprint
Diffusion Probabilistic Models have recently attracted significant attention in the community of computer vision due to their outstanding performance. However, while a substantial amount of diffusion-based research has focused on generative tasks, no work introduces diffusion models to advance the results of polyp segmentation in videos, which is f...
Preprint
Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images capture high-resolution views of the retina with typically 200 spanning degrees. Accurate segmentation of vessels in UWF-SLO images is essential for detecting and diagnosing fundus disease. Recent studies have revealed that the selective State Space Model (SSM) in Mamba performs well i...
Preprint
Full-text available
Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence. Existing works suffer from inefficient temporal learning. Moreover, few works address the VSD problem by considering the characteristic (i.e., boundary) of shadow. Motivated by this, we propose a Timeline and Boundary Guided Diffusion (TBGDiff) network for VSD where w...
Preprint
Traditional shadow detectors often identify all shadow regions of static images or video sequences. This work presents the Referring Video Shadow Detection (RVSD), which is an innovative task that rejuvenates the classic paradigm by facilitating the segmentation of particular shadows in videos based on descriptive natural language prompts. This nov...
Article
Full-text available
Video dehazing is a critical research area in computer vision that aims to enhance the quality of hazy frames, which benefits many downstream tasks, e.g. semantic segmentation. Recent work devise CNN-based structure or attention mechanism to fuse temporal information, while some others utilize offset between frames to align frames explicitly. Anoth...
Preprint
The outdoor vision systems are frequently contaminated by rain streaks and raindrops, which significantly degenerate the performance of visual tasks and multimedia applications. The nature of videos exhibits redundant temporal cues for rain removal with higher stability. Traditional video deraining methods heavily rely on optical flow estimation an...
Preprint
Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling...
Article
Non-autoregressive video captioning methods generate visual words in parallel but often overlook semantic correlations among them, especially regarding verbs, leading to lower caption quality. To address this, we integrate action information of highlighted objects to enhance semantic connections among visual words. Our proposed Action-aware Languag...
Preprint
Existing low-light image enhancement (LIE) methods have achieved noteworthy success in solving synthetic distortions, yet they often fall short in practical applications. The limitations arise from two inherent challenges in real-world LIE: 1) the collection of distorted/clean image pairs is often impractical and sometimes even unavailable, and 2)...
Article
Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interac...
Preprint
Full-text available
Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRN...
Preprint
Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions (\textit{e.g.}, 1K to...
Article
Shadow removal in a single image has received increasing attention in recent years. However, removing shadows over dynamic scenes remains largely under-explored. In this paper, we propose the first data-driven video shadow removal model, termed PSTNet, by exploiting three essential characteristics of video shadows, i.e., physical property, spatio r...
Preprint
Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-reso...
Preprint
With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In thi...
Article
Full-text available
With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In thi...
Preprint
Structure from Motion (SfM) and visual localization in indoor texture-less scenes and industrial scenarios present prevalent yet challenging research topics. Existing SfM methods designed for natural scenes typically yield low accuracy or map-building failures due to insufficient robust feature extraction in such settings. Visual markers, with thei...
Article
Federated learning enables multiple hospitals to cooperatively learn a shared model without privacy disclosure. Existing methods often take a common assumption that the data from different hospitals have the same modalities. However, such a setting is difficult to fully satisfy in practical applications, since the imaging guidelines may be differen...
Article
Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time....
Article
The widely deployed ways to capture a set of unorganized points, e.g., merged laser scans, fusion of depth images, and structure-from- $x$ , usually yield a 3-D noisy point cloud. Accurate normal estimation for the noisy point cloud makes a crucial contribution to the success of various applications. However, the existing normal estimation wisdoms...
Article
Full-text available
Accurate retinal vessel segmentation is a crucial step in the clinical diagnosis and treatment of fundus diseases. Although many efforts have been presented to address the task, the segmentation performance in challenging regions (e.g., collateral vessels) is still not satisfactory, due to their thin morphology or the low contrast between foregroun...
Article
This paper studies a practical Source-free unsupervised domain adaptation (SFUDA) problem, which transfers knowledge of source-trained models to the target domain, without accessing the source data. It has received increasing attention in recent years, while the prior arts focus on designing adaptation strategies, ignoring that different target sam...
Chapter
Line drawing colorization is an indispensable stage in the image painting process, however, traditional manual coloring requires a lot of time and energy from professional artists. With the development of deep learning techniques, attempts have been made to colorize line drawings by means of user prompts, text, etc, but these methods also seem to r...
Chapter
Nasopharyngeal carcinoma (NPC) is a common and significant malignancy that primarily affects the head and neck region. Accurate delineation of Gross Tumor Volume (GTV) is crucial for effective radiotherapy in NPC. Although Magnetic Resonance Imaging (MRI) allows precise visualization of tumor characteristics, challenges arise due to the complex ana...
Chapter
Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US...
Chapter
Diffusion Probabilistic Models have recently shown remarkable performance in generative image modeling, attracting significant attention in the computer vision community. However, while a substantial amount of diffusion-based research has focused on generative tasks, few studies have applied diffusion models to general medical image classification....
Article
Full-text available
Placental maturity grading (PMG) is often utilized for evaluating fetal growth and maternal health. Currently, PMG often relied on the subjective judgment of the clinician, which is time-consuming and tends to incur a wrong estimation due to redundancy and repeatability of the process. The existing methods often focus on designing diverse hand-craf...
Preprint
In the real world, image degradations caused by rain often exhibit a combination of rain streaks and raindrops, thereby increasing the challenges of recovering the underlying clean image. Note that the rain streaks and raindrops have diverse shapes, sizes, and locations in the captured image, and thus modeling the correlation relationship between i...
Article
Full-text available
Specular highlight detection is an essential task with various applications in computer vision. This paper aims to detect specular highlights in single high-resolution images using deep learning while avoiding excessive GPU memory consumption. To achieve this, we present a high-resolution specular highlight detection dataset with manual annotations...
Preprint
Federated Learning (FL) enables multiple clients to collaboratively learn in a distributed way, allowing for privacy protection. However, the real-world non-IID data will lead to client drift which degrades the performance of FL. Interestingly, we find that the difference in logits between the local and global models increases as the model is conti...
Preprint
While multi-modal learning has been widely used for MRI reconstruction, it relies on paired multi-modal data which is difficult to acquire in real clinical scenarios. Especially in the federated setting, the common situation is that several medical institutions only have single-modal data, termed the modality missing issue. Therefore, it is infeasi...
Preprint
Robot-assisted surgery has made significant progress, with instrument segmentation being a critical factor in surgical intervention quality. It serves as the building block to facilitate surgical robot navigation and surgical education for the next generation of operating intelligence. Although existing methods have achieved accurate instrument seg...
Article
Generalized zero-shot learning (GZSL) aims to recognize both seen and unseen samples by leveraging the connections between semantic and visual representations. Recently, a majority of GZSL methods focus on generating visual features for unseen categories conditioned on category-level semantic attributes. However, there is a considerable gap between...
Article
The bi-classifier paradigm is a common practice in unsupervised domain adaptation (UDA), where two classifiers are leveraged to guide the model to learn domain invariant features. Previous approaches only focused on the consistency of the outputs between classifiers, but ignored the classification certainty of each classifier. Therefore, existing m...
Article
Full-text available
Sign language recognition (SLR) is a challenging task, which requires a thorough understanding of spatial-temporal visual features for translating it into comprehensible written or spoken language. However, existing SLR methods ignore the importance of key spatial-temporal representation due to its sparsity and inconsistency in space and time. To s...
Article
Recently neural architecture (NAS) search has attracted great interest in academia and industry. It remains a challenging problem due to the huge search space and computational costs. Recent studies in NAS mainly focused on the usage of weight sharing to train a SuperNet once. However, the corresponding branch of each subnetwork is not guaranteed t...
Article
Pancreatic cancer has the worst prognosis of all cancers. The clinical application of endoscopic ultrasound (EUS) for the assessment of pancreatic cancer risk and of deep learning for the classification of EUS images have been hindered by inter-grader variability and labeling capability. One of the key reasons for these difficulties is that EUS ima...
Article
Accurate estimation of customer lifetime value (LTV), which reflects the potential consumption of a user over a period of time, is crucial for the revenue management of online advertising platforms. However, predicting LTV in real-world applications is not an easy task since the user consumption data is usually insufficient within a specific domain...
Article
Full-text available
Significance: Robust segmentations of neurons greatly improve neuronal population reconstruction, which could support further study of neuron morphology for brain research. Aim: Precise segmentation of 3D neuron structures from optical microscopy (OM) images is crucial to probe neural circuits and brain functions. However, the high noise and low...
Preprint
Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. However, its performance is inevitably degraded as suffering data heterogeneity, i.e., non-IID data. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world a...
Article
Full-text available
Recent learning-based shadow removal methods have achieved remarkable performance. However, they basically require massive paired shadow and shadow-free images for model training, which limits their generalization capability since these data are often cumbersome to obtain and lack of diversity. To address the problem, we present Self-ShadowGAN, a n...
Preprint
Federated learning enables multiple hospitals to cooperatively learn a shared model without privacy disclosure. Existing methods often take a common assumption that the data from different hospitals have the same modalities. However, such a setting is difficult to fully satisfy in practical applications, since the imaging guidelines may be differen...

Network

Cited By