Wei Lu’s research while affiliated with Tianjin University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (15)


Application of Multilayer Information Fusion and Optimization Network Combined With Attention Mechanism in Polyp Segmentation
  • Article

January 2025

·

3 Reads

IEEE Transactions on Instrumentation and Measurement

Jinghui Chu

·

Yongpeng Wang

·

Qi Tian

·

Wei Lu

Colorectal cancer is a multifaceted disease, but it can be effectively prevented through colonoscopy for the detection of polyps. In clinical practice, the development of automatic polyp segmentation techniques for colonoscopy images can significantly enhance the efficiency and accuracy of polyp detection, and help clinicians to precisely localize the polyps. However, existing segmentation methods have several obvious limitations: (1) inadequate utilization of multi-level features extracted by feature encoders, (2) ineffective aggregation of high-level and low-level features, and (3) unclear delineation of polyp boundaries. To address these challenges while enhancing the clarity of polyp boundaries in segmentation, we propose a novel Multi-layer Information Fusion and Optimization Network (MIFONet) consisting of the following components: (1) Contextual and Fine Feature Processing (CFFP) module, employed to effectively extract both local and global contextual information, (2) Hierarchical Feature Integration Module (HFIM), added to facilitate efficient aggregation of processed high-level and low-level features and strengthen the association between contextual features, (3) Multi-Scale Contextual Attention (MSCA) module, used to deeply integrate aggregated high-level features with low-level features, and (4) a novel refinement module composed of an Adaptive Channel Attention Pyramid (ACAP) part and a Skip-Reverse Attention (SRA) part, with the ability of capturing fine-grained information and refining feature representation. We conducted extensive experiments and comparative analysis of our proposed model with 19 popular or state-of-the-art (SOTA) methods on five renowned polyp benchmark datasets. To further validate the model’s generalization performance, we also designed three cross-dataset experiments. Experimental results demonstrate that MIFONet consistently achieves excellent segmentation performance across most datasets. Especially, we achieve 94.6% mean Dice on CVC-ClinicDB dataset which obtains the superior performance compared with SOTA methods.


Multimodal Dual-Graph Collaborative Network With Serial Attentive Aggregation Mechanism for Micro-Video Multi-Label Classification

January 2025

IEEE Transactions on Multimedia

Yu Qiao

·

Wei Lu

·

Peiguang Jing

·

[...]

·

Yuting Su

The increasing commercial value of micro-videos has spurred a rising demand for grasping their contents. The abundant multimodal cues in micro-videos exhibit substantial potential in enhancing content comprehension. However, effectively harnessing the collaborative characteristics across different modalities remains a significant challenge, especially in multi-label scenarios due to inconsistent behaviors regarding label correlations. To better tackle this issue, in this paper, we first introduce a multimodal dual-graph collaborative network with serial attentive aggregation mechanism (MDGCN) for micro-video multi-label classification. In MDGCN, we exploit an asymmetric encoder-decoder framework, which incorporates multiple parallel encoders with complementary representations and a decoder to ensure the completeness of encoded results. Meanwhile, an adversarial constraint is used to ensure individual differences prominently featured within each modality. Furthermore, considering the inconsistency of label correlations across various modalities, we then construct a serial attentive graph convolutional network that employs an interactive dual-graph attention paradigm to sequentially integrate multimodal representations and dynamically explore label correlations. The experiments conducted on two datasets demonstrate that our proposed method outperforms state-of-the-art approaches.


PFPRNet: A Phase-Wise Feature Pyramid With Retention Network for Polyp Segmentation

November 2024

·

4 Reads

IEEE Journal of Biomedical and Health Informatics

Early detection of colonic polyps is crucial for the prevention and diagnosis of colorectal cancer. Currently, deep learning-based polyp segmentation methods have become mainstream and achieved remarkable results. Acquiring a large number of labeled data is time-consuming and labor-intensive, and meanwhile the presence of numerous similar wrinkles in polyp images also hampers model prediction performance. In this paper, we propose a novel approach called Phase- wise Feature Pyramid with Retention Network (PFPRNet), which leverages a pre-trained Transformer-based Encoder to obtain multi-scale feature maps. A Phase- wise Feature Pyramid with Retention Decoder is designed to gradually integrate global features into local features and guide the model's attention towards key regions. Additionally, our custom Enhance Perception module enables capturing image information from a broader perspective. Finally, we introduce an innovative Low-layer Retention module as an alternative to Transformer for more efficient global attention modeling. Evaluation results on several widely-used polyp segmentation datasets demonstrate that our proposed method has strong learning ability and generalization capability, and outperforms the state-of-the-art approaches.


Restoration results for the old photos of the Solvay conference in 1927. Only some face images are displayed. Our method can restore the details of the original images to a large extent and avoid excessive fantasy. Please zoom in for the best view
The framework of our proposed method. It is mainly composed of an asymmetric codec and a StyleGAN2 prior network. There is only one MMRB layer for each convolution scale and a total of 6, except for the encoder’s first and last convolution scales. Furthermore, the inputs of the GAN prior network include latent codes W\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{W}$$\end{document}, a learned 4×4×512\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4 \times 4 \times 512$$\end{document} constant tensor C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document}, and noise branches. This prior network can then apply style blocks to restore high-quality face images from coarse to fine gradually
Overview of the mixed multi-path residual block (MMRB)
Qualitative comparisons with several state-of-the-art face restoration methods on the CelebA Data. To intuitively feel the performance difference of each method, we enlarge and display local areas. Zoom in for best view
Qualitative comparisons of 4x face super-resolution by different methods. We do not apply local magnification displays for the qualitative comparison results, which intends to view the differences between them in the form of the whole picture. Zoom in for best view

+3

Enhancing quality of pose-varied face restoration with local weak feature sensing and GAN prior
  • Article
  • Publisher preview available

October 2023

·

152 Reads

·

4 Citations

Neural Computing and Applications

Facial semantic guidance (including facial landmarks, facial heatmaps, and facial parsing maps) and facial generative adversarial networks (GAN) prior have been widely used in blind face restoration (BFR) in recent years. Although existing BFR methods have achieved good performance in ordinary cases, these solutions have limited resilience when applied to face images with serious degradation and pose-varied (e.g., looking right, looking left, laughing, etc.) in real-world scenarios. In this work, we propose a well-designed blind face restoration network with generative facial prior. The proposed network is mainly comprised of an asymmetric codec and a StyleGAN2 prior network. In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy. The MMRB can also be plug-and-play in other networks. Furthermore, thanks to the affluent and diverse facial priors of the StyleGAN2 model, we adopt it as the primary generator network in our proposed method and specially design a novel self-supervised training strategy to fit the distribution closer to the target and flexibly restore natural and realistic facial details. Extensive experiments on synthetic and real-world datasets demonstrate that our model performs superior to the prior art for face restoration and face super-resolution tasks.

View access options

Confidence-guided mask learning for semi-supervised medical image segmentation

September 2023

·

19 Reads

·

4 Citations

Computers in Biology and Medicine

Semi-supervised learning aims to train a high-performance model with a minority of labeled data and a majority of unlabeled data. Existing methods mostly adopt the mechanism of prediction task to obtain precise segmentation maps with the constraints of consistency or pseudo-labels, whereas the mechanism usually fails to overcome confirmation bias. To address this issue, in this paper, we propose a novel Confidence-Guided Mask Learning (CGML) for semi-supervised medical image segmentation. Specifically, on the basis of the prediction task, we further introduce an auxiliary generation task with mask learning, which intends to reconstruct the masked images for extremely facilitating the model capability of learning feature representations. Moreover, a confidence-guided masking strategy is developed to enhance model discrimination in uncertain regions. Besides, we introduce a triple-consistency loss to enforce a consistent prediction of the masked unlabeled image, original unlabeled image and reconstructed unlabeled image for generating more reliable results. Extensive experiments on two datasets demonstrate that our proposed method achieves remarkable performance.



LACINet: A Lesion-Aware Contextual Interaction Network for Polyp Segmentation

January 2023

·

4 Reads

·

12 Citations

IEEE Transactions on Instrumentation and Measurement

Automatic polyp segmentation is critical for early prevention and diagnosis of colorectal cancer. However, diverse foreground appearance and complicated background interference severely degrade the performance of pixel-level prediction. The excessive computational overheads further hinder the practical clinical applications of existing methods. In this paper, we propose a novel Lesion-Aware Contextual Interaction Network (LACINet), which aims to explore the long-range dependencies and global contexts with friendly computing resource consumption for polyp segmentation. Specifically, we present a Lesion-aware Pyramid Mechanism (LPM) to weaken the influence of background noise and refine lesion-related features. We also develop a robust Representation Enhancement Decoder (RED) to learn global feature representations and aggregate the multi-level contexts. In RED, we first build a Non-local Contextual Lesion Interaction Module (NCLIM) to integrate the cross-level contextual information for obtaining the intrinsic feature representations, and then design a Tri-branching Multi-scale Perceptual Self-attention Module (TMPSM) to sufficiently excavate the global features. Notably, we introduce an asymmetric multi-branch strategy to alleviate the computational burden. The experimental results on several widely-used benchmark datasets demonstrate the superior performance of our proposed LACINet in comparison with state-of-the-art methods.


VMemNet: A Deep Collaborative Spatial-Temporal Network With Attention Representation for Video Memorability Prediction

January 2023

·

3 Reads

·

1 Citation

IEEE Transactions on Multimedia

Video memorability measures the degree to which a video is remembered by different viewers and has shown great potential in various contexts, including advertising, education, and health care. While extensive research has been conducted on image memorability, the study of video memorability is still in its early stages. Existing methods in this field primarily focus on coarse-grained spatial feature representation and decision fusion strategies, overlooking the crucial interactions between spatial and temporal domains. Therefore, we propose an end-to-end collaborative spatial-temporal network called VMemNet, which incorporates targeted attention mechanisms and intermediation fusion strategies. This enables VMemNet to capture the intricate relationships between spatial and temporal information and uncover more elements of memorability within video visual features. VMemNet integrates spatially and semantically guided attention modules into a dual-stream network architecture, allowing it to simultaneously capture static local cues and dynamic global cues in videos. Specifically, the spatial attention module is used to aggregate more memorable elements from spatial locations, and the semantically guided attention module is used to achieve semantic alignment and intermediate fusion of the local and global cues. In addition, two types of loss functions with complementary decision rules are associated with the corresponding attention modules to guide the training process of the proposed network. Experimental results obtained on a publicly available dataset verify that the proposed VMemNet approach outperforms all current single- and multi-modal methods in terms of video memorability prediction.


A Multimodal Aggregation Network With Serial Self-Attention Mechanism for Micro-Video Multi-Label Classification

January 2023

·

16 Reads

·

10 Citations

Signal Processing Letters, IEEE

Currently, micro-videos have attracted increasing attention due to their unique properties and great commercial value. Considering that micro-videos naturally incorporate multimodal information, a powerful representation method for distinct joint multimodal representations is essential for real applications. Inspired by the potential of attention neural network architectures over various tasks, we propose a multimodal aggregation network (MANET) with a serial self-attention mechanism to perform tasks of micro-video multi-label classification. Specifically, we first propose a parallel content-dependent graph neural networks (CDGNN) module, which explores category-related embeddings of micro-videos by disentangling category relations into modality-specific and modality-shared category dependency patterns. Then we introduce a serial self-attention (SSA) module to transmit the multimodal information in sequential order, in which an aggregation bottleneck is incorporated to better collect and condense the significant information. Experiments conducted on a large-scale multi-label micro-video dataset demonstrate that our proposed method has achieved competitive results compared with several state-of-the-art methods.


A two-stage frequency-time dilated dense network for speech enhancement

December 2022

·

9 Reads

·

3 Citations

Applied Acoustics

Speech enhancement system is applied in many devices such as hearing aids. To improve speech quality retrieved from noisy observations, this paper proposes a two-stage network with the frequency-time dilated dense network (FTDDN). This improvement lies in 3 aspects. Firstly, both frequency modeling and temporal modeling are considered to optimize a time-frequency mask; Secondly, to acquire the large receptive field, dilated convolution is incorporated into 3 basic processing units: frequency-dilated convolutional unit (FDCU), time-dilated convolutional unit (TDCU), and frequency-time dilated convolutional unit (FTDCU); Thirdly, for any one of them, 12 units were densely connected to assemble a frequency-dilated dense block (FDDB), a time-dilated dense block (TDDB), or a frequency-time dilated dense block (FTDDB), all of which are combined with some feature mapping operators to build up an FTDDN. With the above considerations, high-quality speech can be retrieved via implementing information reuse and feature fusion operations on two FTDDNs in a two-stage model. Using Librispeech and VCTK data sets, we conducted several experimental comparisons between our method and the state-of-the-art speech enhancement methods, showing that our proposed model outperforms these baseline models.


Citations (10)


... The SA module uses dilated convolutions to expand the receptive field and reduce computational complexity. It minimizes the number of feature channels through 1 × 1 convolutions and generates SA weights using the sigmoid function [25,26]. The dual-attention mechanism used in this study enhances the network's attention to key features in images through adaptive weighted allocation while suppressing unimportant information, thereby improving feature expression ability and optimizing segmentation results when dealing with small targets and edge details. ...

Reference:

Packaging Design Image Segmentation Based on Improved Full Convolutional Networks
VMemNet: A Deep Collaborative Spatial-Temporal Network With Attention Representation for Video Memorability Prediction
  • Citing Article
  • January 2023

IEEE Transactions on Multimedia

... This model can generate high-quality and realistic face images, and it has significantly improved performance in face image synthesis tasks. Hu et al. [2] proposed a method for face restoration with attitude changes. Through determining which weak local features of a face image had attitude changes and using prior knowledge obtained from generative adversarial networks, the method achieved a significant performance improvement in face restoration tasks regarding attitude changes and the generation of high-quality restored images. ...

Enhancing quality of pose-varied face restoration with local weak feature sensing and GAN prior

Neural Computing and Applications

... They designed dual-scale encoding to enhance semantic segmentation by coordinating context with multiscales and nonlocal dependencies. Li et al. [35] aimed to segment polyps via a novel transformer network. A novel lesion-aware contextual interaction network was proposed to capture global contexts and long-range dependencies. ...

LACINet: A Lesion-Aware Contextual Interaction Network for Polyp Segmentation
  • Citing Article
  • January 2023

IEEE Transactions on Instrumentation and Measurement

... Recently, numerous methods have been proposed to tackle the scarcity of annotations Zhao et al. 2024), especially in the medical field Li et al. 2024b;Wang et al. 2022a;Xia et al. 2024;Song et al. 2024). SSL includes pseudo-labels (Lee et al. 2013;Xie et al. 2020;Zhai et al. 2019) and consistency regularization (Rasmus et al. 2015;Laine and Aila 2017;Tarvainen and Valpola 2017;Ke et al. 2019;Li et al. 2023a). Pseudo-labels-based methods assign labels to unlabeled data based on their confidence scores compared to a predefined threshold value. ...

Confidence-guided mask learning for semi-supervised medical image segmentation
  • Citing Article
  • September 2023

Computers in Biology and Medicine

... For example, these studies utilized a compact module called Squeeze and Excite (SEBlock) to compute the channel relationships. They regarded the scaled value of channel attention as the filter's importance score, using a uniform pruning rate throughout all the layers [23][24][25][26][27]. Chen et al. proposed a new filter pruning method called filter pruning by attention and ranking (FPAR), which calculates and ranks channel importance on the basis of the channel attention mechanism [28]. ...

A Novel Channel Pruning Approach based on Local Attention and Global Ranking for CNN Model Compression
  • Citing Conference Paper
  • July 2023

... In order to focus the scene information of micro-videos more effectively, they [28] proposed VT-ResNet to extract visual features and achieved good performance. Lu et al. [29] proposed MANET with a serial self-attention mechanism to perform tasks of micro-video multi-label classification. Liu et al. [30] propose Multi-Modal and Multi-Granularity Object Relations (M2ORE) to address the above issues, which learns multigranularity interactive semantics between venues and multimodal semantic objects to help understand venues. ...

A Multimodal Aggregation Network With Serial Self-Attention Mechanism for Micro-Video Multi-Label Classification
  • Citing Article
  • January 2023

Signal Processing Letters, IEEE

... In practical applications such as digital transmission or GNSS positioning, each signal frame has a limited length. > REPLACE THIS LINE WITH YOUR MANUSCRIPT ID NUMBER (DOUBLE-CLICK HERE TO EDIT) < Therefore, the sampling time cannot exceed the frame length, imposing constraints on how low the subsampling frequency can practically be set, which is not infinitely small [42]- [44]. ...

Joint frequency and DOA estimation of sub-Nyquist sampling multi-band sources with unfolded coprime arrays

Multidimensional Systems and Signal Processing

... Wang et al. [23] proposed a pruning method that treated filter pruning as a clustering task and removed filters with similar outputs in order to avoid reducing feature diversity in traditional pruning. Fan et al. [24] pruned the filters by means of rank constraints and clustering. The global contribution of the parameters to the loss function was used for model compression [25]. ...

A Dual Rank-Constrained Filter Pruning Approach for Convolutional Neural Networks
  • Citing Article
  • January 2021

Signal Processing Letters, IEEE

... This algorithm is claimed to be much faster than other contemporary algorithms and yields high-quality visible images with clearly highlighted IR objects. Using the level of illumination as a guide for an adaptive and intelligent fusion of features from RGB and thermal images is proposed in [103]. Instead of using the 1:1 fusion strategy that is commonly used, this method uses a proposed illumination score (IAN-score) representing the illumination condition to guide the proportions of information from RGB and Thermal images for fusion. ...

Illumination-based adaptive saliency detection network through fusion of multi-source features
  • Citing Article
  • June 2021

Journal of Visual Communication and Image Representation

... Low-rank learning [33][34][35][36] has emerged as a promising technique across various tasks. For example, FLMSC [37] introduced a multi-view clustering method, employing subspace learning to preserve latent low-rank structures in individual views while simultaneously exploring cross-view consistency. ...

Wearable Computing for Internet of Things: A Discriminant Approach for Human Activity Recognition
  • Citing Article
  • October 2018

IEEE Internet of Things Journal