Zhe-Ming Lu’s research while affiliated with Zhejiang University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (282)


Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline
  • Chapter

October 2024

·

1 Citation

Zixuan Chen

·

Zewei He

·

Ziqian Lu

·

[...]

·

Zhe-Ming Lu

TDEM: Table Data Extraction Model Based on Cell Segmentation

October 2024

·

2 Reads

IEICE Transactions on Information and Systems

To accurately extract tabular data, we propose a novel cell-based tabular data extraction model (TDEM). The key of TDEM is to utilize grayscale projection of row separation lines, coupled with table masks and column masks generated by the VGG-19 neural network, to segment each individual cell from the input image of the table. In this way, the text content of the table is extracted from a specific single cell, which greatly improves the accuracy of table recognition.


An edge-preserving stripe noise removal method for infrared images

July 2024

·

6 Reads

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences

In this letter, we propose a single frame based method to remove the stripe noise, meanwhile preserving the vertical details. The key idea is to employ the side-window filter to performedge-preserving smoothing, and then accurately separate the stripe noise via a 1D column guided filter. Experimental results demonstrate the effectiveness and efficiency of our method.



A Monkey Swing Counting Algorithm Based on Object Detection

April 2024

·

5 Reads

IEICE Transactions on Information and Systems

This Letter focuses on deep learning-based monkeys' head swing counting problem. Nowadays, there are very few papers on monkey detection, and even fewer papers on monkeys' head swing counting. This research tries to fill in the gap and try to calculate the head swing frequency of monkeys through deep learning, where we further extend the traditional target detection algorithm. After analyzing object detection results, we localize the monkey's actions over a period. This Letter analyzes the task of counting monkeys' head swings, and proposes the standard that accurately describes a monkey's head swing. Under the guidance of this standard, the monkeys' head swing counting accuracy in 50 test videos reaches 94.23%.


DEA-Net: Single Image Dehazing Based on Detail-Enhanced Convolution and Content-Guided Attention

January 2024

·

60 Reads

·

105 Citations

IEEE Transactions on Image Processing

Single image dehazing is a challenging ill-posed problem which estimates latent haze-free images from observed hazy images. Some existing deep learning based methods are devoted to improving the model performance via increasing the depth or width of convolution. The learning ability of Convolutional Neural Network (CNN) structure is still under-explored. In this paper, a Detail-Enhanced Attention Block (DEAB) consisting of Detail-Enhanced Convolution (DEConv) and Content-Guided Attention (CGA) is proposed to boost the feature learning for improving the dehazing performance. Specifically, the DEConv contains difference convolutions which can integrate prior information to complement the vanilla one and enhance the representation capacity. Then by using the re-parameterization technique, DEConv is equivalently converted into a vanilla convolution to reduce parameters and computational cost. By assigning the unique Spatial Importance Map (SIM) to every channel, CGA can attend more useful information encoded in features. In addition, a CGA-based mixup fusion scheme is presented to effectively fuse the features and aid the gradient flow. By combining above mentioned components, we propose our Detail-Enhanced Attention Network (DEA-Net) for recovering high-quality haze-free images. Extensive experimental results demonstrate the effectiveness of our DEA-Net, outperforming the state-of-the-art (SOTA) methods by boosting the PSNR index over 41 dB with only 3.653 M parameters. (The source code of our DEA-Net is available at https://github.com/cecret3350/DEA-Net .)


Covert Communication through Robust Fragment Hiding in a Large Number of Images
  • Article
  • Full-text available

January 2024

·

13 Reads

·

1 Citation

Sensors

For covert communication in lossy channels, it is necessary to consider that the carrier of the hidden watermark will undergo multiple image-processing attacks. In order to ensure that secret information can be extracted without distortion from the watermarked images that have undergone attacks, in this paper, we design a novel fragmented secure communication system. The sender will fragment the secret data to be transmitted and redundantly hide it in a large number of multimodal carriers of messenger accounts on multiple social platforms. The receiver receives enough covert carriers, extracts each fragment, and concatenates the transmitted secret data. This article uses the image carrier as an example to fragment the text file intended for transmission and embeds it into a large number of images, with each fragment being redundant and embedded into multiple images. In this way, at the receiving end, only enough stego images need to be received to extract the information in each image, and then concatenate the final secret file. In order to resist various possible attacks during image transmission, we propose a strong robust image watermarking method. This method adopts a watermark layer based on DFT, which has high embedding and detection efficiency and good invisibility. Secondly, a watermark layer based on DCT is adopted, which can resist translation attacks, JPEG attacks, and other common attacks. Experiments have shown that our watermarking method is very fast; both the embedding time and the extraction time are less than 0.15 s for images not larger than 2000×2000. Our watermarking method has very good invisibility with 41dB PSNR on average. And our watermarking method is more robust than existing schemes and robust to nearly all kinds of attacks. Based on this strong robust image watermarking method, the scheme of fragmenting and hiding redundant transmission content into a large number of images is effective and practical. Our scheme can 100% restore the secret file completely under different RST or hybrid attacks, such as rotation by 1 degree and 5 degrees, scaling by 1.25 and 0.8, and cropping by 10% and 25%. Our scheme can successfully restore the secret file completely even if 30% of received images are lost. When 80% of received images are lost, our scheme can still restore 61.1% of the secret file. If all stego images can be obtained, the original text file can be completely restored.

Download

A Classification Method for Helmet Wearing State Based on Progressive Multi-Granularity Training Strategy

January 2024

·

1 Read

IEEE Access

In many construction sites, whether to wear the safety helmet directly affects the life safety of workers. Therefore, monitoring the wearing state of safety helmets has become an important auxiliary means of construction safety. However, most current safety helmet wearing state monitoring algorithms only distinguish workers who are wearing safety helmets from those who are not, which has high detection limitations and algorithm performance needs to be improved. In this paper, we innovatively apply fine-grained classification algorithms to classify the wearing state of safety helmets, and propose a progressive multi-granularity training strategy based safety helmet wearing state classification algorithm PMG-Helmet (Progressive Multi-granularity for Helmet, PMG-Helmet) for the six classification dataset of safety helmet wearing state. This algorithm achieves multi-granularity classification of helmet wearing state through a puzzle generator and a progressive training strategy, and introduces the MC-Loss(Mutual Channel Loss) function designed specifically for fine-grained classification tasks to improve algorithm performance. In the algorithm inference stage, this paper normalized the weights of the outputs of each stage of the PMG-Helmet algorithm, resulting in better combination accuracy. The experimental results show that the accuracy of this algorithm on the six classification dataset is 93.36%. Specifically, in order to further investigate the effectiveness of the algorithm, this study conducted separate studies on the finer subcategories of "wearing the helmet correctly" and "wearing the helmet but not fastening the chin strap" during the experimental phase, achieving an accuracy of 90.11%.


The pseudo code of the DCT embedding process for a given I frame
The pseudo code of the DCT extraction process for a given I frame
Block diagram of the proposed algorithm
a embedded Lena image. b DFT magnitude with an embedded ring
a image block in 96×96 devided into 12×12 grids. b The positions of DCT coefficients within an image block

+6

Multipurpose video watermarking algorithm for copyright protection and tamper detection

November 2023

·

73 Reads

With the rise of short video and the development of self-media, video has become a type of frequently used data, thus the protection of video data is particularly important. Digital watermarking is a very effective digital authentication and copyright protection technology that hides a piece of data called watermark into digital multimedia content. However, most existing video watermark algorithms fail to deal with the videos under geometric distortions, and they only focus on copyright protection, ignoring the task of tamper detection. This paper proposes a multipurpose video watermarking algorithm that is resistant to rotation and scaling attacks while providing both copyright protection and tamper detection. We design a DFT template embedding method to make the watermarking algorithm resistant to rotation and scaling attacks efficiently. This method can detect the rotation and scaling that the video has undergone during the watermark extraction process, and then extract the copyright watermark information based on geometric correction. In addition, we propose a robust watermark embedding method with tamper detection function, which embeds copyright watermark information with validation bits in the DCT coefficients of I frames. By embedding the copyright watermark block by block, the algorithm can not only improve the robustness, but also enable the system to detect tampering. Experiments show that the proposed algorithm has good robustness to various attacks such as compression, filtering, rotation, scaling, frame rate changing and other inter-frame operations, and it outperforms previous geometrically robust algorithm by an average of 12.8% under geometric attacks on the normalized correlation values. Also, we show that our algorithm is effective in tamper detection by localizing the distorted areas in the video frames.


A DFT and IWT-DCT Based Image Watermarking Scheme for Industry

November 2023

·

15 Reads

IEICE Transactions on Information and Systems

Until today, digital image watermarking has not been large-scale used in the industry. The first reason is that the watermarking efficiency is low and the real-time performance cannot be satisfied. The second reason is that the watermarking scheme cannot cope with various attacks. To solve above problems, this paper presents a multi-domain based digital image watermarking scheme, where a fast DFT (Discrete Fourier Transform) based watermarking method is proposed for synchronization correction and an IWT-DCT (Integer Wavelet Transform-Discrete Cosine Transform) based watermarking method is proposed for information embedding. The proposed scheme has high efficiency during embedding and extraction. Compared with five existing schemes, the robustness of our scheme is very strong and our scheme can cope with many common attacks and compound attacks, and thus can be used in wide application scenarios.


Citations (65)


... Action-CLIP [38] models temporal information at multiple levels including frame and video level, while X-CLIP [26] introduces a cross-frame attention mechanism for temporal information modulation. Open-VCLIP [41] is designed for tackling open-vocabulary zero-shot tasks [25], involving fine-tuning all parameters of CLIP models. Others [40,45,27,22] utilize PEFT, e.g., adapter structures or learnable prompts, to inject temporal learning ability into CLIP. ...

Reference:

OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
Learning Multiple Criteria Calibration for Generalized Zero-shot Learning
  • Citing Article
  • June 2024

Knowledge-Based Systems

... This method significantly enhances dehazing performance, but the transformer requires extensive training time. Recently, Zixuan Chen et al. [12] introduced a novel method called Detail-Enhanced Attention Network (DEA-Net), which leverages Detail-Enhanced Convolution (DEConv) and Content-Guided Attention (CGA) mechanisms to enhance feature learning capabilities, thereby improving dehazing performance. Existing deep learning based dehazing methods mostly rely on paired hazy and clean images, which need extensive hazy and corresponding clean images for training. ...

DEA-Net: Single Image Dehazing Based on Detail-Enhanced Convolution and Content-Guided Attention
  • Citing Article
  • January 2024

IEEE Transactions on Image Processing

... Thus, single-carrier DSSS signal detection and multi-carrier cases can be distinguished from each other in a simple way. Single carrier DSSS signal detection with cyclostationary signal analysis has been extensively studied in the literature and [23] can be consulted for detailed discussion. In Figure 2, cdma2000 chip rate of 1.2288 mega-chips per second, second carrier at the 1.25 MHz carrier spacing, third carrier at the 2.5 MHz, and fourth carrier at the 3.75 MHz are distinguished particularly as the expected results of the proposed method, beyond the zero-lag correlations. ...

DSSS Signal Detection Based on CNN

Sensors

... Several well-known deep-learning techniques for image de-hazing are there such as DC-Net [1], and FFA-Net [2]. This paper presents a U-Net type of architecture which is an extension of the DEA-Net [3] model. Additionally, pixel and channel attention blocks are included at the encoder side and difference convolution is utilized in DEB for color restoration and effective image de-hazing. ...

DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention
  • Citing Preprint
  • January 2023

... J. Xu et al. [29] presented the innovative approach for single image dehazing which integrates the transformer and convolution neural network architectures. For improving the dehazing capability, the network captures both the global and local features using transformer-convolution hybrid layer. ...

An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network

Sensors

... Although the detection accuracy improved, the increased model complexity made it less suitable for deployment on edge devices. Tan et al. [12] and Zhang et al. [13] utilized YOLOv5 as the baseline model and added an additional detection head layer to capture small target features, thereby enhancing small target recognition. However, this also increased the number of model parameters. ...

Helmet Wearing State Detection Based on Improved Yolov5s

Sensors

... The CBAM module can be seamlessly integrated into any CNN architecture and leads to improved overall performance.Hu et al. 27 introduced a compressive excitation module called SENet, which effectively adjusts the channel feature responses by explicitly modeling the relationships between the channels. When integrated into a backbone network, this module demonstrates notable performance enhancements.Feng et al. 28 enhanced the compression excitation module by transforming it into a dual compression excitation module (S2ENet) and integrated it into a ResNet backbone network. This modification resulted in superior performance compared to the single compression excitation module.Jin et al. 29 introduced a two-stage spatial pooling design to enhance descriptor extraction and information fusion in SENet. ...

Optimized S2E Attention Block based Convolutional Network for Human Pose Estimation

IEEE Access

... The YOLOv4 model improves the Resblock-body structure in YOLOv3 [24], cross-stage partial connections (CSP) [25] approach is applied to the backbone net should be noted that the YOLOv4 algorithm adds an improved spatial pyramid module between the backbone network and the neck to enhance the receptive fiel uses the path aggregation network (PANet) [22] as the neck to fuse the feature la suitable for the detection of targets of different sizes. For the loss function, some r ers proposed IoU loss, which considers the coverage of the predicted BBox area ground truth BBox area [22]. ...

A Real-Time Cup-Detection Method Based on YOLOv3 for Inventory Management

Sensors

... For a support image and a given weak label, it generates CAMs for a set of seen classes using a pre-trained network, then performs a weighted summation of these with the weights proportional to similarities of the textual features obtained by word2vec. Similarly, [57] first proposed the setting of WZSS using only image labels for seen classes as supervision. Another line of work is open-world segmentation [7] where models are trained using large-scale image captioning datasets without a need for dense pixel annotations. ...

Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation