Chunli Lv’s research while affiliated with China Agricultural University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (48)


Overview of the proposed DRL framework for pricing-strategy generation. It includes a state encoder for multi-level feature extraction, a transformer-based policy optimizer with exploration mechanisms, a value estimator for delayed-reward modeling, and a sparse-feedback enhancer to improve training robustness.
Architecture of the policy-learning module. The design includes feature embedding, a transformer encoder with multi-head attention, a strategy head for action distribution, and dual exploration and intrinsic curiosity.
Illustration of the sparse-feedback-enhancement module, which enhances policy robustness through data augmentation, clustering-based reward smoothing, adaptive noise scaling, and dynamic feedback integration.
Visualization of the learned pricing-policy behavior across different models.
R2 curves of different models.
A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
  • Article
  • Full-text available

May 2025

·

6 Reads

Zizhe Zhou

·

Liman Zhang

·

Xuran Liu

·

[...]

·

Chunli Lv

A deep reinforcement learning framework is presented for strategy generation and profit forecasting based on large-scale economic behavior data. By integrating perturbation-based augmentation, backward return estimation, and policy-stabilization mechanisms, the framework facilitates robust modeling and optimization of complex, dynamic behavior sequences. Experimental evaluations on four distinct behavior data subsets indicate that the proposed method achieved consistent performance improvements over representative baseline models across key metrics, including total profit gain, average reward, policy stability, and profit–price correlation. On the sales feedback dataset, the framework achieved a total profit gain of 0.37, an average reward of 4.85, a low-action standard deviation of 0.37, and a correlation score of R2=0.91. In the overall benchmark comparison, the model attained a precision of 0.92 and a recall of 0.89, reflecting reliable strategy response and predictive consistency. These results suggest that the proposed method is capable of effectively handling decision-making scenarios involving sparse feedback, heterogeneous behavior, and temporal volatility, with demonstrable generalization potential and practical relevance.

Download

Illustration of the structure of multimodal feedback data composed of text, image, and voice signals that are collected and integrated to form the basis of sensor-enhanced intelligent education systems. The “+” symbol denotes the semantic integration of heterogeneous modalities, aiming to enrich learner understanding through complementary cues; the “−” symbol represents modal inconsistencies or interference, which the system resolves via alignment or correction strategies during multimodal fusion.
Overall architecture of the proposed cross-modal legal English question-answering system. The system sequentially integrates a visual-language encoding module, a speech recognition and semantic parsing module, and a question–answer matching and feedback module.
Illustration of the visual-language encoding module. The module extracts regional image features using a CNN and encodes textual semantics with a Sentence-BERT (SBERT) model to achieve deep multimodal correlation.
Illustration of the speech recognition and semantic parsing module. The module initially transcribes speech signals into audio features using an ASR model, incorporating inputs from case images, body keypoints, and hand coefficients to extract multisource features.
Illustration of the question–answer matching and feedback module. The module aims to compute the semantic alignment between learner responses and reference answers by integrating multimodal inputs, including case images, legal questions, spoken responses, and instructional prompts. Case images are processed via a Vision Transformer Encoder, while textual and speech features are embedded into a shared space through a projection mechanism. The resulting fused vector is fed into a large language model for contextual inference. A normalized similarity score is then calculated against the reference vector, guiding the generation of interpretable, template-based feedback informed by attention activations.
A Multimodal Deep Learning Approach for Legal English Learning in Intelligent Educational Systems

With the development of artificial intelligence and intelligent sensor technologies, traditional legal English teaching approaches have faced numerous challenges in handling multimodal inputs and complex reasoning tasks. In response to these challenges, a cross-modal legal English question-answering system based on visual and acoustic sensor inputs was proposed, integrating image, text, and speech information and adopting a unified vision–language–speech encoding mechanism coupled with dynamic attention modeling to effectively enhance learners’ understanding and expressive abilities in legal contexts. The system exhibited superior performance across multiple experimental evaluations. In the assessment of question-answering accuracy, the proposed method achieved the best results across BLEU, ROUGE, Precision, Recall, and Accuracy, with an Accuracy of 0.87, Precision of 0.88, and Recall of 0.85, clearly outperforming the traditional ASR+SVM classifier, image-retrieval-based QA model, and unimodal BERT QA system. In the analysis of multimodal matching performance, the proposed method achieved optimal results in Matching Accuracy, Recall@1, Recall@5, and MRR, with a Matching Accuracy of 0.85, surpassing mainstream cross-modal models such as VisualBERT, LXMERT, and CLIP. The user study further verified the system’s practical effectiveness in real teaching environments, with learners’ understanding improvement reaching 0.78, expression improvement reaching 0.75, and satisfaction score reaching 0.88, significantly outperforming traditional teaching methods and unimodal systems. The experimental results fully demonstrate that the proposed cross-modal legal English question-answering system not only exhibits significant advantages in multimodal feature alignment and deep reasoning modeling but also shows substantial potential in enhancing learners’ comprehensive capabilities and learning experiences.


View-Aware Contrastive Learning for Incomplete Tabular Data with Low-Label Regimes

To address the challenges of label sparsity and feature incompleteness in structured data, a self-supervised representation learning method based on multi-view consistency constraints is proposed in this paper. Robust modeling of high-dimensional sparse tabular data is achieved through integration of a view-disentangled encoder, intra- and cross-view contrastive mechanisms, and a joint loss optimization module. The proposed method incorporates feature clustering-based view partitioning, multi-view consistency alignment, and masked reconstruction mechanisms, thereby enhancing the model’s representational capacity and generalization performance under weak supervision. Across multiple experiments conducted on four types of datasets, including user rating data, platform activity logs, and financial transactions, the proposed approach maintains superior performance even under extreme conditions of up to 40% feature missingness and only 10% label availability. The model achieves an accuracy of 0.87, F1-score of 0.83, and AUC of 0.90 while reducing the normalized mean squared error to 0.066. These results significantly outperform mainstream baseline models such as XGBoost, TabTransformer, and VIME, demonstrating the proposed method’s robustness and broad applicability across diverse real-world tasks. The findings suggest that the proposed method offers an efficient and reliable paradigm for modeling sparse structured data.


A Multimodal Parallel Transformer Framework for Apple Disease Detection and Severity Classification with Lightweight Optimization

One of the world’s most important economic crops, apples face numerous disease threats during their production process, posing significant challenges to orchard management and yield quality. To address the impact of complex disease characteristics and diverse environmental factors on detection accuracy, this study proposes a multimodal parallel transformer-based approach for apple disease detection and classification. By integrating multimodal data fusion and lightweight optimization techniques, the proposed method significantly enhances detection accuracy and robustness. Experimental results demonstrate that the method achieves an accuracy of 96%, precision of 97%, and recall of 94% in disease classification tasks. In severity classification, the model achieves a maximum accuracy of 94% for apple scab classification. Furthermore, the continuous frame diffusion generation module enhances the global representation of disease regions through high-dimensional feature modeling, with generated feature distributions closely aligning with real distributions. Additionally, by employing lightweight optimization techniques, the model is successfully deployed on mobile devices, achieving a frame rate of 46 FPS for efficient real-time detection. This research provides an efficient and accurate solution for orchard disease monitoring and lays a foundation for the advancement of intelligent agricultural technologies.


A Deep Learning-Based Approach to Apple Tree Pruning and Evaluation with Multi-Modal Data for Enhanced Accuracy in Agricultural Practices

A deep learning-based tree pruning evaluation system is proposed in this study, which integrates hyperspectral images, sensor data, and expert system rules. The system aims to enhance the accuracy and robustness of tree pruning tasks through multimodal data fusion and online learning strategies. Various models, including Mask R-CNN, SegNet, Tiny-Segformer, Box2Mask, CS-Net, SVM, MLP, and Random Forest, were used in the experiments to perform tree segmentation and pruning evaluation, with comprehensive performance assessments conducted. The experimental results demonstrate that the proposed model excels in the tree segmentation task, achieving a precision of 0.94, recall of 0.90, F1 score of 0.92, and mAP@50 and mAP@75 of 0.91 and 0.90, respectively, outperforming other comparative models. These results confirm the effectiveness of multimodal data fusion and dynamic optimization strategies in improving the accuracy of tree pruning evaluation. The experiments also highlight the critical role of sensor data in pruning evaluation, particularly when combined with the online learning strategy, as the model can progressively optimize pruning decisions and adapt to environmental changes. Through this work, the potential and prospects of the deep learning-based tree pruning evaluation system in practical applications are demonstrated.


Integrating Stride Attention and Cross-Modality Fusion for UAV-Based Detection of Drought, Pest, and Disease Stress in Croplands

Timely and accurate detection of agricultural disasters is crucial for ensuring food security and enhancing post-disaster response efficiency. This paper proposes a deployable UAV-based multimodal agricultural disaster detection framework that integrates multispectral and RGB imagery to simultaneously capture the spectral responses and spatial structural features of affected crop regions. To this end, we design an innovative stride–cross-attention mechanism, in which stride attention is utilized for efficient spatial feature extraction, while cross-attention facilitates semantic fusion between heterogeneous modalities. The experimental data were collected from representative wheat and maize fields in Inner Mongolia, using UAVs equipped with synchronized multispectral (red, green, blue, red edge, near-infrared) and high-resolution RGB sensors. Through a combination of image preprocessing, geometric correction, and various augmentation strategies (e.g., MixUp, CutMix, GridMask, RandAugment), the quality and diversity of the training samples were significantly enhanced. The model trained on the constructed dataset achieved an accuracy of 93.2%, an F1 score of 92.7%, a precision of 93.5%, and a recall of 92.4%, substantially outperforming mainstream models such as ResNet50, EfficientNet-B0, and ViT across multiple evaluation metrics. Ablation studies further validated the critical role of the stride attention and cross-attention modules in performance improvement. This study demonstrates that the integration of lightweight attention mechanisms with multimodal UAV remote sensing imagery enables efficient, accurate, and scalable agricultural disaster detection under complex field conditions.



A Spatiotemporal Attention-Guided Graph Neural Network for Precise Hyperspectral Estimation of Corn Nitrogen Content

April 2025

·

8 Reads

A hyperspectral maize nitrogen content prediction model is proposed, integrating a dynamic spectral–spatiotemporal attention mechanism with a graph neural network, with the aim of enhancing the accuracy and stability of nitrogen estimation. Across multiple experiments, the proposed method achieved outstanding performance on the test set, with R2=0.93, RMSE of 0.35, and MAE of 0.48, significantly outperforming comparative models including SVM, RF, ResNet, and ViT. In experiments conducted across different growth stages, the best performance was observed during the grain-filling stage, where R2 reached 0.96. In terms of accuracy, recall, and precision, the proposed model exhibited an average improvement exceeding 15%, demonstrating strong adaptability to temporal variation and generalization across spatial conditions. These results provide robust technical support for large-scale, nondestructive nitrogen monitoring in agricultural applications.


As illustrated in the figure, a time-evolving adjacency structure is constructed by computing correlation coefficients between historical signal sequences over a sliding window. Each node represents an independent signal, and the edge weights reflect the degree of inter-signal correlation. This dynamic graph is then input into a graph attention module to extract structure-aware embeddings, which enhance the capacity of the model to capture interactions across heterogeneous signal modalities.
The figure illustrates the construction of a graph based on inter-signal correlation and the computation process of attention weights within the Graph Attention Network. Each node receives an initial feature vector Xi, which is transformed via a learnable weight matrix W to obtain embeddings. Attention scores eij are computed using a shared parameter vector a for each pair of neighboring nodes, then normalized to produce attention weights λij. The updated embedding hi′ is obtained through a weighted sum of neighboring node features, modulated by λij.
The module consists of three parallel Transformer encoder branches, each dedicated to processing input sequences of different temporal lengths: short-term (5 steps), mid-term (20 steps), and long-term (60 steps). Each branch includes two stacked Transformer layers for extracting temporal features at its respective resolution. The internal computation of the multi-head self-attention mechanism is illustrated, including the linear transformations for query (Q), key (K), and value (V), and the computation of attention weights aij. Outputs from the three branches are pooled into vectors hs, hm, and hl, which are then fused through a gating mechanism to form the final temporal representation htemporal.
A Deep Learning Framework for High-Frequency Signal Forecasting Based on Graph and Temporal-Macro Fusion

April 2025

·

9 Reads

With the increase in trading frequency and the growing complexity of data structures, traditional quantitative strategies have gradually encountered bottlenecks in modeling capacity, real-time responsiveness, and multi-dimensional information integration. To address these limitations, a high-frequency signal generation framework is proposed, which integrates graph neural networks, cross-scale Transformer architectures, and macro factor modeling. This framework enables unified modeling of structural dependencies, temporal fluctuations, and macroeconomic disturbances. In predictive validation experiments, the framework achieved a precision of 92.4%, a recall of 91.6%, and an F1-score of 92.0% on classification tasks. For regression tasks, the mean squared error (MSE) and mean absolute error (MAE) were reduced to 1.76×10−4 and 0.96×10−2, respectively. These results significantly outperformed several mainstream models, including LSTM, FinBERT, and StockGCN, demonstrating superior stability and practical applicability.


A Reinforcement Learning-Driven UAV-Based Smart Agriculture System for Extreme Weather Prediction

April 2025

·

12 Reads

Extreme weather prediction plays a crucial role in agricultural production and disaster prevention. This study proposes a lightweight extreme weather early warning model based on UAV cruise monitoring, a density-aware attention mechanism, and edge computing. Reinforcement learning is utilized to optimize UAV cruise paths, while a Transformer-based model is employed for weather prediction. Experimental results demonstrate that the proposed method achieves an overall prediction accuracy of 0.91, a precision of 0.93, a recall of 0.88, and an F1-score of 0.91. In the prediction of different extreme weather events, the proposed method attains an accuracy of 0.89 for strong wind conditions, 0.92 for hail, and 0.89 for late spring cold, all outperforming state-of-the-art methods. These results validate the effectiveness and applicability of the proposed approach in extreme weather forecasting.


Citations (25)


... Another critical future direction involves the creation of closed-loop systems that not only detect diseases but also automatically trigger appropriate interventions, such as targeted antifungal treatments or adjusted storage conditions (Silva et al., 2025). Perhaps most importantly, future research must focus on making these technologies more accessible through cost-reduction strategies, simplified user interfaces, and localized training programs to ensure they reach the stakeholders who need them most (Orchi et al., 2023;He et al., 2025). As these innovations mature, they hold the potential to transform post-harvest management from a reactive process to a predictive, precision-based system capable of dramatically reducing global food waste while improving food safety and quality throughout the supply chain (Nturambirwe et al., 2021). ...

Reference:

Innovations in post-harvest disease detection: From molecular diagnostics to AI-based imaging
Passion Fruit Disease Detection Using Sparse Parallel Attention Mechanism and Optical Sensing

... Liu et al. [27] proposed an end-to-end pest object detection method based on the transformer, which achieved 95.3% mAP in leaf-surface pest detection tasks by incorporating FRC and RPSA mechanisms. To achieve accurate pest detection in cistanche, Zhang et al. [28] proposed a transformer-based target identification module. Enhanced by a bridge attention mechanism and loss function, the network achieved 92% average precision, demonstrating excellent results in complicated agricultural scenes. ...

A Transformer-Based Detection Network for Precision Cistanche Pest and Disease Management in Smart Agriculture

... This research adds to the literature by presenting a scheme for balancing usability, efficiency, and security in protecting data in the cloud. Cheng et al. (2025) [14] present a new data obfuscation system that combines probability density functions and information entropy to better preserve privacy. Classic anonymization and encryption techniques tend to impair data utility, which hinders the ability to derive useful insights from protected datasets. ...

A Novel Data Obfuscation Framework Integrating Probability Density and Information Entropy for Privacy Preservation

... For example, a study by Mori et al. [20] proposed using a latent diffusion model to generate images of diseased apple leaves employing an image-to-image generation approach. Another study by Zhou et al. [41] addressed data scarcity by integrating the diffusion model with a few-shot learning technique to implement an end-toend pipeline for plant disease identification. A regressionconditional based image-to-image diffusion model was developed by Egusquiza et al. [4] to create graded synthetic data to quantify plant disease severity. ...

A Novel Few-Shot Learning Framework Based on Diffusion Models for High-Accuracy Sunflower Disease Detection and Classification

... Similarly, YOLOv8, a DL model, was utilized for weed seed quantification, achieve over 80% accuracy, and addressed the challenges in traditional seedbank assessments [79]. Advanced architectures, such as the latent diffusion transformer, have shown remarkable performance in agricultural image analysis, obtaining 92% precision, 91% accuracy, and a 0.90 F1 score [80]. Furthermore, semisupervised learning frameworks may achieved accuracies of up to 96% with only 10% labeled data, showing the potential of generalized student-teacher approaches for reducing annotation costs while maintaining performance [81]. ...

An Efficient Weed Detection Method Using Latent Diffusion Transformer for Enhanced Agricultural Image Analysis and Mobile Deployment

... The diagonal elements of the confusion matrix indicate that the samples are predicted correctly. In addition, the confusion matrix can provide several metrics of the model, such as accuracy, precision, recall, and F1-score [31]. ...

A Deep Learning Model for Accurate Maize Disease Detection Based on State-Space Attention and Feature Fusion

... Although disease prediction models for grapes have shown high accuracy in controlled studies, their practical application can vary significantly depending on regional and environmental conditions. For instance, a novel deep learning model for grape disease detection that integrates multimodal data such as soil type, climate, and grape variety, have been successful in specific locations, but their effectiveness may be limited when applied to different regions without further adaptation [44]. Factors such as local climate variability, different disease prevalence, and varying agricultural practices can affect the precision of these models. ...

High-Performance Grape Disease Detection Method Using Multimodal Data and Parallel Activation Functions

... Ref. [25] proposed a method that performs excellently in segmentation tasks that heavily rely on other tasks, such as defining the edge regions based on leaf shapes. Another study [26] proposed a deep learning approach that combines knowledge graphs and diffusion transformers for cucumber disease detection. This approach enhances the model's ability to recognize complex agricultural disease features and effectively addresses the issue of class imbalance. ...

Integration of Diffusion Transformer and Knowledge Graph for Efficient Cucumber Disease Detection in Agriculture

... While their system achieved 90.19% accuracy, outperforming other models, its reliance on high-end hardware, specifically the V100S GPU server, may limit its accessibility in low-resource environments. A notable hybrid approach by Zhao et al. (2024) proposed a deep learning system for Elaeagnus angustifolia disease detection in smart agriculture, integrating Large Language Models (LLMs), Agricultural Knowledge Graphs (KGs), and Graph Neural Networks (GNNs) with a graph attention mechanism. The framework achieved superior performance (precision: 0.94, recall: 0.92, accuracy: 0.93) by optimizing loss functions This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ...

Implementation of Large Language Models and Agricultural Knowledge Graphs for Efficient Plant Disease Detection

... Despite the individual advancements in federated learning and multimodal learning, relatively few studies have explored their combined application in cybersecurity. Existing research provides valuable insights into their potential integration, with some studies demonstrating the effectiveness of federated learning-based security systems that incorporate multimodal data-such as traffic logs and security alerts-to improve attack detection accuracy in distributed environments [6]. However, most of these works focus on isolated security problems and employ relatively simplistic multimodal fusion techniques. ...

A User-Centered Framework for Data Privacy Protection Using Large Language Models and Attention Mechanisms