Licheng Jiao’s research while affiliated with Xidian University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1,019)


Spatial location and scope of the Weihe River Basin study area in the People’s Republic of China: (a) is the administrative divisions of China, (b) is the true color Landsat 8 OLI image of the study area, (c) is the image before pansharpening in a randomly selected area, and (d) is the image after pansharpening in a randomly selected area.
EDWNet model structure.
CFF module structure.
DAM module structure.
GAM module structure.

+10

EDWNet: A Novel Encoder–Decoder Architecture Network for Water Body Extraction from Optical Images
  • Article
  • Full-text available

November 2024

·

10 Reads

·

Wenbo Ji

·

Weibin Li

·

[...]

·

Licheng Jiao

Automated water body (WB) extraction is one of the hot research topics in the field of remote sensing image processing. To address the challenges of over-extraction and incomplete extraction in complex water scenes, we propose an encoder–decoder architecture semantic segmentation network for high-precision extraction of WBs called EDWNet. We integrate the Cross-layer Feature Fusion (CFF) module to solve difficulties in segmentation of WB edges, utilizing the Global Attention Mechanism (GAM) module to reduce information diffusion, and combining with the Deep Attention Module (DAM) module to enhance the model’s global perception ability and refine WB features. Additionally, an auxiliary head is incorporated to optimize the model’s learning process. In addition, we analyze the feature importance of bands 2 to 7 in Landsat 8 OLI images, constructing a band combination (RGB 763) suitable for algorithm’s WB extraction. When we compare EDWNet with various other semantic segmentation networks, the results on the test dataset show that EDWNet has the highest accuracy. EDWNet is applied to accurately extract WBs in the Weihe River basin from 2013 to 2021, and we quantitatively analyzed the area changes of the WBs during this period and their causes. The results show that EDWNet is suitable for WB extraction in complex scenes and demonstrates great potential in long time-series and large-scale WB extraction.

Download

Improving the Multi-label Atomic Activity Recognition by Robust Visual Feature and Advanced Attention @ ROAD++ Atomic Activity Recognition 2024

October 2024

·

4 Reads

Road++ Track3 proposes a multi-label atomic activity recognition task in traffic scenarios, which can be standardized as a 64-class multi-label video action recognition task. In the multi-label atomic activity recognition task, the robustness of visual feature extraction remains a key challenge, which directly affects the model performance and generalization ability. To cope with these issues, our team optimized three aspects: data processing, model and post-processing. Firstly, the appropriate resolution and video sampling strategy are selected, and a fixed sampling strategy is set on the validation and test sets. Secondly, in terms of model training, the team selects a variety of visual backbone networks for feature extraction, and then introduces the action-slot model, which is trained on the training and validation sets, and reasoned on the test set. Finally, for post-processing, the team combined the strengths and weaknesses of different models for weighted fusion, and the final mAP on the test set was 58%, which is 4% higher than the challenge baseline.


Knowledge Guided Evolutionary Transformer for Remote Sensing Scene Classification

October 2024

·

4 Reads

·

3 Citations

IEEE Transactions on Circuits and Systems for Video Technology

Solving the complex challenges of sophisticated terrain and multi-scale targets in remote sensing (RS) images requires a synergistic combination of Transformer and convolutional neural network (CNN). However, crafting effective CNN architectures remains a major challenge. To address these difficulties, this study introduces the knowledge guided evolutionary Transformer for RS scene classification (Evo RSFormer). It amalgamates adaptive evolutionary CNN (Evo CNN) with Transformers in a hybrid strategy synergistically, which combines fine-grained local feature extraction of CNNs with long-range contextual dependency modeling of Transformers. Furthermore, for the development of Evo CNN blocks, this paper presents a knowledge-guided adaptive efficient multi-objective evolutionary neural architecture search (MOE 2 -NAS) strategy. This approach markedly diminishes the labor-intensive characteristics associated with traditional CNN design, striking a balance for both accuracy and compactness. Additionally, by leveraging domain knowledge from natural scene analysis into the RS field, MOE 2 -NAS facilitates the efficiency of classical NAS. It utilizes a priori knowledge to generate promising initial solutions and constructs a surrogate model for efficient search. The effectiveness of the proposed Evo RSFormer has been rigorously tested on various benchmark RS datasets, including UC Merced, NWPU45, and AID. Empirical results strongly support the superiority of Evo RSFormer over existing methods. Furthermore, experiments on MOE 2 -NAS have been studied to confirm the important role of knowledge guidance in improving the efficiency of NAS.



MCLHN: Toward Automatic Modulation Classification via Masked Contrastive Learning With Hard Negatives

October 2024

·

12 Reads

·

2 Citations

IEEE Transactions on Wireless Communications

Recently, contrastive learning (CL) has exhibited considerable advantages for automatic modulation classification (AMC) with a scarcity of labeled samples. Nevertheless, the majority of the available CL-based AMC methods use the simple signal augmentation strategy and suffer from interference from false negatives. To explore the more generalizable global temporal semantics within signals, a novel masked contrastive learning with hard negatives (MCLHN) method is proposed in this paper. MCLHN first strategically incorporates semantic-preserving data augmentation, ensuring the diversity and semantic invariance of signals. Second, MCLHN adopts an encoder with temporal masking to enable robust temporal modeling. Moreover, a debiased hardness-weighted contrastive (DHWC) loss is designed to balance the adverse impact of the debiased strategy and the advantage of hard negatives. Extensive experiments are conducted on several benchmark datasets, and the experimental results demonstrate the superior performance and generalization capability of MCLHN to other methods. Significantly, the performance of MCLHN with only one labeled sample per modulation under each signal-to-noise ratio (SNR) rivals that of other methods with five to twenty times the number of labeled samples.


Multi-Grained Gradual Inference Model for Multimedia Event Extraction

October 2024

·

10 Reads

·

3 Citations

IEEE Transactions on Circuits and Systems for Video Technology

With the development of multimedia technology, events are usually presented in multimedia forms, thus multimedia event extraction (MEE) has become more and more important. Existing MEE works usually use simple strategies to align two modalities, making it difficult to precisely extract events and arguments in complex multimedia documents. To address this problem, we propose a novel Multi-grained Gradual Inference Model (MGIM) that focuses on inferring and interpreting events in complex multimedia structures in a coarse-to-fine manner. To efficiently integrate textual and visual modalities, we design a Coarse-grained Alignment (CA) module, which represents the two modalities in a graph structure and performs coarse-grained alignment. Based on the CA module, we further propose a Fine-grained Inference module (FI) that fine-grained aligns text and image by performing multiple rounds of gradual inference. MGIM provides a comprehensive interpretation of multimedia events at two information granularities (coarse and fine). Extensive experiments on the M2E2 dataset demonstrate the effectiveness of MGIM.




A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing

September 2024

·

7 Reads

We propose an adaptive fine-tuning algorithm for multimodal large models. The core steps of this algorithm involve two stages of truncation. First, the vast amount of data is projected into a semantic vector space, and the MiniBatchKMeans algorithm is used for automated clustering. This classification ensures that the data within each cluster exhibit high semantic similarity. Next, we process the data in each cluster, calculating the translational difference between the original and perturbed data in the multimodal large model's vector space. This difference serves as a generalization metric for the data. Based on this metric, we select the data with high generalization potential for training. We applied this algorithm to train the InternLM-XComposer2-VL-7B model on two 3090 GPUs using one-third of the GeoChat multimodal remote sensing dataset. The results demonstrate that our algorithm outperforms the state-of-the-art baselines. various baselines. The model trained on our optimally chosen one-third dataset, based on experimental validation, exhibited only 1% reduction in performance across various remote sensing metrics compared to the model trained on the full dataset. This approach significantly preserved general-purpose capabilities while reducing training time by 68.2%. Furthermore, the model achieved scores of 89.86 and 77.19 on the UCMerced and AID evaluation datasets, respectively, surpassing the GeoChat dataset by 5.43 and 5.16 points. It only showed a 0.91-point average decrease on the LRBEN evaluation dataset.


Automatic Graph Topology-Aware Transformer

September 2024

·

6 Reads

·

2 Citations

IEEE Transactions on Neural Networks and Learning Systems

Existing efforts are dedicated to designing many topologies and graph-aware strategies for the graph Transformer, which greatly improve the model’s representation capabilities. However, manually determining the suitable Transformer architecture for a specific graph dataset or task requires extensive expert knowledge and laborious trials. This article proposes an evolutionary graph Transformer architecture search (EGTAS) framework to automate the construction of strong graph Transformers. We build a comprehensive graph Transformer search space with the micro-level and macro-level designs. EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level. Furthermore, a surrogate model based on generic architectural coding is proposed to directly predict the performance of graph Transformers, substantially reducing the evaluation cost of evolutionary search. We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks, encompassing both small-scale and large-scale graph datasets. Experimental results and ablation studies show that EGTAS can construct high-performance architectures that rival state-of-the-art manual and automated baselines.


Citations (45)


... Liu et al. [39] recently introduced a surrogate-assisted GNAS algorithm, known as CTFGNAS, to explore layer components, topology connections, and fusion strategies, while leveraging a surrogate model to reduce the computational cost. Wang et al. [40] presented an automated configuration method for graph Transformer topologies and graph-aware strategies using a surrogate-assisted EA. ...

Reference:

Knowledge-aware Evolutionary Graph Neural Architecture Search
Automatic Graph Topology-Aware Transformer
  • Citing Article
  • September 2024

IEEE Transactions on Neural Networks and Learning Systems

... This is evinced by several works published during this year. For example, the study in [34] examines various natural mechanisms and certain "commonalities of natureinspired intelligent computing paradigms. These commonalities provide a solid algorithmic foundation to avoid designing unreasonable metaphors". ...

Nature-Inspired Intelligent Computing: A Comprehensive Survey

... Verma et al. conducted a time series analysis using Landsat 8 and Landsat 9 imagery to map land use and land cover for Rampura Agucha Mines, demonstrating the value of multi-temporal data in monitoring environmental changes [4]. Geng et al.'s spatial-spectral relation-guided fusion network further exem-plifies the benefits of combining different data sources for improved image classification [5].Xia et al. explored the blending of Sentinel-1 and MODIS data to synthesize Landsat-8 images [6], providing a novel approach to enhancing image quality and availability in remote sensing [7]. Sasaki et al. enhanced the detection of coastal marine debris using very HR satellite imagery through unsupervised domain adaptation, showcasing the application of advanced machine learning techniques in environmental monitoring [8]. ...

A Spatial-Spectral Relation-Guided Fusion Network for Multisource Optical RS Image Classification
  • Citing Article
  • July 2024

IEEE Transactions on Neural Networks and Learning Systems

... With the development of artificial intelligence (AI) and advanced technologies, a series of AI-based channel models for 6G V2V communications have been proposed [7]. The powerful ability of AI to process vast data and identify complex patterns can significantly enhance the trade-off between modeling accuracy and complexity. ...

Advanced Deep Learning Models for 6G: Overview, Opportunities and Challenges

IEEE Access

... HSIs have been widely used in the fields of remote sensing [1], agriculture [2], environmental monitoring [3], medical diagnosis [4] and so on [5]- [8]. Hyperspectral Image Classification (HSIC), aimed at assigning unique labels to each pixel, has emerged as a popular research topic in remote sensing as well [9]- [12]. ...

Ternary Modality Contrastive Learning for Hyperspectral and LiDAR Data Classification
  • Citing Article
  • January 2024

IEEE Transactions on Geoscience and Remote Sensing

... Due to the selective fading of the shortwave channel itself, sudden broadband interference, signal leakage and other factors, the shortwave noise base appears as a colored noise base with continuous changes. Therefore, it is necessary to adopt broadband preprocessing technology before signal detection to meet the detection needs under the shortwave non-stationary noise background [9][10][11][12]. ...

MCLHN: Toward Automatic Modulation Classification via Masked Contrastive Learning With Hard Negatives
  • Citing Article
  • October 2024

IEEE Transactions on Wireless Communications

... SEIFNet [48] leverages coordinate attention to enhance spatiotemporal differences in the differential branch. Wu et al. [49] proposed a multi-task learning framework that innovatively utilizes change detection to assist with discriminative frequency band reweighting. The reweighted bands then use temporal difference information to enhance BCD. ...

A Multitask Framework for Hyperspectral Change Detection and Band Reweighting With Unbalanced Contrastive Learning
  • Citing Article
  • January 2024

IEEE Transactions on Geoscience and Remote Sensing

... Tao et al. [33] incorporate the deformable selfattention mechanism in the Transformer to automatically adjust the receptive field and design an encoder-decoder architecture accordingly to achieve efficient context modeling. Zhang et al. [34] present the knowledge-guided evolutionary Transformer for RSSC, termed Evo RSFormer. This model innovatively merges an adaptive evolutionary CNN (Evo CNN) with Transformers, employing a hybrid strategy that synergistically leverages the fine-grained local feature extraction capabilities of CNN with the long-range dependencies modeling strengths of Transformers. ...

Knowledge Guided Evolutionary Transformer for Remote Sensing Scene Classification
  • Citing Article
  • October 2024

IEEE Transactions on Circuits and Systems for Video Technology

... The information from different modalities needs to be merged or fused for a multimodal fusion network to utilize the information from all the different modalities efficiently. Applications within aerial imagery, such as video surveillance, meteorological analysis, vehicle navigation, land segmentation, and activity detection, heavily rely on a diverse array of data sources [7][8][9][10]. These sources encompass various modalities such as electrooptical imaging, synthetic aperture radar (SAR), hyperspectral imaging, and more, each offering unique perspectives and advantages depending on environmental conditions and observational requirements. ...

Large-Scale Fine-Grained Building Classification and Height Estimation for Semantic Urban Reconstruction: Outcome of the 2023 IEEE GRSS Data Fusion Contest

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing