Kai Ao’s research while affiliated with Aerospace Information Research Institute, Chinese Academy of Sciences and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


The first row displays the RS imagery, and the second row presents the generated cloud masks. The cloud mask is derived from an enhanced ACCA algorithm [13]. (a) is a snow-covered mountain scene, where the bright snow-covered mountain is incorrectly classified as cloud; (b) is a snow-covered scene, where the snow-covered areas are incorrectly classified as cloud; (c) is a desert saline–alkali land scene; (d) is an urban scene, where the clearly identifiable urban area in the RS imagery is still incorrectly classified as cloud.
The overall architecture of TENet. TENet first uses an encoder to extract features. The encoder incorporates the proposed TEDM to extract texture, channel, and spatial features. The decoder restores the spatial dimensions of the image through upsampling while reducing the number of feature channels. To facilitate multi-scale feature integration, skip connections are implemented to concatenate feature maps from corresponding encoder layers with their decoder counterparts at each stage of the decoding process.
The object-oriented cloud matching algorithm.
Successful cases of object-oriented cloud matching algorithm.
Features extracted by first-level wavelet transform under different cloud amounts. (a–c) represent scenarios with high, moderate, and low cloud cover, respectively. In each image, the upper left and lower right corners correspond to low-frequency information and high-frequency information, while the lower left and upper right corners represent vertical and horizontal high-frequency components, respectively. The horizontal and vertical units represent the position of the image.

+7

A Texture-Enhanced Deep Learning Network for Cloud Detection of GaoFen/WFV by Integrating an Object-Oriented Dynamic Threshold Labeling Method and Texture-Feature-Enhanced Attention Module
  • Article
  • Full-text available

May 2025

·

1 Read

·

Xiao Tang

·

Xiaobo Luo

·

[...]

·

Kai Ao

Cloud detection in satellite imagery plays a pivotal role in achieving high-accuracy retrieval of biophysical parameters and subsequent remote sensing applications. Although numerous methods have been developed and operationally deployed, their accuracy over challenging surfaces—such as snow-covered mountains, saline–alkali lands in deserts or Gobi regions, and snow-covered surfaces—remains limited. Additionally, the efficiency of collecting training samples for prevalent deep learning-based methods heavily relies on large-scale pixel-level annotations, which are both time-consuming and labor-intensive. To address these challenges, we propose a Texture-Enhanced Network that integrates an object-oriented dynamic threshold pseudo-labeling method and a texture-feature-enhanced attention module to enhance both the efficiency of deep learning methods and detection accuracy over challenging surfaces. First, an object-oriented dynamic threshold pseudo-labeling approach is developed by leveraging object-oriented principles and adaptive thresholding techniques, enabling the efficient collection of large-scale labeled samples for challenging surfaces. Second, to exploit the spatial continuity of clouds, cross-channel correlations, and their distinctive texture features, a texture-feature-enhanced attention module is designed to improve feature discrimination for challenging positive and negative samples. Extensive experiments on a Chinese GaoFen satellite imagery dataset demonstrate that the proposed method achieves state-of-the-art performance.

Download

FERDNet: High-Resolution Remote Sensing Road Extraction Network Based on Feature Enhancement of Road Directionality

January 2025

·

19 Reads

The identification of roads from satellite imagery plays an important role in urban design, geographic referencing, vehicle navigation, geospatial data integration, and intelligent transportation systems. The use of deep learning methods has demonstrated significant advantages in the extraction of roads from remote sensing data. However, many previous deep learning-based road extraction studies overlook the connectivity and completeness of roads. To address this issue, this paper proposes a new high-resolution satellite road extraction network called FERDNet. In this paper, to effectively distinguish between road features and background features, we design a Multi-angle Feature Enhancement module based on the characteristics of remote sensing road data. Additionally, to enhance the extraction capability for narrow roads, we develop a High–Low-Level Feature Enhancement module within the directional feature extraction branch. Furthermore, experimental results on three public datasets validate the effectiveness of FERDNet in the task of road extraction from satellite imagery.


30 m 5-yearly land cover maps of Qilian Mountain Area (QMA_LC30) from 1990 to 2020

December 2024

·

46 Reads

·

1 Citation

Scientific Data

The Qilian Mountain Area (QMA) serves as a crucial ecological barrier and strategic water conservation zone in China. Recent years have seen heightened social attention to environmental issues within the QMA, underscoring the need for accurate and continuous land cover maps to support ecological monitoring, analysis, and forecasting. This paper presents the QMA_LC30 dataset, which includes 9 land cover categories and spans the period from 1990 to 2020, with updates every 5 years. The dataset primarily utilizes 30 m Landsat series data and features: 1) High precision, achieved through a geographical division and hierarchical classification decision tree approach, complemented by visual interpretation. 2) Robust consistency, ensured by a change detection method based on a benchmark map. The QMA_LC30 dataset undergoes rigorous accuracy validation, achieving an overall accuracy of over 0.92 for all 7 periods of land cover maps. Compared to GlobeLand30, ESA WorldCover, ESRI 2020 Land Cover, FROM_GLC30, and GLC_FCS30, QMA_LC30 demonstrates the highest consistency with remote sensing images.


Reconstruction of 30 m Land Cover in the Qilian Mountains from 1980 to 1990 Based on Super-Resolution Generative Adversarial Networks

November 2024

·

117 Reads

Long time series of annual land cover with fine spatio-temporal resolutions play a crucial role in studying environmental climate change, biophysical modeling, carbon cycling models, and land management. Despite a strong consistency exhibited by several publicly available medium to fine resolution global land cover datasets, significant discrepancies exist at the regional scale; moreover, only every 5/10 year land cover were available. Consequently, high-quality annual land cover datasets before 2000 are unavailable in China. In this study, we proposed a deep learning-based method by integrating multiple remote sensing data from different platforms with historical high spatial resolution land cover datasets (CNLUCC) to derive the 30 m annual land cover maps from 1980 to 1990 for Qilian Mountain. First, the super-resolution generative adversarial network models for upscaling the 5.5 km AVHRR NDVI to 250 m were established by employing the AVHRR and MODIS NDVI data with the same year as input, and the early time series AVHRR NDVI data were subsequently upscaled to 250 m through the above models. Second, the breaks for the additive seasonal and trend (BFAST) change detection algorithm was applied to the upscaled time series NDVI data to detect the change time of different land cover types. Third, the CNLUCC data in 1980 and 1990 were updated to annual land cover datasets from 1980 to 1990 and the annual mapping results provided insights into the dynamic processes of urbanization, deforestation, water bodies, and farmland from 1980 to 1990. Finally, comprehensive analysis and validation were carried out for evaluation and an overall accuracy of 77.26% for the land cover product in 1986 was achieved.



Multi-Swin Mask Transformer for Instance Segmentation of Agricultural Field Extraction

January 2023

·

195 Reads

·

14 Citations

With the rapid development of digital intelligent agriculture, the accurate extraction of field information from remote sensing imagery to guide agricultural planning has become an important issue. In order to better extract fields, we analyze the scale characteristics of agricultural fields and incorporate the multi-scale idea into a Transformer. We subsequently propose an improved deep learning method named the Multi-Swin Mask Transformer (MSMTransformer), which is based on Mask2Former (an end-to-end instance segmentation framework). In order to prove the capability and effectiveness of our method, the iFLYTEK Challenge 2021 Cultivated Land Extraction competition dataset is used and the results are compared with Mask R-CNN, HTC, Mask2Former, etc. The experimental results show that the network has excellent performance, achieving a bbox_AP50 score of 0.749 and a segm_AP50 score of 0.758. Through comparative experiments, it is shown that the MSMTransformer network achieves the optimal values in all the COCO segmentation indexes, and can effectively alleviate the overlapping problem caused by the end-to-end instance segmentation network in dense scenes.


Methods of Sandy Land Detection in a Sparse-Vegetation Scene Based on the Fusion of HJ-2A Hyperspectral and GF-3 SAR Data

March 2022

·

108 Reads

·

6 Citations

Accurate identification of sandy land plays an important role in sandy land prevention and control. It is difficult to identify the nature of sandy land due to vegetation covering the soil in the sandy area. Therefore, HJ-2A hyperspectral data and GF-3 Synthetic Aperture Radar (SAR) data were used as the main data sources in this article. The advantages of the spectral characteristics of a hyperspectral image and the penetration characteristics of SAR data were used synthetically to carry out mixed-pixel decomposition in the “horizontal” direction and polarization decomposition in the “vertical” direction. The results showed that in the study area of the Otingdag Sandy Land, in China, the accuracy of sandy land detection based on feature-level fusion and single GF-3 data was verified to be 92% in both cases by field data; the accuracy of sandy land detection based on feature-level fusion was verified to be 88.74% by the data collected from Google high-resolution imagery, which was higher than that based on single HJ-2A (74.17%) and single GF-3 data (88.08%). To further verify the universality of the feature-level fusion method for sandy land detection, Alxa sandy land was also used as a verification area and the accuracy of sandy land detection was verified to be as high as 88.74%. The method proposed in this paper made full use of the horizontal and vertical structural information of remote sensing data. The problem of mixed pixels in sparse-vegetation scenes in the horizontal direction and the problem of vegetation covering sandy soil in the vertical direction were both well solved. Accurate identification of sandy land can be realized effectively, which can provide technical support for sandy land prevention and control.


Accuracy and speed comparison on HRSC2016. FPS: frames per second.
Accuracy comparison on UCAS-AOD. "Ours*" indicates that sResNet152 is used as the backbone.
Single-Stage Rotation-Decoupled Detector for Oriented Object

October 2020

·

189 Reads

·

38 Citations

Oriented object detection has received extensive attention in recent years, especially for the task of detecting targets in aerial imagery. Traditional detectors locate objects by horizontal bounding boxes (HBBs), which may cause inaccuracies when detecting objects with arbitrary oriented angles, dense distribution and a large aspect ratio. Oriented bounding boxes (OBBs), which add different rotation angles to the horizontal bounding boxes, can better deal with the above problems. New problems arise with the introduction of oriented bounding boxes for rotation detectors, such as an increase in the number of anchors and the sensitivity of the intersection over union (IoU) to changes of angle. To overcome these shortcomings while taking advantage of the oriented bounding boxes, we propose a novel rotation detector which redesigns the matching strategy between oriented anchors and ground truth boxes. The main idea of the new strategy is to decouple the rotating bounding box into a horizontal bounding box during matching, thereby reducing the instability of the angle to the matching process. Extensive experiments on public remote sensing datasets including DOTA, HRSC2016 and UCAS-AOD demonstrate that the proposed approach achieves state-of-the-art detection accuracy with higher efficiency.


An Atmospheric Correction Method over Bright and Stable Surfaces for Moderate to High Spatial-Resolution Optical Remotely Sensed Imagery

February 2020

·

274 Reads

·

5 Citations

Although many attempts have been made, it has remained a challenge to retrieve the aerosol optical depth (AOD) at 550 nm from moderate to high spatial-resolution (MHSR) optical remotely sensed imagery in arid areas with bright surfaces, such as deserts and bare ground. Atmospheric correction for remote-sensing images in these areas has not been good. In this paper, we proposed a new algorithm that can effectively estimate the spatial distribution of atmospheric aerosols and retrieve surface reflectance from moderate to high spatial-resolution imagery in arid areas with bright surfaces. Land surface in arid areas is usually bright and stable and the variation of atmosphere in these areas is also very small; consequently, the land-surface characteristics, specifically the bidirectional reflectance distribution factor (BRDF), can be retrieved easily and accurately using time series of satellite images with relatively lower spatial resolution like the Moderate-resolution Imaging Spectroradiometer (MODIS) with 500 m resolution and the retrieved BRDF is then used to retrieve the AOD from MHSR images. This algorithm has three advantages: (i) it is well suited to arid areas with bright surfaces; (ii) it is very efficient because of employed lower resolution BRDF; and (iii) it is completely automatic. The derived AODs from the Multispectral Instrument (MSI) on board Sentinel-2, Landsat 5 Thematic Mapper (TM), Landsat 8 Operational Land Imager (OLI), Gao Fen 1 Wide Field Viewer (GF-1/WFV), Gao Fen 6 Wide Field Viewer (GF-6/WFV), and Huan Jing 1 CCD (HJ-1/CCD) data are validated using ground measurements from 4 stations of the AErosol Robotic NETwork (AERONET) around the world.

Citations (5)


... The climate classification dataset is obtained from a global climate zoning dataset, the Global Aridity Index and Potential Evapotranspiration Database-Version 3 (Global-AI_PET_v3) [75], which categorizes regions into different climate types on the basis of long-term meteorological variables. Landcover types are obtained from the 30 m 5-yearly landcover maps of the Qilian Mountain area, which include nine categories with an overall accuracy exceeding 90% [76]. The dataset of geomorphological types is obtained from the Resource and Environmental Science Data Platform (https://www.resdc.cn/, ...

Reference:

Exploration of Spatiotemporal Covariation in Vegetation–Groundwater Relationships: A Case Study in an Endorheic Inland River Basin
30 m 5-yearly land cover maps of Qilian Mountain Area (QMA_LC30) from 1990 to 2020

Scientific Data

... A systematic study by Xu et al. [15] demonstrated that the Swin Transformer [16] architecture enhances overall accuracy (OA) by 6.13% and 4.67% compared with U-Net and DeepLab V3, respectively, in the national-scale crop identification task, thereby substantiating the considerable potential of Transformer in the domain of agricultural remote sensing. The MSMTransformer [17] employs a parallel connection of multiple windows of varying sizes and performs multi-head self-attention operations to more effectively extract farmland information at different scales, achieving the highest performance in COCO segmentation metrics. The latest research primarily focusses on innovating the CNN-Transformer hybrid architecture to balance the advantages of local feature extraction and global context-building modelling. ...

Multi-Swin Mask Transformer for Instance Segmentation of Agricultural Field Extraction

... The HJ-2A and HJ-2B satellites are arranged in a 180 • phase configuration to achieve higher temporal resolution imagery. Their MSC and IRS can cover the entire Chinese region within two days, providing efficient data support for applications such as natural disaster monitoring, land use macro-monitoring, water resource management and protection, crop monitoring and yield estimation, and earthquake emergency response and disaster relief [19,20]. ...

Methods of Sandy Land Detection in a Sparse-Vegetation Scene Based on the Fusion of HJ-2A Hyperspectral and GF-3 SAR Data

... This annotation is relatively simple and has a good effect on the object detection of photos. Because of the arbitrary orientations, large aspect ratio and dense distribution of objects, coupled with the complex background in remote sensing images, the classical object detection methods based on the HBB may cause serious overlaps between objects and noise [14,15]. ...

Single-Stage Rotation-Decoupled Detector for Oriented Object

... The purpose of this study was to fully leverage the high-frequency and wideobservational capabilities of the FY-3D satellite, applying the MESMA-AGE algorithm to estimate FSC in China. We employ the Level-2A surface reflectance data products provided by [27] as the research data. Then, the results were evaluated via a comparison with Landsat-8/OLI snow cover maps. ...

An Atmospheric Correction Method over Bright and Stable Surfaces for Moderate to High Spatial-Resolution Optical Remotely Sensed Imagery