Feixiang Chen’s research while affiliated with Beijing Forestry University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (25)


Multiscale feature fusion and enhancement in a transformer for the fine-grained visual classification of tree species
  • Article

May 2025

·

6 Reads

Ecological Informatics

Yanqi Dong

·

Zhibin Ma

·

·

[...]

·

Feixiang Chen

Image Classification of Tree Species in Relatives Based on Dual-Branch Vision Transformer
  • Article
  • Full-text available

December 2024

·

14 Reads

Forests

Tree species in relatives refer to species belonging to the same genus with high morphological similarity and small botanical differences, making it difficult to perform classification and usually requiring manual identification by experts. To reduce labor costs and achieve accurate species identification, we conducted research on the image classification of tree species in relatives based on deep learning and proposed a dual-branch feature fusion Vision Transformer model. This model is designed with a dual-branch architecture and two effective blocks, a Residual Cross-Attention Transformer Block and a Multi-level Feature Fusion method, to enhance the influence of shallow network features on the final classification and enable the model to capture both overall image information and detailed features. Finally, we conducted ablation studies and comparative experiments to validate the effectiveness of the model, achieving an accuracy of 90% on the tree relatives dataset.

Download

MAFIKD: A Real-Time Pest Detection Method Based on Knowledge Distillation

October 2024

·

14 Reads

IEEE Sensors Journal

The significant damage caused by pests to crops has always been a pressing issue in agricultural production. To address the problems of low recognition accuracy, weak feature extraction capability, and poor robustness of lightweight pest detection models, this study proposes a knowledge distillation algorithm based on multi-attention feature fusion and adaptive fine-grained feature imitation (MAFIKD). MAFIKD consists of two parts: multi-attention feature fusion (MA) and fine-grained feature imitation (FI). MAFIKD combines MA and FI to enhance the attention of the student to the key features of the teacher, establishing diversified knowledge such as feature correlation and sample correlation to alleviate the difficulty of knowledge transfer in pest detection models. We used a self-made pest dataset to evaluate the proposed algorithm. Experimental results show that after applying MAFIKD, YOLOv5-CSPDarknet achieved 85.7% mAP@0.5 and 76.12% mmAP, which are 3.13% and 4.56% higher than the baseline, respectively. To verify the actual inference speed of the model, this study developed a mobile application for pest detection based on Android, using the NCNN high-performance neural network forward computing framework to deploy the pest detection model offline to mobile terminals, and deployed the model on the server using the Nginx+uWSGI+Flask architecture to provide online and offline pest detection services. Experimental results show that after applying MAFIKD, YOLOv5-CSPDarknet achieved an average detection frame rate of 10.1 FPS on the HUAWEI Enjoy 20, and the model size was only 14.5 MB, meeting the real-time detection requirements for field pests.


The overall framework of SG-CLIP for few-shot species recognition. It contains three paths for text, image, and geographic information, respectively. The geographic feature is obtained by GFEM. The parameters of GFEM and IGFFM are learnable.
The structure of GFEM for geographic feature extraction. The dashed box is the structure of the FCResLayer.
The structure of IGFFM for image and geographic feature fusion, where Fc denotes the fully connected layer, ReLU denotes the ReLU activation function, and LayerNorm denotes layer normalization. DFB denotes the dynamic fusion block. DFB is used recursively, where N is the number of DFB modules.
Heatmaps of the geolocation distribution. (a) Mammals. (b) Reptiles. (c) Amphibians. Different colors indicate the number of species at different locations. Green indicates relatively little data and red indicates a large number.
Performance comparison of different methods with different training samples on different datasets. (a) Mammals. (b) Reptiles. (c) Amphibians.

+2

CLIP-Driven Few-Shot Species-Recognition Method for Integrating Geographic Information

June 2024

·

26 Reads

·

1 Citation

Automatic recognition of species is important for the conservation and management of biodiversity. However, since closely related species are visually similar, it is difficult to distinguish them by images alone. In addition, traditional species-recognition models are limited by the size of the dataset and face the problem of poor generalization ability. Visual-language models such as Contrastive Language-Image Pretraining (CLIP), obtained by training on large-scale datasets, have excellent visual representation learning ability and demonstrated promising few-shot transfer ability in a variety of few-shot species recognition tasks. However, limited by the dataset on which CLIP is trained, the performance of CLIP is poor when used directly for few-shot species recognition. To improve the performance of CLIP for few-shot species recognition, we proposed a few-shot species-recognition method incorporating geolocation information. First, we utilized the powerful feature extraction capability of CLIP to extract image features and text features. Second, a geographic feature extraction module was constructed to provide additional contextual information by converting structured geographic location information into geographic feature representations. Then, a multimodal feature fusion module was constructed to deeply interact geographic features with image features to obtain enhanced image features through residual connection. Finally, the similarity between the enhanced image features and text features was calculated and the species recognition results were obtained. Extensive experiments on the iNaturalist 2021 dataset show that our proposed method can significantly improve the performance of CLIP’s few-shot species recognition. Under ViT-L/14 and 16-shot training species samples, compared to Linear probe CLIP, our method achieved a performance improvement of 6.22% (mammals), 13.77% (reptiles), and 16.82% (amphibians). Our work provides powerful evidence for integrating geolocation information into species-recognition models based on visual-language models.


Utilizing Geographical Distribution Statistical Data to Improve Zero-Shot Species Recognition

June 2024

·

11 Reads

·

1 Citation

Animals

Simple Summary Species recognition is a key part of understanding biodiversity and can help us to better conserve and manage biodiversity. Traditional species recognition methods require large amounts of image data to train the recognition model, but obtaining image data of rare and endangered species is a challenge. However, Contrastive Language–Image Pre-training (CLIP), a generalized artificial intelligence model, can perform classification by calculating the similarity between images and text without the need for training data. Taking advantage of this and considering the unique geographic distribution pattern of species, we propose a CLIP-based species recognition method that can recognize species based on geographic distribution knowledge. This study is the first to combine geographic distribution knowledge with species recognition, which can lead to more effective recognition of rare and endangered species. Abstract Species recognition is a crucial part of understanding the abundance and distribution of various organisms and is important for biodiversity conservation and management. Traditional vision-based deep learning-driven species recognition requires large amounts of well-labeled, high-quality image data, the collection of which is challenging for rare and endangered species. In addition, recognition methods designed based on specific species have poor generalization ability and are difficult to adapt to new species recognition scenarios. To address these issues, zero-shot species recognition based on Contrastive Language–Image Pre-training (CLIP) has become a research hotspot. However, previous studies have primarily utilized visual descriptive information and taxonomic information of species to improve zero-shot recognition performance, and the use of geographic distribution characteristics of species to improve zero-shot recognition performance has not been explored. To fill this gap, we proposed a CLIP-driven zero-shot species recognition method that incorporates knowledge of the geographic distribution of species. First, we designed three prompts based on the species geographic distribution statistical data. Then, the latitude and longitude coordinate information attached to each image in the species dataset was converted into addresses, and they were integrated together to form the geographical distribution knowledge of each species. Finally, species recognition results were derived by calculating the similarity after acquiring features by the trained CLIP image encoder and text encoder. We conducted extensive experiments on multiple species datasets from the iNaturalist 2021 dataset, where the zero-shot recognition accuracies of mammals, mollusks, reptiles, amphibians, birds, and insects were 44.96%, 15.27%, 17.51%, 9.47%, 28.35%, and 7.03%, an improvement of 2.07%, 0.48%, 0.35%, 1.12%, 1.64%, and 0.61%, respectively, as compared to CLIP with default prompt. The experimental results show that the fusion of geographic distribution statistical data can effectively improve the performance of zero-shot species recognition, which provides a new way to utilize species domain knowledge.


Identification of Rare Wildlife in the Field Environment Based on the Improved YOLOv5 Model

April 2024

·

69 Reads

·

7 Citations

Research on wildlife monitoring methods is a crucial tool for the conservation of rare wildlife in China. However, the fact that rare wildlife monitoring images in field scenes are easily affected by complex scene information, poorly illuminated, obscured, and blurred limits their use. This often results in unstable recognition and low accuracy levels. To address this issue, this paper proposes a novel wildlife identification model for rare animals in Giant Panda National Park (GPNP). We redesigned the C3 module of YOLOv5 using NAMAttention and the MemoryEfficientMish activation function to decrease the weight of field scene features. Additionally, we integrated the WIoU boundary loss function to mitigate the influence of low-quality images during training, resulting in the development of the NMW-YOLOv5 model. Our model achieved 97.3% for mAP50 and 83.3% for mAP50:95 in the LoTE-Animal dataset. When comparing the model with some classical YOLO models for the purpose of conducting comparison experiments, it surpasses the current best-performing model by 1.6% for mAP50:95, showcasing a high level of recognition accuracy. In the generalization ability test, the model has a low error rate for most rare wildlife species and is generally able to identify wildlife in the wild environment of the GPNP with greater accuracy. It has been demonstrated that NMW-YOLOv5 significantly enhances wildlife recognition accuracy in field environments by eliminating irrelevant features and extracting deep, effective features. Furthermore, it exhibits strong detection and recognition capabilities for rare wildlife in GPNP field environments. This could offer a new and effective tool for rare wildlife monitoring in GPNP.


Wildlife Real-Time Detection in Complex Forest Scenes Based on YOLOv5s Deep Learning Network

April 2024

·

127 Reads

·

9 Citations

With the progressively deteriorating global ecological environment and the gradual escalation of human activities, the survival of wildlife has been severely impacted. Hence, a rapid, precise, and reliable method for detecting wildlife holds immense significance in safeguarding their existence and monitoring their status. However, due to the rare and concealed nature of wildlife activities, the existing wildlife detection methods face limitations in efficiently extracting features during real-time monitoring in complex forest environments. These models exhibit drawbacks such as slow speed and low accuracy. Therefore, we propose a novel real-time monitoring model called WL-YOLO, which is designed for lightweight wildlife detection in complex forest environments. This model is built upon the deep learning model YOLOv5s. In WL-YOLO, we introduce a novel and lightweight feature extraction module. This module is comprised of a deeply separable convolutional neural network integrated with compression and excitation modules in the backbone network. This design is aimed at reducing the number of model parameters and computational requirements, while simultaneously enhancing the feature representation of the network. Additionally, we introduced a CBAM attention mechanism to enhance the extraction of local key features, resulting in improved performance of WL-YOLO in the natural environment where wildlife has high concealment and complexity. This model achieved a mean accuracy (mAP) value of 97.25%, an F1-score value of 95.65%, and an accuracy value of 95.14%. These results demonstrated that this model outperforms the current mainstream deep learning models. Additionally, compared to the YOLOv5m base model, WL-YOLO reduces the number of parameters by 44.73% and shortens the detection time by 58%. This study offers technical support for detecting and protecting wildlife in intricate environments by introducing a highly efficient and advanced wildlife detection model.


Figure 7. The inference process for Amur tiger re-ID.
Figure 10. (a,b) are examples of the results of applying our proposed network for Amur tiger re-ID. The number above the image shows the similarity ranking result, and the green color shows correct re-ID.
Training and Testing dataset used in the experiment.
Performing ablation experiments on the ATRW test dataset to demonstrate the effectiveness of the IFPM.
Ablation experiments on ATRW test data set prove the progressiveness of CBAM module in the local branch.
A Serial Multi-Scale Feature Fusion and Enhancement Network for Amur Tiger Re-Identification

April 2024

·

39 Reads

·

1 Citation

Animals

The Amur tiger is an important endangered species in the world, and its re-identification (re-ID) plays an important role in regional biodiversity assessment and wildlife resource statistics. This paper focuses on the task of Amur tiger re-ID based on visible light images from screenshots of surveillance videos or camera traps, aiming to solve the problem of low accuracy caused by camera perspective, noisy background noise, changes in motion posture, and deformation of Amur tiger body patterns during the re-ID process. To overcome this challenge, we propose a serial multi-scale feature fusion and enhancement re-ID network of Amur tiger for this task, in which global and local branches are constructed. Specifically, we design a global inverted pyramid multi-scale feature fusion method in the global branch to effectively fuse multi-scale global features and achieve high-level, fine-grained, and deep semantic feature preservation. We also design a local dual-domain attention feature enhancement method in the local branch, further enhancing local feature extraction and fusion by dividing local feature blocks. Based on the above model structure, we evaluated the effectiveness and feasibility of the model on the public dataset of the Amur Tiger Re-identification in the Wild (ATRW), and achieved good results on mAP, Rank-1, and Rank-5, demonstrating a certain competitiveness. In addition, since our proposed model does not require the introduction of additional expensive annotation information and does not incorporate other pre-training modules, it has important advantages such as strong transferability and simple training.


A New Method for Reconstructing Tree-Level Aboveground Carbon Stocks of Eucalyptus Based on TLS Point Clouds

September 2023

·

108 Reads

·

4 Citations

Eucalyptus plantation forests in southern China provide not only the economic value of producing timber, but also the ecological value service of absorbing carbon dioxide and releasing oxygen. Based on the theory of spatial colonial modeling, this paper proposes a new method for 3D reconstruction of tree terrestrial LiDAR point clouds for determining the aboveground carbon stock of eucalyptus monocotyledons, which consists of the main steps of tree branch and trunk separation, skeleton extraction and optimization, 3D reconstruction, and carbon stock calculation. The main trunk and branches of the tree point clouds are separated using a layer-by-layer judgment and clustering method, which avoids errors in judgment caused by sagging branches. By optimizing and adjusting the skeleton to remove small redundant branches, the near-parallel branches belonging to the same tree branch are fused. The missing parts of the skeleton point clouds were complemented using the cardinal curve interpolation algorithm, and finally a real 3D structural model was generated based on the complemented and smoothed tree skeleton expansion. The bidirectional Hausdoff distance, average Hausdoff distance, and F distance were used as evaluation indexes, which were reduced by 0.7453 m, 0.0028 m, and 0.0011 m, respectively, and the improved spatial colonization algorithm enhanced the accuracy of the reconstructed tree 3D structural model. To verify the accuracy of our method to determine the carbon stock and its related parameters, we cut down 41 eucalyptus trees and destructively sampled the measurement data as reference values. The R2 of the linear fit between the reconstructed single-tree aboveground carbon stock estimates and the reference values was 0.96 with a CV(RMSE) of 16.23%, the R2 of the linear fit between the trunk volume estimates and the reference values was 0.94 with a CV(RMSE) of 19.00%, and the R2 of the linear fit between the branch volume estimates and the reference values was 0.95 with a CV(RMSE) of 38.84%. In this paper, a new method for reconstructing eucalyptus carbon stocks based on TLS point clouds is proposed, which can provide decision support for forest management and administration, forest carbon sink trading, and emission reduction policy formulation.


Forest-PointNet: A Deep Learning Model for Vertical Structure Segmentation in Complex Forest Scenes

September 2023

·

126 Reads

·

8 Citations

The vertical structure of forest ecosystems influences and reflects ecosystem functioning. Terrestrial laser scanning (TLS) enables the rapid acquisition of 3D forest information and subsequent reconstruction of the vertical structure, which provides new support for acquiring forest vertical structure information. We focused on artificial forest sample plots in the north-central of Nanning, Guangxi, China as the research area. Forest sample point cloud data were obtained through TLS. By accurately capturing the gradient information of the forest vertical structure, a classification boundary was delineated. A complex forest vertical structure segmentation method was proposed based on the Forest-PointNet model. This method comprehensively utilized the spatial and shape features of the point cloud. The study accurately segmented four types of vertical structure features in the forest sample location cloud data: ground, bushes, trunks, and leaves. With optimal training, the average classification accuracy reaches 90.98%. The results indicated that segmentation errors are mainly concentrated at the branch intersections of the canopy. Our model demonstrates significant advantages, including effective segmentation of vertical structures, strong generalization ability, and feature extraction capability.


Citations (19)


... It also introduces a cluster-wise attribution loss to reduce semantic confusion, further improving the model's ability to handle complex RS data. Moreover, SG-CLIP [46] integrates geographic information with CLIP's vision-language capabilities to boost species recognition accuracy, especially in few-shot learning scenarios. Similarly, GeoChat [47], built upon CLIP-ViT(L-14) [12] and fine-tuned with LLaVA-1.5 [48] using the LoRA [49] technique, extends CLIP's conversational abilities while enhancing its domain-specific knowledge for RS tasks. ...

Reference:

FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models
CLIP-Driven Few-Shot Species-Recognition Method for Integrating Geographic Information

... In cultural heritage, YOLO is employed to detect and classify artifacts in archaeological studies, aiding in the preservation of cultural history and restoration of structural defects in heritage sites [43,44]. In environmental monitoring, YOLO assists in tracking endangered species and identifying deforestation patterns through satellite imagery [45]. In healthcare, YOLO excels in automating critical tasks, such as detecting tumors in medical imaging and monitoring surgical tools during procedures [46]. ...

Identification of Rare Wildlife in the Field Environment Based on the Improved YOLOv5 Model

... Despite advancements in YOLO's architecture, newer versions still face challenges in accurately detecting animals in complex natural environments. Issues such as weather variations, lighting changes and animal occlusions frequently result in missed detections [29]. To overcome this, the model needs to capture richer features. ...

Wildlife Real-Time Detection in Complex Forest Scenes Based on YOLOv5s Deep Learning Network

... TLS emits laser pulses to capture high-resolution, three-dimensional (3D) point cloud data of forest structures [15]. The ability to rapidly and accurately measure key forest structural parameters such as leaf area index (LAI) using TLS data has opened up new possibilities for the above-ground carbon stock [16][17][18]. Moreover, for assessing the carbon stock in forest plots, vegetation area index (VAI) can be another leaf and structural parameter [19]. ...

A New Method for Reconstructing Tree-Level Aboveground Carbon Stocks of Eucalyptus Based on TLS Point Clouds

... PointNet, initially introduced by the researcher Charles R. Qi in 2017, is a robust deep-learning network specializing in processing point cloud data [12]. The network distinguishes itself from other deep learning networks by efficiently mitigating the impact of the point cloud data's disorganized and inflexible rotational properties on classification accuracy [17,[19][20][21]. Since its proposal, several different network versions have been created and utilized to classify point clouds. ...

Forest-PointNet: A Deep Learning Model for Vertical Structure Segmentation in Complex Forest Scenes

... Traditional lithological mapping relies heavily on manual field surveys, and the accuracy of lithological mapping is significantly constrained by the expertise of mappers. Typically, different individuals may produce different results of lithological mapping and this process is commonly imprecise, slow and costly (Dong et al. 2023). Therefore, the urgent challenge in the field of geology is how to realize accurate and automatic lithological mapping by using various geological survey data which can be directly acquired from the field with specific equipment or from the laboratory. ...

Combining the Back Propagation Neural Network and Particle Swarm Optimization Algorithm for Lithological Mapping in North China

... Conventional machine vision techniques rely on image processing and geometry-based algorithms to analyze and interpret data derived from the natural environment. Leveraging common geometric features (curvature, density, etc.), growing patterns, and topological information [11,12], conventional machine vision techniques facilitate both wood-leaf separation and structural continuity analysis of arboreal elements [13]. Moreover, identifying non-photosynthetic constituents based on segment linearity, along with other methods, enhances the analysis, further augmenting the overall analytical process [14]. ...

Unsupervised Semantic Segmenting TLS Data of Individual Tree Based on Smoothness Constraint Using Open-Source Datasets
  • Citing Article
  • January 2022

IEEE Transactions on Geoscience and Remote Sensing

... Terrestrial laser scanning (TLS), however, allows a non-destructive three-dimensional assessment and reconstruction of the crown components (Li et al., 2020) while avoiding labor-intensive fieldwork, so that also more and larger sample trees may be assessed (Brede et al., 2019;Calders et al., 2020;Fan et al., 2022). Several studies have found close agreement between crown biomass estimates from TLS and destructive measurements (e.g., Calders et al., 2015;de Tanago et al., 2018;Fan et al., 2020;Momo Takoudjou et al., 2018;Muumbe et al., 2021;Raumonen et al., 2013). ...

Plot-level reconstruction of 3D tree models for aboveground biomass estimation

Ecological Indicators

... Different from airborne laser scanning (ALS), the terrestrial laser scanner (TLS) provides more accurate structural information about the understory. Fu et al. (2022) [22] detected the trunks using improved DBSCAN in TLS data and used Hough circle fitting to modify the detection results. Xu et al. (2023) [23] used Topology-based Tree Segmentation (TTS) to segment individual trees. ...

Segmenting Individual Tree from TLS Point Clouds Using Improved DBSCAN

Forests

... The QSM creation relies on fitting 3D cylinders into the point cloud, trying to copy the structure of a tree as well as possible while coping with noise and gaps in the point cloud [23]. Alternatives to the TreeQSM method include the SimpleTree [24] and AdQSM [25] programs, which claim to incorporate certain improvements over TreeQSM. All these software tools employ similar Quantitative Structure Modelling algorithms; however, TreeQSM is frequently used as a benchmarking method, and its code appears to be actively maintained. ...

Low Cost Automatic Reconstruction of Tree Structure by AdQSM with Terrestrial Close-Range Photogrammetry

Forests