Enqing Chen’s research while affiliated with Zhengzhou University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (28)


Figure 3: STC attention model. Sequencing sub-modules are SAM, TAM, and CAM-multiplication indicated by  while addition indicated by ⊕.
Figure 4: The architecture of the TAM module.
Figure 6: Shows network architecture. Nine construction blocks (B1-B9) exist. These three integers represent each block's input, output, and stride. GAP stands for "global average pooling layer."
Figure 9: UCF-101 comparisons curve with state-of-the-art methods.
Figure 11: Examples of trained spatial attention maps. The circle size denotes joint significance.

+6

Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)
  • Preprint
  • File available

November 2024

·

49 Reads

·

Xin Guo

·

Enqing Chen

·

[...]

·

Sami Ullah

Graph convolutional networks (GCNs) are an effective skeleton-based human action recognition (HAR) technique. GCNs enable the specification of CNNs to a non-Euclidean frame that is more flexible. The previous GCN-based models still have a lot of issues: (I) The graph structure is the same for all model layers and input data.

Download

Human Action Recognition (HAR) Using Skeleton-based Quantum Spatial Temporal Relative Transformer Network: ST-RTR

October 2024

·

47 Reads

Quantum Human Action Recognition (HAR) is an interesting research area in human-computer interaction used to monitor the activities of elderly and disabled individuals affected by physical and mental health. In the recent era, skeleton-based HAR has received much attention because skeleton data has shown that it can handle changes in striking, body size, camera views, and complex backgrounds. One key characteristic of ST-GCN is automatically learning spatial and temporal patterns from skeleton sequences. It has some limitations, as this method only works for short-range correlation due to its limited receptive field. Consequently, understanding human action requires long-range interconnection. To address this issue, we developed a quantum spatial-temporal relative transformer ST-RTR model. The ST-RTR includes joint and relay nodes, which allow efficient communication and data transmission within the network. These nodes help to break the inherent spatial and temporal skeleton topologies, which enables the model to understand long-range human action better. Furthermore, we combine quantum ST-RTR with a fusion model for further performance improvements. To assess the performance of the quantum ST-RTR method, we conducted experiments on three skeleton-based HAR benchmarks: NTU RGB+D 60, NTU RGB+D 120, and UAV-Human. It boosted CS and CV by 2.11 % and 1.45% on NTU RGB+D 60, 1.25% and 1.05% on NTU RGB+D 120. On UAV-Human datasets, accuracy improved by 2.54%. The experimental outcomes explain that the proposed ST-RTR model significantly improves action recognition associated with the standard ST-GCN method.



Advancements in Human Action Recognition Through 5G/6G Technology for Smart Cities: Fuzzy Integral-Based Fusion

August 2024

·

15 Reads

·

6 Citations

IEEE Transactions on Consumer Electronics

5-G/6G technology improves skeleton-based human action recognition (HAR) by delivering ultra-low latency and high data throughput for real-time and accurate security analysis of human actions. Despite its growing popularity, current HAR methods frequently fail to capture the skeleton sequence’s complexities. This study proposes a novel multimodal method that synergizes the Spatial-Temporal Attention LSTM (STA-LSTM) Network with the Convolutional Neural Network (CNN) to extract nuanced features from the skeleton sequence. The STA-LSTM network dives deep into inter-and intra-frame relations, while the CNN model uncovers geometric correlations within the human skeleton. Significantly, by integrating the Choquet fuzzy integral, we achieve a harmonized fusion of classifiers for each feature vector. Adopting Kullback Leibler and Jensen-Shannon divergences further ensures the complementary nature of these feature vectors. STA-LSTM Network and CNN in the proposed multimodal method significantly advance human action recognition. Impressive accuracy was demonstrated by our approach after evaluating benchmark skeletal datasets such as NTU-60, NTU-120, HDM05, and UT-DMHAD. Specifically, it achieved C-subject 90.75%, 84.50%, and C-setting 96.7% and 86.70% on NTU-60 and NTU-120, respectively. Furthermore, HDM05 and UT-DMHAD datasets recorded accuracies of 93.5% and 97.43%, indicating that our model outperforms current techniques and has excellent potential for sentiment analysis platforms that combine textual and visual signals.




Improving Feature Learning in Remote Sensing Images Using an Integrated Deep Multi-Scale 3D/2D Convolutional Network

June 2023

·

93 Reads

·

1 Citation

Developing complex hyperspectral image (HSI) sensors that capture high-resolution spatial information and voluminous (hundreds) spectral bands of the earth’s surface has made HSI pixel-wise classification a reality. The 3D-CNN has become the preferred HSI pixel-wise classification approach because of its ability to extract discriminative spectral and spatial information while maintaining data integrity. However, HSI datasets are characterized by high nonlinearity, voluminous spectral features, and limited training sample data. Therefore, developing deep HSI classification methods that purely utilize 3D-CNNs in their network structure often results in computationally expensive models prone to overfitting when the model depth increases. In this regard, this paper proposes an integrated deep multi-scale 3D/2D convolutional network block (MiCB) for simultaneous low-level spectral and high-level spatial feature extraction, which can optimally train on limited sample data. The strength of the proposed MiCB model solely lies in the innovative arrangement of convolution layers, giving the network the ability (i) to simultaneously convolve the low-level spectral with high-level spatial features; (ii) to use multiscale kernels to extract abundant contextual information; (iii) to apply residual connections to solve the degradation problem when the model depth increases beyond the threshold; and (iv) to utilize depthwise separable convolutions in its network structure to address the computational cost of the proposed MiCB model. We evaluate the efficacy of our proposed MiCB model using three publicly accessible HSI benchmarking datasets: Salinas Scene (SA), Indian Pines (IP), and the University of Pavia (UP). When trained on small amounts of training sample data, MiCB is better at classifying than the state-of-the-art methods used for comparison. For instance, the MiCB achieves a high overall classification accuracy of 97.35%, 98.29%, and 99.20% when trained on 5% IP, 1% UP, and 1% SA data, respectively.


RETRACTED ARTICLE: Automatically human action recognition (HAR) with view variation from skeleton means of adaptive transformer network

April 2023

·

52 Reads

·

7 Citations

Soft Computing

Human action recognition using skeletons has become increasingly appealing to a growing number of researchers in recent years. It is particularly challenging to recognize actions when they are captured from different angles because there are so many variations in their representations. This paper proposes an automatic strategy for determining virtual observation viewpoints that are based on learning and data driven to solve the problem of view variation throughout an act. Our VA-CNN and VA-RNN networks, which use convolutional and recurrent neural networks with long short-term memory, offer an alternative to the conventional method of reorienting skeletons according to a human-defined earlier benchmark. Using the unique view adaption module, each network first identifies the best observation perspectives and then transforms the skeletons for end-to-end detection with the main classification network based on those viewpoints. The suggested view adaptive models can provide significantly more consistent virtual viewpoints using the skeletons of different perspectives. By removing views, the models allow networks to learn action-specific properties more efficiently. Furthermore, we developed a two-stream scheme (referred to as VA-fusion) that integrates the performance of two networks to obtain an improved prediction. Random rotation of skeletal sequences is used to avoid overfitting during training and improve the reliability of view adaption models. An extensive experiment demonstrates that our proposed view adaptive networks outperform existing solutions on five challenging benchmarks.


Automatically Human Action Recognition (HAR) with View Variation from Skeleton Means of Adaptive Transformer Network

November 2022

·

154 Reads

Human Action Recognition (HAR) using skeletons has become increasingly appealing to a growing number of researchers in recent years. It is particularly challenging to recognize actions when they are captured from different angles because there are so many variations in their representations. This paper proposes an automatic strategy for determining virtual observation viewpoints that are based on learning and data-driven to solve the problem of view variation throughout an act. Our VA-CNN and VA-RNN networks, which use convolutional and recurrent neural networks with the Long Short-term Memory (LSTM), offer an alternative to the conventional method of reorienting skeletons according to a human-defined earlier benchmark. Using the unique view adaption module, each network first identifies the best observation perspectives and then transforms the skeletons for end-to-end detection with the main classification network based on those viewpoints. Using the skeletons of different perspectives, the suggested view adaptive models can provide significantly more consistent virtual viewpoints. By removing views, the models allow networks to learn action-specific properties more efficiently. Furthermore, we developed a two-stream scheme (referred to as VA-fusion) that integrates the performance of two networks to obtain an improved prediction. Random rotation of skeletal sequences is used to avoid overfitting during training and improve the reliability of view adaption models. An extensive experiment demonstrates that our proposed view-adaptive networks outperform existing solutions on five challenging benchmarks.


Extended Multi-Stream Adaptive Graph Convolutional Networks (EMS-AAGCN) for Skeleton-Based Human Action Recognition

September 2022

·

162 Reads

·

1 Citation

Skeleton-based action recognition using graph convolutional networks (GCNs), which specify CNNs to a more flexible non-Euclidean frame, has shown outstanding results. However, many problems remain the same in the earlier GCN-based models. (I) All model layers and input data have the same graph structure. Given the GCN model’s hierarchy and the variety of action recognition input, this may not be appropriate. (II) Bone length and orientation are rarely studied because they are too helpful and different for human action recognition. This paper Article Title presents an extended multi-stream adaptive graph convolutional neu-ral network (EMS-AAGCN) for skeleton-based action recognition. The proposed model’s network topology can be trained uniformly or individually based on the input data. This approach is based on data, allowing the model to make graphs more flexible and quickly adapt to various datasets. In addition, a spatial-temporal channel attention module in the suggested adaptive graph convolutional layer allows it to focus more on joints, frames, and features. Furthermore, an enhanced multi-stream framework models joints,bones and their motion, improving recognition accuracy. Our method significances the state-of-the-art on the two massive datasets, NTU-RGBD and Kinetics-Skeleton.


Citations (21)


... Ethereum is an open-source blockchain-based platform that establishes a distributed peer-to-peer network for the secure execution and verification of intelligent contract code [8,9]. ...

Reference:

The Impact of Blockchain Technology for Evaluation of Electronic Medical Certification
Extended Multi-stream Temporal-attention Module for Skeleton-based Human Action Recognition (HAR)
  • Citing Article
  • October 2024

Computers in Human Behavior

... Internet criminality takes many forms, but common ones include fraud and assaults on computer systems' security (e.g., hacks, malicious activity, and vulnerability). A scam is an illegal scheme to financially benefit or deceive another person [15,16]. As a network of miners must validate every transaction, collaborative verification and authentication are impossible for those susceptible to unwanted access [17]. ...

Advancements in Human Action Recognition Through 5G/6G Technology for Smart Cities: Fuzzy Integral-Based Fusion
  • Citing Article
  • August 2024

IEEE Transactions on Consumer Electronics

... Accuracy (5-way-5-shot) SNAIL [96] 68.9% TPN [97] 69.4% BaseTransformers [98] 73.4% EGNN [99] 76.4% Shot-Free [100] 77.6% Meta-Transfer [101] 75.5% Dense [102] 79.0% MetaOptnet [103] 78.6% Constellation [104] 80.0% P-transfer [105] 80.1% DeepEMD [106] 82.4% BaseTransformers [98] 82.4% MGGN [107] 83.3% DMC-CNN (2-view) [51] 84.1% ...

Edge-labeling based modified gated graph network for few-shot learning
  • Citing Article
  • June 2024

Pattern Recognition

... However, the majority of studies focusing on hyperspectral image classification have predominantly employed CNN and RNN approaches [6][7][8]. CNN-based methods primarily emphasize local features within the hyperspectral images themselves, overlooking the distinctive spectral features unique to hyperspectral data [9]. On the other hand, RNN approaches solely attend to the distinct spectral sequence features, lacking the ability to effectively process global sequence information due to their unidirectional nature. ...

Improving Feature Learning in Remote Sensing Images Using an Integrated Deep Multi-Scale 3D/2D Convolutional Network

... (3) The natural topological graph structure of the human skeleton is used in the graph neural network (GNN) based approaches [5,20] that made utilization of both spatial and temporal data. Among the three techniques, the ST-GCN is the most expressive and the first to capture the balance between spatial and temporal relationships [21,22]. The ST-GCN baseline model was employed in this study, and the workings of this model are described in depth in Section 3.2. ...

RETRACTED ARTICLE: Automatically human action recognition (HAR) with view variation from skeleton means of adaptive transformer network

Soft Computing

... Previous studies [13][14][15] have confirmed that second-order information about the skeleton plays a complementary role in action recognition. The second-order information about the skeleton, which could also be called the skeletal modality, included joint coordinates, bone vector, joint coordinate motion, and bone vector motion. ...

Extended Multi-Stream Adaptive Graph Convolutional Networks (EMS-AAGCN) for Skeleton-Based Human Action Recognition

... By using voxelization, we can ensure that our models effectively learn from the geometric complexity of the industrial components, leading to more precise predictions. Various studies in computer science and point cloud processing emphasize the importance of voxelization in processing point cloud data for tasks like object detection and feature extraction [26,27]. In the field of mechanical parts manufacturing, authors of [28] innovatively applied Convolutional Neural Networks (CNN) to predict manufacturing costs. ...

TR-Net: A Transformer-Based Neural Network for Point Cloud Processing

... Further, several research studies have utilized joint spectral and spatial features to improve the classification [5]. Vision transformers (ViTs) have recently been proposed to provide long-range dependency on spatial and spectral features for the classification of land objects [6]. ...

HybridGBN-SR: A Deep 3D/2D Genome Graph-Based Network for Hyperspectral Image Classification

... Furthermore, this study employed a combination of deep learning algorithms to examine the properties of fast-moving films and enhance their ability to identify features. According to study findings, the updated algorithm can analyze high-speed films efficiently and increase motion video frame feature detection [8]. Kikuchi et al. designed a method capable of performing real-time detection and virtual removal of existing buildings from a video stream, aiming to more intuitively demonstrate a future scene without these buildings. ...

A deep learning algorithm for fast motion video sequences based on improved codebook model

Neural Computing and Applications

... Li et al. [32] leveraged RL to analyze electronic health records (EHRs) for sequential decision-making, employing a model-free Deep Q-Networks (DQN) algorithm for clinical decision support. Guo et al. [33] proposed a dynamic weight assignment network inspired by advanced RL algorithms, demonstrating its application in human activity recognition. RL's ability to integrate multi-agent frameworks further enhances its potential for mental health monitoring by enabling concurrent learning across multiple parameters. ...

A Deep Reinforcement Learning Method For Multimodal Data Fusion in Action Recognition
  • Citing Article
  • November 2021

Signal Processing Letters, IEEE