Feng Jiang’s research while affiliated with Harbin Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (264)


A Ranking Scheme for Trust Region Multi-agent Reinforcement Learning
  • Conference Paper

April 2025

·

1 Read

Ruichen Gao

·

Yi Hu

·

Deqin Zheng

·

[...]

·

Feng Jiang

Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding

March 2025

·

6 Reads

Referring expression comprehension (REC) aims at achieving object localization based on natural language descriptions. However, existing REC approaches are constrained by object category descriptions and single-attribute intention descriptions, hindering their application in real-world scenarios. In natural human-robot interactions, users often express their desires through individual states and intentions, accompanied by guiding gestures, rather than detailed object descriptions. To address this challenge, we propose Multi-ref EC, a novel task framework that integrates state descriptions, derived intentions, and embodied gestures to locate target objects. We introduce the State-Intention-Gesture Attributes Reference (SIGAR) dataset, which combines state and intention expressions with embodied references. Through extensive experiments with various baseline models on SIGAR, we demonstrate that properly ordered multi-attribute references contribute to improved localization performance, revealing that single-attribute reference is insufficient for natural human-robot interaction scenarios. Our findings underscore the importance of multi-attribute reference expressions in advancing visual-language understanding.


Machine-Learning-Based Automatic Metallographic Grading System for High-Gloss Anodized Aluminum Profiles

March 2025

·

8 Reads

The excellent “mirror” effect of medium and high-strength aluminum alloy profiles from the 6-series, achieved through anodizing, is highly valued by customers. Metallographic analysis is a key method for predicting the anodizing effect. However, traditional metallographic analysis methods suffer from unstable accuracy and low efficiency. To address these issues, this paper successfully develops a metallographic grading system by constructing a dataset and integrating computer vision with machine-learning techniques. Based on grain classification, the system automatically determines the metallographic grade by analyzing the proportion of good grain areas. After applying SMOTE sampling and 10-fold cross-validation to the machine-learning algorithm, we conducted a comparative analysis of the model’s performance from the perspectives of accuracy, good grain recall rate, bad grain recall rate, and AUC. The XGBoost model, selected as the final predictive model from 18 machine-learning models due to its superior performance, achieved a grain classification accuracy of 96.21% and a good grain recall rate of 98.07%. Both the accuracy and good grain recall standard deviations were less than 0.02. These results indicate that the model can effectively distinguish between good and bad grains with high robustness. Additionally, the average time for metallographic grading is less than 9 s. In comparison to the instability of traditional manual grading, this method significantly enhances both the accuracy and efficiency of metallographic analysis while also reducing grading costs.


EGENN: An Efficient Graph-Enhanced Neural Network for Multivariate Time Series Forecasting

March 2025

·

5 Reads

Graph Neural Network (GNN) has been widely applied in multivariate time series forecasting due to its excellent relationship modeling capabilities. However, current methods still face limitations in computational efficiency or time series expression capabilities. To address these issues, we propose an Efficient Graph-Enhanced Neural Network (EGENN), which consists of an adjacency matrix generator, GNN, and projection module. Firstly, EGENN designs a spectral similarity-based graph construction method and further enhances the expressive power of temporal features. Secondly, we introduce an inter-layer attention graph convolutional network, which adaptively aggregates information from different network depths to better capture complex patterns. Finally, a predictive projection strategy fusing wavelet convolutions and patch-wise transformation is proposed to produce compact parameterization and extended receptive fields. Experiments on five datasets from different domains show that our model achieves state-of-the-art prediction performance while maintaining low computational resource consumption.


Sensor placement and data collection environment: (a) For the lower body, six sEMG sensors were placed on both sides of the legs, while 16 Vicon markers were used to collect ground-truth data. An Intel RealSense T265 sensor was mounted on the waist. (b) Ten Vicon cameras were positioned on the ceiling to capture reflective markers on the lower body, and an RGB camera was placed on the side wall. The subject performed walking trials on flat ground, both clockwise and counterclockwise.
Overall schematic of proposed framework, totally including three phases.
The pose estimation network is pipelined with feature extraction, knowledge sharing, fusion of knowledge and pose regression.
The structure of CBAM-Resnet12 is composed of a combination of CBAM module, residual block, convolution layer and max pooling layer.
Results on different subjects with different scales of pre-training.

+1

Meta-Transfer-Learning-Based Multimodal Human Pose Estimation for Lower Limbs
  • Article
  • Full-text available

March 2025

·

3 Reads

Accurate and reliable human pose estimation (HPE) is essential in interactive systems, particularly for applications requiring personalized adaptation, such as controlling cooperative robots and wearable exoskeletons, especially for healthcare monitoring equipment. However, continuously maintaining diverse datasets and frequently updating models for individual adaptation are both resource intensive and time-consuming. To address these challenges, we propose a meta-transfer learning framework that integrates multimodal inputs, including high-frequency surface electromyography (sEMG), visual-inertial odometry (VIO), and high-precision image data. This framework improves both accuracy and stability through a knowledge fusion strategy, resolving the data alignment issue, ensuring seamless integration of different modalities. To further enhance adaptability, we introduce a training and adaptation framework with few-shot learning, facilitating efficient updating of encoders and decoders for dynamic feature adjustment in real-time applications. Experimental results demonstrate that our framework provides accurate, high-frequency pose estimations, particularly for intra-subject adaptation. Our approach enables efficient adaptation to new individuals with only a few new samples, providing an effective solution for personalized motion analysis with minimal data.

Download



Image Compressive Sensing With Scale-Variable Adaptive Sampling and Hybrid-Attention Transformer Reconstruction

January 2025

·

10 Reads

IEEE Transactions on Multimedia

Recently, a large number of image compressive sensing (CS) methods with deep unfolding networks (DUNs) have been proposed. However, existing methods either use fixed-scale blocks for sampling that leads to limited insights into the image content or employ a plain convolutional neural network (CNN) in each iteration that weakens the perception of broader contextual prior. In this paper, we propose a novel DUN (dubbed SVASNet) for image compressive sensing, which achieves scale-variable adaptive sampling and hybrid-attention Transformer reconstruction with a single model. Specifically, for scale-variable sampling, a sampling matrix-based calculator is first employed to evaluate the reconstruction distortion, which only requires measurements without access to the ground truth image. Then, a Block Scale Aggregation (BSA) strategy is presented to compute the reconstruction distortion under block divisions at different scales and select the optimal division scale for sampling. To realize hybrid-attention reconstruction, a dual Cross Attention (CA) submodule in the gradient descent step and a Spatial Attention (SA) submodule in the proximal mapping step are developed. The CA submodule introduces inter-phase inertial forces in the gradient descent, which improves the memory effect between adjacent iterations. The SA submodule integrates local and global prior representations of CNN and Transformer, and explores local and global affinities between dense feature representations. Extensive experimental results show that the proposed SVASNet achieves significant improvements over the state-of-the-art methods.


Investigating the Impact of Cognitive Load on Human Trust in Hybrid Human-Robot Collaboration

December 2024

·

34 Reads

Human trust plays a crucial role in the effectiveness of human-robot collaboration. Despite its significance, the development and maintenance of an optimal trust level are obstructed by the complex nature of influencing factors and their mechanisms. This study investigates the effects of cognitive load on human trust within the context of a hybrid human-robot collaboration task. An experiment is conducted where the humans and the robot, acting as team members, collaboratively construct pyramids with differentiated levels of task complexity. Our findings reveal that cognitive load exerts diverse impacts on human trust in the robot. Notably, there is an increase in human trust under conditions of high cognitive load. Furthermore, the rewards for performance are substantially higher in tasks with high cognitive load compared to those with low cognitive load, and a significant correlation exists between human trust and the failure risk of performance in tasks with low and medium cognitive load. By integrating interdependent task steps, this research emphasizes the unique dynamics of hybrid human-robot collaboration scenarios. The insights gained not only contribute to understanding how cognitive load influences trust but also assist developers in optimizing collaborative target selection and designing more effective human-robot interfaces in such environments.



Citations (55)


... 3D panoptic scene understanding refers to the ability of computer systems to recognize both categorical "stuff" regions and individual "thing" instances within 3D visual scenes. This capability supports a range of applications (Siddiqui et al. 2023;Hui et al. 2023;Xie et al. 2024;Hui et al. 2024), such as augmented reality, virtual reality, robot navigation, and self-driving. ...

Reference:

Multi-view Consistent 3D Panoptic Scene Understanding
S 2 -CSNet: Scale-Aware Scalable Sampling Network for Image Compressive Sensing
  • Citing Conference Paper
  • October 2024

... An analysis of the twelve papers in this issue reveals that technological advancements now focus on developing algorithms to exploit data from existing systems [1,2] and incorporating sensorized devices [3,4]. Other studies explore applications involving Virtual Reality and exoskeletons [5][6][7]. ...

Estimation of Lower Limb Joint Angles Using sEMG Signals and RGB-D Camera

... This end-to-end learning approach enables CNNs to capture intricate patterns and relationships in CT images, leading to improved accuracy and generalization performance. By leveraging large-scale annotated datasets, CNNs can be trained to recognize subtle differences in image features and make accurate predictions, aiding radiologists in diagnosing and interpreting medical images (17). CNNs have been applied to a wide range of clinical tasks, including but not limited to: ...

Enhancing Pulmonary Nodule Detection Rate Using 3D Convolutional Neural Networks With Optical Flow Frame Insertion Technique

IEEE Access

... The work in [20,25,26,27] identifies the need for summarization in wireless capsule endoscopy and uses transfer learning techniques to lighten the burden at the end of health care professionals. Likewise, [28] generates a deep learning framework that should improve low-resolution luma images summarization in medical videos concerning data transmission issues in healthcare scenarios. As summarization models are advancing, multi-modal summarization [13,32,41], where video content is enriched with some other data modalities like text, audio, or sensor inputs, captures more attention. ...

Reducing Data Transmission Efficiency in Wireless Capsule Endoscopy through DL-CEndo Framework: Reconstructing Lossy Low-Resolution Luma Images and Improving Summarization

Mobile Networks and Applications

... A database based on the design of experiment (DOE) and response surface method (RSM) was created, and the Random Forests Method identified the system parameters, yielding excellent results related to the identification of systems with a force. The response surface methodology (RSM) is an optimization method that combines the response surface of an experimental sample dataset, provides the surface equation, and then solves the surface equation to obtain a set of optimal design variables [17]. ...

Multi-Objective Optimization of a Multi-Cavity, Significant Wall Thickness Difference Extrusion Profile Mold Design for New Energy Vehicles

... Compared with other attention methods, ECA(Att GC ) more accurately determines the location of the objects of interest By retaining the favorable parameter budget, CoTNet [28] uses the static and dynamic contextual information in input keys to guide self-attention learning, thus strengthening the capacity of visual representation. In addition to these works, many approaches attempt to extend attentional mechanisms to specific tasks, such as human pose reconstruction [29], medical image segmentation [30], saliency detection [31], machine translation [7], image restoration [32,33] and visual explanation [34][35][36]. ...

Stereo Image Restoration Via Attention-Guided Correspondence Learning
  • Citing Article
  • January 2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

... Pointer meter reading recognition (PMRR) involves first determining the positions of the scale lines and pointer, thereby establishing a mapping relationship between the pointer's deflection angle and the meter reading [3,19]. The pointer meter reading is then calculated using either the angle method or the distance method. ...

Read Pointer Meters Based on a Human-Like Alignment and Recognition Algorithm
  • Citing Conference Paper
  • January 2024

Communications in Computer and Information Science

... Using recognition algorithms to analyze sensor data, HAR is able to detect and classify specific human actions. This technique has significant application value in many fields such as healthcare, sports training and smart home [26][27][28]. ...

ActiveSelfHAR: Incorporating Self-Training Into Active Learning to Improve Cross-Subject Human Activity Recognition
  • Citing Article
  • January 2023

IEEE Internet of Things Journal

... Hu et al. [15] found that the secondary dendrite arm spacing (SDAS) decreased with increasing Er content for die-cast ADC12 aluminum alloys. Zhao et al. [16] and Xu et al. [17]studied the effects of different Sc addition levels (0.1 wt.%-0.45 wt.%) on the microstructure and mechanical properties of thixocast samples. It was found that a higher Sc content leaded to finer and rounder α-Al grains due to the heterogenous nucleation promoted by Al 3 Sc precipitates. ...

Morphology evolution and growth mechanism of primary Al3Sc and eutectic Al3Sc in Al-Sc alloys
  • Citing Article
  • August 2023

Journal of Materials Research and Technology