ArticlePublisher preview available

Deep similarity segmentation model for sensor-based activity recognition

Springer Nature
Multimedia Tools and Applications
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Signal segmentation is a critical stage in activity recognition. Most existing studies adopted the fixed-size sliding window method for this stage. However, the fixed-size sliding window may not produce the most effective segmentation method since human activities have variable length durations, particularly transitional activities. In this paper, we propose a novel deep similarity segmentation model that overcomes not only the limitations of the fixed sliding window method but also the weaknesses of threshold-based segmentation methods. Specifically, a novel deep learning model is designed to distinguish between transitional and basic activity by treating the segmentation task as a binary classification task. The proposed model accepts multiple sequence windows and extracts the local features automatically for each window using convolutional neural networks. The temporal features of windows are extracted by measuring the similarity and differentiation between the local features of adjacent windows. The local features are combined with the temporal features and passed to deep fully connected layers to distinguish the transitional activity from the basic activity windows. The evaluation relies on two public datasets, SBHARPT and FORTH-TRACE. According to the experimental findings, the proposed approach can distinguish between basic and transitional activities with an accuracy of 98.51% and 98.41%, respectively. Additionally, our method outperformed the fixed sliding window for activity recognition by 2.93% and 2.24% for both datasets, respectively, achieving an accuracy of 93.35% and 84.96%. These results are significant and outperform the precision of cutting-edge models.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Multimedia Tools and Applications (2025) 84:8869–8892
https://doi.org/10.1007/s11042-024-18933-2
1 3
Deep similarity segmentation model forsensor-based
activity recognition
AbdulRahmanBaraka1· Mohd HalimMohdNoor1
Received: 24 January 2023 / Revised: 24 January 2024 / Accepted: 13 March 2024 /
Published online: 4 May 2024
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024
Abstract
Signal segmentation is a critical stage in activity recognition. Most existing studies adopted
the fixed-size sliding window method for this stage. However, the fixed-size sliding win-
dow may not produce the most effective segmentation method since human activities have
variable length durations, particularly transitional activities. In this paper, we propose a
novel deep similarity segmentation model that overcomes not only the limitations of the
fixed sliding window method but also the weaknesses of threshold-based segmentation
methods. Specifically, a novel deep learning model is designed to distinguish between tran-
sitional and basic activity by treating the segmentation task as a binary classification task.
The proposed model accepts multiple sequence windows and extracts the local features
automatically for each window using convolutional neural networks. The temporal fea-
tures of windows are extracted by measuring the similarity and differentiation between the
local features of adjacent windows. The local features are combined with the temporal fea-
tures and passed to deep fully connected layers to distinguish the transitional activity from
the basic activity windows. The evaluation relies on two public datasets, SBHARPT and
FORTH-TRACE. According to the experimental findings, the proposed approach can dis-
tinguish between basic and transitional activities with an accuracy of 98.51% and 98.41%,
respectively. Additionally, our method outperformed the fixed sliding window for activity
recognition by 2.93% and 2.24% for both datasets, respectively, achieving an accuracy of
93.35% and 84.96%. These results are significant and outperform the precision of cutting-
edge models.
Keywords Signal segmentation· Deep learning· Transitional activity
* Mohd Halim Mohd Noor
halimnoor@usm.my
AbdulRahman Baraka
abarakeh@qou.edu
1 School ofComputer Sciences, Universiti Sains Malaysia, Gelugor, PulauPinang, Malaysia
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... In this case they evaluated their approach using leave-onesubject-out cross-validation which is equivalent to a UIM. In the work of Baraka and Mohd Noor (2024), they split the train and test data in different user groups. For example, for one of the datasets, 23 subjects were used for the training set and 7 users for the test set. ...
Article
Full-text available
Typically, machine learning classifiers are trained and evaluated without making any distinction between users (e.g., using traditional hold-out and cross-validation). However, this produces inaccurate performance metrics estimates in multi-user settings, that is, situations where the data were collected by multiple users with different characteristics (e.g., age, gender, height, etc.) which is very common in user computer interaction and medical applications. For these types of scenarios, model evaluation strategies that provide better performance estimates have been proposed such as mixed, user-independent, user-dependent, and user-adaptive models. Although those strategies are better suited for multi-user systems, they are typically assessed with respect to performance metrics that capture the overall behavior of the models and do not provide any performance guarantees for individual predictions nor do they provide any feedback about the predictions’ uncertainty. In order to overcome those limitations, in this work we evaluated the conformal prediction framework in several multi-user settings. Conformal prediction is a model agnostic method that provides confidence guarantees on the predictions, thus increasing the trustworthiness and robustness of the models. We propose a new type of benchmark model (user-calibrated) and conducted extensive experiments using different evaluation strategies and found significant differences in terms of conformal performance measures. Our results show the importance of taking into account different evaluation strategies in multi-user systems. We also propose several visualizations based on matrices, graphs, and charts that capture different aspects of the prediction sets. These visualizations allow for a more fine-grained analysis compared to traditional plots such as confusion matrices.
Article
Full-text available
The fixed sliding window is the commonly used technique for signal segmentation in human activity recognition. However, the fixed sliding window may not produce optimal segmentation because human activities have varying durations, especially for transitional activities. This is because a large window size may contain activity signals belonging to different activities, and a small window size may split the activity signal into multiple windows. Furthermore, the fixed sliding window does not consider the relationship between adjacent windows, which may affect the performance of the human activity recognition model. In this study, we propose a similarity segmentation approach that exploits the temporal structure of the activity signal within the window segmentation process. Specifically, the proposed approach segments each window into sub-windows and extracts the inner features by measuring the similarity between them. The inner features are used to measure the dissimilarity between the adjacent windows. The proposed approach is able to distinguish between transitional and non-transitional windows, which achieves more effective segmentation and classification processes. Two public datasets are used for the evaluation. The experimental results show that the proposed approach can distinguish transitional activities from basic activities at 97.65% accuracy, which enhanced the accuracy of transitional activities recognition compared to the fixed sliding window by 33.41%. Also, our approach achieved accuracy for activity recognition of 92.71% and 86.65% for both datasets, respectively, which exceeds the fixed sliding window by 2.29% and 3.93% for both datasets, respectively. These results are significant and exceed the accuracy of state-of-the-art models.
Article
Full-text available
Sit-to-stand transitions are an important part of activities of daily living and play a key role in functional mobility in humans. The sit-to-stand movement is often affected in older adults due to frailty and in patients with motor impairments such as Parkinson’s disease leading to falls. Studying kinematics of sit-to-stand transitions can provide insight in assessment, monitoring and developing rehabilitation strategies for the affected populations. We propose a three-segment body model for estimating sit-to-stand kinematics using only two wearable inertial sensors, placed on the shank and back. Reducing the number of sensors to two instead of one per body segment facilitates monitoring and classifying movements over extended periods, making it more comfortable to wear while reducing the power requirements of sensors. We applied this model on 10 younger healthy adults (YH), 12 older healthy adults (OH) and 12 people with Parkinson’s disease (PwP). We have achieved this by incorporating unique sit-to-stand classification technique using unsupervised learning in the model based reconstruction of angular kinematics using extended Kalman filter. Our proposed model showed that it was possible to successfully estimate thigh kinematics despite not measuring the thigh motion with inertial sensor. We classified sit-to-stand transitions, sitting and standing states with the accuracies of 98.67%, 94.20% and 91.41% for YH, OH and PwP respectively. We have proposed a novel integrated approach of modelling and classification for estimating the body kinematics during sit-to-stand motion and successfully applied it on YH, OH and PwP groups.
Article
Full-text available
In recent years, much research has been conducted on time series based human activity recognition (HAR) using wearable sensors. Most existing work for HAR is based on the manual labeling. However, the complete time serial signals not only contain different types of activities, but also include many transition and atypical ones. Thus, effectively filtering out these activities has become a significant problem. In this paper, a novel machine learning based segmentation scheme with a multi-probability threshold is proposed for HAR. Threshold segmentation (TS) and slope-area (SA) approaches are employed according to the characteristics of small fluctuation of static activity signals and typical peaks and troughs of periodic-like ones. In addition, a multi-label weighted probability (MLWP) model is proposed to estimate the probability of each activity. The HAR error can be significantly decreased, as the proposed model can solve the problem that the fixed window usually contains multiple kinds of activities, while the unknown activities can be accurately rejected to reduce their impacts. Compared with other existing schemes, computer simulation reveals that the proposed model maintains high performance using the UCI and PAMAP2 datasets. The average HAR accuracies are able to reach 97.71% and 95.93%, respectively.
Article
Full-text available
Human activity recognition has gained interest from the research community due to the advancements in sensor technology and the improved machine learning algorithm. Wearable sensors have become more ubiquitous, and most of the wearable sensor data contain rich temporal structural information that describes the distinct underlying patterns and relationships of various activity types. The nature of those activities is typically sequential, with each subsequent activity window being the result of the preceding activity window. However, the state-of-the-art methods usually model the temporal characteristic of the sensor data and ignore the relationship of the sliding window. This research proposes a novel deep temporal Conv-LSTM architecture to enhance activity recognition performance by utilizing both temporal characteristics from sensor data and the relationship of sliding windows. The proposed architecture is evaluated based on the dataset consisting of transition activities—Smartphone-Based Recognition of Human Activities and Postural Transitions dataset. The proposed hybrid architecture with parallel features learning pipelines has demonstrated the ability to model the temporal relationship of the activity windows where the transition of activities is captured accurately. Besides that, the size of sliding windows is studied, and it has shown that the selection of window size is affecting the accuracy of the activity recognition. The proposed deep temporal Conv-LSTM architecture can achieve an accuracy score of 0.916, which outperformed the state-of-the-art accuracy.
Article
Full-text available
Computing devices that can recognize various human activities or movements can be used to assist people in healthcare, sports, or human–robot interaction. Readily available data for this purpose can be obtained from the accelerometer and the gyroscope built into everyday smartphones. Effective classification of real-time activity data is, therefore, actively pursued using various machine learning methods. In this study, the transformer model, a deep learning neural network model developed primarily for the natural language processing and vision tasks, was adapted for a time-series analysis of motion signals. The self-attention mechanism inherent in the transformer, which expresses individual dependencies between signal values within a time series, can match the performance of state-of-the-art convolutional neural networks with long short-term memory. The performance of the proposed adapted transformer method was tested on the largest available public dataset of smartphone motion sensor data covering a wide range of activities, and obtained an average identification accuracy of 99.2% as compared with 89.67% achieved on the same data by a conventional machine learning method. The results suggest the expected future relevance of the transformer model for human activity recognition.
Article
Full-text available
Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human–computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning has greatly pushed the boundaries of HAR on mobile and wearable devices. This paper systematically categorizes and summarizes existing work that introduces deep learning methods for wearables-based HAR and provides a comprehensive analysis of the current advancements, developing trends, and major challenges. We also present cutting-edge frontiers and future directions for deep learning-based HAR.
Article
Full-text available
In recent years, a plethora of algorithms have been devised for efficient human activity recognition. Most of these algorithms consider basic human activities and neglect postural transitions because of their subsidiary occurrence and short duration. However, postural transitions assume a significant part in the enforcement of an activity recognition framework and cannot be neglected. This work proposes a hybrid multi-model activity recognition approach that employs basic and transition activities by utilizing multiple deep learning models simultaneously. For final classification, a dynamic decision fusion module is introduced. The experiments are performed on the publicly available datasets. The proposed approach achieved a classification accuracy of 96.11% and 98.38% for the transition and basic activities, respectively. The outcomes show that the proposed method is superior to the state-of-the-art methods in terms of accuracy and precision.
Article
Deep learning for sensor-based human activity recognition (HAR) has been a focus of research in recent years. Sensor data stream segmentation is a core element in HAR, which has currently been treated as an independent preprocessing task, usually with a fixed-size window. This has led to two critical problems, namely the multi-class window problem caused by possible multiple activities within a fixed-size window and the fluctuation of prediction results due to noisy data and over-segmentation. To address these research challenges, in this paper, we conceive a novel Multi-Task deep learning approach to segmenting and recognizing human activity simultaneously. Specifically, we propose a multi-scale window method based on feature sequence generation to overcome the multi-class window problem. We develop a novel boundary offset prediction algorithm to adjust a windows boundary to tackle the over-segmentation issue. In addition, We design a multi-task framework to streamline and optimize the activity recognition and segmentation tasks simultaneously. We conduct extensive experiments on eight benchmark datasets to evaluate the proposed framework and associated methods. Initial results show that our approach outperforms the performance of current state-of-the-art HAR methods.
Article
Human Activity Recognition (HAR) plays a significant role in the everyday life of people because of its ability to learn extensive high-level information about human activity from wearable or stationary devices. A substantial amount of research has been conducted on HAR and numerous approaches based on deep learning have been exploited by the research community to classify human activities. The main goal of this review is to summarize recent works based on a wide range of deep neural networks architecture, namely convolutional neural networks (CNNs) for human activity recognition. The reviewed systems are clustered into four categories depending on the use of input devices like multimodal sensing devices, smartphones, radar, and vision devices. This review describes the performances, strengths, weaknesses, and the used hyperparameters of CNN architectures for each reviewed system with an overview of available public data sources. In addition, a discussion of the current challenges to CNN-based HAR systems is presented. Finally, this review is concluded with some potential future directions that would be of great assistance for the researchers who would like to contribute to this field. We conclude that CNN-based approaches are suitable for effective and accurate human activity recognition system applications despite challenges including availability of data regarding composite or group activities, high computational resource requirements, data privacy concerns, and edge computing limitations. For widespread adaptation, future research should be focused on more efficient edge computing techniques, datasets incorporating contextual information with activities, more explainable methodologies, and more robust systems.