Article

Enhanced Aiot Multi‐Modal Fusion for Human Activity Recognition in Ambient Assisted Living Environment

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Methodology Human activity recognition (HAR) has emerged as a fundamental capability in various disciplines, including ambient assisted living, healthcare, human‐computer interaction, etc. This study proposes a novel approach for activity recognition by integrating IoT technologies with Artificial Intelligence and Edge Computing. This work presents a fusion HAR approach that combines data readings from wearable sensors such as accelerometer and gyroscope sensors and Images captured by vision‐based sensors such as cameras incorporating the capabilities of Long Short‐Term Memory (LSTM) and Convolutional Neural Network (CNN) models. The aim of fusing these models is to capture and extract the temporal and spatial information, improving the accuracy and resilience of activity identification systems. The work uses the CNN model to find spatial features from the images that represent the contextual information of the activities and the LSTM model for processing sequential accelerometer and gyroscope sensor data to extract the temporal dynamics from the human activities. Results The performance of our fusion approach is evaluated through different experiments using varying parameters and applies the best‐suited parameters for our model. The results demonstrate that the fusion of LSTM and CNN models outperforms standalone models and traditional fusion methods, achieving an accuracy of 98%, which is almost 9% higher than standalone models. Conclusion The fusion of LSTM and CNN models enables the integration of complementary information from both data sources, leading to improved performance. The computation tasks are performed at the local edge device resulting to enhanced privacy and reduced latency. Our approach greatly impacts real‐world applications where accurate and reliable HAR systems are essential for enhancing human‐machine interaction and monitoring human activities in various domains.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The importance of radar-based human activity recognition has increased significantly overthe last two decades in safety and smart surveillance applications due to its superiority in vision-based sensing in the presence of poor environmental conditions like low illumination, increasedradiative heat, occlusion, and fog. Increased public sensitivity to privacy protection and the progressof cost-effective manufacturing have led to higher acceptance and distribution of this technology.Deep learning approaches have proven that manual feature extraction that relies heavily on processknowledge can be avoided due to its hierarchical, non-descriptive nature. On the other hand, MLtechniques based on manual feature extraction provide a robust, yet empirical-based approach, wherethe computational effort is comparatively low. This review outlines the basics of classical ML- andDL-based human activity recognition and its advances, taking the recent progress in both categoriesinto account. For every category, state-of-the-art methods are introduced, briefly explained, and theirrelated works summarized. A comparative study is performed to evaluate the performance andcomputational effort based on a benchmarking dataset to provide a common basis for the assessmentof the techniques’ degrees of suitability.
Article
Full-text available
Smart living, a concept that has gained increasing attention in recent years, revolves around integrating advanced technologies in homes and cities to enhance the quality of life for citizens. Sensing and human action recognition are crucial aspects of this concept. Smart living applications span various domains, such as energy consumption, healthcare, transportation, and education, which greatly benefit from effective human action recognition. This field, originating from computer vision, seeks to recognize human actions and activities using not only visual data but also many other sensor modalities. This paper comprehensively reviews the literature on human action recognition in smart living environments, synthesizing the main contributions, challenges, and future research directions. This review selects five key domains, i.e., Sensing Technology, Multimodality, Real-time Processing, Interoperability, and Resource-Constrained Processing, as they encompass the critical aspects required for successfully deploying human action recognition in smart living. These domains highlight the essential role that sensing and human action recognition play in successfully developing and implementing smart living solutions. This paper serves as a valuable resource for researchers and practitioners seeking to further explore and advance the field of human action recognition in smart living.
Article
Full-text available
Human activity recognition (HAR) is becoming increasingly important, especially with the growing number of elderly people living at home. However, most sensors, such as cameras, do not perform well in low-light environments. To address this issue, we designed a HAR system that combines a camera and a millimeter wave radar, taking advantage of each sensor and a fusion algorithm to distinguish between confusing human activities and to improve accuracy in low-light settings. To extract the spatial and temporal features contained in the multisensor fusion data, we designed an improved CNN-LSTM model. In addition, three data fusion algorithms were studied and investigated. Compared to camera data in low-light environments, the fusion data significantly improved the HAR accuracy by at least 26.68%, 19.87%, and 21.92% under the data level fusion algorithm, feature level fusion algorithm, and decision level fusion algorithm, respectively. Moreover, the data level fusion algorithm also resulted in a reduction of the best misclassification rate to 2%~6%. These findings suggest that the proposed system has the potential to enhance the accuracy of HAR in low-light environments and to decrease human activity misclassification rates.
Article
Full-text available
Convolutional neural networks (CNNs) have demonstrated exceptional results in the analysis of time- series data when used for Human Activity Recognition (HAR). The manual design of such neural architectures is an error-prone and time-consuming process. The search for optimal CNN architectures is considered a revolution in the design of neural networks. By means of Neural Architecture Search (NAS), network architectures can be designed and optimized automatically. Thus, the optimal CNN architecture representation can be found automatically because of its ability to overcome the limitations of human experience and thinking modes. Evolution algorithms, which are derived from evolutionary mechanisms such as natural selection and genetics, have been widely employed to develop and optimize NAS because they can handle a blackbox optimization process for designing appropriate solution representations and search paradigms without explicit mathematical formulations or gradient information. The Genetic optimization algorithm (GA) is widely used to find optimal or near-optimal solutions for difficult problems. Considering these characteristics, an efficient human activity recognition architecture (AUTO-HAR) is presented in this study. Using the evolutionary GA to select the optimal CNN architecture, the current study proposes a novel encoding schema structure and a novel search space with a much broader range of operations to effectively search for the best architectures for HAR tasks. In addition, the proposed search space provides a reasonable degree of depth because it does not limit the maximum length of the devised task architecture. To test the effectiveness of the proposed framework for HAR tasks, three datasets were utilized: UCI-HAR, Opportunity, and DAPHNET. Based on the results of this study, it has been found that the proposed method can efficiently recognize human activity with an average accuracy of 98.5% (∓1.1), 98.3%, and 99.14% (∓0.8) for UCI-HAR, Opportunity, and DAPHNET, respectively.
Article
Full-text available
The discovery of several machine learning and deep learning techniques has paved the way to extend the reach of humans in various real-world applications. Classical machine learning algorithms assume that training, validation, and testing data come from the same domain, with similar input feature spaces and data distribution characteristics. In some real-world exercises, where data collection has become difficult, the above assumption does not hold true. Even, if possible, the scarcity of rightful data prevents the model from being successfully trained. Compensating for outdated data, reducing the need and hardship of recollecting the training data, avoiding many expensive data labeling efforts, and improving the foreseen accuracy of testing data are some significant contributions of transfer learning in the real-world application. The most cited transfer learning application includes classification, regression, and clustering problems in activity recognition, image and video classification, wi-fi localization, detection and tracking, sentiment analysis and classification, and web-document classification. Human activity recognition plays a cardinal role in human- to-human and human-to-object interaction and interpersonal relations. Pairing with robust deep learning algorithms and improved hardware technologies, automatic recognition of human activity has opened the door in the direction of constructing a smart society. To the best of our knowledge, our survey is the first to link machine learning, transfer learning, and vision sensor-based activity recognition under one roof.. However, this survey exploits the above connection by reviewing around 350 related research articles from 2011 to 2021. Findings indicate an approximate 15% increment in research publications connected to our topic every year. Among these reviewed articles, we have selected around 150 significant ones that give insights into various activity levels, classification techniques, performance measures, challenges, and future directions related to transfer learning enhanced vision sensor-based HAR.
Article
Full-text available
Human Activity Recognition (HAR) is an important research area in human–computer interaction and pervasive computing. In recent years, many deep learning (DL) methods have been widely used for HAR, and due to their powerful automatic feature extraction capabilities, they achieve better recognition performance than traditional methods and are applicable to more general scenarios. However, the problem is that DL methods increase the computational cost of the system and take up more system resources while achieving higher recognition accuracy, which is more challenging for its operation in small memory terminal devices such as smartphones. So, we need to reduce the model size as much as possible while taking into account the recognition accuracy. To address this problem, we propose a multi-scale feature extraction fusion model combining Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU). The model uses different convolutional kernel sizes combined with GRU to accomplish the automatic extraction of different local features and long-term dependencies of the original data to obtain a richer feature representation. In addition, the proposed model uses separable convolution instead of classical convolution to meet the requirement of reducing model parameters while improving recognition accuracy. The accuracy of the proposed model is 97.18%, 96.71%, and 96.28% on the WISDM, UCI-HAR, and PAMAP2 datasets respectively. The experimental results show that the proposed model not only obtains higher recognition accuracy but also costs lower computational resources compared with other methods.
Article
Full-text available
Time series classification is an active research topic due to its wide range of applications and the proliferation of sensory data. Convolutional neural networks (CNNs) are ubiquitous in modern machine learning (ML) models. In this work, we present a matched filter (MF) interpretation of CNN classifiers accompanied by an experimental proof of concept using a carefully developed synthetic dataset. We exploit this interpretation to develop an MF CNN model for time series classification comprising a stack of a Conv1D layer followed by a GlobalMaxPooling layer acting as a typical MF for automated feature extraction and a fully connected layer with softmax activation for computing class probabilities. The presented interpretation enables developing superlight highly accurate classifier models that meet the tight requirements of edge inference. Edge inference is emerging research that addresses the latency, availability, privacy, and connectivity concerns of the commonly deployed cloud inference. The MF-based CNN model has been applied to the sensor-based human activity recognition (HAR) problem due to its significant importance in a broad range of applications. The UCI-HAR, WISDM-AR, and MotionSense datasets are used for model training and testing. The proposed classifier is tested and benchmarked on an android smartphone with average accuracy and F1 scores of 98% and 97%, respectively, which outperforms state-of-the-art HAR methods in terms of classification accuracy and run-time performance. The proposed model size is less than 150 KB, and the average inference time is less than 1 ms. The presented interpretation helps develop a better understanding of CNN operation and decision mechanisms. The proposed model is distinguished from related work by jointly featuring interpretability, high accuracy, and low computational cost, enabling its ready deployment on a wide set of mobile devices for a broad range of applications.
Article
Full-text available
The technological advancement in sensor technology and pervasive computing has brought smart devices into our daily life. Due to the continuous connectivity of the internet with our everyday devices, researchers can deploy IoT sensors to health care and other applications, such as human activity recognition. Most of the state-of-the-art sensor-based human activity recognition systems can detect basic activities (such as standing, sitting, and walking), but they cannot accurately distinguish similar activities (ascending stairs or descending stairs). Such systems are not efficient for critical healthcare applications having complex activity sets. This paper proposes two sensor fusion approaches, i.e., position-based early and late sensor fusion using convolutional neural network (CNN) and convolutional long-short-term memory (CNN-LSTM). The performance of our proposed models is evaluated on two publicly available datasets. We also evaluated the effect of different normalization techniques on recognition accuracy. Our results show that the CNN-LSTM-based late sensor fusion model also improves the recognition accuracy of similar activities.
Article
Full-text available
Human Activity Recognition (HAR) systems are devised for continuously observing human behavior - primarily in the fields of environmental compatibility, sports injury detection, senior care, rehabilitation, entertainment, and the surveillance in intelligent home settings. Inertial sensors, e.g., accelerometers, linear acceleration, and gyroscopes are frequently employed for this purpose, which are now compacted into smart devices, e.g., smartphones. Since the use of smartphones is so widespread now-a-days, activity data acquisition for the HAR systems is a pressing need. In this article, we have conducted the smartphone sensor-based raw data collection, namely H-Activity , using an Android-OS-based application for accelerometer, gyroscope, and linear acceleration. Furthermore, a hybrid deep learning model is proposed, coupling convolutional neural network and long-short term memory network (CNN-LSTM), empowered by the self-attention algorithm to enhance the predictive capabilities of the system. In addition to our collected dataset ( H-Activity ), the model has been evaluated with some benchmark datasets, e.g., MHEALTH, and UCI-HAR to demonstrate the comparative performance of our model. When compared to other models, the proposed model has an accuracy of 99.93% using our collected H-Activity data, and 98.76% and 93.11% using data from MHEALTH and UCI-HAR databases respectively, indicating its efficacy in recognizing human activity recognition. We hope that our developed model could be applicable in the clinical settings and collected data could be useful for further research.
Article
Full-text available
In the current era of rapid technological innovation, human activity recognition (HAR) has emerged as a principal research area in the field of multimedia information retrieval. The capacity to monitor people remotely is a main determinant of HAR’s central role. Multiple gyroscope and accelerometer sensors can be used to aggregate data which can be used to recognise human activities—one of the key research objectives of this study. Optimal results are attained through the use of deep learning models to carry out HAR in the collected data. We propose the use of a hierarchical multi-resolution convolutional neural networks in combination with gated recurrent uni. We conducted an experiment on the mHealth and UCI data sets, the results of which demonstrate the efficiency of the proposed model, as it achieved acceptable accuracies: 99.35% in the mHealth data set and 94.50% in the UCI data set.
Article
Full-text available
Differentiation between the various non-small-cell lung cancer subtypes is crucial for providing an effective treatment to the patient. For this purpose, machine learning techniques have been used in recent years over the available biological data from patients. However, in most cases this problem has been treated using a single-modality approach, not exploring the potential of the multi-scale and multi-omic nature of cancer data for the classification. In this work, we study the fusion of five multi-scale and multi-omic modalities (RNA-Seq, miRNA-Seq, whole-slide imaging, copy number variation, and DNA methylation) by using a late fusion strategy and machine learning techniques. We train an independent machine learning model for each modality and we explore the interactions and gains that can be obtained by fusing their outputs in an increasing manner, by using a novel optimization approach to compute the parameters of the late fusion. The final classification model, using all modalities, obtains an F1 score of 96.81±1.07, an AUC of 0.993±0.004, and an AUPRC of 0.980±0.016, improving those results that each independent model obtains and those presented in the literature for this problem. These obtained results show that leveraging the multi-scale and multi-omic nature of cancer data can enhance the performance of single-modality clinical decision support systems in personalized medicine, consequently improving the diagnosis of the patient.
Article
Full-text available
In recent years, Human Activity Recognition (HAR) has become one of the most important research topics in the domains of health and human-machine interaction. Many Artificial intelligence-based models are developed for activity recognition; however, these algorithms fail to extract spatial and temporal features due to which they show poor performance on real-world long-term HAR. Furthermore, in literature, a limited number of datasets are publicly available for physical activities recognition that contains less number of activities. Considering these limitations, we develop a hybrid model by incorporating Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) for activity recognition where CNN is used for spatial features extraction and LSTM network is utilized for learning temporal information. Additionally, a new challenging dataset is generated that is collected from 20 participants using the Kinect V2 sensor and contains 12 different classes of human physical activities. An extensive ablation study is performed over different traditional machine learning and deep learning models to obtain the optimum solution for HAR. The accuracy of 90.89% is achieved via the CNN-LSTM technique, which shows that the proposed model is suitable for HAR applications
Article
Full-text available
This paper presents a wearable device, fitted on the waist of a participant that recognizes six activities of daily living (walking, walking upstairs, walking downstairs, sitting, standing, and laying) through a deep-learning algorithm, human activity recognition (HAR). The wearable device comprises a single-board computer (SBC) and six-axis sensors. The deep-learning algorithm employs three parallel convolutional neural networks for local feature extraction and for subsequent concatenation to establish feature fusion models of varying kernel size. By using kernels of different sizes, relevant local features of varying lengths were identified, thereby increasing the accuracy of human activity recognition. Regarding experimental data, the database of University of California, Irvine (UCI) and self-recorded data were used separately. The self-recorded data were obtained by having 21 participants wear the device on their waist and perform six common activities in the laboratory. These data were used to verify the proposed deep-learning algorithm on the performance of the wearable device. The accuracy of these six activities in the UCI dataset and in the self-recorded data were 97.49% and 96.27%, respectively. The accuracies in tenfold cross-validation were 99.56% and 97.46%, respectively. The experimental results have successfully verified the proposed convolutional neural network (CNN) architecture, which can be used in rehabilitation assessment for people unable to exercise vigorously.
Article
Full-text available
Human activity recognition (HAR) and transfer learning (TL) are two broad areas widely studied in computational intelligence (CI) and artificial intelligence (AI) applications. Much effort has been put into developing suitable solutions to advance the current performance of existing systems. However, challenges are facing the existing methods of HAR. In HAR, the variations in data required in HAR systems pose challenges to many existing solutions. The type of sensory information used could play an important role in overcoming some of these challenges. Vision-based information in 3D acquired using RGB-D cameras is one type. Furthermore, with the successes encountered in TL, HAR stands to benefit from TL to address challenges to existing methods. Therefore, it is important to review the current state-of-the-art related to both areas. This paper presents a comprehensive survey of vision-based HAR using different methods with a focus on the incorporation of TL in HAR methods. It also discusses the limitations, challenges and possible future directions for more research.
Article
Full-text available
Human activity recognition is a key to a lot of applications such as healthcare and smart home. In this study,we provide a comprehensive survey on recent advances and challenges in human activity recognition (HAR) with deep learning. Although there are many surveys on HAR, they focused mainly on the taxonomy of HAR and reviewed the state-of-the-art HAR systems implemented with conventional machine learning methods. Recently, several works have also been done on reviewing studies that use deep models for HAR, whereas these works cover few deep models and their variants. There is still a need for a comprehensive and in-depth survey on HAR with recently-developed deep learning methods.
Article
Full-text available
Identification of human physical activities is an active research area since long due to its application in personalized health and fitness monitoring. The performance accuracy of human activity recognition (HAR) models mainly depend on the features which are extracted from domain knowledge. The features are the input of the classification algorithm to efficiently identify human physical activities. Manually extracted features (handcrafted) need expert domain knowledge. Thus these features have significant importance to identify different human activities. Recently deep learning methods are utilized to extract the features automatically from raw sensory data for HAR models. However, state-of-the-art HAR literature established that the importance of handcrafted features can’t be ignored as it is extracted from expert domain knowledge. Thus, in this paper we use the fusion of both the handcrafted features and automatically extracted features using deep learning (DL) for HAR model to enhance the performance of HAR. Extensive experimental results demonstrate that our proposed feature fusion based HAR model gives higher accuracy compared with state-of-the-art HAR literature for both the self collected and public dataset.
Article
Full-text available
The vast proliferation of sensor devices and Internet of Things enables the applications of sensor-based activity recognition. However, there exist substantial challenges that could influence the performance of the recognition system in practical scenarios. Recently, as deep learning has demonstrated its effectiveness in many areas, plenty of deep methods have been investigated to address the challenges in activity recognition. In this study, we present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets that can be used for evaluation in different challenge tasks. We then propose a new taxonomy to structure the deep methods by challenges. Challenges and challenge-related deep methods are summarized and analyzed to form an overview of the current research progress. At the end of this work, we discuss the open issues and provide some insights for future directions.
Article
Full-text available
Over the last decade, there has been considerable and increasing interest in the development of Active and Assisted Living (AAL) systems to support independent living. The demographic change towards an aging population has introduced new challenges to today’s society from both an economic and societal standpoint. AAL can provide an arrary of solutions for improving the quality of life of individuals, for allowing people to live healthier and independently for longer, for helping people with disabilities, and for supporting caregivers and medical staff. A vast amount of literature exists on this topic, so this paper aims to provide a survey of the research and skills related to AAL systems. A comprehensive analysis is presented that addresses the main trends towards the development of AAL systems both from technological and methodological points of view and highlights the main issues that are worthy of further investigation.
Article
Full-text available
Human Activity Recognition (HAR) has had a diverse range of applications in various fields such as health, security and smart homes. Among different approaches of HAR, WiFi-based solutions are getting popular since it solves the problem of deployment cost, privacy concerns and restriction of the applicable environment. In this paper, we propose a WiFi-based human activity recognition system that can identify different activities via the channel state information from WiFi devices. A special deep learning framework, Long Short-Term Memory-Convolutional Neural Network (LSTM-CNN), is designed for accurate recognition. LSTM-CNN is going to be compared with the LSTM network and the experimental results demonstrate that LSTM-CNN outperforms existing models and has an average accuracy of 94.14% in multi-activity classification.
Article
Full-text available
Human Activity Recognition (HAR) has attracted much attention from researchers in the recent past. The intensification of research into HAR lies in the motive to understand human behaviour and inherently anticipate human intentions. Human activity data obtained via wearable sensors like gyroscope and accelerometer is in the form of time series data, as each reading has a timestamp associated with it. For HAR, it is important to extract the relevant temporal features from raw sensor data. Most of the approaches for HAR involves a good amount of feature engineering and data pre-processing, which in turn requires domain expertise. Such approaches are time-consuming and are application-specific. In this work, a Deep Neural Network based model, which uses Convolutional Neural Network, and Gated Recurrent Unit is proposed as an end-to-end model performing automatic feature extraction and classification of the activities as well. The experiments in this work were carried out using the raw data obtained from wearable sensors with nominal pre-processing and don’t involve any handcrafted feature extraction techniques. The accuracies obtained on UCI-HAR, WISDM, and PAMAP2 datasets are 96.20%, 97.21%, and 95.27% respectively. The results of the experiments establish that the proposed model achieved superior classification performance than other similar architectures.
Article
Full-text available
Human Activity Recognition (HAR) employing inertial motion data has gained considerable momentum in recent years, both in research and industrial applications. From the abstract perspective, this has been driven by an acceleration in the building of intelligent and smart environments and systems that cover all aspects of human life including healthcare, sports, manufacturing, commerce, etc. Such environments and systems necessitate and subsume activity recognition, aimed at recognizing the actions, characteristics, and goals of one or more individuals from a temporal series of observations streamed from one or more sensors. Due to the reliance of conventional Machine Learning (ML) techniques on handcrafted features in the extraction process, current research suggests that deep-learning approaches are more applicable to automated feature extraction from raw sensor data. In this work, the generic HAR framework for smartphone sensor data is proposed, based on Long Short-Term Memory (LSTM) networks for time-series domains. Four baseline LSTM networks are comparatively studied to analyze the impact of using different kinds of smartphone sensor data. In addition, a hybrid LSTM network called 4-layer CNN-LSTM is proposed to improve recognition performance. The HAR method is evaluated on a public smartphone-based dataset of UCI-HAR through various combinations of sample generation processes (OW and NOW) and validation protocols (10-fold and LOSO cross validation). Moreover, Bayesian optimization techniques are used in this study since they are advantageous for tuning the hyperparameters of each LSTM network. The experimental results indicate that the proposed 4-layer CNN-LSTM network performs well in activity recognition, enhancing the average accuracy by up to 2.24% compared to prior state-of-the-art approaches.
Article
Full-text available
Human activity recognition is concerned with detecting different types of human movements and actions using data gathered from various types of sensors. Deep learning approaches, when applied on time series data, offer promising results over intensive handcrafted feature extraction techniques that are highly reliant on the quality of defined domain parameters. In this paper, we investigate the benefits of time series data augmentation in improving the accuracy of several deep learning models on human activity data gathered from mobile phone accelerometers. More specifically, we compare the performance of the Vanilla, Long-Short Term Memory, and Gated Recurrent Units neural network models on three open-source datasets. We use two time series data augmentation techniques and study their impact on the accuracy of the target models. The experiments show that using gated recurrent units achieves the best results in terms of accuracy and training time followed by the long-short term memory technique. Furthermore, the results show that using data augmentation significantly enhances recognition quality.
Article
Full-text available
The scarcity of labelled time-series data can hinder a proper training of deep learning models. This is especially relevant for the growing field of ubiquitous computing, where data coming from wearable devices have to be analysed using pattern recognition techniques to provide meaningful applications. To address this problem, we propose a transfer learning method based on attributing sensor modality labels to a large amount of time-series data collected from various application fields. Using these data, our method firstly trains a Deep Neural Network (DNN) that can learn general characteristics of time-series data, then transfers it to another DNN designed to solve a specific target problem. In addition, we propose a general architecture that can adapt the transferred DNN regardless of the sensors used in the target field making our approach in particular suitable for multichannel data. We test our method for two ubiquitous computing problems—Human Activity Recognition (HAR) and Emotion Recognition (ER)—and compare it a baseline training the DNN without using transfer learning. For HAR, we also introduce a new dataset, Cognitive Village-MSBand (CogAge), which contains data for 61 atomic activities acquired from three wearable devices (smartphone, smartwatch, and smartglasses). Our results show that our transfer learning approach outperforms the baseline for both HAR and ER.
Article
Full-text available
The softmax function has been widely used in deep neural networks (DNNs), and studies on efficient hardware accelerators for DNN have also attracted tremendous attention. However, it is very challenging to design efficient hardware architectures for softmax because of the expensive exponentiation and division calculations in it. In this paper, the softmax function is firstly simplified by exploring algorithmic strength reductions. Afterwards, a hardware-friendly and precision-adjustable calculation method for softmax is proposed, which can meet different precision requirements in various deep learning (DL) tasks. Based on the above innovations, an efficient architecture for softmax is presented. By tuning the parameter P, the system accuracy and complexity can be adjusted dynamically to achieve a good tradeoff between them. The proposed design is coded using hardware description language (HDL) and evaluated on two platforms, Xilinx Zynq-7000 ZC706 development board and TSMC 28-nm CMOS technology, respectively. Hardware implementation results show that our architecture significantly outperforms other works in speed and area, and that by adjusting P, the accuracy can be further increased with little hardware overhead.
Article
Full-text available
The synchronization between wireless sensor nodes is a big issue in all those low-cost systems in which each node performs measurements using its own sensors and stores the measured values locally. In the absence of synchronization, it is impossible to relate all the measurements coming from the different sub-systems on a single time scale for the extraction of complex features. The importance of this issue is even bigger when the sensors’ sampling rate is high as for the Inertial Measurement Unit platforms. The paper addresses the issue of managing a large multi-sensor platform for off-line activities with a reduced sensor node complexity and guaranteeing the perfect synchronization for each acquired sample. The new simple and light algorithm proposed overcomes the limits of the commonly used protocols and it is suitable for this kind of applications. The entire system consists of an Android smartphone, a master unit and several slave units. Each board has a 9 Degrees of Freedom (DoF) Inertial Measurement Unit, a micro-SD for data storage and an ARM®Cortex™-M4 CPU SoC supporting the protocols for both Bluetooth Low Energy (BLE) 5 and 2.4 GHz custom communications.
Article
Full-text available
For the last several decades, Human Activity Recognition (HAR) has been an intriguing topic in the domain of artificial intelligence research, since it has applications in many areas, such as image and signal processing. Generally, every recognition system can be either an end-to-end system or including two phases: feature extraction and classification. In order to create an optimal HAR system that offers a better quality of classification prediction, in this paper we propose a new approach within two-phase recognition system paradigm. Probabilistic generative models, known as Deep Belief Networks (DBNs), are introduced. These DBNs comprise a series of Restricted Boltzmann Machines (RBMs) and are responsible for data reconstruction, feature construction and classification. We tested our approach on the KTH and UIUC human action datasets. The results obtained are very promising, with the recognition accuracy outperforming the recent state-of-the-art.
Article
Full-text available
In the past years, traditional pattern recognition methods have made great progress. However, these methods rely heavily on manual feature extraction, which may hinder the generalization model performance. With the increasing popularity and success of deep learning methods, using these techniques to recognize human actions in mobile and wearable computing scenarios has attracted widespread attention. In this paper, a deep neural network that combines convolutional layers with long short-term memory (LSTM) was proposed. This model could extract activity features automatically and classify them with a few model parameters. LSTM is a variant of the recurrent neural network (RNN), which is more suitable for processing temporal sequences. In the proposed architecture, the raw data collected by mobile sensors was fed into a two-layer LSTM followed by convolutional layers. In addition, a global average pooling layer (GAP) was applied to replace the fully connected layer after convolution for reducing model parameters. Moreover, a batch normalization layer (BN) was added after the GAP layer to speed up the convergence, and obvious results were achieved. The model performance was evaluated on three public datasets (UCI, WISDM, and OPPORTUNITY). Finally, the overall accuracy of the model in the UCI-HAR dataset is 95.78%, in the WISDM dataset is 95.85%, and in the OPPORTUNITY dataset is 92.63%. The results show that the proposed model has higher robustness and better activity detection capability than some of the reported results. It can not only adaptively extract activity features, but also has fewer parameters and higher accuracy.
Article
Full-text available
Human Action Recognition (HAR) has become one of the most active research area in the domain of artificial intelligence, due to various applications such as video surveillance. The wide range of variations among human actions in daily life makes the recognition process more difficult. In this article, a new fully automated scheme is proposed for Human action recognition by fusion of deep neural network (DNN) and multiview features. The DNN features are initially extracted by employing a pre-trained CNN model name VGG19. Subsequently, multiview features are computed from horizontal and vertical gradients, along with vertical directional features. Afterwards, all features are combined in order to select the best features. The best features are selected by employing three parameters i.e. relative entropy, mutual information, and strong correlation coefficient (SCC). Furthermore, these parameters are used for selection of best subset of features through a higher probability based threshold function. The final selected features are provided to Naive Bayes classifier for final recognition. The proposed scheme is tested on five datasets name HMDB51, UCF Sports, YouTube, IXMAS, and KTH and the achieved accuracy were 93.7%, 98%, 99.4%, 95.2%, and 97%, respectively. Lastly, the proposed method in this article is compared with existing techniques. The resuls shows that the proposed scheme outperforms the state of the art methods.
Article
Full-text available
Many deep learning (DL) models have shown exceptional promise in radar-based human activity recognition (HAR) area. For radar-based HAR, the raw data is generally converted into a 2-D spectrogram by using short-time Fourier transform (STFT). All the existing DL methods treat the spectrogram as an optical image, and thus the corresponding architectures such as 2-D convolutional neural networks (2D-CNNs) are adopted in those methods. These 2-D methods that ignore temporal characteristics ordinarily lead to a complex network with a huge amount of parameters but limited recognition accuracy. In this paper, for the first time, the radar spectrogram is treated as a time sequence with multiple channels. Hence, we propose a DL model composed of 1-D convolutional neural networks (1D-CNNs) and long short-term memory (LSTM). The experiments results show that the proposed model can extract spatio-temporal characteristics of the radar data and thus achieves the best recognition accuracy and relatively low complexity compared to the existing 2D-CNN methods.
Article
Full-text available
Cervical cancer is one of the fastest growing global health problems and leading cause of mortality among women of developing countries. Automated Pap smear cell recognition and classification in early stage of cell development is crucial for effective disease diagnosis and immediate treatment. Thus, in this article, we proposed a novel internet of health things (IoHT)-driven deep learning framework for detection and classification of cervical cancer in Pap smear images using concept of transfer learning. Following transfer learning, convolutional neural network (CNN) was combined with different conventional machine learning techniques like K nearest neighbor, naïve Bayes, logistic regression, random forest and support vector machines. In the proposed framework, feature extraction from cervical images is performed using pre-trained CNN models like InceptionV3, VGG19, SqueezeNet and ResNet50, which are fed into dense and flattened layer for normal and abnormal cervical cells classification. The performance of the proposed IoHT frameworks is evaluated using standard Pap smear Herlev dataset. The proposed approach was validated by analyzing precision, recall, F1-score, training–testing time and support parameters. The obtained results concluded that CNN pre-trained model ResNet50 achieved the higher classification rate of 97.89% with the involvement of random forest classifier for effective and reliable disease detection and classification. The minimum training time and testing time required to train model were 0.032 s and 0.006 s, respectively.
Article
The escalating threat of easily transmitted diseases poses a huge challenge to government institutions and health systems worldwide. Advancements in information and communication technology offer a promising approach to effectively controlling infectious diseases. This article introduces a comprehensive framework for predicting and preventing zoonotic virus infections by leveraging the capabilities of artificial intelligence and the Internet of Things. The proposed framework employs IoT‐enabled smart devices for data acquisition and applies a fog‐enabled model for user authentication at the fog layer. Further, the user classification is performed using the proposed ensemble model, with cloud computing enabling efficient information analysis and sharing. The novel aspect of the proposed system involves utilizing the temporal graph matrix method to illustrate dependencies among users infected with the zoonotic flu and provide a nuanced understanding of user interactions. The implemented system demonstrates a classification accuracy of around 91% for around 5000 instances and reliability of around 93%. The presented framework not only aids uninfected citizens in avoiding regional exposure but also empowers government agencies to address the problem more effectively. Moreover, temporal mining results also reveal the efficacy of the proposed system in dealing with zoonotic cases.
Article
We introduce the AI-Generated Optimal Decision (AIGOD) algorithm and the Deep Diffusion Soft Actor-Critic (DDSAC) framework, marking a significant advancement in integrating Human Digital Twins (HDTs) with AI-Generated Content (AIGC) within IoMT-based smart homes. Our innovative AI-Generated Content-as-a-Service (AIGCaaS) architecture, optimized for IoMT environments, leverages network edge servers to enhance the selection of AI-Generated Content Service Providers (AISPs) tailored to the unique characteristics of individual HDTs. Extensive experiments demonstrate DDSAC’s HDT-centric approach outperforms traditional Deep Reinforcement Learning algorithms, offering optimal AIGC services for diverse healthcare needs. Specifically, DDSAC achieved a 20% improvement in task completion rates and a 15% increase in overall utility compared to existing methods. These findings highlight the potential of HDTs in personalized healthcare by simulating and predicting patient-specific medical outcomes, leading to proactive and timely interventions. This integration facilitates personalized healthcare, establishing a new standard for patient-centric care in smart home environments. By leveraging cutting-edge AI techniques, our research significantly contributes to the fields of IoMT and AIGC, paving the way for smarter and more responsive healthcare services.
Article
Data heterogeneity, insufficient scalability, and data privacy protection are the technological challenges of personalized recommendations. This study proposes a new federated learning algorithm (FedSarah) to address low scalability caused by data heterogeneity and uneven computing power in consumer-centric personalized recommendation systems while protecting data privacy of consumers. The algorithm updates the stochastic gradient estimates using a recursive framework on consumer clients. The outer loop calculates the entire gradient for updating global model, and the inner loop calculates the stochastic gradient based on the accumulated stochastic information for updating local models. To increase the stability of convergence, the inner loop modifies intrinsic parameters to change the number of training rounds and the direction of model update on consumer clients. The detailed mathematical analysis and experiments demonstrate that FedSarah has good convergence. In addition, it’s shown that the algorithm can achieve a performance improvement of nearly 5% in terms of accuracy compared to the traditional FedAvg and FedProx algorithms under the condition of heterogeneous data. Furthermore, under the condition of effective privacy protection on consumers’ data, the new algorithm can significantly lessen the impact of data heterogeneity on the real-time service of consumer-centric personalized recommendation systems with low communication latency. The code is available at https://github.com/DashingJ-82/FedSarah.git .
Chapter
With the advancement in wireless sensing technology, the notion of the Internet of Things (IoT) has become ubiquitous and widely adopted due to its extensive applications in smart living. In that regard, Human Activity Recognition (HAR) is an indispensable part of intelligent systems for continuous supervision of human behavior. The design of effective applications requires accurate and relevant information on people’s activities and behaviors. This chapter presents a comprehensive review of the existing IoT-based HAR systems in the literature that promote smart living. The contribution of this work is to drive interested researchers toward the concept of HAR through existing works. In addition, a hands-on experimental case study is also provided to demonstrate the practical application of the concept, for recognizing and predicting several activities, in real life. Accordingly, here we first study the general architecture of the HAR system along with its principal components in detail. Next, the conglomeration of HAR with IoT and its impact on each other has been intensely explored to provide extensive insight for further research. We discuss the research challenges associated with developing IoT-based HAR systems. Next, the state-of-the-art works in HAR based on wearable sensors are surveyed thoroughly. At the end, we present a case study on HAR, using UCIHAR, a standard benchmark dataset that exhibits the practical implementation of the framework on different Machine Learning as well as Deep Learning models.KeywordsWireless sensor networkIoTHARWearable sensorMachine learningDeep learning
Article
Human Activity Recognition (HAR) has become a crucial element for smart healthcare applications due to the fast adoption of wearable sensors and mobile technologies. Most of the existing human activity recognition frameworks deal with a single modality of data that degrades the reliability and recognition accuracy of the system for heterogeneous data sources. In this article, we propose a multi-level feature fusion technique for multimodal human activity recognition using multi-head Convolutional Neural Network (CNN) with Convolution Block Attention Module (CBAM) to process the visual data and Convolutional Long Short Term Memory (ConvLSTM) for dealing with the time-sensitive multi-source sensor information. The architecture is developed to be able to analyze and retrieve channel and spatial dimension features through the use of three branches of CNN along with CBAM for visual information. The ConvLSTM network is designed to capture temporal features from the multiple sensors’ time-series data for efficient activity recognition. An open-access multimodal HAR dataset named UP-Fall detection dataset is utilized in experiments and evaluations to measure the performance of the developed fusion architecture. Finally, we deployed an Internet of Things (IoT) system to test the proposed fusion network in real-world smart healthcare application scenarios. The findings from the experimental results reveal that the developed multimodal HAR framework surpasses the existing state-of-the-art methods in terms of multiple performance metrics.
Article
Human activity recognition (HAR) is one of the key applications of health monitoring that requires continuous use of wearable devices to track daily activities. This article proposes an adaptive convolutional neural network for energy-efficient HAR (AHAR) suitable for low-power edge devices. Unlike traditional adaptive (early-exit) architecture that makes the early-exit decision based on classification confidence, AHAR proposes a novel adaptive architecture that uses an output block predictor to select a portion of the baseline architecture to use during the inference phase. The experimental results show that traditional adaptive architecture suffer from performance loss whereas our adaptive architecture provides similar or better performance as the baseline one while being energy efficient. We validate our methodology in classifying locomotion activities from two data sets—1) Opportunity and 2) w-HAR. Compared to the fog/cloud computing approaches for the Opportunity data set, our baseline and adaptive architectures show a comparable weighted F1 score of 91.79%, and 91.57%, respectively. For the w-HAR data set, our baseline and adaptive architectures outperform the state-of-the-art works with a weighted F1 score of 97.55%, and 97.64%, respectively. Evaluation on real hardware shows that our baseline architecture is significantly energy efficient ( 422.38×422.38\times less) and memory-efficient ( 14.29×14.29\times less) compared to the works on the Opportunity data set. For the w-HAR data set, our baseline architecture requires 2.04×2.04\times less energy and 2.18×2.18\times less memory compared to the state-of-the-art work. Moreover, experimental results show that our adaptive architecture is 12.32% (Opportunity) and 11.14% (w-HAR) energy efficient than our baseline while providing similar (Opportunity) or better (w-HAR) performance with no significant memory overhead.
Article
This paper firstly introduces common wearable sensors, smart wearable devices and the key application areas. Since multi-sensor is defined by the presence of more than one model or channel, e.g. visual, audio, environmental and physiological signals. Hence, the fusion methods of multi-modality and multi-location sensors are proposed. Despite it has been contributed several works reviewing the stateoftheart on information fusion or deep learning, all of them only tackled one aspect of the sensor fusion applications, which leads to a lack of comprehensive understanding about it. Therefore, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of the fusion methods of wearable sensors. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of multi-sensor applications for human activity recognition, including those recently added to the field for unsupervised learning and transfer learning. Finally, the open research issues that need further research and improvement are identified and discussed.
Article
The problem of human action recognition has attracted the interest of several researchers due to its significant use in many applications. With the great success of deep learning methods in most areas, researchers decided to switch from traditional methods-based hand-crafted feature extractors to recent deep learning-based techniques to recognise the action. In the present research work, we propose a learning approach for human activity recognition in the elderly based on convolutional neural network (LAHAR-CNN). The CNN model is used to extract features from the dataset, then, a multilayer perceptron (MLP) classifier is used for action classification. It has been widely admitted that features learned using a CNN model on a large dataset can be successfully transferred to an action recognition task with a small training dataset. The proposed method is evaluated on the well-known MSRDailyActivity 3D dataset. It has shown impressive results that exceed the performances obtained in the state of the art using the same dataset, thus reaching 99.4%. Furthermore, our proposed approach predicts human activity (HA) from one single frame sample which justifies its robustness. Hence, the proposed model is ranked at the top of the list of space-time techniques.
Article
Deep neural networks, including convolutional neural networks (CNNs), have been widely adopted for human activity recognition in recent years. They have attained significant performance improvement over traditional techniques due to their strong feature representation capabilities. Some of the challenges faced by the HAR community is the non-availability of a substantial amount of labeled training samples, and the higher computational cost and system resources requirements of deep learning architectures as opposed to shallow learning algorithms. To address these challenges, we propose an attention-based multi-head model for human activity recognition (HAR). This framework contains three lightweight convolutional heads, with each head designed using one-dimensional CNN to extract features from sensory data. The lightweight multi-head model is induced with attention to strengthen the representation ability of CNN, allowing for automatic selection of salient features and suppress unimportant ones. We conducted ablation studies and experiments on two publicly available benchmark datasets: WISDM and UCI HAR, to evaluate our model. The experimental outcome demonstrates the effectiveness of the proposed framework in activity recognition and achieves better accuracy while ensuring computational efficiency.
Article
Human activity recognition (HAR) has been regarded as an indispensable part of many smart home systems and smart healthcare applications. Specifically, HAR is of great importance in the Internet of Healthcare Things (IoHT), owing to the rapid proliferation of Internet of Things (IoT) technologies embedded in various smart appliances and wearable devices (such as smartphones and smartwatches) that have a pervasive impact on an individual’s life. The inertial sensors of smartphones generate massive amounts of multidimensional time-series data, which can be exploited effectively for HAR purposes. Unlike traditional approaches, deep learning techniques are the most suitable choice for such multivariate streams. In this study, we introduce a supervised dual-channel model that comprises long short-term memory (LSTM), followed by an attention mechanism for the temporal fusion of inertial sensor data concurrent with a convolutional residual network for the spatial fusion of sensor data. We also introduce an adaptive channel-squeezing operation to fine-tune convolutional a neural network feature extraction capability by exploiting multichannel dependency. Finally, two widely available and public HAR datasets are used in experiments to evaluate the performance of our model. The results demonstrate that our proposed approach can overcame state-of-the-art methods.
Article
Graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, have achieved remarkable performance for skeleton-based action recognition. However, there still exist several issues in the previous GCN-based models. First, the topology of the graph is set heuristically and fixed over all the model layers and input data. This may not be suitable for the hierarchy of the GCN model and the diversity of the data in action recognition tasks. Second, the second-order information of the skeleton data, i.e., the length and orientation of the bones, is rarely investigated, which is naturally more informative and discriminative for the human action recognition. In this work, we propose a novel multi-stream attention-enhanced adaptive graph convolutional neural network (MS-AAGCN) for skeleton-based action recognition. The graph topology in our model can be either uniformly or individually learned based on the input data in an end-to-end manner. This data-driven approach increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Besides, the proposed adaptive graph convolutional layer is further enhanced by a spatial-temporal-channel attention module, which helps the model pay more attention to important joints, frames and features. Moreover, the information of both the joints and bones, together with their motion information, are simultaneously modeled in a multi-stream framework, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
Conference Paper
Ambient Assisted Living (AAL) facilities offer personalized care to inhabitants using their profile and surrounding environments. The services provided by AAL listed as health, indoor activities, daily routines and many others. The current focus of the research is to analyze the user’s behaviour to offer different services to them. The AAL system mostly uses various sensors to capture the user’s information. The smart environment uses environmental sensors, object sensors, body sensors and visual sensors as the primary devices. The collected information generally used to detect the activity of daily living (ADL). The research community requires to address added parameters to extend the services offered. The work presented here explores some new strategies to use the firings of environmental sensors. The environmental readings are useful to find the time of detection of any disease using the particular season, indoor air quality, the intensity of sun-rays and finally, probability calculation of various illness during the year. This paper describes the potential role of AAL systems in elderly care by focusing on specific issues using a scenario-based approach.
Article
Human activity recognition (HAR) technology that analyzes data acquired from various types of sensing devices, including vision sensors and embedded sensors, has motivated the development of various context-aware applications in emerging domains, e.g., the Internet of Things (IoT) and healthcare. Even though a considerable number of HAR surveys and review articles have been conducted previously, the major/overallHAR subject has been ignored, and these studies only focus on particular HAR topics. Therefore, a comprehensive review paper that covers major subjects in HAR is imperative. This survey analyzes the latest state-of-the-art research in HAR in recent years, introduces a classification of HAR methodologies, and shows advantages and weaknesses for methods in each category. Specifically, HAR methods are classified into two main groups, which are sensor-based HAR and vision-based HAR, based on the generated datatype. After that, each group is divided into subgroups that perform different procedures, including the data collection, pre-processing methods, feature engineering, and the training process. Moreover, an extensive review regarding the utilization of deep learning in HAR is also conducted. Finally, this paper discusses various challenges in the current HAR topic and offers suggestions for future research
Article
Human action recognition plays a fundamental role in the design of smart solution for home environments, particularly in relation to ambient assisted living applications, where the support of an automated system could improve the quality of life of humans trying to interpret and anticipate user needs, recognizing unusual behaviors or preventing dangerous situations (e.g. falls). In this work the potentialities of the Kinect sensor are fully exploited to design a robust approach for activity recognition combining the analysis of skeleton and RGB data streams. The skeleton representation is designed to capture the most representative body postures, while the temporal evolution of actions is better highlighted by the representation obtained from RGB images. The experimental results confirm that the combination of these two data sources allow to capture highly discriminative features resulting in an approach able to achieve state-of-the-art performance on public benchmarks.