ArticlePDF Available

Abstract and Figures

The fixed sliding window is the commonly used technique for signal segmentation in human activity recognition. However, the fixed sliding window may not produce optimal segmentation because human activities have varying durations, especially for transitional activities. This is because a large window size may contain activity signals belonging to different activities, and a small window size may split the activity signal into multiple windows. Furthermore, the fixed sliding window does not consider the relationship between adjacent windows, which may affect the performance of the human activity recognition model. In this study, we propose a similarity segmentation approach that exploits the temporal structure of the activity signal within the window segmentation process. Specifically, the proposed approach segments each window into sub-windows and extracts the inner features by measuring the similarity between them. The inner features are used to measure the dissimilarity between the adjacent windows. The proposed approach is able to distinguish between transitional and non-transitional windows, which achieves more effective segmentation and classification processes. Two public datasets are used for the evaluation. The experimental results show that the proposed approach can distinguish transitional activities from basic activities at 97.65% accuracy, which enhanced the accuracy of transitional activities recognition compared to the fixed sliding window by 33.41%. Also, our approach achieved accuracy for activity recognition of 92.71% and 86.65% for both datasets, respectively, which exceeds the fixed sliding window by 2.29% and 3.93% for both datasets, respectively. These results are significant and exceed the accuracy of state-of-the-art models.
Content may be subject to copyright.
IEEE Proof
IEEE SENSORS JOURNAL 1
Similarity Segmentation Approach for
Sensor-Based Activity Recognition
AbdulRahman M. A. Baraka and Mohd Halim Mohd Noor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
AbstractThe fixed sliding window is the commonly used
technique for signal segmentation in human activity recog-
nition (HAR). However, the fixed sliding window may not
produce optimal segmentation because human activities
have varying durations, especially for transitional activities
(TAs). This is because a large window size may con-
tain activity signals belonging to different activities, and a
small window size may split the activity signal into multi-
ple windows. Furthermore, the fixed sliding window does
not consider the relationship between adjacent windows,
which may affect the performance of the HAR model. In this
study, we propose a similarity segmentation approach (SSA)
that exploits the temporal structure of the activity signal
within the window segmentation process. Specifically, the
proposed approach segments each window into subwindows
and extracts the inner features by measuring the similarity
between them. The inner features are used to measure the
dissimilarity between the adjacent windows. The proposed
approach is able to distinguish between transitional and nontransitional windows, which achieves more effective
segmentation and classification processes. Two public datasets are used for the evaluation. The experimental results
show that the proposed approach can distinguish TAs from basic activities (BAs) at 97.65% accuracy, which enhanced
the accuracy of TAs recognition compared to the fixed sliding window by 33.41%. Also, our approach achieved accuracy
for activity recognition of 92.71% and 86.65% for both datasets, respectively, which exceeds the fixed sliding window by
2.29% and 3.93% for both datasets, respectively. These results are significant and exceed the accuracy of the state-of-
the-art models.
26 Index TermsHuman activity recognition (HAR), signal segmentation, transitional activity (TA).
I. INTRODUCTION27
RECENTLY, human activity recognition (HAR) has28
become one of the most important areas of research and29
the focus of researchers due to its involvement in many fields,30
such as the Internet of Things (IoT) applications [1], healthcare31
[2], security, sport [3], and smart home [4]. HAR is concerned32
with the ability to recognize human’s daily activities to under-33
stand the user’s behaviors. Human activities can be divided34
into basic activities (BAs) and transitional activities (TAs)35
(or postural transitions). The BAs are those performed by a36
human in daily life and can be classified into two categories:37
AQ:1 Manuscript received 30 April 2023; revised 8 July 2023; accepted
12 July 2023. The associate editor coordinating the review of this article
and approving it for publication was Dr. Varun Bajaj. (Corresponding
author: Mohd Halim Mohd Noor.)
AQ:2 AbdulRahman M. A. Baraka was with Al-Quds Open University,
Palestine. He is now with the School of Computer Sciences, Universiti
Sains Malaysia, Malaysia (e-mail: abarakeh@qou.edu).
Mohd Halim Mohd Noor is with the School of Computer Sciences,
Universiti Sains Malaysia, Malaysia (e-mail: halimnoor@usm.my).
Digital Object Identifier 10.1109/JSEN.2023.3295778
static and dynamic activities. Static activities do not contain 38
motions during their performance, such as walking, jumping, 39
and running. Dynamic activities contain motions during their 40
performance, such as standing and sitting activities. The TAs 41
occur between two BAs, such as stand-to-sit and sit-to-stand. 42
Typically, BAs are performed for a long duration, while TAs 43
have a low incidence and are carried out in a short period 44
of time averaging around 3.7 s [5]. This low duration may 45
lead to difficulty detecting and recognizing TAs [6],[7]. Thus, 46
few researchers considered the TAs in their HAR models. 47
However, TAs are essential in various applications, such as 48
pervasive healthcare and activities monitoring. The importance 49
of detecting and recognizing TAs manifests in the following 50
applications: identifying falls in the elderly [8], separating 51
the different successive activities [9], and segmenting the 52
continuous data signal better, which leads to improving the 53
accuracy of HAR recognition. Furthermore, if the TAs are 54
not considered in the HAR model, it could introduce data 55
fluctuations in the signals, consequently affecting the model’s 56
performance. 57
1558-1748 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
IEEE Proof
2 IEEE SENSORS JOURNAL
Recently, inertial sensors, such as accelerometers and gyro-58
scopes, have become popular embedded sensors in wearable59
devices of various types [10]. These sensors capture the60
body motions while the person performs an activity. The61
accelerometer measures the acceleration along the three axes,62
while the gyroscope measures the rotation around the three63
axes. These continuous signals must be segmented into a64
sequence of windows to perform the features extraction task,65
either manually or automatically, followed by classifying the66
windows into activities. The fixed-size sliding window is the67
most commonly used technique for signal segmentation in68
HAR, which segments the signals into equal-size segments or69
windows. However, using a fixed window is not effective and70
optimal due to the varying lengths of human activity signals.71
A small window size could not capture the general character-72
istics of the activity signal, and a large window size causes73
the window to contain data belonging to different activities.74
Thus, using a fixed window size for signal segmentation is not75
effective for activity recognition [6],[11].76
In this article, we propose a similarity segmentation77
approach (SSA) for the signal segmentation process. The78
proposed approach analyzes two adjacent windows to exploit79
their temporal relationship. To this end, the proposed approach80
further splits the windows into subwindows and extracts81
the features by calculating the inner similarity between the82
subwindows. Then, the dissimilarity between the features of83
the adjacent windows is measured to detect the boundaries84
of the window segmentation. Furthermore, we propose two85
convolutional neural network (CNN) models: one for BAs86
and the other for TAs. Dividing the recognition task into two87
models allows each model to focus on learning their own set88
of activity classes.89
The rest of this article is organized as follows. Related90
work is in Section II. Section III explains our proposed91
methodology. The experiments and results are discussed in92
Section IV. Section Vis the conclusion.93
II. RE LATED WO RKS94
Human activity is person-dependent [12], which means it95
varies among humans in terms of time and actions. In general,96
human activity can be categorized into two categories: BA and97
TAs. The BA is performed daily by humans, whether long98
(e.g., walking) or short (e.g., jumping). The TA occurs between99
two BAs. The TA has two main characteristics compared100
with the BA: low-performing duration time and low incidence101
(e.g., sit-to-stand). These characteristics led the researchers to102
neglect the TAs in HAR.103
As aforementioned, the fixed sliding window is the most104
commonly used technique for segmenting the activity signal105
in activity recognition [13]. Despite these limitations, several106
studies adopted fixed-size sliding window to segment their107
sensor data [14],[15],[16]. Also, some studies [3],[17]108
investigated the effect of different window sizes on the signal109
segmentation task. Banos et al. [18] analyzed and evaluated the110
impact of fixed window size on HAR model performances and111
show that a small window size (0.25–0.5 s) with nonoverlap is112
better than a large window size for recognition performance.113
Numerous studies proposed an adaptive and dynamic win- 114
dow size to overcome the limitation of the fixed-size sliding 115
window method. In the dynamic sliding window, the sensor 116
data are segmented into different sizes of windows based 117
on specific features. Alhammad and Al-Dossari [2] proposed 118
a segmentation method based on their assumption that the 119
Y-axis of the acceleration signal can present the boundaries 120
of all activity types. The method depends on choosing the 121
peaks with the minimum value and selecting the highest 122
value of the valley and also finding the minimum distance 123
between peaks by using a threshold value. Then, the signal 124
characteristics are analyzed using several features to determine 125
the activity boundaries. Akbari et al. [19] proposed a hier- 126
archical signal segmentation method by segmenting the data 127
using a large fixed-window size and extracting the features 128
to predict the activity. If the window is suspected to contain 129
multiple activity signals, the method divides the window into 130
smaller subwindows to fine-tune the label assignment for 131
each subwindow. Xiao et al. [20] proposed a deep learning 132
CNN-based algorithm for activity segmentation. The sensor 133
data are segmented into sets of bins using a sliding win- 134
dow method, and then, each bin is assigned a label. The 135
network identifies the start and end of activity by detecting 136
the changes in the labels that appear in the set of bin labels. 137
He et al. [21] adopted a weakly supervised learning setting 138
to avoid expensive annotation effort. The framework uses 139
support measure machines (SMMs) and features extraction, 140
which is based on kernel embedding of distributions. Signal 141
segmentation, feature extraction, and activity classification 142
tasks are performed jointly. 143
However, all the aforementioned studies do not consider 144
the TAs in their experiments, which could lead to obtaining a 145
low accuracy in recognizing the TAs. Therefore, some studies 146
present the segmentation method considering the postural TAs. 147
Noor et al. [11] proposed an approach that adjusted the 148
adaptive window size by modeling the windows using mul- 149
tivariate Gaussian distribution. Initially, the segmentation is 150
performed using a fixed window size; then, the window is 151
classified into TA or non-TA. If the window is a TA, the 152
adaptive sliding window is activated and the window size 153
is expanded by calculating the belonging probability of the 154
window to a specific activity. Lone et al. [22] used different 155
methods for features extraction and selection methods and five 156
different types of machine learning classification algorithms 157
to classify the BAs and postural transitions. Li et al. [6] 158
present a signal segmentation method by segmenting the data 159
into multiple fragments. After extracting the features from 160
the fragments, K-means clustering is used to classify the 161
fragments. The adjacent fragments that have the same category 162
are combined. The Tas’ fragments are separated from Bas’ 163
fragments using the duration period of the fragment. Shi et al. 164
[23] distinguished TA windows from BA windows through 165
the standard deviation trend analysis (STD-TA) method. The 166
sliding window method segments the signals, and the statistical 167
characteristics of segments are extracted and then use the 168
SVM classifier for the activity classification. AQ:3169
The previous studies utilized features engineering to extract 170
and select the features. For improvement, some studies have 171
IEEE Proof
BARAKA AND MOHD NOOR: SSA FOR SENSOR-BASED ACTIVITY RECOGNITION 3
Fig. 1. Overview of our proposed SSA.
adopted deep learning models in order to extract and select172
the features automatically. Irfan et al. [24] integrated three173
deep learning models to classify the activity windows. The174
feature data are passed into the models simultaneously and175
then fuse the results by using the class probabilities of each176
model. Noor [25] proposed an unsupervised feature learning177
method based on a denoising autoencoder, which aims to178
extract and select the discriminative features for the activity179
recognition task. The data are segmented using an adaptive180
sliding window method presented in [11]. Then, the extracted181
features are represented by denoising the autoencoder using182
1-D convolutional and 1-D max-pooling layers.183
Furthermore, some studies adopted long short-term memory184
(LSTM) to model the temporal sequences between windows.185
Xia et al. [26] built a classification model consisting of two186
layers of LSTM followed by CNN layers. A fixed sliding187
window is used for the segmentation task. The model achieved188
a significant performance; however, the TAs were not consid-189
ered in the study. Noor et al. [27] proposed a hybrid deep190
learning model, which uses deep temporal Conv-LSTM archi-191
tecture to utilize both temporal features and the relationship of192
windows.193
However, most studies treated each window independently194
regardless of any temporal relationship between the win-195
dows. This could lead to a nonoptimal segmentation method.196
To overcome these issues, we proposed an SSA based on197
similarity features and considering the temporal relationship198
between adjacent windows. This allows the proposed approach199
to distinguish the TA windows from the BA windows, thus200
enhancing activity recognition performance.201
III. PROPOSED METHODOLOGY202
The signals from the wearable devices are segmented into203
a sequence of windows using the fixed sliding window X=204
{x1,x2,...,xN}, where Nis the total number of windows,205
xiRw×C,wis the window size, and Cis the number206
of sensor’s channels. In this study, the approach of activity207
classification consists of two stages: the SSA and the activity208
recognition. The block diagram of the proposed approach is209
given in Fig. 1.210
First, the signals are split into multiple successive windows, 211
and then, the windows are passed into our proposed SSA. 212
The SSA can distinguish between non-TA (basic) and TA 213
windows by extracting two feature types: inner similarity 214
features and adjacent window dissimilarity features. Then, the 215
non-TA windows are fed to the BA classifier, and the TA 216
windows are fed to the TA classifier after applying the interpo- 217
lation method. The BA classifier classifies the nontransitional 218
windows into BAs, while the TA classifier classifies the TA 219
windows into TAs. 220
Specifically, the SSA aims to distinguish whether the win- 221
dow belongs to a TA or to a non-TA. To this end, two 222
operations are performed. The first operation is called inner 223
similarity, which extracts the similarity features of the window 224
by measuring the data homogeneity within each window. 225
That is done by dividing the window into three equal-sized 226
subwindows and then measuring the similarity of signals 227
between these subwindows. The second operation is called 228
adjacent window dissimilarity. This operation measures the 229
dissimilarity between the features of the current window and 230
those of the previous window. The current window is the last 231
window read from the sensors and segmented by the fixed 232
sliding window method. 233
We refer to the current window as xi. The previous window 234
is the window that precedes the current window, and we refer 235
to it as xi1. If the degree of dissimilarity between the two 236
adjacent windows is small (i.e., the current and the previous 237
windows), this is an indication that there is no TA in either 238
of both windows, but if the degree of dissimilarity is high, 239
it indicates the presence of TA between one of the two win- 240
dows. Based on the degree of dissimilarity, we can distinguish 241
between the TA window from the BA window. Also, we can 242
detect the boundaries of the TAs, which are presented by mul- 243
tiple adjacent windows. In order for the size of these adjacent 244
transitional windows to be equal to the fixed window size, 245
we combine them together into one window while maintaining 246
their pattern using the linear interpolation method. 247
The second stage aims to recognize the activity type 248
in the window through two classifiers: the first classi- 249
fier aims to classify TAs. The other classifier aims to 250
IEEE Proof
4 IEEE SENSORS JOURNAL
classify BAs. Each classifier is implemented using CNN deep251
learning.252
A. Similarity Segmentation Approach253
Our proposed SSA aims to segment the activity signals254
by distinguishing between TA and non-TA (basic) windows255
using inner similarity features and dissimilarity between two256
adjacent windows. The proposed algorithm is shown in257
Algorithm 1.258
Initially, the signal is segmented using the sliding window259
method with a fixed window size w. The window size is the260
input size of the two classifiers, the BA classifier and the261
TA classifier. The algorithm accepts two adjacent windows262
as inputs, i.e., the previous window (xi1)and the current263
window (xi). Next, a flag (i st a )is defined to represent the264
detection of a TA window. By default, the flag is set to265
false, as shown in line 1. Then, the normalization process266
is performed on each window (line 2). Each window is split267
into three equal-sized subwindows; thus, it is recommended268
that the window size is divisible by 3. The inner similarity269
features are extracted by calculating the distances between270
each subwindow for each window. Then, the dissimilarity271
between the features of both windows is measured, which272
results in two values (b)and (e), as shown in line 4. These273
two values are derived based on the dissimilarity measure274
between the two adjacent windows, which will be explained275
in subsequent subsections.AQ:4 276
The (b)value is used to detect the beginning of the TA277
by comparing it with the beginning threshold (bts ). In the278
case where i st a is false, if the (b)value is greater than the279
threshold (bts ), the previous window is considered as BA280
window, and the current window is considered the beginning281
of a TA window, whereby the current window is added to a282
buffer xta (lines 7 and 8). Otherwise, the previous window is283
considered a BA window (line 10).284
In the case where i st a is true (a transitional window is285
detected), the algorithm starts aggregating the transitional286
windows by adding the current window to the buffer xta
287
(line 12). The process of aggregation is stop or continue288
based on (e). Specifically, the algorithm compares (e)with289
the ending threshold (ets ). As shown in line 13, if the value290
(e)is higher than the threshold (ets ), we reach the end of291
the TA window and aggregate the previous window as the292
ending of the TA. All the detected transitional windows are293
aggregated in xt a representing a single transitional window.294
Then, the algorithm performs interpolation on the aggregated295
window to resize it to the fixed window size to match the296
input size of the classifier (line 15). Otherwise, the window297
aggregation will continue in the next round.298
In some cases, the process of aggregating the windows299
exceeds the maximum length of the TA. This occurs when300
the algorithm fails to detect the end of the TA window. In this301
case, the buffer xta is assumed to contain both TA and BA302
signals. To resolve this, the algorithm assumes the first window303
in the buffer to be the TA window, and the remaining windows304
are considered as BA windows (lines 22–26). In some cases,305
we found that two adjacent windows have (b)values higher306
than the threshold (bts ). This case occurs when both windows307
Algorithm 1 Similarity Segmentation Approach
Input: previous window xi1, current window xi.
Output: distinguish between transitional activity
window from basic activity window.
i: the index of window.
c: number of adjacent transitional windows.
bts: threshold of the beginning of transitional
activity window (bi).
ets: threshold of the ending of transitional activity
window (ei).
i st a : flag to determine the transitional activity is
detected or not.
maxTAlen: maximum number of transitional windows.
xta: the detected transitional activity windows.
norm (x): perform the normalizing data for x.
interpolate (x): perform the interpolation method
on x.
1: i st a =False
2: nor m xi,norm(xi1)
3: If window xiis available Then
4: bi1,ei=compute__dissimilarity(xi1,xi)
5: If i st a =False Then
6: If bi1>bts Then
7: i st a =True
8: add xito xta
9: End If
10: send xi1to the basic activity classifier.
11: Else:
12: add xito xta .
13: c=c+1
14: If (ei>ets )Then
15: interpolate xta
16: send xta to the transitional activity
classifier.
17: clear xta
18: i st a =False
19: reset counter c=0
20: End If
21: If c>maxTAlen Then
22: send x0
ta to the transitional activity
classifier.
23: send x1
ta t o xc
ta to the basic activity
classifier.
24: clear xta
25: i st a =False
26: reset counter c=0
27: End If
28: End If
29: End If
have mixed samples of TA and non-TAs. However, in this case, 308
we consider the latest window to be the beginning of the TA 309
window. 310
B. Inner Similarity (Intrawindow Similarity) 311
Although TAs are short and infrequent, they have distinct 312
patterns that are different from BA patterns, as shown in Fig. 2.313
IEEE Proof
BARAKA AND MOHD NOOR: SSA FOR SENSOR-BASED ACTIVITY RECOGNITION 5
Fig. 2. Differential pattern for the transition activity and BA signals,
where window (xi) is a transition activity window and (xi1) is a BA
window with their subwindows.
The TA pattern is different from the beginning to the end314
of the window. Therefore, we propose a set of features to315
represent the inner characteristics of the window based on316
similarity measurements. The window features are extracted317
by measuring the similarity within the window and called the318
inner similarity. To this end, each window is further split into319
three subwindows (s1,s2, and s3), as shown in Fig. 2. The320
inner similarity is calculated between the subwindows using321
a distance function. Given a window xi, the inner similarity322
measurements of the window fi=(f1,f2,f3)are given as323
follows:324
f1=d(s1,s2)(1)325
f2=d(s2,s3)(2)326
f3=d(s1,s3)(3)327
where d(sk,sl)is the distance function to compute the inner328
similarity. Theoretically, any common distance function, such329
as Euclidean distance, Manhattan (City Block) distance, and330
Cosine similarity. In this study, the Manhattan distance is used331
to measure the inner similarity. Manhattan distance calculates332
the differences between two coordinates of paired objects333
[28]. Manhattan distance has been shown to outperform the334
Euclidean distance methods in terms of performance index335
and classification [28]. The Manhattan distance is calculated336
as the sum of the absolute differences between the two vectors,337
which is defined as follows:338
d(sk,sl)=X|sksl|(4)339
where drefers to the distance between two subwindows340
(sk,sl). Since Manhattan distance deals with two vectors, the341
subwindows are reshaped from (w/3)×Csize into a vector342
with size ((w/3)×C)×1 before calculating the Manhattan343
distance, where Cis the number of sensor’s channel.344
The value of the inner similarity feature is affected by the345
rate of motion within the window. The more motion within346
the segment, the higher difference between the subwindows,347
which increases the value of this feature. Conversely, the less348
motion within the window, the smaller difference between349
the subwindows and decreased the feature value. Since the350
dynamic activities have more motions compared to the static351
activities, the feature values of the dynamic activities are 352
higher than the feature values of the static activities. 353
C. Adjacent Windows Dissimilarity 354
The signal characteristics differ from TA to BAs (whether 355
the activities are static or dynamic). While the signal of static 356
activities is usually low frequency with low amplitude, the 357
signal of dynamic activities is composed of high frequency 358
with high amplitude. On the other hand, the signal of TA 359
is high frequency, but relatively much shorter compared to 360
the BA. Therefore, TA features differ from BA features. A high 361
difference between the features of two successive windows 362
indicates the presence of a TA window, either at the beginning 363
or at the end of the window. Thus, the proposed segmentation 364
approach exploits these temporal characteristics by deriving 365
two measures from the inner similarity features. This step 366
aims to calculate the degree of dissimilarity between the inner 367
similarity features of two consecutive windows. 368
To this end, two dissimilarity values are calculated between 369
two adjacent windows. The first value is b, which aims to 370
determine whether the current window is the beginning of the 371
TA or not. This value is measured by calculating the absolute 372
difference between the minimum value of previous window 373
features and the maximum value of current window features. 374
This value is high if the current window is the beginning of 375
TA and the previous window is the ending of BA. The second 376
value eaims to determine whether the current window is the 377
end of a TA or not. This value is measured by calculating the 378
absolute difference between the minimum value of the current 379
window features with the maximum value of the previous 380
window features. The ratio of this difference is calculated 381
based on the lowest value of features for the current window. 382
Given an inner similarity feature vector fiand fi1for 383
the current and previous windows, respectively, the adjacent 384
windows dissimilarity is calculated as follows:385
bi1=
min fi1
1,fi1
2,fi1
3max f1
1,fi
2,fi
3
min fi1
1,fi1
2,fi1
3×100 386
(5) 387
ei=
min fi
1,fi
2,fi
3max fi1
1,fi1
2,fi1
3
min fi
1,fi
2,fi
3×100 388
(6) 389
where bi1is calculated based on the measure of the 390
difference between the maximum feature value of the cur- 391
rent window and the minimum feature value of the previous 392
window, relative to the minimum feature value of the previous 393
window. Note that the feature values fiare based on the 394
similarity between the subwindows (inner similarity). Thus, 395
bi1will be large if the denominator is low and the numerator 396
is high. 397
eiis calculated based on the measure of the difference 398
between the maximum feature value of the previous window 399
and the minimum feature value of the current window, rel- 400
ative to the minimum feature value of the current window. 401
IEEE Proof
6 IEEE SENSORS JOURNAL
Fig. 3. Differentiation of the value of the maximum and minimum of the
features between two cases of two successive windows.
Thus, eiwill be large if the value of the denominator is low402
and the numerator is high.403
Therefore, the difference between band eis that the404
denominator for the bvalue is based on the features of405
the previous window, while the denominator for the evalue406
is based on the features of the current window. Also, the407
maximum and minimum feature values in the numerator of408
both band eare swapped between the current and previous409
windows.410
For more clarification, Fig. 3illustrates two cases that show
AQ:5
411
the differences in calculation between bi1and eivalues.412
The first case is shown in Fig. 3(a). The current win-413
dow (xi)is a BA, while the previous window (xi1)is414
a TA. Thus, the maximum and minimum feature values of415
the current window indicated by fi
max and fi
min are low416
because the distance between subwindows is small since the417
pattern of signal data is similar. For the previous window,418
the maximum feature value fi1
max is high and the minimum419
feature value fi1
min is low or medium because the distance420
between subwindows is large since the pattern of signal data421
is different. Based on (5) and (6), the minimum and maximum422
feature values are used to determine bi1and ei. Therefore,423
in this case, bi1and eivalues will be small and large,424
respectively.425
In the second case, as shown in Fig. 3(b), the current426
window (xi)is a TA and the previous window (xi1)is a427
BA. Thus, fi
max value is high and fi
min is low, while both428
fi1
max and fi1
min values are low. Thus, bi1and eivalues will429
be large and small, respectively.430
In all cases, the large value of bis used to determine the431
beginning of the TA, while the large value of eis used to432
determine the ending of the TA.433
Note that the measures of “high, mid, and low” that are used434
here are to describe the behavior of both values band eonly.435
The mathematical calculations will have a slightly different436
effect, especially when calculating the ratio.437
All previous steps are listed in Algorithm 2.438
Algorithm 2 Compute the Inner Similarity Features and
Windows Dissimilarity for Two Adjacent Windows
Input: previous window xi1, current window xi
Output: bi1,ei
bi1: is a value that refers to the end of a basic
activity and starts a transitional activity.
ei: is a value that refers to the end of a
transitional activity and starts a basic activity.
round() is rounding to the nearest integer.
split(x,c)split window xinto cequal-sized
sub-windows.
si
1,si
2,si
3: sub-windows of ith window.
1: si1
1,si1
2,si1
3=split(xi1,3,di m =0)
2: si
1,si
2,si
3=split(xi,3,di m =0)
3: fi1
1,fi1
2,fi1
3=calculate distances
between subwindows of xi1(si1
1,si1
2,si1
3)
4: fi
1,fi
2,fi
3=calculate distances
between subwindows of xi(si
1,si
2,si
3)
5: round all features,
round(fi1
1,fi1
2,fi1
3,fi
1,fi
2,fi
3)
6: bi1=
mi nfi1
1,fi1
2,fi1
3maxfi
1,fi
2,fi
3
mi nfi1
1,fi1
2,fi1
3×100
7: ei=
mi nfi
1,fi
2,fi
3maxfi1
1,fi1
2,fi1
3
mi nfi
1,fi
2,fi
3×100
8: return bi1,ei
D. Activity Classifiers 439
The CNNs have been shown to be effective in many 440
fields and have been widely used in HAR studies [29]. The 441
convolutional layers allow the network to automatically extract 442
features from each window for recognition tasks. Therefore, 443
two CNNs are built with the same architecture to classify the 444
windows. The architecture of the networks is illustrated in 445
Table I. This architecture is similar to the model architectures 446
reported in [13] and [30]. The first network is used to 447
recognize the BA windows, and the second network is used 448
to recognize the TAs. After the SSA distinguishes the window 449
into BA window or TA window, the BA window is passed 450
to the BA classifier, and the TA window is passed to the TA 451
classifier for activity recognition. 452
IV. EXPERIMENTS AND RES ULTS 453
In this section, we evaluate the performance of the SSA. 454
Also, the impact of similarity segmentation on recognition 455
performance is evaluated for both TA and human activities. All 456
experiments are performed using two different public datasets. 457
A. Datasets Description 458
In this study, we used two public benchmark datasets: 459
SBHARPT [7] and FORTH-TRACE [31], to perform our 460
experiments and evaluate the proposed approach. 461
The SBHARPT dataset is a collection of continuous data 462
of triaxial signals read from the accelerometer and gyroscope 463
sensors, and it is an updated version of the UCI HAR dataset 464
[32]. The dataset contains activity data from 30 subjects who 465
performed 12 activities detailed as follows: six BAs, where 466
IEEE Proof
BARAKA AND MOHD NOOR: SSA FOR SENSOR-BASED ACTIVITY RECOGNITION 7
TABLE I
STRU CT UR E OF BA AN D TA CNN CL AS SI FIE RS
three static activities (i.e., standing, sitting, and lying down)467
and three dynamic activities (i.e., walking, walking downstairs,468
and walking upstairs), and six postural transitions that are469
occurred between the static postures (i.e., stand-to-sit, sit-470
to-stand, sit-to-lie, lie-to-sit, stand-to-lie, and lie-to-stand). The471
signals are captured at a constant sampling rate of 50 Hz by472
the embedded accelerometer and gyroscope of the smartphone,473
which was attached to the waist during the execution of the474
activities. The dataset is split by subjects into 23 subjects for475
the training stage and seven subjects for the testing stage.476
Fig. 4shows the distribution of the proportion of the number477
of samples for each activity type in the dataset. Also, it demon-478
strates that the number of samples for TAs is too low compared479
to the number of samples for BAs.480
The FORTH-TRACE dataset is collected from 15 subjects481
wearing five Shimmer sensor nodes in five different locations.482
The sensor node is integrated with a triaxial inertial sensor483
(accelerometer, gyroscope, and magnetometer), which col-484
lected the activity data at a constant sampling rate of 51.2 Hz.485
The subjects performed 16 activities: seven BAs and nine486
postural TAs. The raw data of the dataset are split by subjects487
into 11 subjects for the training stage and four subjects for the488
testing stage.489
In addition, the FORTH-TRACE dataset includes activities490
with talking, such as sitting-while-taking. The annotation of491
the dataset differentiates between sitting activity and sitting-492
while-talking. However, in this study, we consider both493
activities to be the same type of activity because talking494
is not captured by the accelerometer and gyroscope, which495
are mainly used for capturing human motions [33]. Further-496
more, some of the postural TAs in FORTH-TRACE occurred497
Fig. 4. Number of samples for activity types in the SBHARPT dataset.
Fig. 5. Running time of our proposed segmentation approach based on
window size.
between the dynamic postures, which may lead to the misclas- 498
sification of TA. To avoid this, we considered the first activity 499
only regardless of the second activity (e.g., climb stairs to walk 500
activity become climb stairs activity only). 501
B. Experimental Setup 502
First, the sensor data are preprocessed to remove the missing 503
values. The main variables used from the datasets are the three 504
axes for the accelerometer and the three axes for the gyro- 505
scope. The subject id is used for splitting the dataset. The 506
other variables are neglected. All dataset files are combined 507
into one file to present the continuous data signal. Then, the 508
normalization method is performed to standardize the data 509
signals. Then, the data are segmented using the fixed-size 510
sliding window with nonoverlapping because our proposed 511
approach uses windowwise treating. Since the TA’s duration 512
is less than the BA’s duration, we consider window sizes that 513
IEEE Proof
8 IEEE SENSORS JOURNAL
are less than the minimum TA’s duration. While the minimum514
TA’s duration time is less than 2.6 s approximately, the signals515
are segmented by three different window sizes of less than516
(2.6 s). In this study, three different window sizes are used517
to evaluate the impact of window size on the performance of518
the proposed approach, i.e., 60 samples (1.2 s), 90 samples519
(1.8 s), and 120 samples (2.4 s). A window’s size refers to the520
size of a single window.521
The dataset is divided into training and testing datasets. The522
training set is further divided into the BA and TA training set.523
The BA training set contains just BA windows, while the TA524
training set contains just TA windows. The TA windows are525
aggregated and resized to the size of the window using linear526
interpolation.527
The BA classifier is trained using the BA windows, and the528
TA classifier is trained using the TA windows. Adam optimizer529
with a default learning rate (0.001) is used and sets the530
hyperparameters empirically. The batch size is 64 windows,531
and the maximum epoch is 180 epochs for the TA classifier532
and 200 epochs for the BA classifier. The models are trained533
using cross entropy loss.534
To perform the experiments optimally, we select the optimal535
value of the beginning of TA (bt s )and (ets )thresholds.536
The value is selected by some experiments that discuss in537
the results section. However, the threshold value is changed538
depending on some factors, such as dataset type and window539
size. While the maximum average duration of TAs is 4.95 s540
[7], we set the maximum length of TAs as (10.2 s).541
We implemented the proposed method by using Python and542
the models by using PyTorch on a workstation with CPU Intel543
Core i5, 16-GB RAM, and GPU GTX1660.544
C. Results and Discussion545
Here, we will explain and discuss the results of the evalu-546
ation experiments. Two types of experiments are performed:547
the first experiment aims to evaluate the performance of the548
proposed segmentation approach. The second experiment aims549
to evaluate the performance of the HAR model, and how our550
proposed SSA has improved the performance of HAR models551
compared to the fixed sliding window segmentation method.552
The performance of the HAR model is evaluated through553
the overall accuracy (ac) metric, which is measured as follows:554
ac =(TP +TN)
(TP +FP +TN +FN)(7)555
where TP is the true positive, TN is the true negative, FP is556
the false positive, and FN is the false negative.557
The activities performance is evaluated through the metrics:558
precision (Pc), recall (Rc), and f-score (Fc), which are559
calculated by the following equations:560
Pc=TP
TP +FP (8)561
Rc=TP
TP +FN (9)562
Fc=2×Pc×Rc
(Pc+Rc)(10)563
where cis the activity class.564
TABLE II
ACC UR ACY RE SU LTS FO R TH E DETECTION EXPERIMENTS OF THE
BEST VALU E OF BEGIN THRESHOLD bts FOR EA CH WI NDOW
SIZE. BE ST RE SULT IS IN BOLD
1) Performance of Non-TA and TA Window Detection:Three 565
groups of experiments are performed in this section. The first 566
group aims to find the optimal values for the thresholds. The 567
second group aims to find best similarity distance method for 568
the proposed approach. The third group aims to evaluate the 569
performance of SSA. 570
Initially, we performed the first group of experiments to 571
find the optimal threshold value of the beginning (bts )and 572
ending (et s )of TA windows. The experiments are performed 573
using three different window sizes, 60, 90, and 120 samples. 574
Also, the experiments are performed using different values 575
of thresholds (bts ), i.e., the threshold value starts at 600 and 576
increases by 100 for the SBHARPT dataset and starts at 577
800 and increases by 200 for the FORTH-TRACE dataset. 578
Table II shows the results of recognition accuracies for dif- 579
ferent window sizes and different threshold values. As shown 580
in the table, the best threshold value of beginning TA (bt s )is 581
780 and 1400 for SBHARPT and FORTH-TRACE datasets, 582
respectively. 583
The second experiments group is performed to determine 584
the optimal similarity distance method that obtains the higher 585
result with our proposed approach. Let s1and s2are two 586
subwindows for window i.The most common similarity 587
distance metrics are considered in other studies [34],[35],588
which includes City Block (Manhattan) and 589
Euclidean rX(s1s2)2(11) 590
Chebyshev max |s1s2|(12) 591
to analyze the similarity between subwindows features. 592
It is worth noting that distance function such as Hamming 593
[36] or Jaccard [37] functions can be used for categorical or 594
discrete data. The experiments are performed using 90 samples 595
window size since it achieved the best accuracy for the 596
SBHARPT dataset. The results show that using Manhattan 597
(City block) method in the proposed approach achieved the 598
best accuracy compared with other similarity distance meth- 599
ods, as shown in Table III.600
The third group of experiments is performed to evaluate 601
SSA performance in terms of distinguishing the TA and 602
IEEE Proof
BARAKA AND MOHD NOOR: SSA FOR SENSOR-BASED ACTIVITY RECOGNITION 9
TABLE III
ACC UR ACY RE SU LTS OF DISTINGUISHING BE TW EE N TA AND NO N-TAS
USI NG VARIOUS SIMILARITY MET HO DS F OR T HE SBHARPT DATASET
WITH 90 SAMPLES WIN DOW SI ZE . BEST RE SU LT IS IN BOL D
TABLE IV
ACC UR ACY RE SU LTS OF PROPOSED SSA TO DISTINGUISH BETWEEN
TA AND NON-TAS F OR BOT H DATASET S WITH THR EE DI FFE RE NT
WIN DOW SI ZE S. BES T RES ULT IS IN BOLD
TABLE V
PRECISION, RECALL,AND f-SC ORE RESU LTS O F TH E PROPOSED SSA
TO DISTINGUISH BET WE EN TA AND NO N- TAS FO R BOTH DATASE TS
WIT H THR EE DIFFER EN T WIN DO W SIZ ES
non-TA windows (binary classification). To achieve this, the603
dataset is annotated as follows. The BA windows are anno-604
tated with 0 and the TA windows are annotated with 1. We605
performed three experiments using three different window606
sizes, 60, 90, and 120 samples to evaluate the impact of the607
window size on the performance of the proposed segmentation608
approach.609
The performance of the segmentation approach is evaluated610
in terms of accuracy, precision, recall, and f-score metrics.611
Based on the experiments, it was found that the best win-612
dow size is 90, with an accuracy of 97.65% and 98.23%613
for the SBHART and FORTH-TRACE datasets, respectively.614
The recognition accuracy for each window size is shown in615
Table IV.616
Table Villustrates the recall, precision, and f-score mea-617
sures in detecting non-TA and TA classes for the SBHARPT618
and FORTH-TRACE datasets, respectively. From the results,619
the best window size is 90, which got the best results for620
overall evaluation metrics for both datasets. Also, the results621
of the metrics of BAs are higher than the TAs. This is due to622
the fact that the number of instances of TAs is lower than the623
number of BAs instances. Also, the appearance of TA is less624
than that of BAs.625
TABLE VI
COMPARISON BET WE EN T HE AC CURAC Y (%) OF ACTIVITY
RECOGNITION BETWEEN THE FIXED SLIDING WINDOW AND
SSA METHODS WI TH DIFFER EN T WIN DO W SIZES FOR
BOTH DATASE TS . BE ST RE SULT IS IN BOLD
In addition, the complexity of our proposed segmentation 626
approach is O(n2)and the space of input is two windows with 627
size (w×c), where wis the window size and cis the channel 628
numbers. Fig. 5shows the comparison of running time for 629
our proposed segmentation approach based on window size. 630
The total number of windows that proceed are 13 593, 9062, 631
and 6796 windows for window size 60, 90, and 120 samples, 632
respectively. 633
In summary, the results show that the proposed SSA is able 634
to effectively distinguish TA windows from BA windows for 635
both datasets. 636
2) Activity Recognition:In the third experiment, we evaluate 637
and compare the proposed SSA and the fixed sliding window 638
method in the context of activity recognition. The experiments 639
are performed using three different window sizes 60, 90, and 640
120 samples to evaluate the impact of window size on the 641
performance of activity recognition. The results of activity 642
recognition accuracy for SSA and fixed sliding window on 643
both datasets are shown in Table VI.644
For the SBHARPT dataset, it is observed that the SSA 645
achieved better recognition accuracies for all window sizes 646
compared to the fixed sliding window, whereby the accuracy 647
is improved by 2.16%, 2.29%, and 0.50% for 60 samples, 648
90 samples, and 120 samples, respectively. The fixed sliding 649
window achieved an accuracy of 89.75%, 90.42%, and 91.2% 650
for window sizes 60, 90, and 120 samples, respectively, while 651
the proposed SSA achieved an accuracy of 91.91%, 92.71%, 652
and 91.70% for window sizes 60, 90, and 120 samples, 653
respectively. Also, the results show that a window size of 654
90 samples achieved the best recognition accuracy, which is 655
comparable to an existing study reported in [12], and the study 656
found that window sizes ranging from 2 to 2.5 s demonstrate 657
the most accurate detection performance for common sports 658
activities. Therefore, the following comparative analysis and 659
discussion are based on 90 samples window size. 660
Table VII shows the precision, recall, and f-score measures 661
of each activity for both the SSA and the fixed sliding window 662
for the SBHARPT dataset. As shown in Table VII, both the 663
proposed SSA and fixed sliding window achieved a good 664
performance in classifying the BAs, with an f-score measure 665
above 95% for all activity classes except sitting and standing. 666
This is because both the sitting and standing activities are 667
static and have close features. Overall, the average f-scores 668
of the BAs for fixed sliding window and SSA are 92.83% and 669
93.67%, respectively. 670
IEEE Proof
10 IEEE SENSORS JOURNAL
TABLE VII
ACTIVITY RECOGNITION PERFORMANCE FOR THE SBHARPT
DATAS ET F OR A WI ND OW SIZ E 90 S AMPLES USI NG
SSA AND FIX ED SLIDING WINDOW
For TAs, the fixed sliding window performed poorly in671
classifying the activities with the f-score measures in the672
range of 61%–74%. This shows that the fixed window size673
cannot handle the TAs due to their varying lengths. Conversely,674
the proposed SSA achieved a significantly better performance675
in classifying the TAs, with f-score measures in the range676
of 75%–91%. The proposed SSA successfully detects the677
TA windows and aggregates them to produce more effective678
segmentation. Overall, the average f-scores of the TAs for679
fixed sliding window and SSA are 64.83% and 82.17%,680
respectively. Fig. 6presents the confusion matrix for activity681
recognition for window size 90 using the SSA approach.682
For the FORTH-TRACE dataset, it is observed that the683
proposed SSA achieved better recognition accuracies for all684
window sizes compared to the fixed sliding window, whereby685
the accuracy is improved by 2.02%, 2.69%, and 3.93%686
for 60 samples, 90 samples, and 120 samples, respectively.687
The fixed sliding window achieved an accuracy of 80.83%,688
82.06%, and 82.72% for window sizes 60, 90, and 120 sam-689
ples, respectively. SSA achieved an accuracy of 82.85%,690
84.75%, and 86.65% for the window size of 60, 90, and691
120 samples, respectively. Also, the results show that a692
window size of 120 samples achieved the best recognition693
accuracy. Therefore, the following comparative analysis and694
discussion are based on the window size of 120 samples.695
Table VIII shows the precision, recall, and f-score measures696
of each activity for both SSA and fixed sliding window in697
the FORTH-TRACE dataset. As shown in Table VIII, both698
the fixed sliding window achieved a moderate and close699
performance in classifying the BAs except sitting and standing700
Fig. 6. Confusion matrix of activity recognition results using SSA with
window size 90 samples for the SBHARPT dataset.
Fig. 7. Confusion matrix for activity recognition results using SSA with
window size 120 for the FORTH-TRACE dataset.
activities. The same as SBHARPT dataset, the sitting and 701
standing activities have lower f-score compared to other 702
activity classes. Nevertheless, the proposed SSA achieved 703
better f-score measure for all BA classes compared to the 704
fixed sliding window. Overall, the average f-scores of the BAs 705
for fixed sliding window and SSA are 83.25% and 86.75%, 706
respectively. 707
For the TAs, the fixed sliding window performed poorly in 708
classifying the TAs, with an f-score measuring 33% and 24% 709
for both TAs, respectively. This shows that the fixed window 710
size cannot handle the TAs due to their varying lengths. 711
Furthermore, the proposed SSA achieved better performance 712
for classifying the TAs, with an f-score measuring 61% and 713
65% for both TAs, respectively. This low performance is due to 714
combining the talking activity with its corresponding transition 715
activity. Overall, the average f-scores of the TAs for fixed 716
sliding window and SSA are 28.5% and 63%, respectively. 717
Fig. 7 presents the confusion matrix of activity recognition 718
using the SSA approach for the FORTH-TRACE dataset. 719
IEEE Proof
BARAKA AND MOHD NOOR: SSA FOR SENSOR-BASED ACTIVITY RECOGNITION 11
TABLE VIII
PRECISION, RECALL,A ND f-SCORE RESULT S OF ACTIVITY
RECOGNITION FOR THE FORTH-TRACE DATASET I N
WIN DOW SI ZE 1 20 SAMPLES USI NG SSM
APP ROAC H AN D FIX ED SLIDING WINDOW
3) Comparison With the State-of-the-Art Approaches:One720
of the main problems facing HAR research is the lack of721
a benchmark dataset and the lack of standardized evaluation722
of the HAR model’s performance, which leads to an unfair723
comparison between the state-of-the-art techniques [21]. This724
is because building a benchmark dataset requires a large725
number of attributes that differ from one database to another726
and cannot be standardized, such as sensor types, sensor727
positions, device types, and activity types. The performance728
of our model will be compared with the performance of some729
state-of-the-art models that use the same database.730
The SBHARPT dataset is mostly used in studies that731
consider TA in their activity recognition. Table IX shows the732
accuracy of the state-of-the-art HAR models on the SBHARPT733
dataset and the segmentation method for each study. Although734
some of the existing works show a higher recognition accuracy,735
the comparison is not completely fair because each study has736
its own setting or some limitations.737
In [22], the BA data are separated from the TA data, and738
then, each data are classified separately. The fixed sliding739
window is used for the segmentation process. Furthermore,740
different machine learning methods are built to obtain the741
best accuracy for each set. In [5], the TAs are treated as a742
single class. Thus, the recognition accuracies are actually not743
representing the model’s ability to classify the human activity744
classes individually.745
Irfan et al. [24] and Yulita and Saori [38] used the pre-746
segmented data of the dataset, which consists of a set of747
statistical features for each activity instead of the raw sensor748
data as in our study. Furthermore, the presegmented data have749
a much smaller number of windows. Thus, it is not appropriate750
to compare their experimental results with ours. Noor et al.751
[27] segment the raw data using the fixed sliding window752
method, and the classification model consists of concurrent753
TABLE IX
COMPARISON BET WE EN T HE STATE-O F-T HE -ART HA R MODELS FOR
TH E SBHARPT DATA SE T CONSIDERING THE TAS
feature learning pipelines to exploit the temporal relationships 754
between the adjacent windows. However, the recognition accu- 755
racy is less than our proposed approach due to the limitation of 756
the fixed sliding window. In summary, all the aforementioned 757
works used the fixed sliding window for the segmentation 758
process, which results in the suboptimal segmentation and the 759
model’s performance. 760
Improved segmentation methods have been proposed to 761
overcome the limitations of the fixed sliding window. In [6],762
the raw data are segmented using the fixed sliding window 763
method, and then, a cluster analysis method with K-means is 764
used to classify the windows. The method is not applicable in 765
real-time systems because it requires longitudinal sensor data 766
to be analyzed and clustered to segment the signal, whereas 767
the incoming sensor data need to be segmented immediately 768
in real-time systems. In addition, data repetition causes the 769
model to learn a different signal pattern than the actual 770
pattern of the activity signal. Noor [25] proposed an adaptive 771
sliding window method, whereby the window can be expanded 772
based on the distribution of the data within the window. The 773
proposed method assumes the distribution of the data to be 774
Gaussian which may not be correct. Our proposed method 775
does not make any assumptions about the data, whereby the 776
segmentation completely relies on the pattern of the signals. 777
In addition, the method requires computational resources to 778
analyze the data and expand the window, which makes it 779
unsuitable for real-time systems. 780
IEEE Proof
12 IEEE SENSORS JOURNAL
In summary, we find that the results of some studies781
have completely different settings compared to ours. These782
limitations make the comparison of the results with each783
other precisely unfair. Some studies have proposed improved784
segmentation methods; however, the methods are resource785
intensive and not applicable for real-time systems. The results786
obtained by our proposed approach are significant and valuable787
for advancing the field of HAR.788
V. CONCLUSION789
This article presents an SSA for the signal segmentation790
task. The proposed approach segments the sensor data and791
distinguishes between the BA windows and TA windows using792
the inner similarity and the temporal dissimilarity between793
adjacent windows. The proposed segmentation method feeds794
two CNN classifiers to perform the classification task. The795
first classifier aims to classify the BAs, and the other aims to796
classify the TAs.797
For future work, the differentiation method of the temporal798
windows features proved to be effective in signal segmentation799
as well as the detection of TAs. Therefore, end-to-end models800
could be developed for both the segmentation and the recog-801
nition tasks. Finally, the proposed approach can be improved802
to enhance the detection of the TA that appears between two803
dynamic BAs. It is worth noting that, in principle, the pro-804
posed approach can be applied to dense sensing-based HAR,805
whereby the sensor data can be binary and/or multimodal data.806
This is because the nature of the data is time series, whereby807
segmentation is needed before the classification is performed.808
A suitable distance function can be specified based on the data809
type of the dataset.810
REFERENCES811
[1] Z. Benhaili, Y. Balouki, and L. Moumoun, A hybrid deep neural812
network for human activity recognition based on IoT sensors, Int.813
J. Adv. Comput. Sci. Appl., vol. 12, no. 11, pp. 250–257, 2021, doi:814
10.14569/IJACSA.2021.0121129.815
[2] N. Alhammad and H. Al-Dossari, “Dynamic segmentation for physical816
activity recognition using a single wearable sensor, Appl. Sci., vol. 11,817
no. 6, p. 2633, Mar. 2021, doi: 10.3390/app11062633.818
[3] N. F. Ghazali, M. A. As’ari, N. Shahar, and H. F. M. Latip, “Investigation819
on the effect of different window size in segmentation for common820
sport activity, in Proc. Int. Conf. Smart Comput. Electron. Enterprise821
(ICSCEE), Jul. 2018, pp. 1–7, doi: 10.1109/ICSCEE.2018.8538429.822
[4] U. Bermejo, A. Almeida, A. Bilbao-Jayo, and G. Azkune, “Embedding-823
based real-time change point detection with application to activity824
segmentation in smart home time series data,” Expert Syst. Appl.,825
vol. 185, Dec. 2021, Art. no. 115641, doi: 10.1016/j.eswa.2021.115641.826
[5] L. Chen, S. Fan, V. Kumar, and Y. Jia, A method of human activity827
recognition in transitional period,” Information, vol. 11, no. 9, pp. 1–17,828
2020, doi: 10.3390/INFO11090416.829
[6] J.-H. Li, L. Tian, H. Wang, Y. An, K. Wang, and L. Yu, “Segmentation830
and recognition of basic and transitional activities for continuous phys-831
ical human activity, IEEE Access, vol. 7, pp. 42565–42576, 2019, doi:832
10.1109/ACCESS.2019.2905575.833
[7] J.-L. Reyes-Ortiz, L. Oneto, A. Samà, X. Parra, and D. Anguita,834
“Transition-aware human activity recognition using smartphones,835
Neurocomputing, vol. 171, pp. 754–767, Jan. 2016, doi:836
10.1016/j.neucom.2015.07.085.837
[8] M. Wairagkar et al., “A novel approach for modelling and classifying838
sit-to-stand kinematics using inertial sensors,” 2021, arXiv:2107.06859.839
[9] K. E. Pilario et al., “Wearables-based multi-task gait and activity840
segmentation using recurrent neural networks, Sensors, vol. 8, no. 1,841
pp. 1–11, 2020, doi: 10.3390/s20113117.842
[10] P. Kumar and N. I. T. Hamirpur, “Human activity recognition with 843
deep learning? Methods, progress & possibilities,” Tech. Rep., 2021, 844
doi: 10.20944/preprints202102.0349.v4. AQ:6845
[11] M. H. M. Noor, Z. Salcic, and K. I.-K. Wang, Adaptive sliding window 846
segmentation for physical activity recognition using a single tri-axial 847
accelerometer, Pervas. Mobile Comput., vol. 38, pp. 41–59, Jul. 2017, 848
doi: 10.1016/j.pmcj.2016.09.009. 849
[12] A. O. Jimale and M. H. M. Noor, “Subject variability in sensor-based 850
activity recognition, J. Ambient Intell. Humanized Comput., vol. 14, 851
no. 4, pp. 3261–3274, 2021, doi: 10.1007/s12652-021-03465-6. 852
[13] S. Zhang et al., “Deep learning in human activity recognition with 853
wearable sensors: A review on advances, Sensors, vol. 22, no. 4, 854
p. 1476, Feb. 2022, doi: 10.3390/s22041476. 855
[14] B. A. Atalaa, A. Alenany, A. Helmi, and I. Ziedan, “Effect of data 856
segmentation on the quality of human activity recognition, vol. 4480, 857
no. 7, pp. 133–145, 2020. AQ:7858
[15] J. Whitlock, O. Krand, and S. Jain, “Understanding activity segmentation 859
for multi-sport competitions,” in Proc. 4th ACM Workshop Wearable 860
Syst. Appl., Jun. 2018, pp. 16–20, doi: 10.1145/3211960.3211972. 861
[16] C. F. Martindale, V. Christlein, P. Klumpp, and B. M. Eskofier, 862
“Wearables-based multi-task gait and activity segmentation using 863
recurrent neural networks,” Neurocomputing, vol. 432, pp. 250–261, 864
Apr. 2021, doi: 10.1016/j.neucom.2020.08.079. 865
[17] B. Fida, I. Bernabucci, D. Bibbo, S. Conforto, and M. Schmid, 866
“Varying behavior of different window sizes on the classification 867
of static and dynamic physical activities from a single accelerome- 868
ter, Med. Eng. Phys., vol. 37, no. 7, pp. 705–711, Jul. 2015, doi: 869
10.1016/j.medengphy.2015.04.005. 870
[18] O. Banos, J.-M. Galvez, M. Damas, H. Pomares, and I. Rojas, “Window 871
size impact in human activity recognition, Sensors, vol. 14, no. 4, 872
pp. 6474–6499, Apr. 2014, doi: 10.3390/s140406474. 873
[19] A. Akbari, J. Wu, R. Grimsley, and R. Jafari, “Hierarchical signal seg- 874
mentation and classification for accurate activity recognition, in Proc. 875
ACM Int. Joint Conf. Int. Symp. Pervasive Ubiquitous Comput. Wearable 876
Comput., Oct. 2018, pp. 1596–1605, doi: 10.1145/3267305.3267528. 877
[20] C. Xiao, Y. Lei, Y. Ma, F. Zhou, and Z. Qin, “DeepSeg: Deep-learning- 878
based activity segmentation framework for activity recognition using 879
WiFi, IEEE Internet Things J., vol. 8, no. 7, pp. 5669–5681, Apr. 2021, 880
doi: 10.1109/JIOT.2020.3033173. 881
[21] J. He, Q. Zhang, L. Wang, and L. Pei, “Weakly supervised human 882
activity recognition from wearable sensors by recurrent attention learn- 883
ing,” IEEE Sensors J., vol. 19, no. 6, pp. 2287–2297, Mar. 2019, doi: 884
10.1109/JSEN.2018.2885796. 885
[22] K. J. Lone, L. Hussain, S. Saeed, A. Aslam, A. Maqbool, and F. M. Butt, 886
“Detecting basic human activities and postural transition using robust 887
machine learning techniques by applying dimensionality reduction meth- 888
ods,” Waves Random Complex Media, vol. 2021, pp. 1–26, Sep. 2021, 889
doi: 10.1080/17455030.2021.1971325. 890
[23] J. Shi, D. Zuo, and Z. Zhang, “Transition activity recognition system 891
based on standard deviation trend analysis, Sensors, vol. 20, no. 11, 892
pp. 1–11, 2020, doi: 10.3390/s20113117. 893
[24] S. Irfan, N. Anjum, N. Masood, A. S. Khattak, and N. Ramzan, 894
“A novel hybrid deep learning model for human activity recognition 895
based on transitional activities,” Sensors, vol. 21, no. 24, pp. 1–20, 2021, 896
doi: 10.3390/s21248227. 897
[25] M. H. M. Noor, “Feature learning using convolutional denoising 898
autoencoder for activity recognition,” Neural Comput. Appl., vol. 2021, 899
pp. 1–12, Jan. 2021, doi: 10.1007/s00521-020-05638-4. 900
[26] K. Xia, J. Huang, and H. Wang, “LSTM-CNN architecture for human 901
activity recognition, IEEE Access, vol. 8, pp. 56855–56866, 2020, doi: 902
10.1109/ACCESS.2020.2982225. 903
[27] M. H. Mohd Noor, S. Y. Tan, and M. N. Ab Wahab, “Deep temporal 904
Conv-LSTM for activity recognition, Neural Process. Lett., vol. 54, 905
no. 5, pp. 4027–4049, Oct. 2022, doi: 10.1007/s11063-022-10799-5. 906
[28] M. Rizwan and D. V. Anderson. (2016). Comparison of Distance 907
Metrics for Phoneme Classification Based On Deep Neural Net- 908
work Features and Weighted K-NN Classifier. [Online]. Available: 909
http://ttic.uchicago.edu/~klivescu/MLSLP2016/rizwan.pdf 910
[29] A. Ferrari, D. Micucci, M. Mobilio, and P. Napoletano, “Trends in 911
human activity recognition using smartphones, J. Reliable Intell. Envi- 912
ronments, vol. 7, no. 3, pp. 189–213, Sep. 2021, doi: 10.1007/s40860- 913
021-00147-0. 914
[30] Y. Xu and T. T. Qiu, “Human activity recognition and embedded appli- 915
cation based on convolutional neural network, J. Artif. Intell. Technol.,916
vol. 1, no. 1, pp. 51–60, Dec. 2020, doi: 10.37965/jait.2020.0051. 917
IEEE Proof
BARAKA AND MOHD NOOR: SSA FOR SENSOR-BASED ACTIVITY RECOGNITION 13
[31] K. Karagiannaki, A. Panousopoulou, and P. Tsakalides, A benchmark918
study on feature selection for human activity recognition,” in Proc. ACM919
Int. Joint Conf. Pervasive Ubiquitous Computing: Adjunct, Sep. 2016,920
pp. 105–108, doi: 10.1145/2968219.2971421.921
[32] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A public922
domain dataset for human activity recognition using smartphones, in923
Proc. ESANN, 2013.AQ:8 924
[33] I. Dirgová Luptáková, M. Kubovˇ
cík, and J. Pospíchal, “Wearable sensor-925
based human activity recognition with transformer model,” Sensors,926
vol. 22, no. 5, p. 1911, Mar. 2022, doi: 10.3390/s22051911.927
[34] M. Kljun, M. Teršek, and E. Štrumbelj, “A review and comparison of928
time series similarity measures,” in Proc. 29th Int. Electrotech. Comput.929
Sci. Conf., 2020, pp. 21–22.930
[35] K. Kavitha, B. Sandhya, and B. Thirumala, “Evaluation of distance931
measures for feature based image registration using AlexNet, Int.932
J. Adv. Comput. Sci. Appl., vol. 9, no. 10, pp. 284–290, 2018, doi:933
10.14569/IJACSA.2018.091034.934
[36] K. Labib, P. Uznanski, and D. Wolleb-Graf, “Hamming distance com-935
pleteness,” in Proc. Annu. Symp. Combinat. Pattern Matching, 2019,936
vol. 128, no. 14, pp. 1–14, doi: 10.4230/LIPIcs.CPM.2019.14.937
[37] S. S. Choi, S. H. Cha, and C. C. Tappert, “A survey of binary similarity938
and distance measures,” in Proc. 13th World Multi-Conf. Syst. Cybern.939
Inform., 2009, vol. 3, no. 1, pp. 80–85.940
[38] I. N. Yulita and S. Saori, “Human activities and postural transitions941
classification using support vector machine and k-nearest neighbor942
methods,” IOP Conf. Ser. Earth Environ. Sci., vol. 248, no. 1, 2019,943
Art. no. 012025, doi: 10.1088/1755-1315/248/1/012025.944
AbdulRahman M. A. Baraka received the B.S. 945
degree in computer information system from 946
Al-Quds Open University, Ramallah, Palestine, 947
in 1999, and the M.S. degree in information 948
technology from the Islamic University in Gaza, 949
Gaza, Palestine, in 2011. He is currently pur- 950
suing the Ph.D. degree in computer science in 951
Universiti Sains Malaysia, Minden, Malaysia. 952
He is currently a Lecturer with the Faculty 953
of Technology and Applied Sciences, Al-Quds 954
Open University. His research interests include 955
deep learning and computer vision. 956
Mohd Halim Mohd Noor received the B.Eng. 957
(Hons.) and M.Sc. degrees in 2004 and 2009, 958
respectively. He received the Ph.D. degree in 959
computer systems engineering from the Uni- 960
versity of Auckland, Auckland, New Zealand, 961
in 2017. 962
He is currently a Senior Lecturer with the 963
School of Computer Sciences, Universiti Sains 964
Malaysia, Minden, Malaysia. His main research 965
interests include machine learning, deep learn- 966
ing, computer vision, and pervasive computing. 967
AQ:9
... SSA [26] 92.71% ...
... Additionally, many machine learning techniques are developed to get the best accuracy for each set. Furthermore, SSA [26] it is still threshold-based approach. The limitations of the threshold-based methods are that distinct performance declines for mixed activities, and they neglect the relationship between windows segmentation and recognition processes. ...
Article
Full-text available
Signal segmentation is a critical stage in activity recognition. Most existing studies adopted the fixed-size sliding window method for this stage. However, the fixed-size sliding window may not produce the most effective segmentation method since human activities have variable length durations, particularly transitional activities. In this paper, we propose a novel deep similarity segmentation model that overcomes not only the limitations of the fixed sliding window method but also the weaknesses of threshold-based segmentation methods. Specifically, a novel deep learning model is designed to distinguish between transitional and basic activity by treating the segmentation task as a binary classification task. The proposed model accepts multiple sequence windows and extracts the local features automatically for each window using convolutional neural networks. The temporal features of windows are extracted by measuring the similarity and differentiation between the local features of adjacent windows. The local features are combined with the temporal features and passed to deep fully connected layers to distinguish the transitional activity from the basic activity windows. The evaluation relies on two public datasets, SBHARPT and FORTH-TRACE. According to the experimental findings, the proposed approach can distinguish between basic and transitional activities with an accuracy of 98.51% and 98.41%, respectively. Additionally, our method outperformed the fixed sliding window for activity recognition by 2.93% and 2.24% for both datasets, respectively, achieving an accuracy of 93.35% and 84.96%. These results are significant and outperform the precision of cutting-edge models.
... O VER the last decades, wearable applications have achieved rapid development. Human activity recognition (HAR) [1], [2], [3], [4] is a crucial application that greatly enhances machine understanding and analysis of human activities from a human perspective. This understanding has significant implications for various fields, including augmented reality (AR) [5], human-computer interaction [6], [7], and health monitoring [8], [9]. ...
Preprint
Full-text available
The rapid development of wearable sensors promotes convenient data collection in human daily life. Human Activity Recognition (HAR), as a prominent research direction for wearable applications, has made remarkable progress in recent years. However, existing efforts mostly focus on improving recognition accuracy, paying limited attention to the model’s functional scalability, specifically its ability for continual learning. This limitation greatly restricts its application in open-world scenarios. Moreover, due to storage and privacy concerns, it is often impractical to retain the activity data of different users for subsequent tasks, especially egocentric visual information. Furthermore, the imbalance between visual-based and inertial-measurement-unit (IMU) sensing modality introduces challenges of lack of generalization when employing conventional continual learning techniques. In this paper, we propose a motivational learning scheme to address the limited generalization caused by the modal imbalance, enabling foreseeable generalization in a visual-IMU multimodal network. To overcome forgetting, we introduce a robust representation estimation technique and a pseudo-representation generation strategy for continual learning. Experimental results on the egocentric multimodal activity dataset UESTC-MMEA-CL demonstrate the effectiveness of our proposed method. Furthermore, our method effectively leverages the generalization capabilities of IMU-based modal representations, outperforming general and state-of-the-art continual learning methods in various task settings.
... O VER the last decades, wearable applications have achieved rapid development. Human activity recognition (HAR) [1], [2], [3], [4] is a crucial application that greatly enhances machine understanding and analysis of human activities from a human perspective. This understanding has significant implications for various fields, including augmented reality (AR) [5], human-computer interaction [6], [7], and health monitoring [8], [9]. ...
Preprint
Full-text available
p>The rapid development of wearable sensors promotes convenient data collection in human daily life. Human Activity Recognition (HAR), as a prominent research direction for wearable applications, has made remarkable progress in recent years. However, existing efforts mostly focus on improving recognition accuracy, paying limited attention to the model's functional scalability, specifically its ability for continual learning. This limitation greatly restricts its application in open-world scenarios. Moreover, due to storage and privacy concerns, it is often impractical to retain the activity data of different users for subsequent tasks, especially egocentric visual information. Furthermore, the imbalance between visual-based and inertial-measurement-unit (IMU) sensing modality introduces challenges of lack of generalization when employing conventional continual learning techniques. In this paper, we propose a motivational learning scheme to address the limited generalization caused by the modal imbalance, enabling foreseeable generalization in a visual-IMU multimodal network. To overcome forgetting, we introduce a robust representation estimation technique and a pseudo-representation generation strategy for continual learning. Experimental results on the egocentric multimodal activity dataset UESTC-MMEA-CL demonstrate the effectiveness of our proposed method. Furthermore, our method effectively leverages the generalization capabilities of IMU-based modal representations, outperforming general and state-of-the-art continual learning methods in various task settings.</p
... O VER the last decades, wearable applications have achieved rapid development. Human activity recognition (HAR) [1], [2], [3], [4] is a crucial application that greatly enhances machine understanding and analysis of human activities from a human perspective. This understanding has significant implications for various fields, including augmented reality (AR) [5], human-computer interaction [6], [7], and health monitoring [8], [9]. ...
Preprint
Full-text available
p>The rapid development of wearable sensors promotes convenient data collection in human daily life. Human Activity Recognition (HAR), as a prominent research direction for wearable applications, has made remarkable progress in recent years. However, existing efforts mostly focus on improving recognition accuracy, paying limited attention to the model's functional scalability, specifically its ability for continual learning. This limitation greatly restricts its application in open-world scenarios. Moreover, due to storage and privacy concerns, it is often impractical to retain the activity data of different users for subsequent tasks, especially egocentric visual information. Furthermore, the imbalance between visual-based and inertial-measurement-unit (IMU) sensing modality introduces challenges of lack of generalization when employing conventional continual learning techniques. In this paper, we propose a motivational learning scheme to address the limited generalization caused by the modal imbalance, enabling foreseeable generalization in a visual-IMU multimodal network. To overcome forgetting, we introduce a robust representation estimation technique and a pseudo-representation generation strategy for continual learning. Experimental results on the egocentric multimodal activity dataset UESTC-MMEA-CL demonstrate the effectiveness of our proposed method. Furthermore, our method effectively leverages the generalization capabilities of IMU-based modal representations, outperforming general and state-of-the-art continual learning methods in various task settings.</p
... O VER the last decades, wearable applications have achieved rapid development. Human activity recognition (HAR) [1], [2], [3], [4] is a crucial application that greatly enhances machine understanding and analysis of human activities from a human perspective. This understanding has significant implications for various fields, including augmented reality (AR) [5], human-computer interaction [6], [7], and health monitoring [8], [9]. ...
Preprint
Full-text available
p>The rapid development of wearable sensors promotes convenient data collection in human daily life. Human Activity Recognition (HAR), as a prominent research direction for wearable applications, has made remarkable progress in recent years. However, existing efforts mostly focus on improving recognition accuracy, paying limited attention to the model's functional scalability, specifically its ability for continual learning. This limitation greatly restricts its application in open-world scenarios. Moreover, due to storage and privacy concerns, it is often impractical to retain the activity data of different users for subsequent tasks, especially egocentric visual information. Furthermore, the imbalance between visual-based and inertial-measurement-unit (IMU) sensing modality introduces challenges of lack of generalization when employing conventional continual learning techniques. In this paper, we propose a motivational learning scheme to address the limited generalization caused by the modal imbalance, enabling foreseeable generalization in a visual-IMU multimodal network. To overcome forgetting, we introduce a robust representation estimation technique and a pseudo-representation generation strategy for continual learning. Experimental results on the egocentric multimodal activity dataset UESTC-MMEA-CL demonstrate the effectiveness of our proposed method. Furthermore, our method effectively leverages the generalization capabilities of IMU-based modal representations, outperforming general and state-of-the-art continual learning methods in various task settings.</p
Article
Full-text available
Pengenalan aktivitas manusia telah banyak dikembangkan untuk berbagai keperluan, seperti kesehatan, olahraga, hingga pengawasan lanjut usia. Penggunaan perangkat sensor menjadi salah satu pilihan dalam melakukan pengenalan aktivitas manusia. Sensor accelerometer adalah salah satu perangkat yang umum digunakan dalam pengenalan aktivitas. Data sensor ini memerlukan teknik dan algoritma yang tepat sehingga menghasilkan hasil pengenalan aktivitas yang sesuai. Penggunaan tradisional machine learning menjadi salah satu teknik yang dapat digunakan, teknik ini memerlukan proses ekstraksi fitur, dan seleksi fitur. Teknik seleksi fitur mana dan berapa jumlah fitur yang tepat untuk mendapatkan performa machine learning yang optimal perlu dilakukan investigasi lebih lanjut. Pada penelitian ini, dilakukan evaluasi terhadap kombinasi sejumlah fitur menggunakan algoritma machine learning: Extreme Gradient Boosting (XGB), Gradient Boosting (GBoost), Random Forest (RF), Decision Tree (DT), dan Support Vector Machine (SVM. Dataset publik yang digunakan yaitu FORTH-TRACE. Sensor yang digunakan adalah sensor accelerometer. Fitur yang digunakan meliputi nilai minimum, nilai maksimum, nilai rata-rata, nilai tengah, standar deviasi, dan nilai interkuartil. Sedangkan seleksi fitur yang digunakan adalah Analysis of Variance (ANOVA) dan Mutual Information (MI). Performa machine learning yang paling optimal ketika jumlah fitur 17 sampai dengan 18 fitur dengan akurasi 0,875, sedangkan performa machine learning paling optimal dicapai dengan menggunakan Extreme Gradient Boosting (XGB).
Article
The rapid advancement of wearable sensors has significantly facilitated data collection in our daily lives. Human Activity Recognition (HAR), a prominent research area in wearable technology, has made substantial progress in recent years. However, the existing efforts often overlook the issue of functional scalability in models, making it challenging for deep models to adapt to application scenarios that require continuous evolution. Furthermore, when employing conventional continual learning techniques, we’ve observed an imbalance between visual-based and inertial-measurement-unit (IMU) sensing modalities during joint optimization, which hampers model generalization and poses a significant challenge. To obtain a generalized representation more adapted to continual tasks, we propose a motivational optimization scheme to address the limited generalization caused by the modal imbalance, enabling foreseeable generalization in a visual-IMU multimodal network. To prevent the forgetting of previously learned activities, we introduce a robust representation estimation technique and a pseudo representation generation strategy for continual learning. Experimental results on the egocentric activity dataset UESTC-MMEA-CL demonstrate the effectiveness of our proposed method. Furthermore, our method effectively leverages the generalization capabilities of IMU-based modal representations, outperforming state-of-the-art methods in various task settings.
Article
Full-text available
Sit-to-stand transitions are an important part of activities of daily living and play a key role in functional mobility in humans. The sit-to-stand movement is often affected in older adults due to frailty and in patients with motor impairments such as Parkinson’s disease leading to falls. Studying kinematics of sit-to-stand transitions can provide insight in assessment, monitoring and developing rehabilitation strategies for the affected populations. We propose a three-segment body model for estimating sit-to-stand kinematics using only two wearable inertial sensors, placed on the shank and back. Reducing the number of sensors to two instead of one per body segment facilitates monitoring and classifying movements over extended periods, making it more comfortable to wear while reducing the power requirements of sensors. We applied this model on 10 younger healthy adults (YH), 12 older healthy adults (OH) and 12 people with Parkinson’s disease (PwP). We have achieved this by incorporating unique sit-to-stand classification technique using unsupervised learning in the model based reconstruction of angular kinematics using extended Kalman filter. Our proposed model showed that it was possible to successfully estimate thigh kinematics despite not measuring the thigh motion with inertial sensor. We classified sit-to-stand transitions, sitting and standing states with the accuracies of 98.67%, 94.20% and 91.41% for YH, OH and PwP respectively. We have proposed a novel integrated approach of modelling and classification for estimating the body kinematics during sit-to-stand motion and successfully applied it on YH, OH and PwP groups.
Article
Full-text available
Human activity recognition has gained interest from the research community due to the advancements in sensor technology and the improved machine learning algorithm. Wearable sensors have become more ubiquitous, and most of the wearable sensor data contain rich temporal structural information that describes the distinct underlying patterns and relationships of various activity types. The nature of those activities is typically sequential, with each subsequent activity window being the result of the preceding activity window. However, the state-of-the-art methods usually model the temporal characteristic of the sensor data and ignore the relationship of the sliding window. This research proposes a novel deep temporal Conv-LSTM architecture to enhance activity recognition performance by utilizing both temporal characteristics from sensor data and the relationship of sliding windows. The proposed architecture is evaluated based on the dataset consisting of transition activities—Smartphone-Based Recognition of Human Activities and Postural Transitions dataset. The proposed hybrid architecture with parallel features learning pipelines has demonstrated the ability to model the temporal relationship of the activity windows where the transition of activities is captured accurately. Besides that, the size of sliding windows is studied, and it has shown that the selection of window size is affecting the accuracy of the activity recognition. The proposed deep temporal Conv-LSTM architecture can achieve an accuracy score of 0.916, which outperformed the state-of-the-art accuracy.
Article
Full-text available
Computing devices that can recognize various human activities or movements can be used to assist people in healthcare, sports, or human–robot interaction. Readily available data for this purpose can be obtained from the accelerometer and the gyroscope built into everyday smartphones. Effective classification of real-time activity data is, therefore, actively pursued using various machine learning methods. In this study, the transformer model, a deep learning neural network model developed primarily for the natural language processing and vision tasks, was adapted for a time-series analysis of motion signals. The self-attention mechanism inherent in the transformer, which expresses individual dependencies between signal values within a time series, can match the performance of state-of-the-art convolutional neural networks with long short-term memory. The performance of the proposed adapted transformer method was tested on the largest available public dataset of smartphone motion sensor data covering a wide range of activities, and obtained an average identification accuracy of 99.2% as compared with 89.67% achieved on the same data by a conventional machine learning method. The results suggest the expected future relevance of the transformer model for human activity recognition.
Article
Full-text available
Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human–computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning has greatly pushed the boundaries of HAR on mobile and wearable devices. This paper systematically categorizes and summarizes existing work that introduces deep learning methods for wearables-based HAR and provides a comprehensive analysis of the current advancements, developing trends, and major challenges. We also present cutting-edge frontiers and future directions for deep learning-based HAR.
Article
Full-text available
In recent years, a plethora of algorithms have been devised for efficient human activity recognition. Most of these algorithms consider basic human activities and neglect postural transitions because of their subsidiary occurrence and short duration. However, postural transitions assume a significant part in the enforcement of an activity recognition framework and cannot be neglected. This work proposes a hybrid multi-model activity recognition approach that employs basic and transition activities by utilizing multiple deep learning models simultaneously. For final classification, a dynamic decision fusion module is introduced. The experiments are performed on the publicly available datasets. The proposed approach achieved a classification accuracy of 96.11% and 98.38% for the transition and basic activities, respectively. The outcomes show that the proposed method is superior to the state-of-the-art methods in terms of accuracy and precision.
Article
Full-text available
Building classification models in activity recognition is based on the concept of exchangeability. While splitting the dataset into training and test sets, we assume that the training set is exchangeable with the test set and expect good classification performance. However, this assumption is invalid due to subject variability of the training and test sets due to age differences. This happens when the classification models are trained with adult dataset and tested it with elderly dataset. This study investigates the effects of subject variability on activity recognition using inertial sensor. Two different datasets—one locally collected from 15 elders and another public from 30 adults with eight types of activities—were used to evaluate the assessment techniques using ten-fold cross-validation. Three sets of experiments have been conducted: experiments on the public dataset only, experiments on the local dataset only, and experiments on public (as training) and local (as test) datasets using machine learning and deep learning classifiers including single classifiers (Support Vector Machine, Decision Tree, K-Nearest Neighbors), ensemble classifiers (Adaboost, Random Forest, and XGBoost), and Convolutional Neural Network. The experimental results show that there is a significant performance drop in activity recognition on different subjects with different age groups. It demonstrates that on average the drop in recognition accuracy is 9.75 and 12% for machine learning and deep learning models respectively. This confirms that subject variability concerning age is a valid problem that degrades the performance of activity recognition models.
Article
Full-text available
Recognizing human activities and monitoring population behavior are fundamental needs of our society. Population security, crowd surveillance, healthcare support and living assistance, and lifestyle and behavior tracking are some of the main applications that require the recognition of human activities. Over the past few decades, researchers have investigated techniques that can automatically recognize human activities. This line of research is commonly known as Human Activity Recognition (HAR). HAR involves many tasks: from signals acquisition to activity classification. The tasks involved are not simple and often require dedicated hardware, sophisticated engineering, and computational and statistical techniques for data preprocessing and analysis. Over the years, different techniques have been tested and different solutions have been proposed to achieve a classification process that provides reliable results. This survey presents the most recent solutions proposed for each task in the human activity classification process, that is, acquisition, preprocessing, data segmentation, feature extraction, and classification. Solutions are analyzed by emphasizing their strengths and weaknesses. For completeness, the survey also presents the metrics commonly used to evaluate the goodness of a classifier and the datasets of inertial signals from smartphones that are mostly used in the evaluation phase.
Article
In the whole world, approximately two billion people are using smartphones for their better life The smartphone’s sensor data consist of high dimensions which increase complexity of training model and degrade overall performance of a classifier. We investigated and analyzed the performance of different machine learning algorithms with various dimensionality reduction techniques on publicly available smartphone-based recognition of human activities and postural transitions dataset. The highest accuracy was obtained at set 1 using support vector machine (SVM) linear kernel (97.40%) followed by set 3 using eXtreme boosting (XGBoost) (96.92%), set 2 using adaptive boosting multi-class (AdaBoost.M1) (95.69%), set 4 using AdaBoost. M1 (82.17%). Using different dimensionality reduction methods for set 1, principal component analysis (PCA) provided the highest accuracy using average neural network (AvNNET) (98.27%), while chi-squared gives the accuracy of 93.72% using the XGBoost classifier. For set 2, PCA gives the highest accuracy of 95.83% using the AvNNET classifier while for set 3 Chi-squared gives an accuracy of 84.46% using average neural network (AvNNET), SVM.L, and stochastic Gradient Boosting Machine (GBM) classifiers. The results reveal that overall dimensionality reduction methods outer performed. The outcomes will be helpful in making wise decisions in detecting various activities in anti-terrorists, surveillance, etc.
Article
Human activity recognition systems are essential to enable many assistive applications. Those systems can be sensor-based or vision-based. When sensor-based systems are deployed in real environments, they must segment sensor data streams on the fly in order to extract features and recognize the ongoing activities. This segmentation can be done with different approaches. One effective approach is to employ change point detection (CPD) algorithms to detect activity transitions (i.e. determine when activities start and end). In this paper, we present a novel real-time CPD method to perform activity segmentation, where neural embeddings (vectors of continuous numbers) are used to represent sensor events. Through empirical evaluation with 3 publicly available benchmark datasets, we conclude that our method is useful for segmenting sensor data, offering significant better performance than state of the art algorithms in two of them. Besides, we propose the use of retrofitting, a graph-based technique, to adjust the embeddings and introduce expert knowledge in the activity segmentation task, showing empirically that it can improve the performance of our method using three graphs generated from two sources of information. Finally, we discuss the advantages of our approach regarding computational cost, manual effort reduction (no need of hand-crafted features) and cross-environment possibilities (transfer learning) in comparison to others.