EEG Wavelet Classification for Fall Detection with Genetic Programming



The ability to autonomously detect a physical fall is one of the many enabling technologies towards better independent living. This work explores how genetic programming can be leveraged to develop machine learning pipelines for the classification of falls via EEG brainwave activity. Eleven physical activities (5 types of falls and 6 non-fall activities) are clustered into a binary classification problem of whether a fall has occurred or not. Wavelet features are extracted from the brainwaves before machine learning models are explored and tuned for better k-fold classification accuracy, precision, recall, and F1 score. Results show that solutions discovered through genetic programming can detect falls with a mean accuracy of 89.34%, precision of 0.883, recall of 0.908, and an F1-Score of 0.895 from EEG brainwave data alone. All three genetic programming solutions chose a further step of Principal Component Analysis for additional feature extraction from the computed wavelet features, each with iterated powers of 6, 3, and 7, and all with a randomised Singular Value Decomposition approach. The best model is finally analysed via the Receiver Operating Characteristic and Precision-Recall curves. Python code for each of the genetic programming pipelines are provided.
The ability to autonomously detect a physical fall is one of many
enabling technologies towards better independent living. Many
State-of-the-Art fall detection techniques are based on the detec-
tion of physical movements such as through accelerometers and
gyroscopes, whereas many consider other traits such as bioelectri-
cal activity from the muscles and brain. Applied machine learning
is never perfect, and thus provision of multiple methods of fall
detection reduces the potential error in the real world, since there
are several observational models to consider rather than reliance
on just one or a few. In the United Kingdom during 2021, there were
more deaths registered than births [
] in part due to the world
facing an ever-ageing population. The demographics of those who
provide care and those who are service users are changing in size
and pace at considerable rates throughout the world [
], and thus
changes are required for healthcare systems throughout the world
to continue to operate eectively and provide a suitable level of
care to those who require it. A number of state-of-the-art solutions
to these issues are presented in the form of applied articial intelli-
gence for independent assisted living [
]. This work proposes
the utilisation of a single electroencephalography electrode to de-
tect the event of a fall autonomously through a process of data
collection, feature extraction, processing, and machine learning.
To detect a fall by this method would provide a further facet to
independent assisted living and allow for further independence
within the home. The main scientic contributions of this work are
as follows:
Exploration of brainwave features via Kullback-Leibler Di-
vergence shows that the absolute mean of the 8
and variance of the 3
wavelet hold the most information
for fall classication.
Balancing and normalisation provide an alleviation to the
data scarcity of brainwave activity recorded during a fall
Manual tuning of machine learning models presents a Gauss-
ian Process as a candidate for fall detection.
Genetic Programming to develop pipelines for better classi-
cation are successful, and the three solutions found outper-
form all other approaches explored within this work.
The remainder of this article is as follows; Section 2 explores the
background and state of the art within the elds of study related to
this work. Section 3 then describes the method of the experiments
prior to the results being presented in Section 4. Finally, Section 5
concludes this study and suggests future work based on the ndings.
Falls in older adults are caused in part by loss of balance due to
ageing [
]. The risk of preventable injury by a fall grows with age,
with around 33% of older adults experiencing a fall once or more
per year, and around half of people over the age of 80 experience
falls annually [
]. According to the NHS, falls do not often re-
sult in serious physical injuries in older adults, but can cause the
person to lose condence, withdraw socially, and feel like they
have lost their independence [
]. It was noted in [
], that 0.1%
of all healthcare expenditures in the United States and 1.5% in Eu-
rope are directly related to fall-related injuries. The review notes
risk factors including impaired balance and gait, polypharmacy,
history of previous falls, advancing age, sex, visual impairments,
cognitive decline, and environmental factors. In the United States,
there were an estimated 10,300 fatal and 2.6 million non-fatal fall-
related injuries in the year 2000 alone [
]. The main goal of fall
detection is the employment of technology to detect a fall event
(abnormal behaviour recognition), leading to a quicker response
Table 1: Class labels applied to group the 11 individual activ-
ities found within the dataset [14].
Activity Duration (s) Class Label
Falling forward using hands 10 Falling
Falling forward using knees 10 Falling
Falling backwards 10 Falling
Falling sideward 10 Falling
Falling sitting in empty chair 10 Falling
Walking 60 Not Falling
Standing 60 Not Falling
Sitting 60 Not Falling
Picking up an object 10 Not Falling
Jumping 30 Not Falling
Laying 60 Not Falling
from carers, and alleviates issues in situations where the suerer of
the fall cannot locate or reach an emergency call button or cord [
Falls can be detected through a number of proposed methods in-
cluding the analysis of wireless networks [
], computer vision [
thermal image processing [
], acoustic classication [
], and ac-
tivities recorded via wearable sensors [
]. Adkin et al. [
] note that
compensatory balance reactions are recognisable within recorded
EEG data. In [
], authors proposed a random forest ensemble for the
classication of fall events and drowsiness with electrodes embed-
ded within a helmet. The model achieved around 98% accuracy, but
the authors note the exhaustiveness of the electrode array approach
in terms of its computational complexity and thus inference time
of the model, and the authors propose that future work may nd
more eciency in an array of fewer electrodes. Annese et al. [
proposed a multimodal approach to learning from EEG and EMG
signals. In particular, seven electrodes are placed around the motor
cortex and the occipital lobe. The results on the dataset were almost
perfect, but similarly to the previous work, the authors note the
computational expense of the approach. Given the level of con-
sumer hardware available which could be provided by healthcare
systems, a slow classication of an event would not solve the goal of
quick response times that fall detection requires during real-world
use. The NeuroSky EEG headset has a single electrode placed on
the Fp1 position within the 10-20 EEG electrode placement system.
Although many of the commercial applications of the device are
based on the classication of concentration [
], the NeuroSky has
proposed applications in fatigue detection [
], blink detection [
and fall detection [14].
The initial raw signals are collected from the Preliminar Fall-UP
Dataset presented in [
]. The dataset is comprised of 11 activi-
ties performed by 4 subjects (three trials each). This work focuses
only on the data recorded by the Neurosky MindWave EEG device,
and all other features are disregarded. Table 1 details the binary
classication problem that is formed from the dataset due to the
consideration of whether a fall is occurring during the recording.
Feature extraction in EEG is the process of deriving mathematical
descriptions of sections of the wave for classication [
], and
wavelet characteristics have been noted as informative descriptors
of brainwave activity [
]; EEG signals are divided into half-
second windows, and seven sets of features are extracted, which
leads to a dataset of 39 numerical features. The spectral entropies
of the signals are computed via Fourier transform. The spectral
entropy is given as
is the power
spectrum and probability distribution of the input signal. Shan-
non entropy
𝑃(𝑥𝑖)𝑙𝑜𝑔𝑃 (𝑥𝑖)
is also calculated. In
terms of each wavelet scale, the following features are extracted
via the continuous wavelet transform: absolute mean value, en-
ergy, entropy, standard deviation, and variance. After extraction,
all numerical features are normalised to the range 0-1.
Prior to machine learning, the dataset is explored to discern
how eective each attribute is for classication prediction. The
information gain
𝐼𝐺 (𝑇 , 𝑎)=𝐸(𝑇) 𝐸(𝑇|𝑎)
of each attribute is con-
sidered via observed changes in entropy
𝐸(𝑠)=Í𝑗𝑝𝑗𝑙𝑜𝑔 (𝑝𝑗)
Hyperparameters for the KNN and Random Forest models are ex-
plored through a linear search
, ...,
to discern
whether hyperparameter tuning has a noticeable eect on mean
classication metrics. Various machine learning algorithms are se-
lected with a range of dierent statistical methods to provide a
general overview of the classication ability using multiple meth-
ods (see Section 4.5 for more details). Following this, further tuning
is performed via Adaptive Boosting [
] on all the selected models
that are compatible with the algorithm (Random Forest, Logistic
Regression, Naive Bayes, Stochastic Gradient Descent). Finally, a
Genetic Programming approach is explored through a tree-based
algorithm detailed in [
]; the algorithm is executed three times
with random seeds equal to the iteration (1, 2, 3) and the source
code is provided. All models are trained by 10-fold cross-validation
with a seed set to 1for randomisation and are therefore directly
comparable. The algorithms were trained on an overclocked Intel
Core i7-8700K CPU (4.3GHz) with scikit-learn [
] and TPOT [
In this section, the results of all planned experiments are presented.
First, the information gain of the best features are noted prior to
a a machine learning argument for class balancing and numerical
normalisation are presented. Hyperparameter optimisation of select
models is explored, and boosting is performed where possible. This
section also details the results of genetic programming before giving
a nal comparison of all experiments performed in this work.
4.1 Data Preprocessing
The information gain (Kullback-Leibler divergence) of the top 5 fea-
tures within the dataset by 10-fold cross-validation can be observed
in Table 2. Prior to performing the experiments, Table 3 shows
further details on the reasoning behind class balancing. When the
dataset is unbalanced, there is a much higher frequency of EEG sig-
nals linked to activities under the category of not falling. Due to this,
misleading results can be achieved; for example, even though the
class balanced approach seemingly has a lower classication accu-
racy (83.3% vs. 92.21%), the ability to recognise the falling behaviour
is improved from 885 correctly classied instances to 980. The base-
line (prediction based on the most common label) for the balanced
Table 2: Top 5 features in the dataset by their Kullback-
Leibler divergence after feature extraction and normalisa-
Attribute KLD Rank
Wavelet absolute mean_8 0.288 ±0.004 2 ±1.18
Wavelet variance_3 0.288 ±0.006 2.4 ±1.28
Wavelet variance_4 0.286 ±0.005 2.5 ±0.92
Wavelet standard deviation_5 0.284 ±0.005 5.8 ±4.66
Wavelet absolute mean_2 0.28 ±0.006 12.1 ±5.84
Table 3: Confusion matrices of balanced and unbalanced
datasets following the training of a simple random decision
tree. Due to the higher frequency of "Not Falling", classi-
cation without class balancing produces misleadingly high
Balanced (Acc 83.3%) Unbalanced (Acc 92.21%)
No Fall Fall No Fall Fall
856 246 No Fall 4771 261 No Fall
122 980 Fall 217 885 Fall
Table 4: Classication metrics on normalised and non-
normalised numeric attributes with a simple random de-
cision tree.
Normalised Non-Normalised
Precision Recall F-Score Precision Recall F-Score
0.839 0.834 0.833 0.837 0.833 0.833
dataset is 50% whereas the baseline for the unbalanced dataset is
82.03% - thus, balancing in this preliminary example provides a
33.3% advantage over the baseline, whereas leaving the dataset
unbalanced provides only a 10.18% advantage over the baseline. In
Table 4, the classication metrics are compared when normalising
the data within the range of 0-1. As can be observed for this pre-
liminary decision tree classier, the metrics increase slightly when
normalisation is performed.
It is due to these examples and discussion that the normalised
and equally balanced dataset is chosen for the remainder of the
experiments presented in this work.
4.2 Hyperparameter Tuning
Figures 1 and 2 show the aects of the number of estimators in the
Random Forest model. The best overall model was a random forest
containing 80 decision trees, which had a mean accuracy of 84.94,
a precision of 0.81, a recall of 0.915, and an F-Score of 0.856. These
were the highest observed metrics within the linear search except
for mean precision, where a Random Forest of 50 trees scored 0.81.
A similar linear search for the value of
within the K-Nearest
Neighbour model can be observed in Figures 3 and 4. The most
eective model was
40, which had a mean accuracy of 73.37, a
precision of 0.793, a recall of 0.634 and an F score of 0.704.
Trees in Forest
Mean Accuracy
20 40 60 80 100
Figure 1: Eect of the number of estimators on the mean
kfold accuracy of the Random Forest model.
Trees in Forest
20 40 60 80 100
Mean Accuracy Mean Precision Mean Recall
Mean F-Score
Figure 2: Eect of the number of estimators on the mean
kfold accuracy, precision, recall, and F-Score of the Random
Forest model.
K-Nearest Neighbours
Mean Accuracy
20 40 60 80 100
Figure 3: Eect of the number of K-Nearest Neighbours on the
mean 10-fold accuracy of the K-Nearest Neighbours model.
K-Nearest Neighbours
20 40 60 80 100
Mean Accuracy Mean Precision Mean Recall
Mean F-Score
Figure 4: Eect of the number of K-Nearest Neighbours on
the mean 10-fold accuracy, precision, recall, and F-Score of
the K-Nearest Neighbours model.
Table 5: Results for the Adaptive Boosted models (Log. - Lo-
gistic Regression).
Model Acc. Prec. Recall F1
RF 84.71 (2.51) 0.81 (0.03) 0.908 (0.027) 0.856 (0.021)
Log. 61.57 (2.28) 0.575 (0.024) 0.881 (0.022) 0.696 (0.021)
SGD 59.84 (2.48) 0.564 (0.023) 0.846 (0.132) 0.672 (0.063)
NB 48.69 (5.49) 0.536 (0.257) 0.459 (0.425) 0.359 (0.283)
Mean Accuracy
Naive Bayes Stochastic
Vanilla Boosted
Figure 5: Comparison of models before and after being Adap-
tive Boosted.
4.3 Adaptive Boosting
The models which supported adaptive boosting due to their ability
to predict probabilities are presented in Table 5. Figure 5 shows a
comparison between the original model and the eect of adaptive
boosting. It can be observed that adaptive boosting Random Forest
and Naive Bayes models for this problem leads to a lower mean
classication accuracy, whereas Logistic Regression and Stochas-
tic Gradient Descent classication is improved. It must be noted
here that although improvements were made in some cases, these
were not competitive with the other results explored. Additionally,
Mean Accuracy
5 10 15 20 25 30
Iteration 1 Iteration 2 Iteration 3
Figure 6: Best tness (mean accuracy) observed during each
generation for three genetic programming experiments.
Table 6: Classication metrics of the best solutions discov-
ered after three individual runs for the genetic programming
GP Accuracy Precision Recall F1
188.79 (1.88) 0.892 (0.024) 0.889 (0.042) 0.888 (0.016)
288.97 (2.14) 0.882 (0.013) 0.901 (0.04) 0.891 (0.021)
389.34 (2.19) 0.883 (0.021) 0.908 (0.037) 0.895 (0.02)
Adaptive Boosting is computationally expensive compared to many
of the approaches explored in this work.
4.4 Genetic Programming
As previously described, the genetic programming approach ex-
plored 30 generations with 20 solutions as a population size. The
learning process for three iterations of the GP algorithm can be ob-
served in Figure 6, and the best nal solutions are detailed further in
Table 6. Although starting at the highest tness, iteration 1 had the
lowest nal score of 88.79%, with iteration 2 (which started at the
lowest tness) scoring slightly more by the end of the simulation
at 88.79%. The best solution found was that by iteration 3, which
scored 89.34%. Due to their complexity, the solutions are presented
by their iteration ID in this work - the source code for all three
machine learning pipelines can be found in Appendix A. Although
features are extracted manually, it is interesting to note that all
simulations decided upon further engineering through Principal
Component Analysis (PCA); a number of related works have also
proposed PCA as a dimensionality reduction technique to improve
EEG classication [13, 28, 32].
4.5 Comparison of all Models
A nal comparison of all models is provided in Table 7. As can
be observed, the best models were all those that were explored
through genetic programming. Though, it is worth noting that
these models are relatively complex, whereas the Gaussian Process
and Random Forest models are less computationally expensive but
compete at -2.86% -4.4%, respectively. Interestingly, the adaptive
Table 7: Overall comparison of all fall detection models explored within this work.
Model Accuracy Precision Recall F1
Genetic Programming (3) 89.34 (2.19) 0.883 (0.021) 0.908 (0.037) 0.895 (0.02)
Genetic Programming (2) 88.97 (2.14) 0.882 (0.013) 0.901 (0.04) 0.891 (0.021)
Genetic Programming (1) 88.79 (1.88) 0.892 (0.024) 0.889 (0.042) 0.888 (0.016)
Gaussian Process 86.48 (2.65) 0.842 (0.044) 0.902 (0.033) 0.87 (0.024)
Random Forest 84.94 (2.39) 0.81 (0.03) 0.915 (0.027) 0.856 (0.021)
AB(Random Forest) 84.71 (2.51) 0.81 (0.03) 0.908 (0.027) 0.856 (0.021)
Extreme Gradient Boost 76.95 (3.16) 0.791 (0.038) 0.733 (0.036) 0.761 (0.033)
Adaptive Boosting 73.59 (4.12) 0.777 (0.058) 0.665 (0.051) 0.715 (0.045)
K-Nearest Neighbours 73.37 (3.29) 0.793 (0.049) 0.634 (0.046) 0.704 (0.038)
Linear Discriminant Analysis 64.93 (3.29) 0.611 (0.037) 0.824 (0.056) 0.7 (0.034)
AB(Logistic Regression) 61.57 (2.28) 0.575 (0.024) 0.881 (0.022) 0.696 (0.021)
Linear SVM 61.16 (1.92) 0.572 (0.022) 0.88 (0.018) 0.694 (0.019)
Logistic Regression 60.84 (1.92) 0.57 (0.022) 0.882 (0.02) 0.692 (0.019)
Radial Basis SVM 59.98 (2.15) 0.562 (0.024) 0.898 (0.021) 0.691 (0.021)
AB(Stochastic Gradient Descent) 59.84 (2.48) 0.564 (0.023) 0.846 (0.132) 0.672 (0.063)
Quadratic Discriminant Analysis 59.39 (2.08) 0.557 (0.025) 0.914 (0.015) 0.692 (0.021)
Naive Bayes 58.44 (1.78) 0.549 (0.021) 0.939 (0.016) 0.693 (0.019)
Stochastic Gradient Descent 55.17 (3.68) 0.381 (0.251) 0.625 (0.433) 0.468 (0.312)
AB(Naive Bayes) 48.69 (5.49) 0.536 (0.257) 0.459 (0.425) 0.359 (0.283)
0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate
True Positive Rate
ROC fold 0 (AUC = 0.951)
ROC fold 1 (AUC = 0.927)
ROC fold 2 (AUC = 0.934)
ROC fold 3 (AUC = 0.940)
ROC fold 4 (AUC = 0.954)
ROC fold 5 (AUC = 0.892)
ROC fold 6 (AUC = 0.969)
ROC fold 7 (AUC = 0.949)
ROC fold 8 (AUC = 0.920)
ROC fold 9 (AUC = 0.916)
Mean ROC (AUC = 0.935
1 std. dev.
Figure 7: Receiver Operating Characteristic (ROC) curve for
the best genetic programming-based solution.
boost of the Naive Bayes model was worse than random guessing,
and this was the only instance of such an occurrence. The Receiver
Operating Characteristic (ROC) and Precision-Recall curves for the
best solution can be observed within Figures 7 and 8, respectively.
To nally conclude, this work has explored how machine learning
and genetic programming can be leveraged to autonomously detect
physical falls via a single electrode reading brain activity. Although
the problem was dicult, due in part to activities such as laying
down being present in the category of not falling, genetic program-
ming developed a machine learning pipeline that could detect falls
0.0 0.2 0.4 0.6 0.8 1.0
PR fold 0 (AUC = 0.929)
PR fold 1 (AUC = 0.906)
PR fold 2 (AUC = 0.898)
PR fold 3 (AUC = 0.920)
PR fold 4 (AUC = 0.937)
PR fold 5 (AUC = 0.876)
PR fold 6 (AUC = 0.942)
PR fold 7 (AUC = 0.917)
PR fold 8 (AUC = 0.905)
PR fold 9 (AUC = 0.898)
Precision-Recall (AUC = 0.912)
Figure 8: Precision-Recall curve for the best genetic
programming-based solution.
with an average accuracy of 89.34%.
The results presented in this work provide a good basis for future
experiments, given that some approaches were particularly worse
than the more impressive set of results. In the future, larger datasets
could be leveraged to attempt a generalisation of the population. In
particular, a larger dataset collected from a larger number of subjects
would also enable leave-one-subject-out cross validation to test this.
Additional ensemble methods could also be explored, as the genetic
programming results seem to point towards a statistical ensemble
being a particularly powerful method for EEG-based fall detection.
In addition to the models explored, future work could involve the
multimodal classication of falls by including information collected
by other sensors e.g. those which are wearable and ambient sensors
around the home environment. Finally, deep learning and data
augmentation could be explored towards methods that can be tuned
in the future as more data becomes available.
This appendix provides the source code for the nal solutions found
by the three iterations of genetic programming. Python 3.x code is
presented and is compatible with the scikit-learn library.
A.1 Iteration 1
iter1 =make_pipeline(
StackingEstimator(estimator =make_pipeline(
PCA(iterated_power = 6, svd_solver =
ExtraTreesClassifier(bootstrap =False,
criterion ="entropy", max_features =
0.6500000000000001, min_samples_leaf
= 5, min_samples_split = 11,
n_estimators = 100)
KNeighborsClassifier(n_neighbors = 64,p= 2,
weights ="distance")
A.2 Iteration 2
iter2 =make_pipeline(
PCA(iterated_power = 3, svd_solver =
StackingEstimator(estimator =LogisticRegression(C
= 10.0, dual =False, penalty ="l2")),
GradientBoostingClassifier(learning_rate = 0.5,
max_depth = 7, max_features = 0.5,
min_samples_leaf = 2, min_samples_split = 8,
n_estimators = 100, subsample = 1.0)
A.3 Iteration 3
iter3 =make_pipeline(
PCA(iterated_power = 7, svd_solver =
StackingEstimator(estimator =
GradientBoostingClassifier(learning_rate =
0.5, max_depth = 8, max_features = 0.5,
min_samples_leaf = 12, min_samples_split = 7,
n_estimators = 100, subsample = 1.0)),
KNeighborsClassifier(n_neighbors = 63,p= 1,
weights ="distance")
