ArticlePDF Available

Abstract and Figures

Many machine learning models show black box characteristics and, therefore, a lack of transparency, interpretability, and trustworthiness. This strongly limits their practical application in clinical contexts. For overcoming these limitations, Explainable Artificial Intelligence (XAI) has shown promising results. The current study examined the influence of different input representations on a trained model's accuracy, interpretability, as well as clinical relevancy using XAI methods. The gait of 27 healthy subjects and 20 subjects after total hip arthroplasty (THA) was recorded with an inertial measurement unit (IMU)-based system. Three different input representations were used for classification. Local Interpretable Model-Agnostic Explanations (LIME) was used for model interpretation. The best accuracy was achieved with automatically extracted features (mean accuracy M acc = 100%), followed by features based on simple descriptive statistics (M acc = 97.38%) and waveform data (M acc = 95.88%). Globally seen, sagittal movement of the hip, knee, and pelvis as well as transversal movement of the ankle were especially important for this specific classification task. The current work shows that the type of input representation crucially determines interpretability as well as clinical relevance. A combined approach using different forms of representations seems advantageous. The results might assist physicians and therapists finding and addressing individual pathologic gait patterns.
Content may be subject to copyright.
sensors
Article
Interpretability of Input Representations for Gait
Classification in Patients after Total Hip Arthroplasty
Carlo Dindorf 1, * , Wolfgang Teufl 2, Bertram Taetz 2, Gabriele Bleser 2
and Michael Fröhlich 1
1Department of Sports Science, Technische Universität Kaiserslautern, Erwin-Schrödinger-Str. 57,
67663 Kaiserslautern, Germany; michael.froehlich@sowi.uni-kl.de
2Junior Research Group wearHEALTH, Technische Universität Kaiserslautern, Gottlieb-Daimler-Str. 48,
67663 Kaiserslautern, Germany; teufl@cs.uni-kl.de (W.T.); taetz@cs.uni-kl.de (B.T.); bleser@cs.uni-kl.de (G.B.)
*Correspondence: carlo.dindorf@sowi.uni-kl.de; Tel.: +49-631-205-5172
Received: 1 July 2020; Accepted: 4 August 2020; Published: 6 August 2020


Abstract:
Many machine learning models show black box characteristics and, therefore, a lack of
transparency, interpretability, and trustworthiness. This strongly limits their practical application
in clinical contexts. For overcoming these limitations, Explainable Artificial Intelligence (XAI) has
shown promising results. The current study examined the influence of dierent input representations
on a trained model’s accuracy, interpretability, as well as clinical relevancy using XAI methods.
The gait of 27 healthy subjects and 20 subjects after total hip arthroplasty (THA) was recorded with
an inertial measurement unit (IMU)-based system. Three dierent input representations were used
for classification. Local Interpretable Model-Agnostic Explanations (LIME) was used for model
interpretation. The best accuracy was achieved with automatically extracted features (mean accuracy
M
acc
=100%), followed by features based on simple descriptive statistics (M
acc
=97.38%) and
waveform data (M
acc
=95.88%). Globally seen, sagittal movement of the hip, knee, and pelvis as well
as transversal movement of the ankle were especially important for this specific classification task.
The current work shows that the type of input representation crucially determines interpretability
as well as clinical relevance. A combined approach using dierent forms of representations seems
advantageous. The results might assist physicians and therapists finding and addressing individual
pathologic gait patterns.
Keywords:
explainable artificial intelligence; inertial measurement unit; machine learning;
biomechanics; gait; total hip replacement
1. Introduction
Identification and discrimination of group dierences are important aspects of biomechanical
research [
1
,
2
]. With modern movement tracking systems, a huge amount of data is available
(big data) [
3
,
4
]. The progressive development of motion analysis systems based on inertial measurement
units (IMUs) contributes in particular to the generation of large amounts of data, because they
make valid and reliable biomechanical data easily accessible [
5
]. This provides the potential to
generate new knowledge and a better understanding of human biomechanics. However, classical
inference-based statistical methods show limited capabilities in analyzing the emerging—often complex
and multivariate—amounts of data and, thus, machine learning models have gained importance [
3
,
4
,
6
].
Many studies have obtained promising results for the classification of pathological movements
(e.g., hip osteoarthritis [
7
], after stroke [
8
], and Parkinson’s Disease [
9
]). However, many of the machine
learning models show black box characteristics and a lack of transparency [
10
]. For example, this
does not comply with the requirements of the European General Data Protection Regulation (GDPR,
Sensors 2020,20, 4385; doi:10.3390/s20164385 www.mdpi.com/journal/sensors
Sensors 2020,20, 4385 2 of 14
EU 2016/679) [
11
] and strongly limits practical application in clinical contexts. In order to improve
interpretability, transparency, and clinical relevancy, the models themselves (model interpretability) as
well as the practical usefulness and interpretability of the input variables used for modeling (feature
interpretability) must be taken into account.
The input representation plays an important role in classification accuracy [
12
], as well as
in interpretability, clinical relevancy, and comparability with previous research. Dierent feature
extraction approaches can be found in the literature:
(i)
Many studies have used simple descriptive statistics of the gait waveforms such as peak values,
range of motion, or respective side dierences [
13
,
14
]. They are straight forward to interpret and
are often mentioned in the literature for describing gait characteristics. However, it is unclear
if important information would be a priori discarded, and model performance is consequently
negatively aected. A further limitation is the dependence on expert or prior knowledge.
(ii) An alternative approach, which is independent of prior knowledge, is the use of entire concatenated
waveforms as input features [
7
,
15
,
16
]. This allows for an interpretation of group dierences
through the determination of important areas of the waveforms. However, it is unclear if this
shows better discriminative power compared with the abovementioned extracted statistical
features and, therefore, enhances classification performance. Correlations and redundancy of the
inputs may further be problematic.
(iii)
Lastly, automated feature extraction using a vast amount of possibly meaningful statistics can be
applied [
17
]. Feature extraction algorithms such as tsfresh [
18
] or featuretools [
19
] can be used for
this. However, the extracted features are often nested, complex, and hard to interpret, therefore
showing limited comparability with the literature, which results in questionable clinical relevance.
Dimensionality reduction methods can be applied before classification to compress the data onto
a new feature space, with the aim of capturing as much variance as possible of the original data [
15
,
20
].
Nevertheless, this makes interpretation harder because the components of the new feature space
must be interpreted first. For all mentioned approaches, it should be noted that feature selection
may further improve a model’s accuracy, reduce computing power, prevent overfitting, and improve
interpretability [21].
Regarding model interpretability, on the one hand, complex machine learning models (e.g., deep
neural networks) often achieve more accurate results compared with simpler models (e.g., decision
trees). On the other hand, complexity leads to reduced transparency and interpretability (trade-o
between model accuracy and interpretability) [
22
,
23
], which results in the black box characteristic of
many models [
10
]. For the user, it is therefore hard to trust in the model and its decision because it is
opaque as to what the model has really learned and why it makes certain decisions [
24
]. These factors
currently limit practical application in clinical contexts [25].
Explainable Artificial Intelligence (XAI) has gained great interest in recent years and oers methods
for increasing the transparency and trustworthiness of black box models [
10
]. Local Interpretable
Model-Agnostic Explanations (LIME) [
26
], SHapley Additive exPlanations (SHAP) [
27
], and Deep
Learning Important FeaTures (DeepLIFT) [
28
] should be mentioned as prominent interpretation tools.
For example, LIME performs an approximation of a single prediction of a black box model with a simpler
interpretable model to explain how a black box model makes a single prediction [
26
]. Initial studies
applying XAI methods with clinical data have produced promising results [
16
]. Therefore, XAI methods
seem promising for making machine learning models more useable in practical clinical applications.
The application of XAI methods in the context of biomechanical data analysis is a young field of
research. Practical applications in clinical contexts are still very rare. There is need for research in order
to assess potentials, show limitations, and identify further research directions in the biomechanical
and clinical domain. As a step towards practical clinical application, the present work focuses on the
input representation. Dierent kinds of input representations take dierent perspectives on the data
and provide dierent insights. Yet, they all show benefits and limitations regarding model accuracy,
Sensors 2020,20, 4385 3 of 14
interpretability, and clinical relevancy. The application of XAI methods based on dierent input
representations could, therefore, lead to new insights and an even better understanding of the data.
To the best of our knowledge, no similar comparison has been performed so far. Therefore, we wanted
to check if the application of XAI methods on models, trained on dierent input representations, leads
to congruent results and provides, taking them all together, better interpretability. For this reason,
we used a highly relevant use case example and compared the gait kinematics, measured by means of
an IMU system, of patients after total hip arthroplasty (THA) (most important surgery for the treatment
of degenerative hip osteoarthritis [29]) with that of a group of healthy subjects.
2. Materials and Methods
2.1. Subjects, Data Acquisition, and Data Preprocessing
For the present study, the IMU-based gait data of a healthy sample from [
30
] and a sample of
patients after THA from [
13
] were employed. The studies were approved by the ethical committee
of the Technische Universität Kaiserslautern and the Universität Paderborn and met the criteria of
the Declaration of Helsinki. After receiving all relevant study information, the participants signed an
informed consent form for the study, including permission to publish the data.
In the mentioned studies, 27 healthy subjects (14 females, 13 males; age: 24.63
±
2.80 years; weight:
70.44
±
12.56 kg; height: 1.76
±
0.09 m) and 20 subjects approximately 2 weeks after THA (13 females,
7 males; age: 57.79
±
7.41 years; weight: 83.89
±
17.22 kg; height: 1.73
±
0.08 m) performed a 6 min
walking test. The accelerometer and gyroscope raw data during the gait were recorded by means
of seven MTw Awinda IMUs (Xsens Technologies BV, Enschede, The Netherlands) attached to the
segments of the lower extremities according to [13].
The IMU raw data were then processed using a recently developed sensor fusion algorithm based
on an extended Kalman filter approach [
31
,
32
]. Using this algorithm, relative segment orientations
were estimated exploiting the knowledge of a biomechanical model (i.e., segment lengths, joint centers,
virtual anatomical landmarks, and an IMU to segment calibration). From this, it was then possible to
interpret these angles as anatomically meaningful joint angles and further estimate gait-specific events
(i.e., initial contact and terminal contact), and calculate based on those spatiotemporal parameters.
The event detection and the estimation of the joint angles of the lower body using the mentioned
algorithm were validated in recent publications [30,33,34].
Consequently, the following parameters were calculated based on the IMU data: the hip, knee,
and ankle joint angle waveforms as well as the global pelvic motion in the sagittal, frontal, and
transversal plane.
The joint angle waveforms of all subjects were divided into gait cycles (GCs) using the initial
contact information. The initial contact was detected using a kinematics-based approach according
to [
34
]. The gait cycles were then checked for outliers using the mean gait cycle duration (
i.e., the stride
time) of a subject
±
2 times standard deviation. Gait cycles displaying a duration higher or lower
than these thresholds were excluded from the following evaluation. The remaining gait cycles of all
subjects were normalized to 100-time steps using cubic spline interpolation. Twenty gait cycles were
extracted for every subject. The original sidewise (left, right) consideration was transformed into a
dierentiation between aected and unaected sides.
Three dierent forms of input representations were calculated and used as input vectors:
entire waveforms (V_waves), discrete features based on simple descriptive statistics (V_simple),
and automatically extracted features (V_tsfresh) (see Table 1).
To analyze the influence of dierent data preprocessing, training and test data were standardized
based on the respective training set, with three dierent scaling approaches:
without data scaling,
removal of the mean and scaling to unit variance (StandardScaler),
scaling to a feature range between 0 and 1 (MinMaxScaler).
Sensors 2020,20, 4385 4 of 14
The processing of the IMU raw data and the calculation of the joint angle waveforms were
conducted in C++. The segmentation of the joint angle waveforms as well as the outlier detection were
conducted in Matlab 2019b (Mathworks, Inc., Natick, MA, USA). Further calculations were performed
in Python (Python Software Foundation, Wilmington, DE, USA).
Table 1.
Input vectors used for modeling. Twenty gait cycles (GCs) per subject were extracted. To reduce
the number of irrelevant features of V_simple and V_tsfresh, the fresh-algorithm (FeatuRe Extraction
based on Scalable Hypothesis tests) [18] was applied as a filter. ROM =range of motion
Abbreviation Description Size
(GC ×Feature)
V_waves Concatenated time-normalized GC for the measured variables. 940 ×2100
V_simple
Calculated features based on simple descriptive statistics which are
commonly mentioned in the literature [35,36]. Maxima, minima, and
ROM for every variable as well as the dierence between aected and
unaected sides for the respective variables were calculated.
940 ×74
V_tsfresh Automated feature extraction with the tsfresh algorithm [18]. 940 ×8349
2.2. Model Training and Classification
The following procedure was applied for all input vectors: fivefold cross-validation was performed
with each test set consisting of the gait cycles respective to the extracted features of four subjects of each
class (8 subjects in total with 8
×
20 GCs). Every subject of the THA group was used once for testing.
Due to imbalanced class distribution in the training set, synthetic minority oversampling [
37
] was for
each fold performed on the respective training set. Random Forest (RF), linear Support Vector Machine
(SVM), SVM with radial basis function kernel (rbf), and a neural network (multilayer perceptron, MLP)
were applied for classification with the standard parameters of Scikit-learn [38].
2.3. Model Interpretation
An a priori specification of the best performing classification algorithm for a certain task is in
most cases not possible. For practical application, interpretation tools should, therefore, be generally
applicable (model agnostic) and not dependent on a specific algorithm (model specific). Further,
subjects show individual gait characteristics [
39
,
40
]. In the context of personalized medical treatment,
it becomes, therefore, especially important to understand why a model made a specific decision for a
single instance/GC or for a single subject, respectively. Therefore, local interpretability gains importance.
For that reason, the model-agnostic interpretation tool LIME [
26
] was used for model interpretation.
To explain a black box model, LIME performs an approximation of a single prediction of a black box
model with a simpler interpretable model (e.g., decision tree, linear model). The simpler model will
probably not perform a globally faithful approximation of the complex model but will perform well
locally. Therefore, LIME is based on local interpretations of single instances of interest. To explain
a single prediction of a black box model, the instance of interest is chosen and data points around
it are generated through perturbation. These data points are predicted with the black box model
and weighted by their proximity to the selected instance. Finally, a simpler model is learned on the
weighted data points and used for explaining the prediction [26].
For the special case of serial data (gait waveforms), LIME for Time was developed [
41
]. It is based
on the idea of using LIME in the context of image data. For image data, Superpixels (connected pixels
of one color) are used for data variation because variations of individual pixels would hardly change a
model’s prediction. As an equivalent to Superpixels, LIME For Time performs data variation of parts
of a series by replacing them, for example, by noise or the entire mean of the series. The approach
makes it possible to identify areas of serial data that are important for the classifier in its prediction.
The algorithm was applied for interpretation of the concatenated waveforms (V waves). Total mean
was used for perturbation.
Sensors 2020,20, 4385 5 of 14
For each input vector, the best performing model was used for interpretation with LIME. Local
interpretations are presented as exemplars. To determine if eects due to group dierences were
important for class selections across multiple instances, global interpretations are presented (indication
of generality). They were calculated by mean aggregation of the absolute weight values of the local
results (5 GCs of each subject to reduce computing power) of the LIME algorithm (see, e.g., [
42
]).
Statistical Parameter Mapping (SPM) [
43
] was used to verify the results from a statistical perspective.
3. Results
3.1. Classification Results
Classification results are presented in Figure 1. The overall best accuracy was obtained for the use of
the V_tsfresh input vector, SVM linear (MinMaxScaler, StandardScaler), and MLP (MinMaxScaler) (mean
accuracy M
acc
=100%). The best result for the waveform data V_waves was obtained with SVM linear
without normalization (M
acc
=95.88%). SVM linear with MinMaxScaler performed best for the gait-specific
data (M
acc
=97.38%). The best performing models are used in the following subsections for interpretation.
Sensors 2020, 20, x FOR PEER REVIEW 5 of 16
For each input vector, the best performing model was used for interpretation with LIME. Local
interpretations are presented as exemplars. To determine if effects due to group differences were
important for class selections across multiple instances, global interpretations are presented
(indication of generality). They were calculated by mean aggregation of the absolute weight values
of the local results (5 GCs of each subject to reduce computing power) of the LIME algorithm (see,
e.g., [42]). Statistical Parameter Mapping (SPM) [43] was used to verify the results from a statistical
perspective.
3. Results
3.1. Classification Results
Classification results are presented in Figure 1. The overall best accuracy was obtained for the
use of the V_tsfresh input vector, SVM linear (MinMaxScaler, StandardScaler), and MLP
(MinMaxScaler) (mean accuracy M
acc
= 100%). The best result for the waveform data V_waves was
obtained with SVM linear without normalization (M
acc
= 95.88%). SVM linear with MinMaxScaler
performed best for the gait-specific data (M
acc
= 97.38%). The best performing models are used in the
following subsections for interpretation.
Figure 1. Classification accuracy (mean, min, max value) for the different input vectors, classification
algorithms, and normalization approaches over the 5-fold cross-validation. RF = Random Forest; MLP
= multilayer perceptron; SVM lin = Support Vector Machine with linear kernel; SVM rbf = Support
Vector Machine with radial basis function kernel.
3.2. Model Interpretation Based on Waveforms
The global results for the comparison of patients and healthy subjects based on the concatenated
waveforms (V waves) are presented in Figure 2. SPM indicates statistical group differences for most
of the movements (p < 0.05). LIME indicates a high global effect for slices representing the ankle
rotation as well as knee and hip movement in the sagittal plane. The ankle rotation of the affected
side, especially during initial contact and midstance, plays an important role. For the knee joint of the
affected as well as unaffected side, the important slices represent the maximal flexion and the
maximal extension. Maximal flexion and maximal extension are further relevant for the hip
movement in the sagittal plane of the affected side. Group differences for the respective slices are all
significant.
Model explanations for single instances (gait cycles) regarding the waveform data are presented
in Figure 3. It is noticeable that for both correctly classified GCs, few slices indicate an effect towards
Figure 1.
Classification accuracy (mean, min, max value) for the dierent input vectors, classification
algorithms, and normalization approaches over the 5-fold cross-validation. RF =Random Forest;
MLP =multilayer
perceptron; SVM lin =Support Vector Machine with linear kernel;
SVM rbf =Support
Vector Machine with radial basis function kernel.
3.2. Model Interpretation Based on Waveforms
The global results for the comparison of patients and healthy subjects based on the concatenated
waveforms (V waves) are presented in Figure 2. SPM indicates statistical group dierences for most
of the movements (p<0.05). LIME indicates a high global eect for slices representing the ankle
rotation as well as knee and hip movement in the sagittal plane. The ankle rotation of the aected
side, especially during initial contact and midstance, plays an important role. For the knee joint of the
aected as well as unaected side, the important slices represent the maximal flexion and the maximal
extension. Maximal flexion and maximal extension are further relevant for the hip movement in the
sagittal plane of the aected side. Group dierences for the respective slices are all significant.
Model explanations for single instances (gait cycles) regarding the waveform data are presented
in Figure 3. It is noticeable that for both correctly classified GCs, few slices indicate an eect towards
the other class. Additionally, one misclassified GC of a patient is shown. Most of the slices with the
highest eect indicate an eect towards the class of healthy subjects (e.g., ankle rotation, knee flexion).
Similar movement patterns compared to healthy subjects are noticeable for the respective slices.
Sensors 2020,20, 4385 6 of 14
Sensors 2020, 20, x FOR PEER REVIEW 7 of 16
Figure 2. Global results for the use of the waveform data (V_waves). For visualization reasons, data are separately scaled to a range from 0 to 1 for each variable. (a)
Mean movements for healthy subjects and patients after total hip arthroplasty (THA). The grey areas indicate statistical difference (alpha = 0.05) according to
Statistical Parameter Mapping. (b) Aggregated Local Interpretable Model-Agnostic Explanations (LIME) results as mean absolute effect. Abbreviations: a. = affected
side; u. = unaffected side.
Figure 2.
Global results for the use of the waveform data (V_waves). For visualization reasons, data are separately scaled to a range from 0 to 1 for each variable.
(
a
) Mean movements for healthy subjects and patients after total hip arthroplasty (THA). The grey areas indicate statistical dierence (alpha =0.05) according to
Statistical Parameter Mapping. (
b
) Aggregated Local Interpretable Model-Agnostic Explanations (LIME) results as mean absolute eect. Abbreviations: a. =aected
side; u. =unaected side.
Sensors 2020,20, 4385 7 of 14
Sensors 2020, 20, x FOR PEER REVIEW 8 of 16
Figure 3. Exemplary local model interpretation for single gait cycle (GC) (instance) of a patient with THA, a healthy subject, and a GC of a patient classified as
healthy (blue lines). The instances are plotted against the mean value and standard deviation of the other class (black = healthy, red = patient). For better visualization,
the top 20 slices with the highest absolute effect are displayed for each instance (grey vertical span = effect towards class of healthy subjects, red vertical span = effect
towards class of patients with THA). The color saturation indicates the effect size. For visualization reasons, data are separately scaled to a range from 0 to 1 for
each variable. Abbreviations: a. = affected side; u. = unaffected side.
Figure 3.
Exemplary local model interpretation for single gait cycle (GC) (instance) of a patient with THA, a healthy subject, and a GC of a patient classified as healthy
(blue lines). The instances are plotted against the mean value and standard deviation of the other class (black =healthy, red =patient). For better visualization, the top
20 slices with the highest absolute eect are displayed for each instance (grey vertical span =eect towards class of healthy subjects, red vertical span =eect towards
class of patients with THA). The color saturation indicates the eect size. For visualization reasons, data are separately scaled to a range from 0 to 1 for each variable.
Abbreviations: a. =aected side; u. =unaected side.
Sensors 2020,20, 4385 8 of 14
3.3. Model Interpretation: Discrete Features
LIME results for the use of the V simple input vector are presented in Figure 4. The analyzed
instances correspond to the interpreted waveform instances displayed in Figure 3. Even though
dierent input features were used, the model performed the same misclassification. The features with
the highest eect globally seen are based on hip, knee, and pelvic sagittal motion and ankle rotation in
the transversal plane.
Sensors 2020, 20, x FOR PEER REVIEW 9 of 16
3.3. Model Interpretation: Discrete Features
LIME results for the use of the V simple input vector are presented in Figure 4. The analyzed
instances correspond to the interpreted waveform instances displayed in Figure 3. Even though
different input features were used, the model performed the same misclassification. The features with
the highest effect globally seen are based on hip, knee, and pelvic sagittal motion and ankle rotation
in the transversal plane.
Figure 4. (a) Global LIME results as mean absolute effects. The analyzed instances (bd) correspond
to the instances displayed in Figure 3. Negative values indicate an effect towards the class of healthy
subjects (black), positive effects towards the class of patients after THA (red). Feature abbreviations:
a. = affected side; u. = unaffected side; d. = difference affected, unaffected side.
The mean absolute effects for the use of the V_tsfresh input vector are shown in Figure 5.
Features based on knee and ankle sagittal movement as well as hip, knee, and pelvic transversal
movement show the highest effect. For calculation of the features with the highest effect, the
operation “large_standard_deviation” was used. The result of the operation is a Boolean variable,
denoting if the standard deviation of the series (in this case, knee flexion) is higher than “r” times the
range of motion (ROM).
Figure 4.
(
a
) Global LIME results as mean absolute eects. The analyzed instances (
b
d
) correspond to
the instances displayed in Figure 3. Negative values indicate an eect towards the class of healthy
subjects (black), positive eects towards the class of patients after THA (red). Feature abbreviations:
a. =aected side; u. =unaected side; d. =dierence aected, unaected side.
The mean absolute eects for the use of the V_tsfresh input vector are shown in Figure 5.
Features based on knee and ankle sagittal movement as well as hip, knee, and pelvic transversal
movement show the highest eect. For calculation of the features with the highest eect, the operation
“large_standard_deviation” was used. The result of the operation is a Boolean variable, denoting if
the standard deviation of the series (in this case, knee flexion) is higher than “r” times the range of
motion (ROM).
Sensors 2020,20, 4385 9 of 14
Figure 5.
Mean absolute eects for the V_tsfresh input vector determined using LIME. Results are based
on SVM linear with MinMaxScaler. Feature labels are according to the automated feature extraction
algorithm tsfresh [18].
4. Discussion
Very good results regarding accuracy were obtained for the classification of patients with THA
and healthy subjects. Three dierent input vectors were used. The best classification results were
obtained when using automatically extracted features (V tslearn). However, with the use of simple
descriptive statistics (V simple) or the waveform data (V waves), very good classification performances
with only a slight reduction in accuracy were obtained. It can be assumed, in line with the previous
study [
13
], that even simple descriptive statistics have high discriminative power, appropriately map
the classes, and, therefore, slightly outperform the pure use of the waveforms.
In line with other works [
8
], we can demonstrate the superior performance of linear SVM in the
context of gait classification. In most cases, linear SVM showed the best results and outperformed
the more complex and computationally expensive models (e.g., MLP). The main focus of the current
work was on interpretability; therefore, no extensive parameter tuning of the models was performed.
It might be possible to obtain the same or even better results using the more complex models. However,
the cost would be an extensive and time-consuming parameter tuning.
With the use of the model agnostic method LIME, it was possible to gain insight into how the
models made their decisions and to increase interpretability. Through the selection of a maximum
number of features to be displayed by LIME, it is possible to provide dierent interpretation levels
for dierent contexts. Using complex features (V tsfresh) leads to slightly higher performance with
the cost of interpretability and comparability with previous research. The reason for this is that the
operations used for feature calculation often make it dicult to attribute class dierences to the direct
movements because they describe the original movements on a more abstract level compared with
simpler and, in a biomechanical context, more commonly used descriptive statistics. Therefore, their
usage in clinical contexts is questionable. Thus, the following interpretation mainly focuses on the
input representations V waves and V simple.
To evaluate the validity/plausibility of the eects and the identified group dierences, three aspects
are addressed as follows: (i) asymmetric gait patterns are often prevalent after THA [
15
,
44
]. Not only
are the operated joint and surrounding structures aected but also the contralateral side [
36
]. However,
the main dierence of the gait characteristics between healthy subjects and patients after THA should
be seen for the aected side. LIME results should therefore generally emphasize an eect for features
regarding the aected side. For both input vectors, most of the time, this holds true. Yet, some features
with the highest eect map gait dierences for the unaected side (e.g., V_simple: unaected hip
abduction) or for both aected and unaected sides (e.g., waveform data: unaected/aected knee
flexion/extension). A possible reason for this might be that the trained model compares the aected
Sensors 2020,20, 4385 10 of 14
and unaected sides. Regarding the sample of subjects, age-related dierences in gait, which possibly
led to a group separation with regard to the unaected side (see, e.g., classification young, old [
45
]),
should not go unnoticed and possibly have further influence.
(ii) Another aspect which speaks for the validity of the results is that most results are congruent
between V_tsfresh and V_wave. For both input vectors, ankle rotation as well as sagittal knee and hip
movement plays an important role. Nevertheless, pelvic flexion (ROM) only shows a high eect for
the use of V_tsfresh. Therefore, application of dierent input representations can provide new insights
and show eects, which were possibly not detected with only the use of a single input vector.
Using only the waveform data, interactions between dierent slices are opaque. For example,
maximal flexion and maximal extension of hip movement in the sagittal plane is highly relevant
regarding the waveform data. Looking at V_tsfresh, the ROM of the hip movement in the sagittal
plane shows a high eect. This might indicate that the slices mapping maximal flexion and maximal
extension interact. Comparing the results for dierent input vectors might, therefore, be useful for a
better understanding of possible interactions.
(iii) Finally, LIME results are discussed taking into account previous research. Overall, a ground
truth of the automatically determined explanations is missing, and it therefore becomes more dicult
to evaluate if the results are meaningful and appropriately map gait characteristics. In the literature,
gait patterns of patients with THA are mostly described using simple descriptive statistics comparable
with the V_simple input vector [
36
,
46
,
47
]. Consequently, it is harder to directly compare the eects
respective to the determined relevant regions regarding the waveform data (V wave) with previous
works. Further, it is not possible to evaluate all our findings with previous research because the
literature often focuses on a few gait characteristics to describe group dierences. In agreement with
our findings, previous research reports a reduced ROM for knee and hip movement of the operated side
in the sagittal plane compared with healthy subjects [
36
,
44
,
46
48
]. Further, altered postoperative ankle
rotation [
46
] and increased sagittal pelvic movement compared with healthy subjects was found [
46
,
47
].
Regarding the current state of research, there are no objective criteria to evaluate interpretability [
42
].
Subjective ratings from end-users or task performance might be possible ways for evaluation [
22
].
Regarding interpretability and the goal of making machine learning applicable in practical clinical
contexts, it is important to make a distinction between the interpretability of the model itself and clinical
interpretability. Interpretability of a model describes the degree to which humans can understand a
decision from a model by its causes and, hence, why a model made a certain decision [
49
]. The usage
of XAI tools in this study made it possible to understand why a model made a certain decision
through revealing the eects for features or areas of waveforms which were important for a certain
prediction. However, with this alone, it is unclear if the identified eects are interpretable, relevant,
and usable in clinical contexts. In this respect, the current work emphasizes the importance of the
input representation because it highly influences the interpretability and usability in clinical contexts.
In the current case, automatically extracted features show no significantly better classification accuracy,
which could possibly justify a loss of interpretability. Consequently, waveform data and simple
descriptive statistics should be used for practical applications because eects can be traced back to
variables that are mentioned in the literature and already used in clinical contexts for describing
biomechanical dierences.
In cases where classification is harder, a combination of expert-knowledge-based features extended
by the best performing automated extracted features might be an appropriate compromise to increase
classification performance and ensure best possible clinical interpretability. In settings where only
classification performance without the need for biomechanical interpretability is important, automated
feature extraction approaches can be suggested.
Simple metrics are often used for describing pathologic movements (e.g., symmetry index [
48
])
in clinical contexts because the full consideration of all movements is too complex for human beings.
Further, many clinical decisions are influenced by expert knowledge and the experiences of the
physicians and therapists. As the current study demonstrates, the benefits of data-driven approaches
Sensors 2020,20, 4385 11 of 14
based on machine learning and XAI are their ability to take into account the full complexity of the
motion data, and they therefore may provide objective orientations and assistance for physicians in
their decisions. Further, through focusing on local model interpretability, they are able to take into
account the individual gait characteristics that were important for classification and class discrimination
and, therefore, play an important role in the context of personalized medicine.
Local interpretability additionally helps one to understand why a model wrongly classified single
instances. As previously presented as an example, a GC of a patient was wrongly classified as a
GC of a healthy person. A possible explanation is that the subject showed fewer pathologic gait
patterns compared with the remaining patients and was therefore classified as a healthy person. In this
regard, the algorithm could provide information about the rehabilitation status of a patient and an
objective orientation for physicians and therapists. However, it cannot be excluded that the model was
not able to handle and correctly classify the instance, because the regarded instance was an outlier.
In this regard, the usage of automatic systems is dependent on experts, because in such cases, expert
knowledge and experience is crucial for the right decision-making.
The current work emphasizes the importance of data preprocessing for a model’s accuracy and
interpretability and shows that it is not possible to give general recommendations. In the present case,
the best results were obtained without scaling for the use of gait waveform data. This might also be
promising for interpretation because features can be directly interpreted without the need of scaling
the data back to its original representation.
Another important aspect is data aggregation. In the present case, in line with various other
studies [
50
,
51
], single gait cycles were used for input feature calculation. However, in the literature,
averaged waveforms were also reported for gait classification [
20
,
52
], which might be useful for
elimination of the intraindividual variance between dierent gait cycles. Depending on the requirements
in possible fields of application, this might be an alternative approach. Various other methods should
not go unnoticed for dealing with imbalanced data (see, e.g., [
53
]) and should be further considered
and compared in future works.
Previous research showed that IMU-based systems show higher errors for movements in the
transversal plane [
54
]. Therefore, it is uncertain if the findings for transversal movements are meaningful
and interpretable. The exclusion of transverse movements due to great inaccuracies when using
IMU-based systems might, therefore, be considered in future works.
As noted previously, there are limitations regarding the sample of subjects. In particular, the large
age and weight dierence between the groups of healthy subjects and patients after THA is striking.
At this point, it cannot be excluded that the corresponding eects influenced the classification. In
consecutive studies, the analysis should be repeated with a matched group of healthy subjects. Further,
it worth mentioning the possible strong correlations between the features which are a priori present
and lead to redundancies in the dierent input vectors. To reduce the impact of redundant features,
application of the minimum redundancy maximum relevance (mRMR) filter [
15
,
55
] might be a
promising approach.
The present work used the LIME algorithm and checked if its application would lead to coherent
results from dierent input representations. The utility and generalizability of the mentioned
methodology should be evaluated through application on data of dierent subjects and groups of
patients. Moreover, future works should compare dierent explanation methods (e.g., LIME, SHAP,
DeepLIFT) and check whether results are consistent.
5. Conclusions
XAI is promising for making the decisions of machine learning models more transparent,
interpretable, and, therefore, more trustworthy. It is, therefore, a promising step towards the practical
applicability of machine learning in clinical contexts. However, the research is still very young in
this domain. Before practical clinical application, further research is necessary. The current study
shows that the type of input representation crucially determines interpretability as well as clinical
Sensors 2020,20, 4385 12 of 14
relevancy. A combined approach using dierent forms of representations seems advantageous,
because it can provide a better understanding of the underlying group dierences and discriminative
eects that were important for classification. Based on the current findings, waveform data as well
as features based on simple descriptive statistics can be suggested. In the context of personalized
medicine, XAI approaches focusing on local interpretability enable the identification of individual
gait characteristics, which are important for classification and class discrimination. Thus, the results
might assist physicians and therapists in finding and addressing individual pathologic gait patterns by
oering an objective orientation.
Author Contributions:
Conceptualization, C.D., W.T., B.T., G.B., and M.F.; methodology, C.D. and B.T.; software,
C.D. and W.T.; validation, C.D.; formal analysis, C.D.; investigation, C.D. and W.T.; resources, M.F.; data curation,
W.T.; writing—original draft preparation, C.D. and W.T.; writing—review and editing, C.D., W.T., B.T., G.B.,
and M.F.; visualization, C.D.; supervision, M.F.; project administration, M.F.; funding acquisition, M.F. and G.B.
All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Oene Digitalisierungsallianz Pfalz, BMBF, grant number 03IHS075B.
Acknowledgments:
We would like to thank the staof the Klinik Lindenplatz, Bad Sassendorf, Germany for
their support concerning the subject acquisition and data acquisition.
Conflicts of Interest:
The authors declare no conflict of interest. The authors alone are responsible for the content
and writing of this paper. The funders had no role in the design of the study; in the collection, analyses, or
interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
1.
Eskofier, B.M.; Kraus, M.; Worobets, J.T.; Stefanyshyn, D.J.; Nigg, B.M. Pattern classification of kinematic
and kinetic running data to distinguish gender, shod/barefoot and injury groups with feature ranking.
Comput. Methods Biomech. Biomed. Eng. 2012,15, 467–474. [CrossRef] [PubMed]
2.
Ferber, R.; McClay Davis, I.; Williams, D.S., III. Gender dierences in lower extremity mechanics during
running. Clin. Biomech. 2003,18, 350–357. [CrossRef]
3.
Phinyomark, A.; Petri, G.; Ib
á
ñez-Marcelo, E.; Osis, S.T.; Ferber, R. Analysis of Big Data in Gait Biomechanics:
Current Trends and Future Directions. J. Med. Biol. Eng. 2018,38, 244–260. [CrossRef]
4.
Halilaj, E.; Rajagopal, A.; Fiterau, M.; Hicks, J.L.; Hastie, T.J.; Delp, S.L. Machine learning in human movement
biomechanics: Best practices, common pitfalls, and new opportunities. J. Biomech.
2018
,81, 1–11. [CrossRef]
[PubMed]
5.
Kobsar, D.; Charlton, J.M.; Tse, C.T.F.; Esculier, J.-F.; Graos, A.; Krowchuk, N.M.; Thatcher, D.; Hunt, M.A.
Validity and reliability of wearable inertial sensors in healthy adult walking: A systematic review and
meta-analysis. J. Neuroeng. Rehabil. 2020,17, 62. [CrossRef]
6.
Bzdok, D.; Altman, N.; Krzywinski, M. Statistics versus machine learning. Nat. Methods
2018
,15, 233–234.
[CrossRef]
7.
Laroche, D.; Tolambiya, A.; Morisset, C.; Maillefert, J.F.; French, R.M.; Ornetti, P.; Thomas, E. A classification
study of kinematic gait trajectories in hip osteoarthritis. Comput. Biol. Med. 2014,55, 42–48. [CrossRef]
8.
Lau, H.-Y.; Tong, K.-Y.; Zhu, H. Support vector machine for classification of walking conditions of persons
after stroke with dropped foot. Hum. Mov. Sci. 2009,28, 504–514. [CrossRef]
9.
Wahid, F.; Begg, R.K.; Hass, C.J.; Halgamuge, S.; Ackland, D.C. Classification of Parkinson’s Disease Gait
Using Spatial-Temporal Gait Features. IEEE J. Biomed. Health Inform. 2015,19, 1794–1802. [CrossRef]
10.
Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI).
IEEE Access 2018,6, 52138–52160. [CrossRef]
11.
European Union. Regulation (EU) 2016/679 of the european parliament and of the council of 27 april 2016 on
the protection of natural persons with regard to the processing of personal data and on the free movement of
such data, and repealing directive 95/46/ec (General Data Protection Regulation). O. J. Eur. Union
2016
,
L119, 1–88.
12.
Slijepcevic, D.; Zeppelzauer, M.; Schwab, C.; Raberger, A.-M.; Breiteneder, C.; Horsak, B. Input representations
and classification strategies for automated human gait analysis. Gait Posture
2020
,76, 198–203. [CrossRef]
[PubMed]
Sensors 2020,20, 4385 13 of 14
13.
Teufl, W.; Taetz, B.; Miezal, M.; Lorenz, M.; Pietschmann, J.; Jöllenbeck, T.; Fröhlich, M.; Bleser, G. Towards
an Inertial Sensor-Based Wearable Feedback System for Patients after Total Hip Arthroplasty: Validity and
Applicability for Gait Classification with Gait Kinematics-Based Features. Sensors
2019
,19, 5006. [CrossRef]
14.
Begg, R.; Kamruzzaman, J. A machine learning approach for automated recognition of movement patterns
using basic, kinetic and kinematic gait data. J. Biomech. 2005,38, 401–408. [CrossRef] [PubMed]
15.
Dindorf, C.; Teufl, W.; Taetz, B.; Becker, S.; Bleser, G.; Fröhlich, M. Feature extraction and gait classification in
hip replacement patients on basis of kinematic waveform data. (under review).
16.
Horst, F.; Slijepcevic, D.; Lapuschkin, S.; Raberger, A.-M.; Zeppelzauer, M.; Samek, W.; Breiteneder, C.;
Schöllhorn, W.I.; Horsak, B. On the Understanding and Interpretation of Machine Learning Predictions in
Clinical Gait Analysis Using Explainable Artificial Intelligence. Available online: http://arxiv.org/pdf/1912a.
07737v1 (accessed on 10 March 2020).
17.
Dindorf, C.; Konradi, J.; Wolf, C.; Taetz, B.; Bleser, G.; Huthwelker, J.; Drees, P.; Fröhlich, M.; Betz, U.
General method for automated feature extraction and selection and its application for gender classification
and biomechanical knowledge discovery of sex dierences in spinal posture during stance and gait.
(under review).
18.
Christ, M.; Kempa-Liehr, A.W.; Feindt, M. Distributed and Parallel Time Series Feature Extraction for Industrial
Big Data Applications. Available online: http://arxiv.org/pdf/1610.07717v3 (accessed on 10 January 2020).
19.
Feature Labs, I. Featuretools: Automated Feature Engineering. Available online: https://www.featuretools.
com/(accessed on 30 May 2020).
20.
Eskofier, B.M.; Federolf, P.; Kugler, P.F.; Nigg, B.M. Marker-based classification of young-elderly gait pattern
dierences via direct PCA feature extraction and SVMs. Comput. Methods Biomech. Biomed. Eng.
2013
,16,
435–442. [CrossRef]
21.
Liu, H.; Yu, L. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans.
Knowl. Data Eng. 2005,17, 491–502. [CrossRef]
22.
Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.-Z. XAI—Explainable artificial intelligence.
Sci. Robot. 2019,4, eaay7120. [CrossRef]
23.
Bologna, G.; Hayashi, Y. Characterization of Symbolic RulesEmbedded in Deep DIMLP Networks: AChallenge
to Transparency of Deep Learning. J. Artif. Intell. Soft Comput. Res. 2017,7, 265–286. [CrossRef]
24.
Samek, W.; Müller, K.-R. Towards explainable artificial intelligence. In Explainable AI: Interpreting, Explaining
and Visualizing Deep Learning, 1st ed.; Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Muller, K.-R., Eds.;
Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 5–22.
25.
Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do We Need to Build Explainable AI Systems for
the Medical Domain? Available online: http://arxiv.org/pdf/1712.09923v1 (accessed on 20 February 2020).
26.
Ribeiro, M.T.; Singh, S.; Guestrin, C. "Why Should I Trust You?": Explaining the Predictions of Any Classifier.
In Proceedings of the KDD
'
16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144.
27.
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st
Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017.
28.
Shrikumar, A.; Greenside, P.; Kundaje, A. Learning important features through propagating activation
dierences. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia,
6–11 August 2017; pp. 3145–3153.
29.
OECD. Health at a Glance: Europe 2016. State of Health in the EU Cycle; OECD Publishing: Paris, France, 2016;
ISBN 978-92-64-26559-2.
30.
Teufl, W.; Miezal, M.; Taetz, B.; Fröhlich, M.; Bleser, G. Validity, Test-Retest Reliability and Long-Term Stability
of Magnetometer Free Inertial Sensor Based 3D Joint Kinematics. Sensors 2018,18, 1980. [CrossRef]
31.
Miezal, M.; Taetz, B.; Bleser, G. On Inertial Body Tracking in the Presence of Model Calibration Errors. Sensors
2016,16, 1132. [CrossRef]
32.
Miezal, M.; Taetz, B.; Bleser, G. Real-time inertial lower body kinematics and ground contact estimation at
anatomical foot points for agile human locomotion. In Proceedings of the IEEE International Conference on
Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3256–3263.
33.
Teufl, W.; Miezal, M.; Taetz, B.; Fröhlich, M.; Bleser, G. Validity of inertial sensor based 3D joint kinematics of
static and dynamic sport and physiotherapy specific movements. PLoS ONE
2019
,14, e0213064. [CrossRef]
[PubMed]
Sensors 2020,20, 4385 14 of 14
34.
Teufl, W.; Lorenz, M.; Miezal, M.; Taetz, B.; Fröhlich, M.; Bleser, G. Towards Inertial Sensor Based Mobile
Gait Analysis: Event-Detection and Spatio-Temporal Parameters. Sensors
2018
,19, 38. [CrossRef] [PubMed]
35.
Ewen, A.M.; Stewart, S.; St Clair Gibson, A.; Kashyap, S.N.; Caplan, N. Post-operative gait analysis in
total hip replacement patients-a review of current literature and meta-analysis. Gait Posture
2012
,36, 1–6.
[CrossRef] [PubMed]
36.
Beaulieu, M.L.; Lamontagne, M.; Beaul
é
, P.E. Lower limb biomechanics during gait do not return to normal
following total hip arthroplasty. Gait Posture 2010,32, 269–273. [CrossRef]
37.
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling
Technique. JAIR 2002,16, 321–357. [CrossRef]
38.
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.;
Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
2011
,12,
2825–2830.
39.
Schöllhorn, W.I.; Nigg, B.M.; Stefanyshyn, D.J.; Liu, W. Identification of individual walking patterns using
time discrete and time continuous data sets. Gait Posture 2002,15, 180–186. [CrossRef]
40.
Horst, F.; Kramer, F.; Schäfer, B.; Eekhoff, A.; Hegen, P.; Nigg, B.M.; Schöllhorn, W.I. Daily changes of individual
gait patterns identified by means of support vector machines. Gait Posture 2016,49, 309–314. [CrossRef]
41.
Hering, J.; Metzenthin, E.; Zenner, A. LIME For Time. Available online: https://github.com/emanuel-
metzenthin/Lime-For-Time (accessed on 20 February 2020).
42.
Molnar, C. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. 2018. Available
online: https://christophm.github.io/interpretable-ml-book/index.html (accessed on 20 February 2020).
43.
Ataky, T.C. One-dimensional statistical parametric mapping inPython. Comput. Methods Biomech. Biomed. Eng.
2012,15, 295–301. [CrossRef]
44.
Horstmann, T.; Listringhaus, R.; Haase, G.-B.; Grau, S.; Mündermann, A. Changes in gait patterns and muscle
activity following total hip arthroplasty: A six-month follow-up. Clin. Biomech.
2013
,28, 762–769. [CrossRef]
45.
Begg, R.K.; Palaniswami, M.; Owen, B. Support vector machines for automated gait classification. IEEE Trans.
Biomed. Eng. 2005,52, 828–838. [CrossRef] [PubMed]
46.
Chopra, S.; Kaufman, K.R. Eects of total hip arthroplasty on gait. In Handbook of Human Motion; Müller, B.,
Wolf, S., Eds.; Springer: Cham, Germany, 2018; pp. 1–15, ISBN 978-3-319-30808-1.
47.
Perron, M.; Malouin, F.; Moet, H.; McFadyen, B.J. Three-dimensional gait analysis in women with a total
hiparthroplasty. Clin. Biomech. 2000,15, 504–515. [CrossRef]
48.
Madsen, M.S.; Ritter, M.A.; Morris, H.H.; Meding, J.B.; Berend, M.E.; Faris, P.M.; Vardaxis, V.G. The eect of
total hip arthroplasty surgical approach on gait. J. Orthop. Res. 2004,22, 44–50. [CrossRef]
49.
Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell.
2019
,267, 1–38.
[CrossRef]
50.
Levinger, P.; Lai, D.T.H.; Begg, R.K.; Webster, K.E.; Feller, J.A. The application of support vector machines for
detecting recovery from knee replacement surgery using spatio-temporal gait parameters. Gait Posture
2009
,
29, 91–96. [CrossRef]
51.
Nüesch, C.; Valderrabano, V.; Huber, C.; von Tscharner, V.; Pagenstert, G. Gait patterns of asymmetric ankle
osteoarthritis patients. Clin. Biomech. 2012,27, 613–618. [CrossRef] [PubMed]
52.
Soares, D.P.; de Castro, M.P.; Mendes, E.A.; Machado, L. Principal component analysis in ground reaction
forces and center of pressure gait waveforms of people with transfemoral amputation. Prosthet. Orthot. Int.
2016,40, 729–738. [CrossRef]
53.
Barlow, H.; Mao, S.; Khushi, M. Predicting High-Risk Prostate Cancer Using Machine Learning Methods.
Data 2019,4, 129. [CrossRef]
54.
Poitras, I.; Dupuis, F.; Bielmann, M.; Campeau-Lecours, A.; Mercier, C.; Bouyer, L.J.; Roy, J.-S. Validity and
Reliability of Wearable Sensors for Joint Angle Estimation: A Systematic Review. Sensors
2019
,19, 1555.
[CrossRef]
55.
Peng, H.; Long, F.; Ding, C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency,
Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell.
2005
,27, 1226–1238. [CrossRef]
©
2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
... Artificial intelligence 1 (AI) stands at the forefront of a healthcare revolution, essentially transforming the way medical diseases are detected and diagnosed in a timely manner due to its transparency and explainability. Using the computational expertise of AI formulas, medical care experts examine substantial datasets with extraordinary rate and precision (Dindorf et al., 2020). The innovation's capacity to look at large databases of clinical info, consisting of individual documents, imaging scans, and hereditary information, allows very early and accurate recognition of illness. ...
... While AI's effect on diagnostics and therapy is extensive, its impact prolongs much past these worlds, penetrating numerous aspects of medical care systems. From enhancing functional jobs and enhancing source monitoring to boosting personal involvement via individualised health and wellness details, AI is improving the whole healthcare environment (Dindorf et al., 2020). Natural Language Processing 2 (NLP) modern technologies, as an example, help with the removal of useful understandings from disorganised healthcare information, using a much deeper understanding of personal problems (Taylor & Taylor, 2021). ...
... Moreover, ethical considerations, such as the need to avoid bias and ensure fair access to AI-driven healthcare, are closely linked to the transparency and explainability of these systems. In this context, future research must focus on developing AI models and frameworks that prioritise transparency and explainability, thereby ensuring that AI technologies in healthcare are not only innovative but also ethical, trustworthy, and aligned with the goals of sustainable healthcare (Dindorf et al., 2020). ...
Chapter
In the period of fast technical innovation, artificial intelligence (AI) has become a transformative force within the medical care market. This study explores the crucial measurements of AI application in healthcare, with a primary concentrate on cultivating openness, interpretability, and cooperation to make specific lasting methods. It establishes the stage by highlighting the crucial function of AI in health-care. It highlights the necessity of incorporating concepts of openness and inter-pretability right into AI systems' materials. This structure is essential for developing trust funds amongst stakeholders and advertising liable AI implementation within the healthcare environment. Furthermore, it illuminates the nuanced meanings of openness within the healthcare context, browsing regulative factors to consider and providing studies that brighten effective executions of clear AI in healthcare decision-making procedures. It better explores the details of interpretability and explainability, highlighting their value in boosting the human understanding of AI-driven health-care choices. Methods and approaches for providing AI choices that are understandable to medical care experts are talked about thoroughly. Human-AI partnership becomes a critical motif in the story, diving right into the collaborating connection between healthcare specialists and AI systems. Techniques for reliable cooperation exist, showcasing exactly how human-in-the-loop techniques boost the total performance and dependability of AI applications in medical care. It checks out the intricacies related to releasing transparent and interpretable AI, describes the instructional
... The result from the datasets utilized in training the ML/DL models used in explainable AI in the healthcare sector in relation to research question three is also provided and the Fig. 9 The computational process of Grad-GAM [99] Neural [106,112,115,118,128,130,132,133,142,143,[156][157][158][159] 27 LR [112,115,117,130,132,142,143,151,153,155,156] 28 RF [112,117,118,130,133,142,148,153] 29 DT [82,102,112,122,132,153,156] Not specify NIL [107,128,131,138,139,147,156] of images, the image caption, the caption number and the caption size. However, datasets are meaningless except for certain features or patterns extracted from them. ...
... The result from the datasets utilized in training the ML/DL models used in explainable AI in the healthcare sector in relation to research question three is also provided and the Fig. 9 The computational process of Grad-GAM [99] Neural [106,112,115,118,128,130,132,133,142,143,[156][157][158][159] 27 LR [112,115,117,130,132,142,143,151,153,155,156] 28 RF [112,117,118,130,133,142,148,153] 29 DT [82,102,112,122,132,153,156] Not specify NIL [107,128,131,138,139,147,156] of images, the image caption, the caption number and the caption size. However, datasets are meaningless except for certain features or patterns extracted from them. ...
Article
Full-text available
The healthcare sector has advanced significantly as a result of the ability of artificial intelligence (AI) to solve cognitive problems that once required human intelligence. As artificial intelligence finds more applications in healthcare, trustworthiness must be guaranteed. Even while AI has the potential to improve healthcare, there are still challenging issues because it is yet to be widely adopted, especially when it comes to transparency. Concerns about comprehending the internal workings of AI models, possible biases, model robustness, and generalizability are raised by their opacity which makes them function like black boxes. A solution for worries over the transparency of AI algorithms is explainable AI. Explainable AI seeks to enhance AI explainability and analytical capabilities, particularly in vital industries like healthcare. Even though earlier research has examined several explainable AI-related topics, such as a lexicon, industry-specific overviews, and applications in the healthcare industry, a thorough analysis concentrating on the function of explainable AI in building trust in AI healthcare systems is required. In an effort to close this gap, a systematic literature review that adheres to PRISMA principles that analyze relevant papers that were published between 2015 and 2023 was done in this paper. To determine the critical role that explainable AI plays in fostering trust, this study examines widely utilized methodologies, machine learning and deep learning techniques, datasets, performance measures and validation procedures used in AI healthcare research. In addition, research issues and potential research directions are also discussed in this research. Thus, this systematic review provides a thorough summary of the present status of research on explainability and transparency in AI healthcare systems, thus illuminating crucial factors that affect user trust. The results are intended to assist researchers, policymakers and healthcare professionals in developing a more transparent, responsible and reliable AI system in the healthcare sector.
... Dindorf et al. [241] utilised the perturbation-based explainability method Local Interpretable Model-Agnostic Explanation [242] to explain linear Support Vector Machines trained on kinematic and kinetic waveforms as well as discrete features to differentiate between healthy individuals and patients who underwent total hip arthroplasty. To derive global model explanations, the authors averaged the decision explanations per class. ...
Preprint
Full-text available
This chapter provides an overview of recent and promising Machine Learning applications, i.e. pose estimation, feature estimation, event detection, data exploration & clustering, and automated classification, in gait (walking and running) and sports biomechanics. It explores the potential of Machine Learning methods to address challenges in biomechanical workflows, highlights central limitations, i.e. data and annotation availability and explainability, that need to be addressed, and emphasises the importance of interdisciplinary approaches for fully harnessing the potential of Machine Learning in gait and sports biomechanics.
... However, the simpler model will likely function well locally despite not performing a globally accurate approximation of the complicated model. The prediction is then explained using a simpler model that was learned using the weighted data points (Dindorf et al., 2020). ...
Article
Full-text available
Objectives Lumbar spinal stenosis (LSS) is an increasingly important issue related to back pain in elderly patients, resulting in significant socioeconomic burdens. Postoperative complications and socioeconomic effects are evaluated using the clinical parameter of hospital length of stay (LOS). This study aimed to develop a machine learning-based tool that can calculate the risk of prolonged length of stay (PLOS) after surgery and interpret the results. Methods Patients were registered from the spine surgery department in our hospital. Hospital stays greater than or equal to the 75th percentile for LOS was considered extended PLOS after spine surgery. We screened the variables using the least absolute shrinkage and selection operator (LASSO) and permutation importance value and selected nine features. We then performed hyperparameter selection via grid search with nested cross-validation. Receiver operating characteristics curve, calibration curve and decision curve analysis was carried out to assess model performance. The result of the final selected model was interpreted using Shapley Additive exPlanations (SHAP), and Local Interpretable Model-agnostic Explanations (LIME) were used for model interpretation. To facilitate model utilization, a web application was deployed. Results A total of 540 patients were involved, and several features were finally selected. The final optimal random forest (RF) model achieved an area under the curve (ROC) of 0.93 on the training set and 0.83 on the test set. Based on both SHAP and LIME analyses, intraoperative blood loss emerged as the most significant contributor to the outcome. Conclusion Machine learning in association with SHAP and LIME can provide a clear explanation of personalized risk prediction, and spine surgeons can gain a perceptual grasp of the impact of important model components. Utilization and future clinical research of our RF model are made simple and accessible through the web application.
... By definition, the FPA is the angle of the foot with respect to the walking direction. Following results from gait analysis on HOA [23][24][25] and after THA operation [13,15,31], the following kinematic angles are used in this study to provide sufficient discrimination related to gait dysfunctionalities: ...
Article
Full-text available
The application of gait analysis on patients with Hip Osteoarthritis (HOA) before and after Total Hip Arthroplasty (THA) surgery can provide accurate diagnostics, reliable treatment decision making, and proper rehabilitation efforts. Acquired kinematic trajectories provide discriminating features that can be used to determine the gait patterns of healthy subjects and the effects of surgical operation. However, there is still a lack of consensus on the best discriminating kinematics to achieve this. Our investigation aims to utilize Deep Learning (DL) methodologies and improve classification results for the kinematic parameters of healthy, HOA, and 6 months post-THA gait cycles. Kinematic angles from the lower limb are used directly as one-dimensional inputs into a DL model. Based on the human gait cycle’s features, a hybrid Long Short-Term Memory–Convolutional Neural Network (HLSTM-CNN) is designed for the classification of healthy/HOA/THA gaits. It was found, from the results, that the sagittal angles of hip and knee, and front angles of FPA and knee, provide the most discriminating results with accuracy above 94% between healthy and HOA gaits. Interestingly, when using the sagittal angles of hip and knee to analyze the THA gaits, common subjects have the same results on the misclassifications. This crucial information provides a glimpse in the determination for the success or failure of THA.
Article
Background Aging of societies in recent and upcoming years has made musculoskeletal disorders a significant challenge for healthcare system. Knee osteoarthritis (KOA) is a progressive musculoskeletal disorder that is typically diagnosed using radiographs. Considering the drawbacks of X-ray imaging, such as exposure to ionizing radiation, the need for a noninvasive, low-cost alternative method for diagnosing KOA is essential. The purpose of this study was to evaluate the ability of a wearable device to differentiate between healthy individuals and those with severe osteoarthritis (grade 4). Methods The wearable device consisted of two inertial measurement unit (IMU) sensors, one on the lower leg and one on the thigh. One of the sensors is used as a dynamic coordinate system to improve the accuracy of the measurements. In this study, to discriminate between 1433 labeled IMU signals collected from 15 healthy individuals and 15 people with severe KOA aged over 45, new features were extracted and defined in dynamic coordinates. These features were employed in four different classifiers: (1) naive Bayes, (2) K-nearest neighbors (KNNs), (3) support vector machine, and (4) random forest. Each classifier was evaluated using the 10-fold cross-validation method ( K = 10). The data were applied to these models, and based on their outputs, four performance metrics – accuracy, precision, sensitivity, and specificity – were calculated to assess the classification of these two groups using the mentioned software. Results The evaluation of the selected classifiers involved calculating the four specified metrics and their average and variance values. The highest accuracy was achieved by KNN, with an accuracy of 93.71 ± 1.1 and a precision of 93 ± 1.31. Conclusion The novel features based on the dynamic coordinate system, along with the success of the proposed KNN model, demonstrate the effectiveness of the proposed algorithm in diagnosing between signals received from healthy individuals and patients. The proposed algorithm outperforms existing methods in similar articles in sensitivity showing an improvement of 4% and at least. The main objective of this study is to investigate the feasibility of using a wearable device as an auxiliary tool in the diagnosis of arthritis. The reported results in this study are related to two groups of individuals with severe arthritis (grade 4), and there is a possibility of weaker results with the current method.
Chapter
Full-text available
This chapter provides an overview of recent and promising Machine Learning applications, i.e. pose estimation, feature estimation, event detection, data exploration and clustering and automated classification, in gait (walking and running) and sports biomechanics. It explores the potential of Machine Learning methods to address challenges in biomechanical workflows; highlights central limitations, i.e. data and annotation availability and explainability, that need to be addressed; and emphasises the importance of interdisciplinary approaches for fully harnessing the potential of Machine Learning in gait and sports biomechanics.
Article
Full-text available
This study explores the application of machine learning (ML) in deriving and analyzing individual gait patterns (i.e., gait signatures) from ground reaction force data. The study leverages three datasets containing 2,092 individuals, including 1,283 cases with pathological gait, and addresses three key objectives: (1) Demonstrating the uniqueness of gait signatures in a large-scale dataset with heterogeneity introduced by patient data and various conditions. (2) Characterizing gait signatures using explainable artificial intelligence (XAI) to highlight specific features contributing to their uniqueness. (3) Evaluating the reliability of gait signatures and their characterizations across different numbers of individuals and training samples per individual. The results show that ML can accurately differentiate unique gait patterns across healthy individuals and patients with pathological gait patterns, highlighting the importance of considering individual gait signatures in clinical gait analysis. The high reliability of a person’s unique gait signature may bear potential for more personalized treatment decisions and rehabilitation programs, with XAI methods providing valuable insights into the key features characterizing individual gait. The results indicate that even more refined and personalized approaches are possible, extending beyond the conventional categories of pathology, age, and sex. This study provides a foundation for exploring the practical impact of gait signatures on rehabilitation, clinical diagnosis, and personalized treatment strategies.
Article
Full-text available
Study aim: To find out, without relying on gait-specific assumptions or prior knowledge, which parameters are most important for the description of asymmetrical gait in patients after total hip arthroplasty (THA). Material and methods: The gait of 22 patients after THA was recorded using an optical motion capture system. The waveform data of the marker positions, velocities, and accelerations, as well as joint and segment angles, were used as initial features. The random forest (RF) and minimum-redundancy maximum-relevance (mRMR) algorithms were chosen for feature selection. The results were compared with those obtained from the use of different dimensionality reduction methods. Results: Hip movement in the sagittal plane, knee kinematics in the frontal and sagittal planes, marker position data of the anterior and posterior superior iliac spine, and acceleration data for markers placed at the proximal end of the fibula are highly important for classification (accuracy: 91.09%). With feature selection, better results were obtained compared to dimensionality reduction. Conclusion: The proposed approaches can be used to identify and individually address abnormal gait patterns during the rehabilitation process via waveform data. The results indicate that position and acceleration data also provide significant information for this task.
Article
Full-text available
Modern technologies enable to capture multiple biomechanical parameters often resulting in relational data. The current work proposes a generally applicable method comprising automated feature extraction, ensemble feature selection and classification to best capture the potentials of the data also for generating new biomechanical knowledge. Its benefits are demonstrated in the concrete biomechanically and medically relevant use case of gender classification based on spinal data for stance and gait. Very good results for accuracy were obtained using gait data. Dynamic movements of the lumbar spine in sagittal and frontal plane and of the pelvis in frontal plane best map gender differences.
Article
Full-text available
Background: Inertial measurement units (IMUs) offer the ability to measure walking gait through a variety of biomechanical outcomes (e.g., spatiotemporal, kinematics, other). Although many studies have assessed their validity and reliability, there remains no quantitive summary of this vast body of literature. Therefore, we aimed to conduct a systematic review and meta-analysis to determine the i) concurrent validity and ii) test-retest reliability of IMUs for measuring biomechanical gait outcomes during level walking in healthy adults. Methods: Five electronic databases were searched for journal articles assessing the validity or reliability of IMUs during healthy adult walking. Two reviewers screened titles, abstracts, and full texts for studies to be included, before two reviewers examined the methodological quality of all included studies. When sufficient data were present for a given biomechanical outcome, data were meta-analyzed on Pearson correlation coefficients (r) or intraclass correlation coefficients (ICC) for validity and reliability, respectively. Alternatively, qualitative summaries of outcomes were conducted on those that could not be meta-analyzed. Results: A total of 82 articles, assessing the validity or reliability of over 100 outcomes, were included in this review. Seventeen biomechanical outcomes, primarily spatiotemporal parameters, were meta-analyzed. The validity and reliability of step and stride times were found to be excellent. Similarly, the validity and reliability of step and stride length, as well as swing and stance time, were found to be good to excellent. Alternatively, spatiotemporal parameter variability and symmetry displayed poor to moderate validity and reliability. IMUs were also found to display moderate reliability for the assessment of local dynamic stability during walking. The remaining biomechanical outcomes were qualitatively summarized to provide a variety of recommendations for future IMU research. Conclusions: The findings of this review demonstrate the excellent validity and reliability of IMUs for mean spatiotemporal parameters during walking, but caution the use of spatiotemporal variability and symmetry metrics without strict protocol. Further, this work tentatively supports the use of IMUs for joint angle measurement and other biomechanical outcomes such as stability, regularity, and segmental accelerations. Unfortunately, the strength of these recommendations are limited based on the lack of high-quality studies for each outcome, with underpowered and/or unjustified sample sizes (sample size median 12; range: 2-95) being the primary limitation.
Article
Full-text available
Explainability is essential for users to effectively understand, trust, and manage powerful artificial intelligence applications.
Article
Full-text available
Patients after total hip arthroplasty (THA) suffer from lingering musculoskeletal restrictions. Three-dimensional (3D) gait analysis in combination with machine-learning approaches is used to detect these impairments. In this work, features from the 3D gait kinematics, spatio temporal parameters (Set 1) and joint angles (Set 2), of an inertial sensor (IMU) system are proposed as an input for a support vector machine (SVM) model, to differentiate impaired and non-impaired gait. The features were divided into two subsets. The IMU-based features were validated against an optical motion capture (OMC) system by means of 20 patients after THA and a healthy control group of 24 subjects. Then the SVM model was trained on both subsets. The validation of the IMU system-based kinematic features revealed root mean squared errors in the joint kinematics from 0.24° to 1.25°. The validity of the spatio-temporal gait parameters (STP) revealed a similarly high accuracy. The SVM models based on IMU data showed an accuracy of 87.2% (Set 1) and 97.0% (Set 2). The current work presents valid IMU-based features, employed in an SVM model for the classification of the gait of patients after THA and a healthy control. The study reveals that the features of Set 2 are more significant concerning the classification problem. The present IMU system proves its potential to provide accurate features for the incorporation in a mobile gait-feedback system for patients after THA.
Article
Full-text available
Background: Quantitative gait analysis produces a vast amount of data, which can be difficult to analyze. Automated gait classification based on machine learning techniques bear the potential to support clinicians in comprehending these complex data. Even though these techniques are already frequently used in the scientific community, there is no clear consensus on how the data need to be preprocessed and arranged to assure optimal classification accuracy outcomes. Research question: Is there an optimal data aggregation and preprocessing workflow to optimize classification accuracy outcomes? Methods: Based on our previous work on automated classification of ground reaction force (GRF) data, a sequential setup was followed: firstly, several aggregation methods - early fusion and late fusion - were compared, and secondly, based on the best aggregation method identified, the expressiveness of different combinations of signal representations was investigated. The employed dataset included data from 910 subjects, with four gait disorder classes and one healthy control group. The machine learning pipeline comprised principle component analysis (PCA), z-standardization and a support vector machine (SVM). Results: The late fusion aggregation, i.e., utilizing majority voting on the classifier’s predictions, performed best. In addition, the use of derived signal representations (relative changes and signal differences) seems to be advantageous as well. Significance: Our results indicate that great caution is needed when data preprocessing and aggregation methods are selected, as these can have an impact on classification accuracies. Our results shall serve future studies as a guideline for the choice of data aggregation and preprocessing techniques to be employed.
Article
Full-text available
Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients from the screening arm of the National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. We segmented the data into instances without prostate cancer, instances with low-risk prostate cancer, and instances with high-risk prostate cancer. We developed a pipeline to deal with imbalanced data and proposed algorithms to perform preprocessing on such datasets. We evaluated the accuracy of various machine learning algorithms in predicting high-risk prostate cancer. An accuracy of 91.5% can be achieved by the proposed pipeline, using standard scaling, SVMSMOTE sampling method, and AdaBoost for machine learning. We then evaluated the contribution of rate of change of PSA, age, BMI, and filtration by race to this model’s accuracy. We identified that including the rate of change of PSA and age in our model increased the area under the curve (AUC) of the model by 6.8%, whereas BMI and race had a minimal effect.
Article
Full-text available
Motion capture systems are recognized as the gold standard for joint angle calculation. However, studies using these systems are restricted to laboratory settings for technical reasons, which may lead to findings that are not representative of real-life context. Recently developed commercial and home-made inertial measurement sensors (M/IMU) are potentially good alternatives to the laboratory-based systems, and recent technology improvements required a synthesis of the current evidence. The aim of this systematic review was to determine the criterion validity and reliability of M/IMU for each body joint and for tasks of different levels of complexity. Five different databases were screened (Pubmed, Cinhal, Embase, Ergonomic abstract, and Compendex). Two evaluators performed independent selection, quality assessment (consensus-based standards for the selection of health measurement instruments [COSMIN] and quality appraisal tools), and data extraction. Forty-two studies were included. Reported validity varied according to task complexity (higher validity for simple tasks) and the joint evaluated (better validity for lower limb joints). More studies on reliability are needed to make stronger conclusions, as the number of studies addressing this psychometric property was limited. M/IMU should be considered as a valid tool to assess whole body range of motion, but further studies are needed to standardize technical procedures to obtain more accurate data.
Book
The development of “intelligent” systems that can take decisions and perform autonomously might lead to faster and more consistent decisions. A limiting factor for a broader adoption of AI technology is the inherent risks that come with giving up human control and oversight to “intelligent” machines. Forsensitive tasks involving critical infrastructures and affecting human well-being or health, it is crucial to limit the possibility of improper, non-robust and unsafe decisions and actions. Before deploying an AI system, we see a strong need to validate its behavior, and thus establish guarantees that it will continue to perform as expected when deployed in a real-world environment. In pursuit of that objective, ways for humans to verify the agreement between the AI decision structure and their own ground-truth knowledge have been explored. Explainable AI (XAI) has developed as a subfield of AI, focused on exposing complex AI models to humans in a systematic and interpretable manner. The 22 chapters included in this book provide a timely snapshot of algorithms, theory, and applications of interpretable and explainable AI and AI techniques that have been proposed recently reflecting the current discourse in this field and providing directions of future development. The book is organized in six parts: towards AI transparency; methods for interpreting AI systems; explaining the decisions of AI systems; evaluating interpretability and explanations; applications of explainable AI; and software for explainable AI.
Chapter
In recent years, machine learning (ML) has become a key enabling technology for the sciences and industry. Especially through improvements in methodology, the availability of large databases and increased computational power, today’s ML algorithms are able to achieve excellent performance (at times even exceeding the human level) on an increasing number of complex tasks. Deep learning models are at the forefront of this development. However, due to their nested non-linear structure, these powerful models have been generally considered “black boxes”, not providing any information about what exactly makes them arrive at their predictions. Since in many applications, e.g., in the medical domain, such lack of transparency may be not acceptable, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This introductory paper presents recent developments and applications in this field and makes a plea for a wider use of explainable learning algorithms in practice.