ArticlePDF Available

Metrics for Performance Evaluation of Patient Exercises during Physical Therapy

Authors:

Abstract and Figures

Objective: The article proposes a set of metrics for evaluation of patient performance in physical therapy exercises. Methods: Taxonomy is employed that classifies the metrics into quantitative and qualitative categories, based on the level of abstraction of the captured motion sequences. Further, the quantitative metrics are classified into model-less and model-based metrics, in reference to whether the evaluation employs the raw measurements of patient performed motions, or whether the evaluation is based on a mathematical model of the motions. The reviewed metrics include root-mean square distance, Kullback Leibler divergence, log-likelihood, heuristic consistency, Fugl-Meyer Assessment, and similar. Results: The metrics are evaluated for a set of five human motions captured with a Kinect sensor. Conclusion: The metrics can potentially be integrated into a system that employs machine learning for modelling and assessment of the consistency of patient performance in home-based therapy setting. Automated performance evaluation can overcome the inherent subjectivity in human performed therapy assessment, and it can increase the adherence to prescribed therapy plans, and reduce healthcare costs.
Content may be subject to copyright.
I
n
t
e
r
n
a
t
i
o
n
a
l
J
o
u
r
n
a
l
o
f
P
h
y
s
i
c
a
l
M
e
d
i
c
i
n
e
&
R
e
h
a
b
i
l
i
t
a
t
i
o
n
ISSN: 2329-9096
International Journal of
Physical Medicine & Rehabilitation
Vakanski et al., Int J Phys Med Rehabil 2017, 5:3
DOI: 10.4172/2329-9096.1000403
Research Article OMICS International
Volume 5 • Issue 3 • 1000403
Int J Phys Med Rehabil, an open access journal
ISSN: 2329-9096
Metrics for Performance Evaluation of Patient Exercises during Physical
Therapy
Aleksandar Vakanski1*, Jake M. Ferguson2 and Stephen Lee3
1Industrial Technology, University of Idaho, Idaho Falls, ID, USA
2Center for Modeling Complex Interactions, University of Idaho, Moscow, ID, USA
3Department of Statistical Science, University of Idaho, Moscow, ID, USA
*Corresponding author: Aleksandar Vakanski, Industrial Technology, University
of Idaho, 1776 Science Center Drive, TAB 309, Idaho Falls, ID, 83402, USA, Tel:
208-757-5422; E-mail: vakanski@uidaho.edu
Received April 01, 2017; Accepted April 17, 2017; Published April 20, 2017
Citation: Vakanski A, Ferguson JM, Lee S (2017) Metrics for Performance
Evaluation of Patient Exercises during Physical Therapy. Int J Phys Med Rehabil 5:
403. doi: 10.4172/2329-9096.1000403
Copyright: © 2017 Vakanski A, et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.
Keywords: Physical therapy; Performance metric; Log-likelihood;
KL divergence; RMS distance
Introduction
Functional recovery from neuromotor disabilities, various surgical
procedures, or musculoskeletal trauma is strongly dependent on
patient participation in a physical therapy program. While a large
portion of all therapy exercises is performed by patients in a home-
based setting, the lack of supervision and motivation for continued
involvement in the therapy program in outpatient environment
conduce low adherence to prescribed treatment regimens [1]. e
presented work in this article was motivated by our belief that the latest
progress in machine learning furnishes a potential to be harnessed for
analysis and monitoring of patient progress toward recovery during in
home physical rehabilitation, and accordingly, can greatly benet both
patients and healthcare providers.
e recent rapid advancements in articial intelligence (AI), driven
predominantly by its sub eld machine learning, have been reected by
ubiquitous deployment across a wide spectrum of application domains,
ranging from miscellaneous image-, text-, and voice-processing apps
in smart phones and computers to autonomous cars and personalized
recommender systems.
It is expected that as the eld further evolves in the years to
come, AI-enabled systems will have even more pronounced and
transformative impact on society as a whole and on all aspects of our
lives as individuals.
In the medical eld, the number of machine learning applications
has proliferated recently due to the demonstrated capacity for
discovering complex patterns by analysing large numbers of electronic
medical records. Not surprisingly, the most notable medical AI success
has been in the domain of medical image processing. For example,
the medical team at Deep Mind have applied deep articial neural
networks (ANNs) for analysis of digital scans of the eye in diagnosis
of age-related macular degeneration and diabetic retinopathy [2],
and for analysis of radiotherapy scans for detection of oral and neck
cancer [3]. Other exemplary AI applications include image processing
of skin lesions in screening and detection of melanoma cancer [4],
and image processing of scans for detection of invasive brain cancer
cells [5]. Machine learning approaches have also been implemented in
a variety of other biomedical research problems [6], such as analysis
of genomics sequences [7], drug discovery and repurposing [8], and
robotic healthcare assistants [9].
e benets of applying machine learning algorithms to medical
data analytics are numerous, and encompass customized and
personalized diagnosis and treatment, faster screening and early
detection of conditions, which can potentially lead to improved
healthcare quality and patient satisfaction, reduced healthcare costs,
reduced need for hospital stay, and similar.
As more archived traditional medical records are transferred to
digital form, and as the personal wearable devices and mobile apps
unobtrusively collect massive amounts of information about our bodily
functions and activities, more training data will become available,
which will improve the outcomes of the machine learning algorithms
and leverage the extraction of subtle health related and behavioural
patterns. For instance, one creative solution employing images taken
from a regular cell phone camera is the mobile app AiCure [10], which
uses AI-supported image processing for monitoring users’ habits
in taking prescription medications, with an objective to increase the
adherence rates, as well as to update the respective physician on patient
habits related to taking the prescribed medications.
Abstract
Objective: The article proposes a set of metrics for evaluation of patient performance in physical therapy exercises.
Methods: Taxonomy is employed that classies the metrics into quantitative and qualitative categories, based on
the level of abstraction of the captured motion sequences. Further, the quantitative metrics are classied into model-
less and model-based metrics, in reference to whether the evaluation employs the raw measurements of patient
performed motions, or whether the evaluation is based on a mathematical model of the motions. The reviewed metrics
include root-mean square distance, Kullback Leibler divergence, log-likelihood, heuristic consistency, Fugl-Meyer
Assessment, and similar.
Results: The metrics are evaluated for a set of ve human motions captured with a Kinect sensor.
Conclusion: The metrics can potentially be integrated into a system that employs machine learning for modelling
and assessment of the consistency of patient performance in home-based therapy setting. Automated performance
evaluation can overcome the inherent subjectivity in human performed therapy assessment, and it can increase the
adherence to prescribed therapy plans, and reduce healthcare costs.
Citation: Vakanski A, Ferguson JM, Lee S (2017) Metrics for Performance Evaluation of Patient Exercises during Physical Therapy. Int J Phys Med
Rehabil 5: 403. doi: 10.4172/2329-9096.1000403
Page 2 of 6
Volume 5 • Issue 3 • 1000403
Int J Phys Med Rehabil, an open access journal
ISSN: 2329-9096
evaluation in the published literature, to the best of our knowledge only
the work by Komatireddy et al. [12] has partially addressed this topic.
e authors proposed a quantitative metric, related to the number of
correctly performed repetitions of an exercise, and a qualitative metric,
related to ratio of optimal vs. sub-optimal repetitions of the exercise.
e study does not provide a clear explanation of which discriminative
approach was applied for distinguishing between optimal and
suboptimal repetitions.
is article reviews metrics that have been used, or that can be
potentially used, for evaluation of patient therapy motions. Motivated
by the work in Komatireddy et al. [12], we employ a taxonomy
that classies the metrics as quantitative and qualitative. Further,
quantitative metrics are categorized into model-less and model-based
metrics. Model-less metrics perform the assessment based on the raw
time series of the motions as acquired by a sensory system. Metrics
is this category are: root-mean square distance, and norm of jerk.
Model-based metrics calculate the consistency of patient exercises in
comparison to a mathematical model of the motion as prescribed by
a PT. Metrics in this category include: log-likelihood, Kullback Leibler
divergence, heuristic consistency, and prediction intervals. Other
related metrics not explored in this work are the Hellinger distance and
the Bhattacharyya distance. While the quantitative metrics evaluate
the motions at a low level of abstraction, i.e., at a level of individual
measurement points in a sequence, the qualitative metrics evaluate the
motions at a high level of abstraction, i.e., at a motion sequence level.
Metrics in this category involve: number of optimal attempts, Fugl-
Meyer Assessment, and Wolf Motor Function Test.
e article is organized as follow. e next section introduces the
used mathematical notation for the human motions. Aerwards the
metrics for patient performance evaluation are described. e reviewed
metrics are next compared for evaluation of ve human motions. e
last section summarizes the presented study.
Notation
In a physical rehabilitation setting, a PT will prescribe a collection
of desired therapy motions to a patient, by either performing the
motions in front of the patient, or by physically moving the body parts
of the patient along the required paths. It is assumed here that the PT
will provide several demonstrations of each motion in order to reinforce
the perception of the motion by the patient, which may be related to
required range, speed of movement, and other respective constraints in
the execution of the motion. e set of reference examples of a motion
prescribed by the PT is denoted O
{ }
=
M
mm=1
OO
where m is used for indexing
the individual examples of the motion, and M is the total number of
examples of the motion O demonstrated by the PT. It is also assumed that
a sensory system is used for capturing the prescribed therapy exercises,
where each motion Om is acquired by the sensor as a temporal sequence
of measurements
( ) ( ) ( )
( )
12
, ,...,=oo o
m
T
m mm m
O
. Each measurement
( )
o
k
m
in the
motion sequence represents a D-dimensional vector where the subscript
m denotes again the example index, and the superscript k denotes the
temporal position of the measurement in the motion sequence Om. e
total number of temporal measurements in the demonstration Om is
denoted
m
T
. In general, the length of the motion examples Tm in the set
{O1, O2,.., OM} will be similar but dierent, due to the inherent variability
of human movements.
Analogously, let’s assume that the patient is attempting to perform the
prescribed motion O in a home-based rehabilitation program in front of
a sensory system for motion capturing. e patient is presumably asked
to repeat the motion a predened number of times at a predened time
period (e.g., 10 times daily). e measured motion examples performed
Likewise, application of machine learning algorithms for
monitoring and evaluation of patient compliance with a prescribed
physical therapy program can improve the adherence rates, reduce
the required time for functional recovery, and consequently, reduce
treatment cost. e development of such systems requires hardware
components, i.e., a dedicated computer for data processing, and a
sensory system for capturing patient exercises during rehabilitation
sessions. Among the dierent sensory systems for motion capturing,
the vision-range sensors of the type of Microso Kinect are currently
an excellent option for the task at hand, considering their aordability
(price in the range of $150), reliability for dierent research and
industrial applications, and availability of open source libraries for
program development with a broad range of capabilities. Two such
existing systems KiRes (Kinect Rehabilitation System) [11] and VERA
(Virtual Exercise Rehabilitation Assistant) [12] utilize the motion
capturing feature of Kinect to present an avatar on a computer display
that reproduces patient motions in real time, and simultaneously
displays the desired motions. e visualization of the performance
provides an instantaneous feedback to the patient, helps in recognizing
any needs for correcting the exercises, as well as motivates the patient
to comply with the prescribed treatment. A comprehensive review of
the technical and clinical merits of the application of Microso Kinect
for motion capturing of patient exercises in physical rehabilitation is
presented by Hondori and Khademi [13].
Equally important to the requirement for adequate hardware
components is the development of a methodology for computer-driven
analysis of patient therapy eorts, related to evaluating the consistency
of the performance with the PT-prescribed exercises, the day-to-day
patient progress, and the level of compliance with the prescribed
treatment plan. Such methodology is predicated upon the provision
of: (i) ecient mathematical models for representation of bodily
movements undertaken during physical therapy exercises [14], and
(ii) ecient metrics for quantifying the patient executed motions and
collating the performance to the prescribed motions by the PT.
e objective of this article is to present a survey of the current
literature in reference to the metrics for evaluation of patient
performance in physical therapy. e existing practice for evaluation
of physical rehabilitation has exclusively relied on assessment by a
PT. For instance, a common test for evaluation of motor recovery
aer stroke is Fugl-Meyer Assessment [15], where a PT evaluates a
patient’s performance on a set of pre-dened movements and assigns a
numerical score on a scale of 0 to 2 for each of the movements. Related
tests for evaluation of the level of recovery aer stroke include the
Motor Assessment Scale [16] and the motricity index [17]. Another
test for assessing the ability of upper motor movements is the Wolf
Motor Function Test [18], which is a timed test consisting of several
functional tasks, scored on a scale of 0 to 5. ese and several other
tests for assessment of patient performance and the corresponding
level of functional recovery that are currently performed by a trained
PT are suitable candidates for automation, since they rely on a set of
standard pre-dened movements. Accordingly, drawbacks of this type
of assessment include: it is time consuming, and it produces subjective
scores where dierent PTs can provide dierent assessment scores
due to human inability to accurately measure and quantify body
trajectories. Automated performance evaluation can overcome these
limitations by providing more accurate and quantied assessment, also
can be involved in daily monitoring of the therapy sessions, and can
provide instantaneous corrective feedback and send the performance
data to the respective PT on a daily basis.
With regards to the proposed metrics for automated performance
Citation: Vakanski A, Ferguson JM, Lee S (2017) Metrics for Performance Evaluation of Patient Exercises during Physical Therapy. Int J Phys Med
Rehabil 5: 403. doi: 10.4172/2329-9096.1000403
Page 3 of 6
Volume 5 • Issue 3 • 1000403
Int J Phys Med Rehabil, an open access journal
ISSN: 2329-9096
by the patient are denoted R
{ }
1
N
nn=
=RR
where by analogy n represents a
motion index, and N is the number of performed motion examples. Each
motion example is a temporal sequence
( ) ( )
( )
( )
12
, , ...,
n
T
n nn n
=rr rR
, consisting of
n
T
D-dimensional vectors denoted in this work
( )
,T
kD
n
r
.
e metrics for performance evaluation are to describe in a
quantitative or a qualitative manner, or both, the consistency of the
patient performed examples of the motion R with the PT prescribed
examples of the motion O. Due to musculoskeletal constraints, pain, or
other conditions, the patient may not be able to correctly perform the
motion at the beginning of the therapy program, which may, or may
not, improve as the therapy program progresses.
Metrics
e reviewed metrics for performance evaluation are classied
in this work into two main categories: quantitative and qualitative
metrics. Accordingly, quantitative metrics assign a numerical score for
the consistency of the patient performance, whereas qualitative metrics
assign either a non-numerical evaluation (e.g., correct versus incorrect
performance) or a discrete numerical score from a nite and limited
range of values or states.
Quantitative metrics
Quantitative metrics can be also referred to as low-level metrics,
since they evaluate the consistency of each measurement with regards
to the prescribed sequence of measurements, or with respect to a model
of the motion in the form of a probability distribution. e quantitative
metrics are further classied into model-less and model-based metrics.
Model-less metrics
e model-less metrics compare the motions captured during a
physical therapy exercise by a patient, with the motions captured when
prescribing the therapy exercise by the PT. ese metrics compare the
measured raw trajectories of the body parts as acquired by the sensory
system.
e following metrics are classied in this group:
a) Root-mean square (RMS) distance-obtained as a sum of dierences
between the points of a captured trajectory Rn and a set of prescribed
trajectories Rn and a set of prescribed trajectories O
{ }
1
M
mm=
=OO
L1 (Rn,O)
( )
()
11
1
m
T
M
kk
nm
mk
M
= =
= −=
∑∑
ro
( )
( )
( )
( )
22
,1 ,
(,1) (, )
11
1m
T
M
k kD
k kD
nm n m
mk
ro r o
M
= =
= − ++ −
∑∑
(1)
One constraint of the RMS distance is the requirement that the
trajectories have the same length, i.e., the same number of observations
Tm. erefore, the observed trajectories need to be scaled to a same
length before the RMS distance is calculated. For the case when the
trajectories are linearly scaled to a same length, if there are great spatial
dierences along their temporal dimension, that will result in a large
RMS distance between the trajectories. is limitation is typically
mitigated by employing approaches for temporal alignment of the
trajectories, such as Dynamic Temporal Warping (DTW) [19].
Another metric that can be derived from the RMS distance for
a single motion example Rn is the mean of the RMS distances for all
motion sequences in the set R
{ }
1
N
nn=
=RR
, i.e.,
L1,mean (R,O)
( )
( )
1, 1
1
1
,,
N
mean n
n
N
=
=
LLRO OR
(Rn,O)
(2)
b) Norm of jerk-where the term jerk is related to the time derivative
of the acceleration, i.e., the third derivative of the position. e metrics
calculates the norm of the jerk for each trajectory point as:
( )
( )
23
1
1
n
T
k
nn
k
n
d
Tdt
=
= =
rLR
( ) ( )
22
,1 ,
33
33
1
1
n
k kD
T
nn
k
n
dr dr
Tdt dt
=
 
= ++
 
 
 
(3)
is metric quanties the level of smoothness of the movement [20],
and high value of jerk can be indicative of shaky patient movements
during the physical exercises. In certain rehabilitations exercises and
conditions, it is expected that the patients will produce high level of
jerks at the beginning of the treatment, which will gradually reduce as
the recovery improves. Although this metric evaluates only one aspect
of the movements, when combined with other metrics it can provide
valuable information regarding the level of progress toward functional
recovery.
Model-based metrics
ese metrics rely on a model of the prescribed motions and/
or a model of the patient motions. Common methods used for
modeling human motions include probabilistic approaches, such
as Gaussian mixture models [21] and hidden Markov models [22].
ese approaches model the sequences through a set of latent states
that describe a statistical distribution of the motion dynamics. Other
common approach for modeling human movements is by employing
a set of deterministic latent states connected by weights, such as the
articial neural networks [21].
e metrics in this category include:
a) Log-likelihood-expresses the probability P that a performed
motion example by the patient is drawn from a model of the motions
as prescribed by the PT. For a model described with a set of parameters
λ, the log-likelihood of a motion example Rn is calculated as a natural
logarithm of the likelihood for all data points given the model
parameters λ [21], that is,
( )
( )
3
1log
nn
n
T
λ
= =LRRP
P(Rn)|(λ)=
1
1
11
loglog
nn
TT
kk
nn
k
k
nn
TT
rr
(4)
Similar to (2), the mean of the log-likelihood for all sequences in
the set R
{ }
1
N
nn=
=RR
can be employed as a measure of consistency of the
repetitions of a single motion in reference to a model λ of the prescribed
set O.
L3,mean (R,O)
( )
( )
3, 3
1
1
,
N
mean n
n
N=
=
LLRO
R
(5)
b) Kullback Leibler (KL) divergence-is a measure of the similarity
between two probability distributions [23]. One of the distributions is
considered to represent the true theoretical distribution of the data, in
this case that is the empirical distribution of the prescribed movements
by the PT, i.e., P(O). e other distribution represents an approximation
of the true distribution, which in this case is the distribution of the
executed movements by the patient, i.e., P(R). e KL divergence
between P(O) and P(R) is dened as:
( ) ( ) ( )
( )
4
, log=
 
   
(6)
If the probability distributions of the motions are modelled with a
Citation: Vakanski A, Ferguson JM, Lee S (2017) Metrics for Performance Evaluation of Patient Exercises during Physical Therapy. Int J Phys Med
Rehabil 5: 403. doi: 10.4172/2329-9096.1000403
Page 4 of 6
Volume 5 • Issue 3 • 1000403
Int J Phys Med Rehabil, an open access journal
ISSN: 2329-9096
parameter set, the KL divergence can be found by calculating the mean
probability of the data points in the motion sequences as
()
4,
11
1
,
m
T
M
k
mean m
mk
m
MT
o
( )
()
λ
ok
m.
( ) ( )
1 1 1 1
1 1
λ λ
= = = =
 
 
 
∑ ∑ ∑ ∑
o r
m n
T T
M N
k k
m n
m k n k
m n
MT NT
 
( ) ( )
log log
(7)
is metric is also known as relative entropy, and is a measure of
the lost information when the probability distribution P(R) is used to
approximate the probability distribution P(O).
Other alternative metrics to the KL divergence that have been used
to quantify the dierence between two probability distribution and can
be as well considered for evaluation of human motion consistency are
the Hellinger distance and Bhattacharyya distance.
c) Heuristic consistency-is a simple qualitative measure that
determines the proportion of patient movements that are contained
within the extremums of the demonstrated movements O. e measure
is dened as:
L5(R,O)
( )
( )
()
( )
()
{ }
( )
( )
5min ,max
1
1
,1
n
kk
T
k
n
k
n
NT
=

= −


oo
1rLRO
(8)
e indicator function
( )
()
( )
()
{ }
( )
( )
min ,max
kk
k
n
oo
1r
evaluates to 1 if the
captured trajectory data at time step k,
( )
()
( )
()
{ }
( )
( )
min ,max
kk
k
n
oo
1r
and otherwise the indicator
function evaluates to 0. Higher values of the measure indicate increased
consistency between the patient performed, and the prescribed
movement examples. is metric may require a larger number of
movement examples.
d) Prediction intervals-can be used to determine if the estimated
means of the patient movements are consistent with the tted model
of PT’s demonstrated movements. For this purpose, 95% condence
intervals from the relative likelihood are constructed, determined by
the bounds
ˆ
ln ln 1.92
kk
oo
.
(9)
Next, the proportion of estimated means from the captured patient
trajectory that is contained within the condence interval is calculated,
and averaged over all captured trajectories to obtain the metric L6(R,O).
If the captured trajectories are consistent with the demonstrated
movements then L6(R,O) should have a value of approximately 5%.
Qualitative metrics
Qualitative metrics can be referred to as high level metrics because
they evaluate each patient’s performed motion example as an individual
repetition with respect to the prescribed motion examples, as opposed
to evaluating the individual sequential measurements at the trajectory
level.
e following metrics have been used for qualitative assessment of
therapy exercises in previous works in the literature:
a) Number of optimal attempts-is used in the work of Komatireddy
et al. [12] to assess patient performance. As stated before, it is not clear
what type of approach the authors applied in labeling the motions as
either optimal or suboptimal.
On the other hand, it is possible to use any of the quantitative
approaches listed above to calculate a numerical score for each
repetition of a motion, and then to label it as optimal if the score is
greater than a predened threshold value.
b) Fugl-Meyer assessment (FMA)-introduces a series of standardized
exercises intended to evaluate the development of motor functions
and balance in patients recovering from stroke [15]. e FMA test
encompasses ve principle domains for assessment: motor function,
sensory function, balance, joint range of motion, and joint pain. Each
domain involves several assessment steps related to the performance
of respective movements. e movements are evaluated by a PT on a
scale with 3 grades, with 0 as minimum and 2 as maximum grade. e
assessment produces a cumulative numerical score representing the
progression toward functional recovery of the stroke patient.
is assessment method can be employed in the development of
metrics for automated performance evaluation, by either drawing
insights from the PT evaluator’s way of scoring the movements, or by
training a machine learning algorithm to score in a similar manner by
using PT’s scores as inputs.
In addition, the FMA test has been reported to be complex and time
consuming [16]. Consequently, an automated version of the test based
on machine learning methodology could be a valuable contribution to
the domain of physical rehabilitation. Another potential advantage of
automated assessment is the provision of more precise evaluation than
the three grades scale.
Several faster alternative tests to the FMA have been introduced,
including the Motor Assessment Scale [16] and the motricity index
[17]. ese tests have been frequently used in practice, and can also
be exploited in the development of an automated performance metric.
c) Wolf motor function test (WMFT)-is a timed test of functional
tasks used to assess the ability of upper motor movements [18]. e
test relies on using a number of objects as props, such as a chair, table,
weights. e required motions are performed by using the props. e
tasks are timed, with each motion given a maximum time of 2 minutes.
e performance of each task is scored on a scale from 0 to 5. Summary
scores are calculated based on the medians of the timings of the
motions, and on the means of the ratings for the functional abilities.
Similar to the observation regarding the FMA test, WMFT is
also suitable for automation and can provide understanding into the
development of automated performance metrics.
Evaluation
Dataset
e proposed metrics were evaluated on the publically available
dataset of human motion UTD-MHAD (University of Texas at Dallas
– Multimodal Human Action Dataset) [24]. e dataset includes 27
actions, each performed by 8 subjects 4 times. A Kinect sensor and a
wearable inertial sensor were used for collecting the data.
e following 5 actions were employed here for evaluation
purposes: two hands front clap, right arm throw, draw circle clockwise,
draw triangle, and tennis serve. Sample images for the actions are
presented in Figure 1.
Evaluation Results
e following metrics were evaluated for the ve actions: rot-mean
square distance, log-likelihood, KL divergence, heuristic consistency,
and prediction intervals. e results are presented in Table 1.
e data for the ve actions was divided into 2 sets: a training
Citation: Vakanski A, Ferguson JM, Lee S (2017) Metrics for Performance Evaluation of Patient Exercises during Physical Therapy. Int J Phys Med
Rehabil 5: 403. doi: 10.4172/2329-9096.1000403
Page 5 of 6
Volume 5 • Issue 3 • 1000403
Int J Phys Med Rehabil, an open access journal
ISSN: 2329-9096
set consisting of 21 sequences for each action, and a testing dataset
consisting of 7 sequences of each action. Both the training and the
testing set correspond to actions performed by the same group of
subjects. One may note that it is preferred the motions to correspond
to therapy exercises, and the testing set to include suboptimal examples
of the motions. As part of the future work, we have plans to create a
dedicated dataset related to motions performed in physical therapy.
e root-mean square distance was calculated for the recorded
trajectories. e motion capture feature of Kinect provides a skeletal
data, where the human skeleton (shown in Figure 1) consists of 20 joints.
e temporal measurements for each joint are spatial 3-dimensional
coordinates. Hence, the data comprises 60 dimensional data sequences.
e recorded motion sequences were scaled to a same number of
measurements by using the DTW algorithm. e provided results in
Table 1 present the mean values for the root-mean square distance for
the 7 motion sequences in the testing dataset.
Log-likelihood of the testing data was calculated for several
dierent mathematical models of the training data. e dimensionality
of the raw observation data was rst reduced from 60 to 3-dimensions,
by employing an autoencoder neural network [25]. Aerwards, the
3-dimensional sequences were modeled using a mixture density
network [14], Gaussian mixture model by employing expectation
maximization, and a hidden Markov model [26]. e mean log-
likelihood of the testing dataset is shown in Table 1.
e mean KL divergence of the testing data is also presented in Table
1. Similar to the log-likelihood metric, an autoencoder is employed to
reduce the dimensionality of the observed data, and a mixture density
network is aerwards used to model the data.
e last two columns in the table present the heuristic consistency
and prediction intervals metrics.
Conclusion
e article presents a survey on the current literature on the metrics
for evaluation of patient performance in physical therapy. e metrics
Figure 1: Sample images and skeletal representations for the selected actions in the UTD-MHAD dataset: (a) Two hands front clap; (b) Right arm throw; (c) Draw circle
clockwise; (d) Draw triangle; and (e) Tennis serve.
Action 4 - Two
Hand Front Clap
Action 5 – Right
Arm Throw
Action 9 – Draw
Circle Clockwise
Action 11 – Draw
Triangle
Action 17 – Tennis
Serve
RMS 5.616 (0.139) 6.508 (0.172) 6.580 (0.144) 6.098 (0.230) 8.195 (0.208)
Log-likelihood: Autoenconder+Mixture Density Network
(GMM) 1.707 (0.569) 0.968 (0.834) 0.488 (0.897) 0.788 (0.467) 0.823 (0.626)
Log-likelihood: Autoenconder+Expectation Maximization
(GMM)
1.808 (3.526)
8 states
0.557 (3.481)
8 states
0.429 (5.189)
10 states
0.199 (10.567)
7 states
0.575 (2.999)
7 states
Log-likelihood: Autoenconder+Hidden Markov Model -0.954 (0.066)
23 states
-0.731 (0.041)
28 states
-0.905 (0.017)
26 states
-1.459 (3.025)
30 states
-0.730 (0.015)
25 states
KL Divergence: Autoenconder+Mixture Density Network
(GMM)
Train vs. test data:
1.232
Train vs. test data:
0.685
Train vs. test data:
2.617
Train vs. test data:
1.626
Train vs. test data:
0.709
Heuristic Consistency 0.1002 0.0972 0.0728 0.1053 0.0724
Prediction Intervals 0.0759 0.0966 0.0788 0.0837 0.0559
Table 1: Performance metrics for 5 action movements from the UTD-MHAD Dataset.
Citation: Vakanski A, Ferguson JM, Lee S (2017) Metrics for Performance Evaluation of Patient Exercises during Physical Therapy. Int J Phys Med
Rehabil 5: 403. doi: 10.4172/2329-9096.1000403
Page 6 of 6
Volume 5 • Issue 3 • 1000403
Int J Phys Med Rehabil, an open access journal
ISSN: 2329-9096
are classied into quantitative and qualitative metrics. e quantitative
metrics assign a numerical score for the patient performance, and are
categorized into model-less and model-based metrics, based on whether
a mathematical model of the motions is employed for performance
evaluation.
e existing practice in physical therapy predominantly relies on
assessment by a physical therapist. e studies related to automated
assessment of therapy motions are scarce in the published literature,
and consequently little attention has been paid to the development and
denition of metrics for performance evaluation. is article reviews
some of the reported metrics in the literature. In addition, the article
reviews metrics that have been used for evaluation of human motions in
other elds. Examples are root-mean square distance and norm of jerk,
which have been used in the domain of robotic learning from human
demonstrations. Other metrics, such as Kullback Leibler divergence,
heuristic consistency, have been used in general for comparison of
probability distributions.
e presented metrics in this article can be used for evaluation of
human motions in other application domains, or also for assessment of
sequential data in other elds, if applicable.
Acknowledgement
This work was supported by the Center for Modeling Complex Interactions
through NIH Award #P20GM104420 with additional support from the University
of Idaho.
References
1. Jack K, McLean SM, Moffett JK, Gardiner E (2010) Barriers to treatment
adherence in physiotherapy outpatient clinics: a systematic review. Manual
Therapy 15: 220–228.
2. De Fauw J, Keane P, Tomasev N, Visentin D, van der Driessche G, et al. (2016)
Automated analysis of retinal imaging using machine learning techniques for
computer vision. F1000 Res 5: 1573.
3. Chu C, De Fauw J, Tomasev N, Paredes BR, Hughes C, et al. (2016) Applying
machine learning to automated segmentation of head and neck tumour
volumes and organs at risk on radiotherapy planning CT and MRI scans. F1000
Res 5: 2104.
4. Jafari MH, Nasr-Esfahani E, Karimi N, Reza Soroushmehr SM, Samavi S, et
al. (2016) Extraction of skin lesions from non-dermoscopic images using deep
learning. arXiv: 1609.
5. Jermyn M, Desroches J, Mercier J, Tremblay MA, St-Arnaud K, et al. (2016)
Neural networks improve brain cancer detection with Raman spectroscopy in
the presence of operating room light artifacts. J Biomed Opt 21: 094002.
6. Mamoshina P, Vieira A, Putin E, Zhavoronkov A (2016) Applications of deep
learning in biomedicine. Mol Pharmac 13: 1445-1454.
7. Ditzler G, Polikar R, Rosen G (2015) Multi Layer and recursive neural networks
for metagenomic classication. IEEE Transactions on NanoBioscience 14: 608-
616.
8. Hughes TB, Miller GP, Swamidass SJ (2015) Modeling epoxidation of drug-
like molecules with a deep machine learning network. ACS Central Science
1: 168-180.
9. Shademan A, Decker RS, Opfermann JD, Leonard S, Krieger A, et al. (2016)
Supervised autonomous robotic soft tissue surgery. Science Translational
Medicine 8: 337-364.
10. AiCure – Advanced Medication Adherence.
11. Anton D, Goni A, Illarramendi A, Torres-Unda JJ, Seco J (2013) KiReS: A Kinect
based telerehabilitation system. Int. Conf. on e-Health Networking, Applications
and Services: 456-460.
12. Komatireddy R, Chokshi A, Basnett J, Casale M, Goble D, et al. (2016) Quality
and quantity of rehabilitation exercises delivered by a 3-D motion controlled
camera: a pilot study. Int J Phys Med Rehab 2: 1–14.
13. Hondori HM, Khademi M (2014) A review on technical and clinical impact of
Microsoft Kinect on physical therapy and rehabilitation. J Med Eng: 1–16.
14. Vakanski A, Ferguson JM, Lee S (2017) Mathematical modeling and evaluation
of human motions in physical therapy using mixture density neural networks. J
Physiother Phys Rehabil 1: 1-10.
15. Fugl-Meyer AR, Jääskö L, Leyman I, Olsson S, Steglind S (1975) The post-
stroke hemiplegic patient. 1. a method for evaluation of physical performance.
Scandinavian J Rehab Med 7: 13–31.
16. Carr JH, Shepherd RB, Nordholm L, Lynne D (1985) Investigation of a new
motor assessment scale for stroke patients. Phys Ther 65: 175-180.
17. Poole JL, Whitney SL (1988) Motor assessment scale for stroke patients:
concurrent validity and interrater reliability. Arch Phys Med Rehabil 69: 195–197.
18. Wolf SL, Lecraw DE, Barton LA, Jann BB (1989) Forced use of hemiplegic
upper extremities to reverse the effect of learned nonuse among chronic stroke
and head-injured patients. Exp Neurol 104: 125-132.
19. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for
spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal
Processing ASSP-26: 43-49.
20. Calinon S, D’halluin F, Sauser EL, Caldwell DG, Billard AG (2010) Learning and
reproduction of gestures by imitation: an approach based on hidden Markov
model and Gaussian mixture regression. IEEE Robotics and Automation
Magazine 17: 44-54.
21. Bishop CM (2006) Pattern Recognition and Machine Learning. New York, USA:
Springer.
22. Rabiner L (1989) A tutorial on hidden Markov models and selected applications
in speech recognition. Proc of the IEEE 77: 257-286.
23. Kullback S, Leibler RA (1951) On information and sufciency. Annals Math Stat
22: 79-86.
24. University of Texas at Dallas – Multimodal Human Action Dataset.
25. Bourlard H, Kamp Y (1998) Auto-association by multilayer perceptrons and
singular value decomposition. Biological Cybernetics 59: 291-294.
26. Vakanski A, Mantegh I, Irish A, Janabi-Shari F (2012) Trajectory learning
for robot programming by demonstration using Hidden Markov Model and
Dynamic Time Warping. IEEE Transactions on Systems, Man, and Cybernetics
44: 1039-1052.
Citation: Vakanski A, Ferguson JM, Lee S (2017) Metrics for Performance
Evaluation of Patient Exercises during Physical Therapy. Int J Phys Med
Rehabil 5: 403. doi: 10.4172/2329-9096.1000403
OMICS International: Open Acess Publication Benefits &
Features
Unique features:
Increased global visibility of articles through worldwide distribution and indexing
Showcasing recent research output in a timely and updated manner
Special issues on the current trends of scientic research
Special features:
700+ Open Access Journals
50,000+ editorial team
Rapid review process
Quality and quick editorial, review and publication processing
Indexing at major indexing services
Sharing Option: Social Networking Enabled
Authors, Reviewers and Editors rewarded with online Scientic Credits
Better discount for your subsequent articles
Submit your manuscript at: http://www.omicsonline.org/submission
... While in the initial research emphasis was mainly on aerobic exercises [20,21], more recently attention has also begun to focus on strength exercises. To independently monitor and evaluate repetitive human movements through this type of exercise, a proper form is to classify them into two main categories: quantitative and qualitative [22]. Quantitative evaluation will provide an overview of how many repetitions are done, and qualitative evaluation will show whether repetition is being performed correctly. ...
... In the development of system feedback, where a workout consists primarily of strength exercises, it is possible to take advantage of the fact that human movements are repetitive. The form of feedback in this case should contain two important parameters, quantitative and qualitative, i.e. the number of performed repetitions of a particular exercise and the quality of the performed repetitions [22,31]. For successful counting and assessment of repetition quality, repetition must first be detected and isolated (segmented) and then identified (classified) to which exercise it belongs [30]. ...
Article
Full-text available
Monitoring a person’s physical activity has a wide range of applications in both sports and medicine. With the advancement of technology for measuring human movement, it is possible to monitor the performed activity without a need for an expert to directly overlook the trainee. While the initial interest focused mainly on aerobic exercises, research has recently begun to focus on strength exercises. The goal is to achieve the highest possible accuracy in tracking movement while maintaining the low cost and energy autonomy of the monitoring device. In this paper, an algorithm for the segmentation and classification of repetitive movements during workouts based on 3-axis accelerometer data from a wearable device is presented. The accelerometer signals were recorded continuously during the workout session which consisted typically of 9 strength exercises, where 8 default movements were repeated in three sets. Segmentation of the acceleration signals recorded during the workout was done using the frequency spectrum of the acceleration magnitude with an accuracy of 99.4%, while the classification of the segmented movements was done using the Dynamic Time Warping (DTW) algorithm with an accuracy of 85.7%.
... Studies in the literature concerned with automated evaluation of therapy motions are scarce [23][24][25][26], and not much attention has been paid to the development of metrics for performance evaluation [27]. As a common scheme, a reference model is first captured as the ground truth. ...
... Then, a user's performance can be compared with the reference using machine learning approaches. A comprehensive taxonomy of the metrics for evaluation of patient performance in physical therapy was proposed by Vakanski et al [27]. The metrics are classified into quantitative and qualitative categories. ...
Article
Full-text available
Background Performing physiotherapy exercises in front of a physiotherapist yields qualitative assessment notes and immediate feedback. However, practicing the exercises at home lacks feedback on how well patients are performing the prescribed tasks. The absence of proper feedback might result in patients performing the exercises incorrectly, which could worsen their condition. We present an approach to generate performance scores to enable tracking the progress by both the patient at home and the physiotherapist in the clinic. Objective This study aims to propose the use of 2 machine learning algorithms, dynamic time warping (DTW) and hidden Markov model (HMM), to quantitatively assess the patient’s performance with respect to a reference. Methods Movement data were recorded using a motion sensor (Kinect V2), capable of detecting 25 joints in the human skeleton model, and were compared with those of a reference. A total of 16 participants were recruited to perform 4 different exercises: shoulder abduction, hip abduction, lunge, and sit-to-stand exercises. Their performance was compared with that of a physiotherapist as a reference. Results Both algorithms showed a similar trend in assessing participant performance. However, their sensitivity levels were different. Although DTW was more sensitive to small changes, HMM captured a general view of the performance, being less sensitive to the details. Conclusions The chosen algorithms demonstrated their capacity to objectively assess the performance of physical therapy. HMM may be more suitable in the early stages of a physiotherapy program to capture and report general performance, whereas DTW could be used later to focus on the details. The scores enable the patient to monitor their daily performance. They can also be reported back to the physiotherapist to track and assess patient progress, provide feedback, and adjust the exercise program if needed.
... Motion tracking systems included in videogame consoles and virtual or mixed reality devices have been used to record these data. The Kinect employs an optical system [14]- [16], while the Wii uses inertial sensors [16], [17]. Both devices are examples of console technologies. ...
Conference Paper
Full-text available
Shoulder injuries and conditions are common musculoskeletal complaints that can limit a patient's range of motion and daily activities. Recently, serious games and mixed reality technologies, such as the HoloLens, have been proposed for shoulder rehabilitation. However, it is unclear if this technology accurately tracks 3D hand movements for reporting therapy-related kinematic metrics. This paper presents accuracy and repeatability tests of the HoloLens 2 in tracking hand movements, and its potential for shoulder rehabilitation assessment. Comparisons were made between index fingertip, palm, and wrist movements captured by the HoloLens 2 and an Aurora electromagnetic system, which was used as the ground truth. A mixed-reality environment was developed to capture static hand positions, as well as dynamic hand movements performed during a shoulder physiotherapy-based exercise. The tracking data were used to calculate several kinematic metrics. The results show that the HoloLens 2 hand-tracking system is accurate to within a median of 10.2 mm and has repeatability comparable to the Aurora system, with the palm exhibiting the best results. The HoloLens 2 data are suitable for computing kinematic metrics for shoulder rehabilitation assessment, achieving accuracies above 86.9% for all of the tested metrics. Metrics such as time-to-speed peak and the log dimensionless jerk were found to have significant differences between dynamic hand movements. These findings support the mixed reality technology potential to assist shoulder rehabilitation through immersive and interactive environments.
... This article reviews the systems of measurements that have been used to evaluate rehabilitation exercises depending on the data captured from motion sensors. The computational methods for evaluating physical therapy exercises are classified into discrete movement score, rulebased and template-based methods [38], [39]. Using discrete movement score techniques, each repetition of rehabilitation activities is classified into specific categories, such as correct or incorrect. ...
... While many visual techniques have been used in recent decades, large differences in anatomy, human occlusion, and changes in perspectives often limit the capacity of the proposed models to correctly assess the performance of an exercise. Sensing technology (apart from video) has made significant progress during the last decade, especially with low-power devices, wireless communication, high computational capacity, and data processing [11]. Wearable sensors can be integrated in clothes, strips, mobile devices, and smartwatches [12]. ...
Article
jats:sec> Background Balance rehabilitation programs represent the most common treatments for balance disorders. Nonetheless, lack of resources and lack of highly expert physiotherapists are barriers for patients to undergo individualized rehabilitation sessions. Therefore, balance rehabilitation programs are often transferred to the home environment, with a considerable risk of the patient misperforming the exercises or failing to follow the program at all. Holobalance is a persuasive coaching system with the capacity to offer full-scale rehabilitation services at home. Holobalance involves several modules, from rehabilitation program management to augmented reality coach presentation. Objective The aim of this study was to design, implement, test, and evaluate a scoring model for the accurate assessment of balance rehabilitation exercises, based on data-driven techniques. Methods The data-driven scoring module is based on an extensive data set (approximately 1300 rehabilitation exercise sessions) collected during the Holobalance pilot study. It can be used as a training and testing data set for training machine learning (ML) models, which can infer the scoring components of all physical rehabilitation exercises. In that direction, for creating the data set, 2 independent experts monitored (in the clinic) 19 patients performing 1313 balance rehabilitation exercises and scored their performance based on a predefined scoring rubric. On the collected data, preprocessing, data cleansing, and normalization techniques were applied before deploying feature selection techniques. Finally, a wide set of ML algorithms, like random forests and neural networks, were used to identify the most suitable model for each scoring component. Results The results of the trained model improved the performance of the scoring module in terms of more accurate assessment of a performed exercise, when compared with a rule-based scoring model deployed at an early phase of the system (k-statistic value of 15.9% for sitting exercises, 20.8% for standing exercises, and 26.8% for walking exercises). Finally, the resulting performance of the model resembled the threshold of the interobserver variability, enabling trustworthy usage of the scoring module in the closed-loop chain of the Holobalance coaching system. Conclusions The proposed set of ML models can effectively score the balance rehabilitation exercises of the Holobalance system. The models had similar accuracy in terms of Cohen kappa analysis, with interobserver variability, enabling the scoring module to infer the score of an exercise based on the collected signals from sensing devices. More specifically, for sitting exercises, the scoring model had high classification accuracy, ranging from 0.86 to 0.90. Similarly, for standing exercises, the classification accuracy ranged from 0.85 to 0.92, while for walking exercises, it ranged from 0.81 to 0.90. Trial Registration ClinicalTrials.gov NCT04053829; https://clinicaltrials.gov/ct2/show/NCT04053829 </jats:sec
... While many visual techniques have been used in recent decades, large differences in anatomy, human occlusion, and changes in perspectives often limit the capacity of the proposed models to correctly assess the performance of an exercise. Sensing technology (apart from video) has made significant progress during the last decade, especially with low-power devices, wireless communication, high computational capacity, and data processing [11]. Wearable sensors can be integrated in clothes, strips, mobile devices, and smartwatches [12]. ...
Article
Full-text available
Background Balance rehabilitation programs represent the most common treatments for balance disorders. Nonetheless, lack of resources and lack of highly expert physiotherapists are barriers for patients to undergo individualized rehabilitation sessions. Therefore, balance rehabilitation programs are often transferred to the home environment, with a considerable risk of the patient misperforming the exercises or failing to follow the program at all. Holobalance is a persuasive coaching system with the capacity to offer full-scale rehabilitation services at home. Holobalance involves several modules, from rehabilitation program management to augmented reality coach presentation. Objective The aim of this study was to design, implement, test, and evaluate a scoring model for the accurate assessment of balance rehabilitation exercises, based on data-driven techniques. Methods The data-driven scoring module is based on an extensive data set (approximately 1300 rehabilitation exercise sessions) collected during the Holobalance pilot study. It can be used as a training and testing data set for training machine learning (ML) models, which can infer the scoring components of all physical rehabilitation exercises. In that direction, for creating the data set, 2 independent experts monitored (in the clinic) 19 patients performing 1313 balance rehabilitation exercises and scored their performance based on a predefined scoring rubric. On the collected data, preprocessing, data cleansing, and normalization techniques were applied before deploying feature selection techniques. Finally, a wide set of ML algorithms, like random forests and neural networks, were used to identify the most suitable model for each scoring component. Results The results of the trained model improved the performance of the scoring module in terms of more accurate assessment of a performed exercise, when compared with a rule-based scoring model deployed at an early phase of the system (k-statistic value of 15.9% for sitting exercises, 20.8% for standing exercises, and 26.8% for walking exercises). Finally, the resulting performance of the model resembled the threshold of the interobserver variability, enabling trustworthy usage of the scoring module in the closed-loop chain of the Holobalance coaching system. Conclusions The proposed set of ML models can effectively score the balance rehabilitation exercises of the Holobalance system. The models had similar accuracy in terms of Cohen kappa analysis, with interobserver variability, enabling the scoring module to infer the score of an exercise based on the collected signals from sensing devices. More specifically, for sitting exercises, the scoring model had high classification accuracy, ranging from 0.86 to 0.90. Similarly, for standing exercises, the classification accuracy ranged from 0.85 to 0.92, while for walking exercises, it ranged from 0.81 to 0.90. Trial Registration ClinicalTrials.gov NCT04053829; https://clinicaltrials.gov/ct2/show/NCT04053829
... Emerging wearable technologies designed for movement monitoring and tele-rehabilitation typically consist of inertial measurement units (IMUs), step activity monitors [14], electromyography (EMG) and electrical muscle stimulation (EMS) sensors [15], and can operate in conjunction with virtual reality systems [16] and mobile phone applications [15]. With regards to the features and algorithms that accompany such systems, authors have previously relied on metrics measuring the duration of each exercise session, the number of the correctly performed repetitions of an exercise [14,17], and the exercise performance quality (e.g., a measure of "distance" of the data recorded by the patient from a specific baseline expressed in terms of root-mean square distance, norm of jerk or log-likelihood [18]). ...
Article
Full-text available
Background The benefits to be obtained from home-based physical therapy programmes are dependent on the proper execution of physiotherapy exercises during unsupervised treatment. Wearable sensors and appropriate movement-related metrics may be used to determine at-home exercise performance and compliance to a physical therapy program. Methods A total of thirty healthy volunteers (mean age of 31 years) had their movements captured using wearable inertial measurement units (IMUs), after video recordings of five different exercises with varying levels of complexity were demonstrated to them. Participants were then given wearable sensors to enable a second unsupervised data capture at home. Movement performance between the participants’ recordings was assessed with metrics of movement smoothness, intensity, consistency and control. Results In general, subjects executed all exercises similarly when recording at home and as compared with their performance in the lab. However, participants executed all movements faster compared to the physiotherapist’s demonstrations, indicating the need of a wearable system with user feedback that will set the pace of movement. Conclusion In light of the Covid-19 pandemic and the imperative transition towards remote consultation and tele-rehabilitation, this work aims to promote new tools and methods for the assessment of adherence to home-based physical therapy programmes. The studied IMU-derived features have shown adequate sensitivity to evaluate home-based programmes in an unsupervised manner. Cost-effective wearables, such as the one presented in this study, can support therapeutic exercises that ought to be performed with appropriate speed, intensity, smoothness and range of motion.
Article
Sports physiotherapists and coaches are tasked with evaluating the movement quality of athletes across the spectrum of ability and experience. However, the accuracy of visual observation is low and existing technology outside of expensive lab-based solutions has limited adoption, leading to an unmet need for an efficient and accurate means to measure static and dynamic joint angles during movement, converted to movement metrics useable by practitioners. This paper proposes a set of pose landmarks for computing frequently used joint angles as metrics of interest to sports physiotherapists and coaches in assessing common strength-building human exercise movements. It then proposes a set of rules for computing these metrics for a range of common exercises (single and double drop jumps and counter-movement jumps, deadlifts and various squats) from anatomical key-points detected using video, and evaluates the accuracy of these using a published 3D human pose model trained with ground truth data derived from VICON motion capture of common rehabilitation exercises. Results show a set of mathematically defined metrics which are derived from the chosen pose landmarks, and which are sufficient to compute the metrics for each of the exercises under consideration. Comparison to ground truth data showed that root mean square angle errors were within 10° for all exercises for the following metrics: shin angle, knee varus/valgus and left/right flexion, hip flexion and pelvic tilt, trunk angle, spinal flexion lower/upper/mid and rib flare. Larger errors (though still all within 15°) were observed for shoulder flexion and ASIS asymmetry in some exercises, notably front squats and drop-jumps. In conclusion, the contribution of this paper is that a set of sufficient key-points and associated metrics for exercise assessment from 3D human pose have been uniquely defined. Further, we found generally very good accuracy of the Strided Transformer 3D pose model in predicting these metrics for the chosen set of exercises from a single mobile device camera, when trained on a suitable set of functional exercises recorded using a VICON motion capture system. Future assessment of generalization is needed.
Preprint
Full-text available
Background: The benefits to be obtained from home-based physical therapy programmes are dependent on the proper execution of physiotherapy exercises during unsupervised treatment. Wearable sensors and appropriate movement-related metrics may be used to determine at-home exercise performance and compliance to a physical therapy program. Methods: A total of thirty healthy volunteers (mean age of 31 years) had their movements captured using wearable inertial measurement units (IMUs), after video recordings of five different exercises with varying levels of complexity were demonstrated to them. Participants were then given wearable sensors to enable a second unsupervised data capture at home. Movement performance between the participants’ recordings was assessed with metrics of movement smoothness, intensity, consistency and control. Results: In general, subjects executed all exercises similarly when recording at home and as compared with their performance in the lab. However, participants executed all movements faster compared to the physiotherapist’s demonstrations, indicating the need of a wearable system with user feedback that will set the pace of movement. Conclusion: In light of the Covid-19 pandemic and the imperative transition towards remote consultation and tele-rehabilitation, this work aims to promote new tools and methods for the assessment of adherence to home-based physical therapy programmes. The studied IMU-derived features have shown adequate sensitivity to evaluate home-based programmes in an unsupervised manner. Cost-effective wearables, such as the one presented in this study, can support therapeutic exercises that ought to be performed with appropriate speed, intensity, smoothness and range of motion.
Article
Full-text available
Objective: The objective of the proposed research is to develop a methodology for modeling and evaluation of human motions, which will potentially benefit patients undertaking a physical rehabilitation therapy (e.g., following a stroke or due to other medical conditions). The ultimate aim is to allow patients to perform home-based rehabilitation exercises using a sensory system for capturing the motions, where an algorithm will retrieve the trajectories of a patient's exercises, will perform data analysis by comparing the performed motions to a reference model of prescribed motions, and will send the analysis results to the patient's physician with recommendations for improvement. Methods: The modeling approach employs an artificial neural network, consisting of layers of recurrent neuron units and layers of neuron units for estimating a mixture density function over the spatio-temporal dependencies within the human motion sequences. Input data are sequences of motions related to a prescribed exercise by a physiotherapist to a patient, and recorded with a motion capture system. An autoencoder subnet is employed for reducing the dimensionality of captured sequences of human motions, complemented with a mixture density subnet for probabilistic modeling of the motion data using a mixture of Gaussian distributions. Results: The proposed neural network architecture produced a model for sets of human motions represented with a mixture of Gaussian density functions. The mean log-likelihood of observed sequences was employed as a performance metric in evaluating the consistency of a subject's performance relative to the reference dataset of motions. A publically available dataset of human motions captured with Microsoft Kinect was used for validation of the proposed method. Conclusion: The article presents a novel approach for modeling and evaluation of human motions with a potential application in home-based physical therapy and rehabilitation. The described approach employs the recent progress in the field of machine learning and neural networks in developing a parametric model of human motions, by exploiting the representational power of these algorithms to encode nonlinear input-output dependencies over long temporal horizons.
Article
Full-text available
Melanoma is amongst most aggressive types of cancer. However, it is highly curable if detected in its early stages. Prescreening of suspicious moles and lesions for malignancy is of great importance. Detection can be done by images captured by standard cameras, which are more preferable due to low cost and availability. One important step in computerized evaluation of skin lesions is accurate detection of lesion region, i.e. segmentation of an image into two regions as lesion and normal skin. Accurate segmentation can be challenging due to burdens such as illumination variation and low contrast between lesion and healthy skin. In this paper, a method based on deep neural networks is proposed for accurate extraction of a lesion region. The input image is preprocessed and then its patches are fed to a convolutional neural network (CNN). Local texture and global structure of the patches are processed in order to assign pixels to lesion or normal classes. A method for effective selection of training patches is used for more accurate detection of a lesion border. The output segmentation mask is refined by some post processing operations. The experimental results of qualitative and quantitative evaluations demonstrate that our method can outperform other state-of-the-art algorithms exist in the literature.
Article
Full-text available
Radiotherapy is one of the main ways head and neck cancers are treated; radiation is used to kill cancerous cells and prevent their recurrence. Complex treatment planning is required to ensure that enough radiation is given to the tumour, and little to other sensitive structures (known as organs at risk) such as the eyes and nerves which might otherwise be damaged. This is especially difficult in the head and neck, where multiple at-risk structures often lie in extremely close proximity to the tumour. It can take radiotherapy experts four hours or more to pick out the important areas on planning scans (known as segmentation). This research will focus on applying machine learning algorithms to automatic segmentation of head and neck planning computed tomography (CT) and magnetic resonance imaging (MRI) scans at University College London Hospital NHS Foundation Trust patients. Through analysis of the images used in radiotherapy DeepMind Health will investigate improvements in efficiency of cancer treatment pathways.
Article
Full-text available
There are almost two million people in the United Kingdom living with sight loss, including around 360,000 people who are registered as blind or partially sighted. Sight threatening diseases, such as diabetic retinopathy and age related macular degeneration have contributed to the 40% increase in outpatient attendances in the last decade but are amenable to early detection and monitoring. With early and appropriate intervention, blindness may be prevented in many cases. Ophthalmic imaging provides a way to diagnose and objectively assess the progression of a number of pathologies including neovascular (“wet”) age-related macular degeneration (wet AMD) and diabetic retinopathy. Two methods of imaging are commonly used: digital photographs of the fundus (the ‘back’ of the eye) and Optical Coherence Tomography (OCT, a modality that uses light waves in a similar way to how ultrasound uses sound waves). Changes in population demographics and expectations and the changing pattern of chronic diseases creates a rising demand for such imaging. Meanwhile, interrogation of such images is time consuming, costly, and prone to human error. The application of novel analysis methods may provide a solution to these challenges. This research will focus on applying novel machine learning algorithms to automatic analysis of both digital fundus photographs and OCT in Moorfields Eye Hospital NHS Foundation Trust patients. Through analysis of the images used in ophthalmology, along with relevant clinical and demographic information, Google DeepMind Health will investigate the feasibility of automated grading of digital fundus photographs and OCT and provide novel quantitative measures for specific disease features and for monitoring the therapeutic success.
Article
Full-text available
Drug toxicity is frequently caused by electrophilic reactive metabolites that covalently bind to proteins. Epoxides comprise a large class of three-membered cyclic ethers. These molecules are electrophilic and typically highly reactive due to ring tension and polarized carbon-oxygen bonds. Epoxides are metabolites often formed by cytochromes P450 acting on aromatic or double bonds. The specific location on a molecule that undergoes epoxidation is its site of epoxidation (SOE). Identifying a molecule’s SOE can aid in interpreting adverse events related to reactive metabolites and direct modification to prevent epoxidation for safer drugs. This study utilized a database of 702 epoxidation reactions to build a model that accurately predicted sites of epoxidation. The foundation for this model was an algorithm originally designed to model sites of cytochromes P450 metabolism (called XenoSite) that was recently applied to model the intrinsic reactivity of diverse molecules with glutathione. This modeling algorithm systematically and quantitatively summarizes the knowledge from hundreds of epoxidation reactions with a deep convolution network. This network makes predictions at both an atom and molecule level. The final epoxidation model constructed with this approach identified SOEs with 94.9% area under the curve (AUC) performance and separated epoxidized and non-epoxidized molecules with 79.3% AUC. Moreover, within epoxidized molecules, the model separated aromatic or double bond SOEs from all other aromatic or double bonds with AUCs of 92.5% and 95.1%, respectively. Finally, the model separated SOEs from sites of sp2 hydroxylation with 83.2% AUC. Our model is the first of its kind and may be useful for the development of safer drugs. The epoxidation model is available at http://swami.wustl.edu/xenosite.
Article
Full-text available
Introduction Tele-rehabiliation technologies that track human motion could enable physical therapy in the home. To be effective, these systems need to collect critical metrics without PT supervision both in real time and in a store and forward capacity. The first step of this process is to determine if PTs (PTs) are able to accurately assess the quality and quantity of an exercise repetition captured by a tele-rehabilitation platform. The purpose of this pilot project was to determine the level of agreement of quality and quantity of an exercise delivered and assessed by the Virtual Exercise Rehabilitation Assistant (VERA), and seven PTs. Methods Ten healthy subjects were instructed by a PT in how to perform four lower extremity exercises. Subjects then performed each exercises delivered by VERA which counted repetitions and quality. Seven PTs independently reviewed video of each subject’s session and assessed repetitions quality. The percent difference in total repetitions and analysis of the distribution of rating repetition quality was assessed between the VERA and PTs. Results The VERA counted 426 repetitions across 10 subjects performing the four different exercises while the mean repetition count from the PT panel was 426.7 (SD = 0.8). The VERA underestimated the total repetitions performed by 0.16% (SD = 0.03%, 95% CI 0.12 – 0. 22). Chi square analysis across raters was χ² = 63.17 (df = 6, p<.001), suggesting significant variance in at least one rater. Conclusion The VERA count of repetitions was accurate in comparison to a seven member panel of PTs. For exercise quality the VERA was able to rate 426 exercise repetitions across 10 patients and four different exercises in a manner consistent with five out of seven experienced PTs.
Article
Invasive brain cancer cells cannot be visualized during surgery and so they are often not removed. These residual cancer cells give rise to recurrences. In vivo Raman spectroscopy can detect these invasive cancer cells in patients with grade 2 to 4 gliomas. The robustness of this Raman signal can be dampened by spectral artifacts generated by lights in the operating room. We found that artificial neural networks (ANNs) can overcome these spectral artifacts using nonparametric and adaptive models to detect complex nonlinear spectral characteristics. Coupling ANN with Raman spectroscopy simplifies the intraoperative use of Raman spectroscopy by limiting changes required to the standard neurosurgical workflow. The ability to detect invasive brain cancer under these conditions may reduce residual cancer remaining after surgery and improve patient survival. © 2016 Society of Photo-Optical Instrumentation Engineers (SPIE).
Article
The current paradigm of robot-assisted surgeries (RASs) depends entirely on an individual surgeon's manual capability. Autonomous robotic surgery - removing the surgeon's hands - promises enhanced efficacy, safety, and improved access to optimized surgical techniques. Surgeries involving soft tissue have not been performed autonomously because of technological limitations, including lack of vision systems that can distinguish and track the target tissues in dynamic surgical environments and lack of intelligent algorithms that can execute complex surgical tasks. We demonstrate in vivo supervised autonomous soft tissue surgery in an open surgical setting, enabled by a plenoptic three-dimensional and near-infrared fluorescent (NIRF) imaging system and an autonomous suturing algorithm. Inspired by the best human surgical practices, a computer program generates a plan to complete complex surgical tasks on deformable soft tissue, such as suturing and intestinal anastomosis. We compared metrics of anastomosis - including the consistency of suturing informed by the average suture spacing, the pressure at which the anastomosis leaked, the number of mistakes that required removing the needle from the tissue, completion time, and lumen reduction in intestinal anastomoses - between our supervised autonomous system, manual laparoscopic surgery, and clinically used RAS approaches. Despite dynamic scene changes and tissue movement during surgery, we demonstrate that the outcome of supervised autonomous procedures is superior to surgery performed by expert surgeons and RAS techniques in ex vivo porcine tissues and in living pigs. These results demonstrate the potential for autonomous robots to improve the efficacy, consistency, functional outcome, and accessibility of surgical techniques.
Article
Increases in throughput and installed base of biomedical research equipment led to a massive accumulation of -omics data known to be highly variable, high-dimensional, and sourced from multiple often incompatible data platforms. While this data may be useful for biomarker identification and drug discovery, the bulk of it remains underutilized. Deep neural networks (DNNs) are efficient algorithms based on the use of compositional layers of neurons, with advantages well matched to the challenges -omics data presents. While achieving state-of-the-art results and even surpassing human accuracy in many challenging tasks, the adoption of deep learning in biomedicine has been comparatively slow. Here, we discuss key features of deep learning that may give this approach an edge over other machine learning methods. We then consider limitations and review a number of applications of deep learning in biomedical studies demonstrating proof of concept and practical utility.
Article
Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: (i) a deep belief network, and (ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multilayer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy - as that depends on the specific application - but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.