Research ProposalPDF Available

Neural networks to quantify and improve the progress of body movement in rhythmic exercises

Authors:

Abstract and Figures

In this project a framework will be designed capable of interpreting and visualising rhythmic movement. The visualisation is informed by the performer and acts as biofeedback to improve movement. The two questions are (1) whether a computational model can extract rhythm from people with a variety of movement types and skill and (2) translate these into interpretable visualisations. Neural nets will be used to extract these high-level features from movement data and visualise through intuitive interaction. This integrates into the research of rhythm in human motion and visualisations through informatics and human cognition at RITMO.
Content may be subject to copyright.
Neural networks to quantify and improve the
progress of body movement in rhythmic exercises
Abstract
In this project a framework will be designed capable of interpreting
and visualising rhythmic movement. The visualisation is informed by
the performer and acts as biofeedback to improve movement. The two
questions are (1) whether a computational model can extract rhythm from
people with a variety of movement types and skill and (2) translate these
into interpretable visualisations. Neural nets will be used to extract these
high-level features from movement data and visualise through intuitive
interaction. This integrates into the research of rhythm in human motion
and visualisations through informatics and human cognition at RITMO.
1 Background and theoretical framework
When a trainer teaches rhythmic movements on music, over time students will
learn to execute them. In this process the trainer identifies difficulties and gives
feedback like more fluidity in the chest and hold the rhythm. This is vague and
requires a high level of body awareness to know how to act on it. Automated,
continuous and more personal feedback is helpful therefore.
In a recent survey on the literature of metrics for evaluation of patient per-
formance in physical therapy they compare a predefined set of movement with
the patients execution of it [8]. Measuring the rhythm in human motion is not as
extensively studied as in music. However, There do exist methods to formalise
human movement such as the Laban Movement Analysis (LMA). In Burton et
al. LMA has been used as low-dimensional representation to derive a model
for affective motion [2]. Human motion can be expressed in limited dimensions,
but there is no silver bullet. Additionally, human motion evolves and all its
varieties cannot be perfectly captured by a predefined framework. Therefore,
instead of using a static representation, motion should be represented by an
adaptive model.
Adaptive models like artificial neural networks (ANNs) have been used to mea-
sure improvement after exercises [7]. ANNs are used for motion dimensionality
reduction and methods based upon this achieve state of the art performance in
1
motion prediction, classification and synthesis [3]. The human body is a hier-
archical structure where the movement of joints within a limb correlate more
and those from different limbs less. Motion can be periodic as in walking but
many are more complex and highly aperiodic. The periodicity of motion can
be on different time scales and between different limbs. Therefore architectures
particularly suited for modelling motion data leverage both its hierarchical and
periodic characteristics in space and time, for example hierarchical recurrent
neural networks (HRNN) [4].
Every person has a unique motion pattern and direct metrics cannot offer inter-
pretable feedback as is given by a trainer. A person is to varying degree aware
of his/her progress, but an observer giving targeted feedback, can accelerate the
improvement. The visualisation of low dimensional features from movement can
act as an indirect metric. These would be interpreted by a human, leveraging
his/her body awareness and operate as biofeedback during exercises.
The design of a visualisation for the motion is partly subjective and not easily
formalised. In the domain of procedural content generation (PCG) interactive
algorithms have been proposed as a method to automate only a part of the de-
sign process and put the subjective decisions in the hand of the user [1]. Thus
a persons body awareness will be additionally leveraged by allowing him/her
to influence the design of the adaptive visualisation during the exercises. The
aforementioned algorithms have been shown exceptionally effective when mul-
tiple different user can share their own and work upon other’s results in the
process
2 Main objective, research questions and hy-
pothesis
The goal of the project is to quantify the progress of body movement due to
rhythmic exercises executed on music. The quantification is done through an
indirect and interactively designed visualisation of a persons movement, which
acts as biofeedback to increase body awareness and progress in the exercises.
The research questions are: (1) Can a neural net robustly extract features re-
lating to aspects of motion such as rhythm, fluidity and intensity which can be
generalised over multiple people and types of motion. (2) Can the visualisation
be interpreted by a human observer and relevant feedback be deduced to im-
prove its movement.
I expect that the visualisation will create a feedback loop with the observed
user. This loop will greatly improve the learning curve for training rhythm
movements without the need for an observing trainer. By using different visual-
isations it will be possible to evaluate the impact of exercises in a more concise
2
and formal way than manually by human experts.
The models interpreting different characteristics of motion and the visualisa-
tion designed through user interaction, will help advance research towards a
generalised framework for representing and understanding human motion. This
is aligned with RITMOs interest on rhythm in human motion and visualisations
through informatics and human cognition.
3 Method
The neural network that extracts features from motion capture data, called
motion interpreter (MI), will be trained on existing datasets. I will gather addi-
tional motion data from amateurs and professional performers such as dancers
and martial artists. This variety of data will allow the model to generalise over
different types of movements and performers. From this data several neural
network architectures, that make different assumptions about time and space,
will be evaluated. The models will be evaluated for different machine learning
problems such as motion prediction and classification. I have experience with
applying deep learning on motion and sensor data in the context of my research
position at the university of Ghent.
Skeleton data will be sensed in real-time from the user with wearable motion
capture sensors such as Xsens. Motion features will be extracted with the afore-
mentioned MI and used as parameters for the visual generation (VG). For the
design of the VG, interactive machine learning algorithms will be leveraged. The
interaction required from the performer will be made intuitive, to interfere as lit-
tle as possible with the training. I did my master thesis on PCG which included
research on the interactive generation of images and animations. Additionally,
in personal projects I experimented with different visualisation techniques using
deep learning.
Part of the research is practice-based. To improve, test and evaluate the system
I will design a training scheme containing a several exercises with accompanying
music. The focus will be on training of rhythm, but, for comparison, exercises
on alignment, fluency and energy will be included. Since my youth I have prac-
ticed many different sports. From the age of 14 to 17 I attended a sport school,
training for handball more than 20 hours a week. There I got a first taste of
how focus, boredom and fun had a specific relation to the training experience
and bodily improvements. These last years I studied the body/mind connection
further, through practices as yoga, aikido, contemporary dance and recently
rhythm/vocal training for theater based on Eugenio Barba’s theory. Currently
I am involved as content creator and instructor in an artistic project tentatively
called Smart Gym. It has the political and social motivation to redefine how
learning and training can function both mentally and physically. I purposefully
seek this variety and document it, because I want to understand the subtle over-
3
Figure 1: Overview of the data-flow between components in the methodology
laps between them. To deepen my embodied understanding I research how to
integrate their core values into my daily practice.
4 Progress plan
Semester 1:
Literature study on existing movement analysis frameworks such as LMA.
First iteration of the training scheme.
Build a prototype, by training training the MI on existing motion databases
such as human3.6 [5] and KIT [6], and start to implement interactive vi-
sualisation.
Semester 2:
Participate in professional training to get feedback on my training scheme
and accompanying music
Test prototype and training scheme on amateur performers, accompanied
by an expert, such as a certified movement analyst, interpreting the vi-
sualisation to compare with a non-computational movement analysis such
as LMA
Semester 3:
Use data from previous test to improve the MI
Evaluate and improve whole system based on expert feedback
Experiment with types of interaction in the VG that wont interfere with
the exercises such that the assistance of an external observer is no longer
required
4
Semester 4:
Test current iteration of system on professional performers
Evaluate whether the system integrates in performers practice
Evaluate whether the visualisation can help to differentiate between types
of performers and exercises
Semester 5:
Implement online platform to gather data in a distributed way
Distribute the system to for longer periods of use to evaluate the perfor-
mance of the system over longer time
Semester 6:
Write thesis on gathered results
Find channels to communicate the concluded project, e.g. open-source or
workshops
References
[1] Picbreeder: a case study in collaborative evolutionary exploration of design
space. Evol. Comput., 19(3):373–403, 2011.
[2] Sarah Jane Burton, Ali-akbar Samadani, Rob Gorbet, and Dana Kuli. Dance
Notations and Robot Motion. 111, 2016.
[3] Judith B¨utepage, Michael Black, Danica Kragic, and Hedvig Kjellstr¨om.
Deep representation learning for human motion prediction and classification.
arXiv, 2017.
[4] Yong Du, Wei Wang, and Liang Wang. Hierarchical Recurrent Neural Net-
work for Skeleton Based Action Recognition. Comput. Vis. Pattern Recog-
nit., 2015.
[5] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu.
Human3.6m: Large scale datasets and predictive methods for 3d human
sensing in natural environments. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 36(7):1325–1339, jul 2014.
[6] Matthias Plappert, Christian Mandery, and Tamim Asfour. The KIT
motion-language dataset. Big Data, 4(4):236–252, dec 2016.
[7] A Vakanski, J M Ferguson, and S Lee. Mathematical Modeling and Evalua-
tion of Human Motions in Physical Therapy using Mixture Density Neural
Networks. J. Physiother. Phys. Rehabil., 1(4):1–10, 2016.
5
[8] Aleksandar Vakanski, Jake M Ferguson, and Stephen Lee. Metrics for Per-
formance Evaluation of Patient Exercises during Physical Therapy. Int. J.
Phys. Med. Rehabil., 05(03), 2017.
6
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Objective: The article proposes a set of metrics for evaluation of patient performance in physical therapy exercises. Methods: Taxonomy is employed that classifies the metrics into quantitative and qualitative categories, based on the level of abstraction of the captured motion sequences. Further, the quantitative metrics are classified into model-less and model-based metrics, in reference to whether the evaluation employs the raw measurements of patient performed motions, or whether the evaluation is based on a mathematical model of the motions. The reviewed metrics include root-mean square distance, Kullback Leibler divergence, log-likelihood, heuristic consistency, Fugl-Meyer Assessment, and similar. Results: The metrics are evaluated for a set of five human motions captured with a Kinect sensor. Conclusion: The metrics can potentially be integrated into a system that employs machine learning for modelling and assessment of the consistency of patient performance in home-based therapy setting. Automated performance evaluation can overcome the inherent subjectivity in human performed therapy assessment, and it can increase the adherence to prescribed therapy plans, and reduce healthcare costs.
Article
Full-text available
Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.
Article
Full-text available
Objective: The objective of the proposed research is to develop a methodology for modeling and evaluation of human motions, which will potentially benefit patients undertaking a physical rehabilitation therapy (e.g., following a stroke or due to other medical conditions). The ultimate aim is to allow patients to perform home-based rehabilitation exercises using a sensory system for capturing the motions, where an algorithm will retrieve the trajectories of a patient's exercises, will perform data analysis by comparing the performed motions to a reference model of prescribed motions, and will send the analysis results to the patient's physician with recommendations for improvement. Methods: The modeling approach employs an artificial neural network, consisting of layers of recurrent neuron units and layers of neuron units for estimating a mixture density function over the spatio-temporal dependencies within the human motion sequences. Input data are sequences of motions related to a prescribed exercise by a physiotherapist to a patient, and recorded with a motion capture system. An autoencoder subnet is employed for reducing the dimensionality of captured sequences of human motions, complemented with a mixture density subnet for probabilistic modeling of the motion data using a mixture of Gaussian distributions. Results: The proposed neural network architecture produced a model for sets of human motions represented with a mixture of Gaussian density functions. The mean log-likelihood of observed sequences was employed as a performance metric in evaluating the consistency of a subject's performance relative to the reference dataset of motions. A publically available dataset of human motions captured with Microsoft Kinect was used for validation of the proposed method. Conclusion: The article presents a novel approach for modeling and evaluation of human motions with a potential application in home-based physical therapy and rehabilitation. The described approach employs the recent progress in the field of machine learning and neural networks in developing a parametric model of human motions, by exploiting the representational power of these algorithms to encode nonlinear input-output dependencies over long temporal horizons.
Article
Full-text available
Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, therefore, propose the Karlsruhe Institute of Technology (KIT) Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our data set using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our data set or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting data set, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our data set an excellent choice that enables more transparent and comparable research in this important area.
Article
We introduce a new dataset, Human3.6M, of 3.6 Million 3D Human poses, acquired by recording the performance of 11 subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models. Besides increasing the size the current state of the art datasets by several orders of magnitude, we aim to complement such datasets with a diverse set of poses encountered in typical human activities (taking photos, posing, greeting, eating, etc.), with synchronized image, motion capture and depth data, and with accurate 3D body scans of all subjects involved. We also provide mixed reality videos where 3D human models are animated using motion capture data and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide large scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. The dataset and code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, are available online at http://vision.imar.ro/human3.6m.
Picbreeder: a case study in collaborative evolutionary exploration of design space
Picbreeder: a case study in collaborative evolutionary exploration of design space. Evol. Comput., 19(3):373-403, 2011.
Dance Notations and Robot Motion
  • Sarah Jane Burton
  • Ali-Akbar Samadani
  • Rob Gorbet
  • Dana Kuli
Sarah Jane Burton, Ali-akbar Samadani, Rob Gorbet, and Dana Kuli. Dance Notations and Robot Motion. 111, 2016.