Conference PaperPDF Available

SkillVis: A Visualization Tool for Boxing Skill Assessment


Abstract and Figures

Motion analysis and visualization are crucial in sports science for sports training and performance evaluation. While primitive computational methods have been proposed for simple analysis such as postures and movements, few can evaluate the high-level quality of sports players such as their skill levels and strategies. We propose a visualization tool to help visualizing boxers' motions and assess their skill levels. Our system automatically builds a graph-based representation from motion capture data and reduces the dimension of the graph onto a 3D space so that it can be easily visualized and understood. In particular, our system allows easy understanding of the boxer's boxing behaviours, preferred actions, potential strength and weakness. We demonstrate the effectiveness of our system on different boxers' motions. Our system not only serves as a tool for visualization, it also provides intuitive motion analysis that can be further used beyond sports science.
Content may be subject to copyright.
SkillVis: A Visualization Tool for Boxing Skill Assessment
Hubert P. H. Shum
Northumbria University
He Wang
University of Leeds
Edmond S. L. Ho
Hong Kong Baptist University
Taku Komura§
Edinburgh University
Figure 1: Our system visualizes high-level boxing skills such as the richness of actions and the transition of them with a graph structure. A
graph representation is employed with the characters of different sizes representing nodes with different sizes and edges representing groups
of actions.
Motion analysis and visualization are crucial in sports science for
sports training and performance evaluation. While primitive com-
putational methods have been proposed for simple analysis such as
postures and movements, few can evaluate the high-level quality of
sports players such as their skill levels and strategies. We propose
a visualization tool to help visualizing boxers’ motions and assess
their skill levels. Our system automatically builds a graph-based
representation from motion capture data and reduces the dimension
of the graph onto a 3D space so that it can be easily visualized and
understood. In particular, our system allows easy understanding of
the boxer’s boxing behaviours, preferred actions, potential strength
and weakness. We demonstrate the effectiveness of our system on
different boxers’ motions. Our system not only serves as a tool for
visualization, it also provides intuitive motion analysis that can be
further used beyond sports science.
Keywords: Motion Graph, Information Visualization, Dimension-
ality Reduction
Concepts: Computing methodologies Animation;
1 Introduction
Computers have been used in the fields of sports and health science
to record and improve the performance of both amateur and pro-
fessional athletes. There are computer-managed weight lifting ma-
chines and treadmills recording energy consumption or repetition
achieved in every sports club. In the attempt to assist more profes-
sional sports activities, some researchers have used the virtual real-
ity technology to create training systems in baseball [Komura et al.
2002], handball [Bideau et al. 2003] and tennis [Molet et al. 1999].
Nevertheless, the analysis of motions done in these technologies
are usually on the low level: recording the timing of basic motions
e-mail:, the corresponding author
or comparing the trajectories with those by better players. On the
other hand, human experts are good at evaluating performances by
comprehensively capturing different features in the motions. For in-
stance, they consider not only the instantaneous movements of the
athlete but also the variety of motions and the ability of transition
from one motion to another.
Consider boxing for example. Professional boxers are trained first
on some basic postures such as defense, stepping and attack, thread-
ing through which are the transitions carried out by the boxer based
on the strategy and the opponent in a match. A good boxer can
carry out a variety of transitions at will to achieve the best outcome.
Such information serves as indicators for assessing the skill level of
a player and the same principle applies to many other sports such
as tennis, fencing, etc. Unfortunately, there is a research gap in
evaluating the motions of the players from a higher level point of
In this paper, we propose a robust method to visualize the skill
level of a boxer’s performance in terms of flexibility and richness
of his/her motions. To begin with, we capture the training of a
player, in which he/she moves alone in an open space and imag-
ines to interact with an opponent. In boxing, this kind of training is
known as shadow boxing. An experienced coach can easily assess
the player’s skills by watching the shadowing boxing motions. It is
considered very important not only for training the skills, but also to
warm up the body and get ready for further training with other play-
ers. This method greatly reduces the complexity for motion capture
due to occlusion and collision, and has shown to be effective in our
system for detailed player’s skill evaluation.
We use the techniques and data structures in computer animation
to process the motion data. The captured motion data are first au-
tomatically segmented into shorter clips with meaningful contexts
and categorized into groups. Next, our system automatically gen-
erates a hierarchical motion graph structure known as Fat Graph,
which uses nodes representing the postures of the body and edges
representing the motion groups. With dimensionality reduction
techniques, this Fat Graph can be visualized on a 3D space to eval-
uate the performance of the player. The transition capability of the
player are visualized by the connectivity of the nodes, where the
richness and preference of the motions is visualized by the edges
in the graph. With the proposed algorithm, it is easy to identify the
performance quality and potential problems of a player.
We demonstrate how we can easily evaluate the skills of different
boxers with our visualization system, as shown in Figure 1. While
we use boxing as our target sport in this paper due to its complex
moves and strategic nature, our system can be applied to most activ-
ities that require swiftness, flexibility and creativity, such as tennis,
fencing and basketball.
The rest of this paper is organized as follows. Section 2 discusses
other related works about motion analysis and dimensionality re-
duction. Section 3 provides an overview of our system. In Section 4
and 5, we explain the algorithms to organize captured motion with
a graph structure, and visualize generated graph. Related experi-
ments can be found in Section 6. We conclude and provide some
discussion for our proposed method in Section 7.
2 Related Work
Visualization of the skills of athletes, and hence helping them to
improve, is a field that has not been fullly explored in the field of
sports science. Existing research [Yeadon 1990; Yeadon ] mainly
focuses on how a motion will appear when parameters of the mo-
tion or the body are changed. For example, Yeadon [Yeadon 1990;
Yeadon ] has done research on how the diving and somersaults mo-
tions change when the motions are launched at different timings by
using physical simulation. Although such tools are useful for the
players to interactively visualize possible results under different pa-
rameters, they can only evaluate the performance of the sports that
do not require complex maneuvers and strategies, such as jumping,
high jumping, sky jumping, or somersaults.
In many sports games, the performance depends not only on phys-
ical factors such as velocity, power and strength, but also on flex-
ibility to switch from one motion to another and richness of the
player’s motions. This high-level information has not been used to
visualize the skills of the player in previous research. In this re-
search, we combine the approaches of motion graph [Arikan and
Forsyth 2002; Lee et al. 2002; Kovar et al. 2002; Lau and Kuffner
2005; Kwon and Shin 2005] and dimensionality reduction [Gro-
chow et al. 2004; Shin and Lee 2006] to visualize high-level skills
information of the athletes for the skill assessments.
Statistical approaches for analyzing the connectivity of different
movements have been developed in the area of computer anima-
tion and pattern recognition. The Motion Graph approach [Arikan
and Forsyth 2002; Lee et al. 2002; Kovar et al. 2002; Min and Chai
2012; Beaudoin et al. 2008; Arikan et al. 2003; Li et al. 2002] is
a method to interactively reproduce continuous motions based on
a graph generated from captured motion data. Reitsma and Pol-
lard [2007] compared different motion graph techniques compre-
hensively. Heck et al. [2007] further parametrized the motion space
to control how the motions are being generated by blending samples
in the motion graph. Such an approach can be used for interactive
character control such as those in computer games. When it comes
to graph construction, [Min and Chai 2012; Beaudoin et al. 2008]
are the most similar ones to our method. Min et al. [2012] grouped
similar postures and transitions into nodes and edges. Their focus
is the motion variety of synthesized motions so they used genera-
tive models to fit the posture and motion data. Our focus is about
skill visualization through the analysis of postures and motions so
we can afford simpler and faster methods for analysis. Beaudoin
et al. [2008] cluster postures first then find motion motifs by con-
verting the motion matching task into a string matching problem.
Their priority is to find motifs that are representative while our fo-
cus is to visualize motion details and statistics to help people to
assess the skills. Xia et al.[2015] constructed a series of local mix-
tures of autoregressive models (MAR) for modeling the style vari-
ations among different motions for real-time style transfer. They
demonstrated style-rich motions can be generated by combining
their method and motion graph.
Since the Motion Graph produces a lot of edges and nodes with-
out any context, it becomes difficult to control generated motion as
the user wishes. Safonova and Hodgins [2007] optimized the graph
structure by combining motion graph and interpolation techniques
to improve the performance. On the other hand, works to resolve
this problem by introducing a hierarchical structure have been pro-
posed in [Lau and Kuffner 2005; Kwon and Shin 2005; Shin and Oh
2006]. These approaches add topological structures into the contin-
uous unstructured data so that the motion synthesis can be done at
a higher level. In a sport like boxing, it is possible to create a mo-
tion graph of semantic actions such as attack and defence, which is
known as the action-level motion graph [Shum et al. 2008; Shum
et al. 2012]. A recent work by Hyun et al. [2016] proposed Mo-
tion Grammars to specify how character animations are generated
by high-level symbolic description. Such an approach can be used
with existing animation systems which are built based on motion
graphs. Ho and Komura [2011] built a finite state machine (FSM)
based on Topology Coordinates [Ho and Komura 2009] for synthe-
sizing two-character close interactions. The sparse graph structure
can be used for controlling the movement of virtual wrestlers in
computer games. The purpose of these approaches, however, is
motion generation rather than the visualization of the player’s skill.
In our research, we adapted a hierarchical motion graph structure
called the Fat Graph [Shin and Oh 2006] on the action level to an-
alyze the connectivity and the variety of a captured motion set. In
a fat graph, similar nodes are grouped together as fat nodes, and
similar edges are grouped as fat edges, allowing better organiza-
tion of motion data. The filtered motion graph is a variation of
the Fat Graph, in which temporal relationship between poses are
considered [Plantard et al. 2016b]. Such a structure, however, is
targeted for motion reconstruction and analysis instead of visual-
ization [Plantard et al. 2016a].
Dimensionality reduction methods have been proposed to visualize
the overall structure of captured motions. For example, Grochow et
al [Grochow et al. 2004] proposed a method to project the 3D mo-
tions of the human onto a 2D plane, and further reconstruct 3D mo-
tions by mapping arbitrary points from the 2D plane back onto 3D
joint space. PCA [Shin and Lee 2006] and ISOMAP [Tenenbaum
et al. ; Shum et al. 2010] are proposed to map the motions onto
2D planes. Due to the high variation of human motion, local PCA
that considers only a relevant subset of the whole motion database
in order to generate a locally linear space is proposed [Shum et al.
2013; Ho et al. 2013]. One can generate motions from arbitrary
points on the plane by interpolating the postures of the original mo-
tion. Meanwhile, non-linear methods [Lawrence 2004; Wang et al.
2006] and Deep Learning [Holden et al. 2016] have also been used
to reduce the dimensionality of motions. The Gaussian Process [Liu
et al. 2016] and the mixture of Gaussian Processes [Liu et al. 2016]
can be used to represent a set of human postures with a small num-
ber of Gaussian parameters. However, such methodologies did not
take into the account the connectivity structure of the motions. We
apply dimensionality reduction to our graph structure to visualize
the connectivity structure of captured motions on a 2D plane.
3 System Overview
The system can be divided into two parts as shown in Figure 2:
the motion organization system and the visualization system. The
motion organization system captures, analyze and organizes the
motion of a player using motion segmentation and motion graph
techniques. The visualization system prepares graph layout by pro-
jecting entities to appropriate 2D position using dimensionality re-
duction techniques, and renders the resultant graph with interactive
Figure 2: The system is divided into the motion organization sys-
tem and the visualization system.
4 Motion Organization System
The motion organization system first captures the motion required
for analysis using motion capture systems. Then, it segments the
long sequence of motion into meaningful parts, which are used as
building blocks of a motion graph. The system analyzes the simi-
larity of these motion segments and constructs a Fat Graph structure
that can be used to evaluate the skill level of the subject.
4.1 Motion Capture
The proposed algorithm can be applied to most activities that re-
quire swiftness, flexibility and creativity. For interactive games
with multiple players, it would be the best to capture the motion
of all players and evaluate them individually. However, even with
the state-of-the-art technology, capture motions of multiple play-
ers remains difficult due to occlusion and collision among players.
Therefore, we proposed to capture the training motion of the play-
ers, in which the player moves alone in an open area, imagining to
interact with an opponent.
In boxing or any other martial arts, there is a practice called
“shadow boxing”. The boxer imagines another boxer standing in
front of him/her and repeats the techniques that he/she has been
practicing. The boxer launches not only motions such as punch-
ing, but also defence, stepping, and the consecutive combination of
all such motions. There are similar practice methods in basketball
and soccer as well. The players can use the ball to conduct various
techniques in the court imagining that their opponents are trying to
take away the ball from him/her. The players thus perform various
motions to keep the ball and trick the imaginary opponent.
During experiments, we used optical motion capture system to ac-
quire the performed motion (Figure 3). This is because when com-
pared to magnetic and mechanic motion capture system, optical
systems produce the least disturbance to the player. Also, we pre-
ferred to capture long and continuous clips for the same reason.
4.2 Motion Analysis
We have developed an automatic motion analyzer to segment and
classify raw motions into shorter, meaningful motion clips. This is
done by analyzing the supporting foot condition, the acceleration
profile of joints, and the trajectories of the effective joints.
Figure 3: The shadow boxing motions of several boxers were cap-
tured using an optical motion capture system.
Here we define the term ”motion” as the raw captured data, and
the term ”action” as a semantic segment of the motion we cap-
tured. In the field of boxing, an action can be an attack (such as
a ”left straight”, ”jab” or a ”right kick”), a defense (such as ”par-
ries”, ”blocking” or ”ducking”) , a transition (such as ”stepping to
the left”, ”stepping forward” or ”back step”), or any combination of
We observed that most actions start and end at a double supporting
state (i.e. both feet touching the floor), which can be detected by
monitoring the feet height and velocity. The raw captured motion is
segmented into a set of movement segments, which are the periods
between every two successive double supporting states (Figure 4
We also observed that a relatively large amount of force is exerted
during any actions such as a punch or a step. The periods with high-
level of force exertion is called the activity segments. Since the
force is proportional to acceleration, these segments can be found
when the sum of squares of acceleration of all joints is above a
threshold. The threshold is statistically obtained from the accelera-
tion plot of the body (Figure 4 Middle).
The actions are composed by using the movement segments as the
building blocks. The timing and the duration of the activity seg-
ments are used to determine if the movement segments should be
merged together to form longer segments. Regarding the relation-
ship of the movement segments and the activity segments, there
could be three possible cases: (1) There is no activity segment in-
side a movement segment. In this case, the movement segment be-
comes a single transition action. (2) There is one activity segment
inside a movement segment. In this case, this movement segment
becomes an action with a special activity. (3) There are more than
one activity segments lying across successive movement segments.
In this case, the movement segments containing activity segments
at the border are merged to form an action (Figure 4 Lower). Note
that due to this merging process, the resulting action could contain
multiple activity segments. We also filter very short actions that are
likely to be generated due to the noise of the supporting feet.
Here, we define the effective joints to be the set of joints to repre-
sent an activity segment. In case (1), since the actions contain no
special activities, the pelvis is considered to be the effective joint.
However, in case (2) and (3), the effective joint is the joint that
contributes the most to the sum of squares of the acceleration in
the activity segment. In more complicated actions such as left-right
combo punches, there may be multiple effective joints. Such joints
are used in the later process to evaluate the similarity of actions.
Figure 4: Upper: The movement segment is defined as the period
between two double support supporting phases. Middle: The activ-
ity segment is defined as the period with high acceleration. Lower:
The action is the combination of movement segment and activity
4.3 Graph Construction
We apply the Fat Graph structure to organize the captured motion.
Since Fat Graph is originally proposed for motion synthesis, it is
not optimized for skill visualization. We redesign the algorithms to
generate nodes and edges in the Fat Graph for our purpose.
The nodes of Fat Graph, known as Fat Nodes, are common starting
or ending poses of actions. An unsupervised clustering scheme is
used for grouping them into a finite set of pose groups, each rep-
resented by the mean pose of the group. In this way, we do not
need additional labor such as labelling. Specifically, we used k-
mean to cluster postures. The distance between two postures is the
Euclidean distance between their respective joint angles. Regard-
ing the cluster number k, a large kwould result in many clusters
(Fat Nodes) which unnecessarily increases the complexity of the
graph. A small kwill cluster very different postures into the same
node, defeating the purpose of the graph construction. Therefore,
we set up a posture difference threshold empirically based on ex-
perts’ suggestions. Then, we iteratively search for a proper kby
initially setting k=1 and incrementing kby 1 until we find the first
kthat does not violate the distance threshold.
After clustering, the poses representing the Fat Nodes are the stan-
dard poses that the player can start various motions from. In the
case of boxing, they are usually the fighting poses that the boxer
guarding his/her face against the opponent, with both feet landing
on the ground and keeping apart in shoulder distance. By evaluating
the Fat Nodes, it is possible to tell if a boxer has multiple unnec-
essary standard poses, or if any standard poses contain potential
The edges of a Fat Graph, known as Fat Edges, are directional edges
that represent groups of similar actions. Each edge points from the
Fat Node representing the starting pose to that representing the end-
ing pose. In our implementation, the Fat Edges are represented by
the action group classified by a similar scheme introduced for Fat
Nodes construction. We apply the same algorithm as in the Fat
Node, in which we use k-mean to cluster the actions and search for
the smallest acceptable kfor a given distance threshold. The only
difference is that instead of using posture distance, the actions dis-
tance is defined according to the trajectory of the effective joints as
explained in Section 4.2. This allows accurate clustering of actions
and ensures that the effects of the effective joints are not smoothed
out by other joints.
Formally, the distance between two action A0and A1is defined as:
D(A0,A1) =
if sequences of effective joints are different
[A0(j)( f)A1(j)( f)]
where A0and A1are actions defined by 3D position in terms of
joints and frames, jand jtotal are the joint index and the total num-
ber of effective joints in the sequence of effective joints inside an
action, f,fstart and fend are the frame index, starting frame and
ending frame of the considering effective joint. The terms A0(j)( f)
and A1(j)( f)represent the 3D positions of joint jin frame ffor
the corresponding action. In case two effective joints with different
duration are to be compared, the shorter one is linearly scaled to the
duration of the longer one.
Actions with small distances calculated from Equation 1 are
grouped together to form an action group. In the field of boxing,
an action group could contain a set of actions with basic attacks or
defences such as ”straight punch”, ”hook punch”, ”parry”, or a set
of complex actions combining several attacks and defences. Since
member actions in a Fat Edges share the same starting and ending
Fat Nodes, if an action group contains multiple starting or ending
poses, it is sub-divided. In general, novice players normally has
fewer Fat Edges since they have less experience and hence does not
acquire enough techniques to perform in a match. The relationship
of Fat Nodes and Fat Graphs is demonstrated in Figure 5.
Fat Node
(Standard Fighting Pose)
upper cut
Figure 5: The Fat Node represents the standard fighting pose. The
three outgoing edges represent different action groups.
4.4 The Skill Index
It requires deep knowledge and years’ experience to assess one’s
skills in sports. For the sports in interest, there are two important
indicators. The first one is the richness of the actions that indicates
the resourcefulness of a player. A top player has more than one way
to achieve the same goal where the choice depends on the situation.
The other is the flexibility of transitions between states so that the
player can switch between different states at will. Our graph repre-
sentation captures both of the indicators. The richness can be rep-
resented by the number of Fat Edges between any two Fat Nodes
indicating how many kinds of maneuvers, each with variations, the
player has for transitions from one state to another. The flexibil-
ity is indicated by the connectivity of the graph. A fully connected
graph shows great flexibility because there are transitions between
any two nodes.
However, these two factors are somehow contradicting. In general,
the richer the actions are, the greater the number of different starting
and ending poses is hence the poorer the connectivity of actions is.
Independently considering either of them would not suffice. There-
fore we define a Skill Index,S, that evaluates the skill level of the
The richness of action is represented by the number of Fat Edges
in the Fat Graph, while the connectivity is inversely proportional to
the number of Fat Nodes. The Skill Index is defined by:
S=Number of Fat Edges
Number of Fat Nodes (2)
In our implementation, we do not consider the Fat Nodes that are
not intentionally created. For example, one of our boxers tripped
over during a session. While it is good that our system can ob-
jectively pick up the posture generated by the accident, we do not
include the corresponding Fat Nodes when calculating the Skill In-
dex. Also, we do not consider Fat Edges that contain only one ac-
tion to make sure this edge is not some randomly performed action.
As an example, experienced boxers could perform a large variety
or actions while maintaining the connectivity of actions by limiting
the number of starting and ending poses of actions. In this case, the
Skill Index of the player will be very large.
5 Visualization System
The graph representation explained above consists of high dimen-
sional Fat Nodes (groups of similar postures of many degrees of
freedom) and Fat Edges (groups of similar actions), which presents
a challenge for visualization. To decrease the dimensionality for
better visualization, we propose two different algorithms for nodes
and edges because of their different nature in this graph. Specifi-
cally, we project the nodes on a 2D space and represent the edges
with curves. For Fat Nodes, we apply Principal Component Anal-
ysis (PCA) as it creates a more consistent low dimensional space
comparing with other methods. For Fat Edges, we apply PCA on
high energy postures of the actions, and use a combination of geo-
metric primitives to visualize the action features.
5.1 Visualizing Fat Nodes
Although the degree of freedom (DOF) of human poses are nor-
mally in high dimensionality (45 DOF in our system), they are in-
trinsically dependent on each other. In fact, the Fat Nodes can be
represented effectively in a 2D space where nodes of similar poses
are located together while that of different poses are located far
apart. By this representation, viewers can easily understand the re-
lationship among postures. In this section, we briefly describe the
progress of projecting high dimensionality poses in Fat Nodes to
low dimensionality viewing plane.
We define a posture as a vector of 3D positions of joints, each of
which is computed as its 3D position with respect to the position of
the pelvis. Suppose iT otal is the total number of joints, a pose space
Pcan be defined as P={Ji}where i[0,iTotal ]. Each dimension
in the pose space Prepresents a DOF, and the jt h dimension in the
pose space is denoted as Pj, where j[0,3iTotal ]and 3iTot al is the
total number of DOF.
Given a set of zero-meaned poses pP, it is possible to calcu-
late the covariance matrix Cof size 3iTotal ×3iT otal to evaluate the
intrinsic dependency of the dimensions in the pose space. The ele-
ment at mth column and nth row of matrix Cis defined as:
where cov(P
n)denotes the covariance between the mth and nth
dimension in the pose space.
We calculate the eigenvectors from C. The eigenvectors represent
orthogonal dimensions that form a projected space, and each of
them comes with an eigenvalue. Since we wish to project the human
pose onto a 2D space, we select the two eigenvectors with largest
eigenvalues, which indicate the variance of data in the correspond-
ing eigenvector dimension, to form a feature vector.
F= [eig1,eig2](4)
where eig1and eig2are the two eigenvectors with largest eigenval-
ues. Once the feature vector is calculated, an arbitrary pose pcan
be projected onto the 2D plane by:
where p0is the projected 2D coordinate.
We obtain the mean posture of each Fat Node, and render it with a
humanoid character at the corresponding 2D position using Equa-
tion 5. We use the size of the character to represent the number of
poses that are classified into the node. The more muscular/bigger
the character is, the bigger the node is (Figure 6). In this way, one
can easily observe the poses that the player normally used to start
actions, and hence identify the incorrectly performed poses. For
example, in boxing, novice boxers sometimes lose tracking of their
boxing rhythm, and hence start or end a punch with an inappropri-
ate posture.
Figure 6: From left to right, the character becomes bigger and
bigger as the size of the nodes goes up.
5.2 Visualizing Fat Edges
Fat Edges contain information of groups of similar actions, they can
represent motion variation, player resourcefulness, etc. We cannot
apply dimensionality reduction purely based on the action data it-
self because the low dimensional projection would be very com-
plex. We propose to visualize each Fat Edge by a 2D curve that
connects the starting and the ending Fat Nodes.
We represent the number of actions in the edge by the thickness
of the curve. It shows the frequencies of different actions which
indicates the player’s preference of specific actions. For instance, if
a boxer heavily relies on single straight punch, the Fat Edge for such
action will be unreasonably thick, while edges for other attacks such
as hook punches will be relatively thin, which shows the potential
problem of the lack of diversity of attacking strategies.
Finally, to visually distinguish between different actions, we add
some geometric patterns on the 3D curve. We collect the high-
energy frames of all actions and use the PCA method explained in
the last section to project them onto a 1D space. Since the high-
energy frames of different actions are typically distinguishing pos-
tures, the projection essentially maps all action features onto a nor-
malized 1D space, denoted by I[1.0,1.0]. We specify some
geometric patterns to represent values in this 1D space. In particu-
lar, we design some patterns for landmark values -1.0, -0.5, 0, 0.5
and 1.0. The patterns to represent values between two landmarks
are obtained by linear interpolation. The patterns for landmark val-
ues in our system are shown in Figure 7 Upper. Given a Fat Edge,
we first obtain a mean action and its corresponding high energy pos-
tures. We then obtain the 1D representation of those postures and
place a corresponding pattern on the edge. Through the comparison
between Figure 7 Lower Left and Lower Right, it shows that adding
the geometric patterns gives a better visualization of actions in the
edges. This strategy presents an intuitive way to show the players
preferences over actions of different complexity.
Figure 7: (Upper) The geometric patterns for landmark values be-
tween -1 and 1. (Lower) Comparison of visualization without/with
the patterns.
Although we use Fat Edges to represent actions in groups, there
may still be a lot of edges in a graph. It is essential to organize them
in a neat way to avoid overlapping of edges. We organize the edges
in a way to avoid occlusion. For edges with starting node different
from ending node, the edge direction is fixed. The only adjustable
variable is the bending side of the curves, which is essentially the
sign of the curves. On the other hand, for edges with starting same
as ending node, the edge direction is undefined. In other words, the
direction of these edges can be any angles in the X-Z plane. In both
cases, we select signs and angles such that the edges would blend
towards less density region of the graph.
5.3 Interactive Features
We integrate some interactive features in our system to incremen-
tally display relevant information based on the user input. The user
could interact with our system using standard input device such as
mouse and keyboard. When the user selects any specific entities in
the graph, related information will be shown.
For example, when a Fat Node is selected, its corresponding Fat
Edges will be highlighted. Information about the number of mem-
bers in that node, number of outgoing edges, and number of incom-
ing edges are displayed in a sub window. On the other hand, when a
Fat Edge is selected, we render the action included in the edge one
by one, such that the user can visualize the content of the edge.
6 Experiments
In this section, we present experimental results. We captured mo-
tions of four boxers, with different skill levels. We first give de-
tailed motion analysis and visualization of individual motions, then
compare them side by side. They demonstrate that our system is
an effective tool for motion analysis, skill assessment and compar-
isons. As it is difficult to show the motions in pictures, we strongly
suggest that the readers watch the supplementary video for more
The four boxers chosen have different skill levels. As a ground
truth, they were evaluated and labelled as Skillful, Medium,
Medium and Novice, denoted by S, M2, M1 and N. We show the
subjects and their Fat Graphs respectively. To fully explain the fea-
tures in the visualization system, we first show the Fat Graph of
Boxer S in figure 8.
6.1 The Visualization System
In Figure 8, there are three Fat Nodes indicated by red arrows and
numbered as 1, 2 and 3, each visualized as a character with a mean
posture in the node. The sizes of the nodes are indicated by the
body shapes. Node 1 obviously has the most muscular character
which means its node size is the biggest. 2 and 3 are far smaller.
Fat Edges are rendered as curves between nodes such as the ones
shown by 4 and 5. As we explained, the thicknesses of the edges
indicate the frequency of the actions taken. 5 is much thicker than
4 suggesting this boxer takes action 5 more often. In addition, an
edge can be smooth like a circle or bumpy with geometric patterns
shown in 7 and augmented into 3D. A single pattern means one
activity segment, e.g. a left punch while multiple patterns mean a
series of activities such as a combo attack.
In addition to the basic features, our system supports interactive
demonstrations. Figure 8 is the result when the mouse hovers over
the Node 1. All the edges starting from this node are highlighted,
each with a small character performing the action on it. It gives the
user the flexibility to look at the actions from that node. Also, if
the screen looks too jammed with all actions performed at the same
time, the user could move the mouse onto a specific edge where our
system will only render one character performing that action on the
6.2 Boxer Evaluation
Next, we show what assessments we can make by looking at the Fat
Graph of Boxer S. First of all, a good boxer always tries to stay in a
defensive mode whenever he/she is not attacking or dodging. Even
after making an attack or dodge, he/she is supposed to resume the
defense mode. From this point of view, Boxer S has a good grasp
onto the principle. It can be seen in that Node 1 is the standard
defense posture and also the biggest node which means Boxer S
stays in the defense mode most of the time. Also, most of the edges
leave and go back to it, which means no matter what actions Boxer
S takes, he returns immediately to defense mode. Second, there are
a good number of actions starting and going back to Node 1, which
indicates Boxer S is resourceful and has mastered a variety of ac-
tions for different purposes. By looking more closely at the edges,
we number the innermost edge as 1 and increment the index out-
wards. It can be easily seen that edge 1-3 are smooth circles without
any geometric patterns, showing they are simple stepping strategies,
which means Boxer S moves a lot while waiting for the right timing
to attack. Starting from the 4th edge (the one pointed by Figure 8 5
is the 5th edge), there are many attacking actions shown. Interest-
ingly, the 4th and 5th edges have only one geometric pattern while
the 6th and onward edges have more, showing Boxer S prefers sim-
ple attacks such as one punch and use fewer combo attacks because
edge 4 and 5 are thicker. For a skillful boxer, it makes sense be-
cause complex attacks also expose the attacker himself to attacks
(due to that it takes a longer time to resume the defense posture),
they are usually less used.
Besides, we can also spot the skill flaws in the Fat Graph. Node 2
is a deformed defense posture because the two hands are lower than
they are supposed to. This node is much smaller node and there are
only a few edges between Node 1 and Node 2. It means Boxer S
stays in Node 2 sometimes which exposes his head to attacks. This
Figure 8: The Fat Graph of Boxer S. 1, 2 and 3 are Fat Nodes. 4 and 5 are two Fat Edges. 4 connects Node 2 and Node 3. 5 connects Node1
to itself.
is a typical mistake made by even professional boxers especially
when they are tired or focusing on looking for opportunities for
attacking. In fact, raising the hands for defense is a common advice
that boxing coaches give. Node 3 is a special case. There is just
one thin edge connecting Node 2 to Node 3 because it was when
the Boxer tripped over accidentally during the motion capture. We
show it here for completeness.
Through Figure 8, one can see that our system provides intuitive vi-
sualization of boxing motions including posture preference, action
variety, action preference, skill flaws, etc. Next, we also show the
Fat Graphs of the other three boxers in Figure 9.
From Figure 9, one can intuitively understand why their respective
skill levels are so. The top one is a novice boxer. The graph has
five nodes with four of them being almost equally muscular (the
four near the center). It means the boxer transits among them most
of the time and all of them except for the green node in the center
have ineffective defense postures, e.g. hands are too lower or too
apart, making him very vulnerable in boxing. Also, The green one
is the relatively main posture, in which the leg is wider apart. The
blue one is a secondary posture with narrower leg distance. The red
and the orange are less used postures. Leg movement is important
in boxing to efficiently launch actions. The postures with wider leg
distance are considered inferior, as they limit the ability for a swift
evasive move.
The middle is Boxer M1 who has medium skills but slightly worse
than Boxer M2. He apparently knows about the principle of holding
a defense posture all the time as shown that by the purple character
in the middle which is the biggest node. Also, most of the actions
leave and go back to this node indicating he is aware that after an
action, he is supposed to resume the defense posture. However,
overall there are many more nodes. Although as relatively small as
they are, it means there are times Boxer M1 forgets about defense
posture holding. In addition, the purple main posture has a large
variety of actions. However, it is also obvious that some actions are
too complex and are only conducted once or twice. This means that
the boxer lacks consistency in boxing techniques. The blue pos-
ture is a subtle preparation movement for right punch. This should
be avoided as the opponent can tell his move when seeing such a
Boxer N Boxer M1 Boxer M2 Boxer S
PN 176 160 112 138
FNN 6 6 3 3
AN 88 80 56 69
FEN 57 36 16 20
SI 1.3 3.0 2.3 5.0
SL Novice Medium Medium Skillful
Table 1: Statistics of the boxing motions. PN: Posture Number.
FNN: Fat Node Number. AN: Action Number. FEN: Fat Edge Num-
ber. SI: Skill Index. SL: Skill Level.
The bottom is the Boxer M2 whose skill level is the closest to Boxer
S. Their Fat Graphs also look very similar which shows the consis-
tency of our system. Boxer M2 rarely leaves the defense posture
shown by the purple character in the middle of 9 Bottom. However,
the biggest difference between him and Boxer S is the actions. The
boxer has a large number of locomotion. The first edge (the inner-
most one) is a circle which indicates a stepping. Note that he relies
too heavily on rightleft combo (the second edge), which should be
avoid as the opponent could take advantage on such a frequently
launched combo. There is stepping in different directions shown by
the third and fourth edge. Then, there are some left punches (the
fifth edge). The boxer also has a number of other boxing combos
to enrich the variety of actions. The green and the blue postures
represent should be avoided, as they derive from the core, purple
posture. Only limited actions can be started from them.
6.3 Graph Statistics
Finally, we give some statistics about the four Fat Graphs in Table
1. We calculate the Skill Index using Equation 2, and it aligns with
the boxers’ skill level. Notice that when calculating the Skill Index,
we do not consider Fat Nodes that are unintentionally generated
such as tripping over accidentally during the capture section. We
also do not consider Fat Edges that have only one member action
to ensure that the evaluated action can be conducted by the boxer
Figure 9: Top: Boxer N, the novice boxer. Middle: Boxer M1, the
boxer with basic skills. Bottom: Boxer M2, the boxer with medium
7 Conclusions
In this paper, we proposed a method to visualize the skills of ath-
letes using Fat Graphs from a higher level point of view. With our
algorithm, the flexibility and the richness of the motions can be clar-
ified. In our motion organization system, we introduced a generic
motion segmentation and classification method to analyze raw cap-
tured sports motions. We also applied a hierarchical motion graph
called Fat Graph to arrange segmented motion in a meaningful way.
Then, in our visualization system, we utilized dimensionality re-
duction techniques to project the Fat Graph onto a 2D plane, and
rendered it in a 3D space with special 3D features.
In our experiments, we showed the differences between the Fat
Graphs of a novice player and that of an experienced player: the
former is much sparser and less connected, while the later is dense
and well-connected. We made use of the proposed skill index to
objectively evaluate the skill level of players in terms of flexibility
and connectivity of motions. Moreover, we discussed some of the
potential problems of the sports players by analyzing their corre-
sponding Fat Graphs. This information is useful for the players to
check their performances and plan their future training.
In this research, we focus on analyzing the skill level of the boxers
in terms of high-level motion behaviour such as the richness of the
action and the transition of action. We do not evaluate the lower-
level parameters such as the speed of the punches, which has been
explored in previous works. It is an interesting future direction to
combine both high-level and low-level evaluation in order to have a
full assessment of the boxers.
There are limitations to our method. First, our method is based
on the assumption that the sports skills mainly consist of a finite
number of key postures and transitions in between. Admittedly, not
all sports follow this pattern. Second, the visualization and skill
assessment is based on individual athlete, not considering skills re-
lated to collaborations such as those in group sports, in which the
assessment might need to employ different criteria.
In the future, we wish to extend the proposed algorithm to the field
of computer animation. Currently, when synthesizing animations
by motion graphs, experienced animators are required to tell what
motions are missed or badly captured. With our system, it is pos-
sible to analyze the connectivity and variety of a motion set, which
are two critical factors in motion synthesis. However, how to gener-
alize these findings to give high-level suggestion, such as proposing
the motions to capture, remains an open problem. In addition, we
would like to develop a visualization system to take the adversar-
ial nature of sports. For instance, although two boxers might have
roughly the same skill level, in a match, one’s skill composition
might give him/her advantages over the other. This kind of analysis
would be very useful in preparation for a game or predicting the
This work was supported in part by the Engineering and Phys-
ical Sciences Research Council (EPSRC) (Ref: EP/M002632/1),
Hong Kong Baptist University Science Faculty Research Grants
(Ref: FRG2/14-15/105), NSFC Young Scientist Research Grant
(Project no.: 61302176) and the Hong Kong Research Grant Coun-
cil (Project no.: GRF210813).
ARIKAN, O., AND FOR SYT H, D. 2002. Motion generation from
examples. ACM Transactions on Graphics 21, 3, 483–490.
ARIKAN, O., FORS YTH , D. A., AND O’BR IEN , J. F. 2003. Mo-
tion synthesis from annotations. ACM Trans. Graph. 22, 3 (July),
2008. Motion-motif graphs.
TON, F., D ELAMARCHE, P., AND ARN AL DI , B. 2003. Real
handball goalkeeper vs. virtual handball thrower. Presence:
Teleoper. Virtual Environ. 12, 4, 411–421.
POP OVI C’, Z. 2004. Style-based inverse kinematics. ACM
Transactions on Graphics (TOG) 22, 3 (August).
HEC K, R., A N D GLEICHER, M. 2007. Parametric motion graphs.
In Proceedings of the 2007 Symposium on Interactive 3D Graph-
ics and Games, ACM, New York, NY, USA, I3D ’07, 129–136.
HO, E. S., AND KOMURA, T. 2009. Character Motion Synthesis
by Topology Coordinates. Computer Graphics Forum.
HO, E. S . L., A ND KOMURA, T. 2011. A finite state machine
based on topology coordinates for wrestling games. Computer
Animation and Virtual Worlds 22, 5, 435–443.
HO, E. S . L., SH UM , H . P. H., CHEUNG, Y.-M., A ND YU EN ,
P. C. 2013. Topology aware data-driven inverse kinematics.
Comp. Graph. Forum 32, 7 (Oct), 61–70.
HOL DE N, D ., SA IT O, J ., AN D KOMURA, T. 2016. A deep learn-
ing framework for character motion synthesis and editing. ACM
Trans. Graph. 35, 4 (July), 138:1–138:11.
HYU N, K., L E E, K., AND LE E, J . 2016. Motion grammars for
character animation. Computer Graphics Forum 35, 2, 103–113.
KOMURA, T., KU RO DA, A., AND SHI NAG AWA, Y. 2002. Nice-
meetvr: Facing professional baseball pitchers in the virtual bat-
ting cage. In Proceedings of the 2002 ACM Symposium on Ap-
plied Computing, ACM, New York, NY, USA, SAC ’02, 1060–
KOVAR, L., G LEICHER, M. , A ND PIGHIN, F. 2002. Motion
graphs. ACM Transactions on Graphics 21, 3, 473–482.
KWO N, T., AND SHIN, S. Y. 2005. Motion modeling for on-line
locomotion synthesis. In SCA ’05: Proceedings of the 2005 ACM
SIGGRAPH/Eurographics symposium on Computer animation,
LAU, M., A ND KU FF NE R, J . J. 2005. Behavior planning for char-
acter animation. In SCA ’05: Proceedings of the 2005 ACM
SIGGRAPH/Eurographics symposium on Computer animation,
ACM Press, 271–280.
LAWRENCE, N. D . 2004. Gaussian process latent variable models
for visualisation of high dimensional data.
LEE , J., CHAI, J ., RE IT SMA , P. S. A., HODGINS, J. K., AND
POL LA RD , N. S . 2002. Interactive control of avatars animated
with human motion data. ACM Transactions on Graphics 21, 3,
LI, Y., WAN G , T., AND SH UM , H.-Y. 2002. Motion texture: A
two-level statistical model for character motion synthesis. ACM
Trans. Graph. 21, 3 (July), 465–472.
LIU , Z. , ZHOU, L ., LE UN G, H., AN D SHU M , H. P. H. 2016.
Kinect posture reconstruction based on a local mixture of gaus-
sian process models. IEEE Transactions on Visualization and
Computer Graphics.
MIN , J. , AND CHAI, J. 2012. Motion graphs++: a compact gen-
erative model for semantic motion analysis and synthesis. ACM
Transactions on Graphics 31, 6, 153.
MOL ET, T., AU BEL , A. , NO S ER , H. , CAP IN , T., LEE , E. ,
Anyone for tennis.
PLA NTARD , P., SH UM , H. P. H., A ND MU LTO N, F. 2016. Er-
gonomics measurements using kinect with a pose correction
framework. In Proceedings of the 2016 International Digital
Human Modeling Symposium, DHM ’16.
PLA NTARD , P., SH UM , H. P. H., AND MU LTON , F. 2016. Filtered
pose graph for efficient kinect pose reconstruction. Journal of
Multimedia Tools and Applications.
REI TS MA , P. S. A ., A ND PO LLA RD , N. S. 2007. Evaluating
motion graphs for character animation. ACM Trans. Graph. 26,
4 (Oct.).
SAF ON OVA, A. , A ND HODGINS, J. K. 2007. Construction and op-
timal search of interpolated motion graphs. ACM Trans. Graph.
26, 3 (July).
SHIN, H. J ., AND LEE, J . 2006. Motion synthesis and editing
in low-dimensional spaces. Computer Animation and Virtual
Worlds (Special Issue: CASA 2006) 17, 3-4, 219 – 227.
SHIN, H. J ., AND OH, H. S. 2006. Fat graphs: Constructing an
interactive character with continuous controls. ACM SIGGRAPH
/ Eurographics Symposium on Computer Animation, 291–298.
S. 2008. Interaction patches for multi-character animation. ACM
Trans. Graph. 27, 5 (Dec), 114:1–114:8.
S. 2010. Physically-based character control in low dimensional
space. In Proceedings of the Third International Conference on
Motion in Games, Springer-Verlag, Berlin, Heidelberg, vol. 6459
of MIG ’10, 23–34.
SHU M, H . P. H., KOMURA, T., AND YAMAZAKI, S. 2012. Simu-
lating multiple character interactions with collaborative and ad-
versarial goals. IEEE Transactions on Visualization and Com-
puter Graphics 18, 5 (May), 741–752.
SHU M, H . P. H., H O, E. S. L ., JIANG, Y., AND TAKAG I, S. 2013.
Real-time posture reconstruction for microsoft kinect. IEEE
Transactions on Cybernetics 43, 5, 1357–1369.
global geometric framework for nonlinear dimensionality reduc-
tion. Science 290, 5500, 2319–2323.
WANG, J., HERTZMANN, A. , AND BL EI , D. M. 2006. Gaussian
process dynamical models.
XIA , S. , WANG, C., C HAI , J. , AND HODGINS, J. 2015. Realtime
style transfer for unlabeled heterogeneous human motion. ACM
Trans. Graph. 34, 4 (July), 119:1–119:10.
YEADON, M. The biomechanics of the human in flight. The Amer-
ican Journal of Sports Medicine 25, 4, 575–580.
YEADON, M. 1990. The simulation of aerial movement- iv. a
computer simulation model. Journal of Biomechanics 23, 1, 85–
... Based on a similar motivation, work in other education domains, such as sports training [22,33,37], has started to increasingly leverage data-driven approaches to expand this feedback loop with an additional data angle. Visual analysis can then be used to guide decisions and increase motivation [15]. ...
Full-text available
We propose a data-driven approach to music instrument practice that allows studying patterns and long-term trends through visualization. Inspired by life logging and fitness tracking, we imagine musicians to record their practice sessions over the span of months or years. The resulting data in the form of MIDI or audio recordings can then be analyzed sporadically to track progress and guide decisions. Toward this vision, we started exploring various visualization designs together with a group of nine guitarists, who provided us with data and feedback over the course of three months.
... In this section, we will first review existing examples of AR in various industries. While Virtual Reality (VR) and interactive computer graphics have been used for teaching and learning, such as partner dancing [3], visualizing wrestling [4], [5] and boxing [6], [7] skills, in the last two decades, more attention has been paid on vision-based frameworks which make use of cameras and sensors. By capturing the information from the surrounding using cameras and sensors, useful feedback can be provided to the user, such as posture monitoring [8] and interacting with virtual objects using body movement [9], [10]. ...
Conference Paper
Full-text available
Connecting network cables to network switches is a time-consuming and inefficient task, and requires extensive documentation and preparation beforehand to ensure no service faults are encountered by the users. In this paper, a new AR smartphone application that overlays network switch information over the user’s vision is designed and developed for real working environment to increase user’s efficiency in working with a network switch. Specifically, the prototype of the AR App is developed on the Android platform using both the Unity game engine and Vuforia AR library and connecting to the network switch to retrieve network information through telnet. By using the camera on the smartphone for capturing the visual information from the working environment, i.e. the network switch in this App, the network switch information such as speed, types, etc. will be overlaid on each port on the smartphone screen. A user study was conducted to evaluate the effectiveness of the AR App to assist users in performing network tasks. In particular, participants were tasked with connecting switchports to a patch panel to match up corresponding configurations. After three tests, it was found that the times for completion and mistakes made were reduced in the final test when compared to the first. This highlights the positive effects of the application in improving the user’s efficiency.
... When a human player encounters such a character in a virtual environment, a close interaction with a virtual character is usually desired. When the human player interacts with the character through body movements, one would expect real-time feedback from the virtual character, such as wrestling [15], hand shaking [33], sword fighting [43], or boxing [34]. A human-like reaction by the virtual character would greatly enhance the immersion of the gaming experience. ...
Full-text available
In this paper, we propose a generative recurrent model for human-character interaction. Our model is an encoder-recurrent-decoder network. The recurrent network is composed by multiple layers of long short-term memory (LSTM) and is incorporated with an encoder network and a decoder network before and after the recurrent network. With the proposed model, the virtual character’s animation is generated on the fly while it interacts with the human player. The coming animation of the character is automatically generated based on the history motion data of both itself and its opponent. We evaluated our model based on both public motion capture databases and our own recorded motion data. Experimental results demonstrate that the LSTM layers can help the character learn a long history of human dynamics to animate itself. In addition, the encoder–decoder networks can significantly improve the stability of the generated animation. This method can automatically animate a virtual character responding to a human player.
Automatic evaluation of sports skills has been an active research area. However, most of the existing research focuses on low-level features such as movement speed and strength. In this work, we propose a framework for automatic motion analysis and visualization, which allows us to evaluate high-level skills such as the richness of actions, the flexibility of transitions and the unpredictability of action patterns. The core of our framework is the construction and visualization of the posture-based graph that focuses on the standard postures for launching and ending actions, as well as the action-based graph that focuses on the preference of actions and their transition probability. We further propose two numerical indices, the Connectivity Index and the Action Strategy Index, to assess skill level according to the graph. We demonstrate our framework with motions captured from different boxers. Experimental results demonstrate that our system can effectively visualize the strengths and weaknesses of the boxers.
Full-text available
Being marker-free and calibration free, Microsoft Kinect is nowadays widely used in many motion-based applications, such as user training for complex industrial tasks and ergonomics pose evaluation. The major problem of Kinect is the placement requirement to obtain accurate poses, as well as its weakness against occlusions. To improve the robustness of Kinect in interactive motion-based applications, real-time data-driven pose reconstruction has been proposed. The idea is to utilize a database of accurately captured human poses as a prior to optimize the Kinect recognized ones, in order to estimate the true poses performed by the user. The key research problem is to identify the most relevant poses in the database for accurate and efficient reconstruction. In this paper, we propose a new pose reconstruction method based on modeling the pose database with a structure called Filtered Pose Graph, which indicates the intrinsic correspondence between poses. Such a graph not only speeds up the database poses selection process, but also improves the relevance of the selected poses for higher quality reconstruction.We apply the proposed method in a challenging environment of industrial context that involves sub-optimal Kinect placement and a large amount of occlusion. Experimental results show that our real-time system reconstructs Kinect poses more accurately than existing methods.
Full-text available
Depth sensor based 3D human motion estimation hardware such as Kinect has made interactive applications more popular recently. However, it is still challenging to accurately recognize postures from a single depth camera due to the inherently noisy data derived from depth images and self-occluding action performed by the user. In this paper, we propose a new real-time probabilistic framework to enhance the accuracy of live captured postures that belong to one of the action classes in the database. We adopt the Gaussian Process model as a prior to leverage the position data obtained from Kinect and marker-based motion capture system. We also incorporate a temporal consistency term into the optimization framework to constrain the velocity variations between successive frames. To ensure that the reconstructed posture resembles the accurate parts of the observed posture, we embed a set of joint reliability measurements into the optimization framework. A major drawback of Gaussian Process is its cubic learning complexity when dealing with a large database due to the inverse of a covariance matrix. To solve the problem, we propose a new method based on a local mixture of Gaussian Processes, in which Gaussian Processes are defined in local regions of the state space. Due to the significantly decreased sample size in each local Gaussian Process, the learning time is greatly reduced. At the same time, the prediction speed is enhanced as the weighted mean prediction for a given sample is determined by the nearby local models only. Our system also allows incrementally updating a specific local Gaussian Process in real time, which enhances the likelihood of adapting to run-time postures that are different from those in the database. Experimental results demonstrate that our system can generate high quality postures even under severe self-occlusion situations, which is beneficial for real-time applications such as motion-based gaming and sport training.
Full-text available
In this paper we present a virtual tennis game. We describe the creation and modeling of the virtual humans and body deformations, also showing the real-time animation and rendering aspects of the avatars. We focus on the animation of the virtual tennis ball and the behavior of a synthetic, autonomous referee who judges the tennis games. The networked, collaborative, virtual environment system is described with special reference to its interfaces to driver programs. We also mention the virtual reality (VR) devices that are used to merge the interactive players into the virtual tennis environment, together with the equipment and technologies employed for this exciting experience. We conclude with remarks on personal experiences during the game and on future research topics to improve parts of the presented system.
We present a framework to synthesize character movements based on high level parameters, such that the produced movements respect the manifold of human motion, trained on a large motion capture dataset. The learned motion manifold, which is represented by the hidden units of a convolutional autoencoder, represents motion data in sparse components which can be combined to produce a wide range of complex movements. To map from high level parameters to the motion manifold, we stack a deep feedforward neural network on top of the trained autoencoder. This network is trained to produce realistic motion sequences from parameters such as a curve over the terrain that the character should follow, or a target location for punching and kicking. The feedforward control network and the motion manifold are trained independently, allowing the user to easily switch between feedforward networks according to the desired interface, without re-training the motion manifold. Once motion is generated it can be edited by performing optimization in the space of the motion manifold. This allows for imposing kinematic constraints, or transforming the style of the motion, while ensuring the edited motion remains natural. As a result, the system can produce smooth, high quality motion sequences without any manual pre-processing of the training data.
The behavioral structure of human movements is imposed by multiple sources, such as rules, regulations, choreography, habits, and emotion. Our goal is to identify the behavioral structure in a specific application domain and create a novel sequence of movements that abide by structure-building rules. To do so, we exploit the ideas from formal language, such as rewriting rules and grammar parsing, and adapted those ideas to synthesize the three-dimensional animation of multiple characters. The structured motion synthesis using motion grammars is formulated in two layers. The upper layer is a symbolic description that relates the semantics of each individual's movements and the interaction among them. The lower layer provides spatial and temporal contexts to the animation. Our multi-level MCMC (Markov Chain Monte Carlo) algorithm deals with the syntax, semantics, and spatiotemporal context of human motion to produce highly-structured, animated scenes. The power and effectiveness of motion grammars are demonstrated in animating basketball games from drawings on a tactic board. Our system allows the user to position players and draw out tactical plans, which are animated automatically in virtual environments with three-dimensional, full-body characters.
Conference Paper
Real-time control of three-dimensional avatars is an important problem in the context of computer games and virtual environments. Avatar animation and control is difficult, however, because a large repertoire of avatar behaviors must be made available, and the user must be able to select from this set of behaviors, possibly with a low-dimensional input device. One appealing approach to obtaining a rich set of avatar behaviors is to collect an extended, unlabeled sequence of motion data appropriate to the application. In this paper, we show that such a motion database can be preprocessed for fle xibility in behavior and efficient search and exploited for real-time avatar control. Flexibility is created by identifying plausible transitions between motion segments, and efficient search through the resulting graph structure is obtained through clustering. Three interface techniques are demonstrated for controlling avatar motion using this data structure: the user selects from a set of available choices, sketches a path through an environment, or acts out a desired motion in front of a video camera. We demonstrate the flexibility of the approach through four different applications and compare the avatar motion to directly recorded human motion.
Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. The human brain confronts the same problem in everyday perception, extracting from its high-dimensional sensory inputs—30,000 auditory nerve fibers or 106 optic nerve fibers—a manageably small number of perceptually relevant features. Here we describe an approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set. Unlike classical techniques such as principal component analysis (PCA) and multidimensional scaling (MDS), our approach is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure.
This paper presents a novel solution for realtime generation of stylistic human motion that automatically transforms unlabeled, heterogeneous motion data into new styles. The key idea of our approach is an online learning algorithm that automatically constructs a series of local mixtures of autoregressive models (MAR) to capture the complex relationships between styles of motion. We construct local MAR models on the fly by searching for the closest examples of each input pose in the database. Once the model parameters are estimated from the training data, the model adapts the current pose with simple linear transformations. In addition, we introduce an efficient local regression model to predict the timings of synthesized poses in the output style. We demonstrate the power of our approach by transferring stylistic human motion for a wide variety of actions, including walking, running, punching, kicking, jumping and transitions between those behaviors. Our method achieves superior performance in a comparison against alternative methods. We have also performed experiments to evaluate the generalization ability of our data-driven model as well as the key components of our system.
This paper introduces a new generative statistical model that allows for human motion analysis and synthesis at both semantic and kinematic levels. Our key idea is to decouple complex variations of human movements into finite structural variations and continuous style variations and encode them with a concatenation of morphable functional models. This allows us to model not only a rich repertoire of behaviors but also an infinite number of style variations within the same action. Our models are appealing for motion analysis and synthesis because they are highly structured, contact aware, and semantic embedding. We have constructed a compact generative motion model from a huge and heterogeneous motion database (about two hours mocap data and more than 15 different actions). We have demonstrated the power and effectiveness of our models by exploring a wide variety of applications, ranging from automatic motion segmentation, recognition, and annotation, and online/offline motion synthesis at both kinematics and behavior levels to semantic motion editing. We show the superiority of our model by comparing it with alternative methods.