ArticlePDF Available

KeyFrame extraction for human motion capture data via multiple binomial fitting

Authors:

Abstract and Figures

In this paper, we make two contributions. The first is to propose a new keyframe extraction algorithm, which reduces the keyframe redundancy and reduces the motion sequence reconstruction error. Secondly, a new motion sequence reconstruction method is proposed, which further reduces the error of motion sequence reconstruction. Specifically, we treated the input motion sequence as curves, then the binomial fitting was extended to obtain the points where the slope changes dramatically in the vicinity. Then we took these points as inputs to obtain keyframes by density clustering. Finally, the motion curves were segmented by keyframes and the segmented curves were fitted by binomial formula again to obtain the binomial parameters for motion reconstruction. Experiments show that our methods outperform existing techniques, in terms of reconstruction error.
Content may be subject to copyright.
KeyFrame Extraction for Human Motion Capture
Data via Multiple Binomial Fitting
Chenxu Xu1, Wenjie Yu1, Yanran Li4, Xuequan Lu5, Meili Wang 123 and
Xiaosong Yang4
1College of Information Engineering, Northwest A&F University,Yangling 712100,Chian
2Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture, Yangling 712100, China
3Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling 712100,
China
4Bournemouth University, NCCA
5Deakin University - Waurn Ponds Campus, School of Information Technology
Abstract
In this paper, we make two contributions. The
first is to propose a new keyframe extraction
algorithm, which reduces the keyframe re-
dundancy and reduces the motion sequence
reconstruction error. Secondly, a new motion
sequence reconstruction method is proposed,
which further reduces the error of motion
sequence reconstruction. Specifically, we treated
the input motion sequence as curves, then the
binomial fitting was extended to obtain the points
where the slope changes dramatically in the
vicinity. Then we took these points as inputs to
obtain keyframes by density clustering. Finally,
the motion curves were segmented by keyframes
and the segmented curves were fitted by binomial
formula again to obtain the binomial parame-
ters for motion reconstruction. Experiments
show that our methods outperform existing
techniques, in terms of reconstruction error.
Keywords:
keyframe extraction, motion capture,
curve simplification, computer animation
1 Introduction
Motion capture plays an important role in com-
puter games, virtual environments, movie effects,
Corresponding author: wml@nwsuaf.edu.cn
robotics [
1
,
2
,
3
], motion prediction [
4
], and other
fields. Motion capture originates from authentic
motion, so even complex motion can produce
real effects in real time, which is very attrac-
tive for related fields. However, there still exist
significant problems while using motion capture
data. Human motion data is usually captured
at high frequencies, which leads to a very large
amount of data. As a result, it is difficult to the
storage, retrieval, browse and reuse the capture
data. It seems that keyframe extraction brings
hope for solving the problems. As its name sug-
gests, the technique of keyframe extraction is
to select the most representative frames from
the motion sequence to represent the whole mo-
tion sequence. It eliminates redundant frames
in motion sequences, saves the space needed for
storage, facilitates the browsing and editing of
motion sequences, and improves the reuse effi-
ciency of motion capture sequences. From what
has been mentioned above, keyframes should
have characteristics as follows: (1) Keyframes
should be as few as possible while reflecting the
overall movement trend of motion sequence. (2)
Keyframes can reconstruct the motion sequence
with minimum error by interpolation.
In this paper, a curve fitting method is pro-
posed to extract keyframes from motion capture
sequences. First, the input motion data is re-
garded as a set of rotation information curves,
This is a preprint
because except for three records that are the lo-
cus information of the root joint, all the other
records are the rotation information of the joint.
Then identify the areas of the curve where the s-
lope changes dramatically and the points in these
areas can best represent the overall trend of joints
motion changes. We employ binomial to piece-
wise fit the rotation information curves, make the
curves fit to the specified goodness-of-fit
R2
, and
get the segment points. Then we cluster the seg-
ment points, and the obtained clustering centers
are optimized to obtain the keyframes. At the
same time, we also propose a method of motion
data reconstruction based on several areas bino-
mial fitting. Its reconstruction error is lower than
the current popular method.
2 Related Work
2.1 KeyFrame Extraction
The methods of keyframe extraction main-
ly include curve simplification-based, ma-
trix decomposition-based, cluster-based and
optimization-based. Curve simplification takes
a frame as a point and a motion sequence as a
curve in a high-dimensional space, and realizes
keyframe extraction by finding a series of points
that can depict the whole curve well[
5
,
6
,
7
].
However, the keyframe is extracted according to
the local extremum of the motion curve, and oth-
er parts of the curve with obvious slope change
are ignored. The method based on matrix de-
composition is to represent motion sequence
as motion matrix, and then decompose the ma-
trix into weight matrix and keyframe matrix
approximately[
8
,
9
], but this method is time-
consuming and easy to ignore the time informa-
tion, and the extracted keyframes may not cover
the whole sequence.
Algorithms concerning clustering-based and
optimization-based keyframe extraction have be-
come hot topic in recent years[
10
,
11
]. B. Sun et
al.[
12
] proposed a keyframe extraction method
based on affine propagation (AP) clustering algo-
rithm to adaptively search for the best keyframe
of video. The algorithm is fast and can extract
high-quality keyframes, which is also easy to im-
plement. Q. Zhang et al.[
13
] used ISODATA dy-
namic clustering algorithm to cluster all frames
and extracted the frames close to the clustering
center as keyframes. This method is suitable for
a variety of motion types without user-specified
parameters. However, the keyframes extracted
by clustering method are not in chronological
order, and it is difficult to represent the original
motion sequence information.
Y. Zhang et al.[
14
] proposed a grey Wolf
optimization algorithm, in which the extracted
keyframes can well summarize the original mo-
tion sequence and maintain the consistency of
keyframes between similar motion sequences. A
genetic algorithm based on multiple populations
proposed by Q. Zhang[
15
] et al. mimics natu-
ral selection and evolution mechanism, and the
evolution process does not need to artificially
specify threshold parameters, with fast conver-
gence speed, but still cannot specify the number
of keyframes. G. Y. Xia et al.[
16
] proposed a
novel model called ”Joint kernel sparse Repre-
sentation”, which could perfectly simulate the
sparseness of human motion and Riemannian
manifold structure, and extract keyframes from
it. However, it does not effectively solve the
problem of keyframe redundancy, and the time
complexity is O(n2).
2.2 Motion data reconstruction
Linear interpolation and quaternion interpola-
tion are commonly used in motion reconstruction
[
5
,
6
,
12
,
14
]. In animation, linear interpolation
and quaternion interpolation are very good for
intermediate frame compensation. Using the da-
ta fitting method to correct the motion capture
error caused by deleting force feedback can ef-
fectively reduce the traditional motion error and
has good application value[
17
]. However, if the
original motion sequence is known, using these
two methods to reconstruct the motion sequence
will cause a large reconstruction error, because
they completely discard the constraints of the
non-keyframe parts on the rotation information
curves.
3 The proposed Methodology
We solved two problems, keyframe extraction
and motion data reconstruction, which will be
discussed separately in the following sections.
3.1 KeyFrame Extraction Method
The keyframe extraction method consists of
three steps: curve simplification, key point and
keyframes acquisition, binomial fitting parame-
ters acquisition. We treat the rotation information
in each direction of the joints and the movement
information in each direction of the root joint as a
single curve. Then the binomial fitting is used to
fit the curve in segments, and the points of each
segment of the curve are obtained. All of these
points are used as a set for density clustering, and
the density centers are extracted as the keyframes.
Finally, the rotation information curves are seg-
mented by the keyframes, and binomial fitting is
performed by the least square method to obtain
the fitting parameters.
3.1.1 Simplified Curve Model
When human body moves, many joints change at
the same time. Therefore, the information is rich
and the calculation is complicated. Human mo-
tion capture records the rotation information of
each joint and the translation information of the
root joint. This information is three-dimensional,
and each dimension of each joint records a piece
of information Some joints have only one or two
degrees of freedom, so the rotation information
is only one or two dimensions. According to
the characteristics of human motion capture data,
we take time as the independent variable and the
motion information of each dimension of human
joints as dependent variables, as a result of which,
the integrated motion is reduced to simple curves.
We use the curve
Li
to represent the change of ro-
tation of a joint in one direction or the trajectory
of the root joint in one direction. For each motion
sequence, we can obtain 55 rotation information
curves and 3 motion trajectory curves. Then the
motion sequence is expressed as a set of curves
M=L1, ...Li, ...Lm
, where
i
represents the ith
curve and m is 58.
3.1.2 Key Points Extraction
It is difficult to describe the continuous rotation
information curves with the function
L=f(x)
,
but it is feasible to divide the curve into small
segments and then use binomial fitting. So
Li
can be expressed as:
Li=
li,1=fi,1(xj)j[1, ni1)
li,2=fi,2(xj)j[ni1, ni2)
.
.
.
li,k =fi,k(xj)j[nik 1, nik)
(1)
Where
n1, n2
and
nk1
are segment points and
there are
n
frames in the motion sequence, and
j
means the jth frame. So the rotation information
curves is divided into
k
segments to fit. For differ-
ent
Li
, the value of
k
is also different. We want
to keep the fitting effect and the fewer segments,
the better. In order to achieve this goal, the test
index of goodness-of-fit of regression equation
R2
is introduced to restrict the curve piecewise
fitting and find the longest piecewise fitting.
R2
is calculated as follows:
R2=SSR
SST =
n
P
i=1
(ˆyiy)2
n
P
i=1
(yiy)2
= 1 SSE
SST
= 1
n
P
i=1
(yiˆyi)2
n
P
i=1
(yiy)2
(2)
Where SST is the total sum of squares, SSR is
the sum of regression squares, SSE is the sum
of residual squares, and SST = SSR + SSE, and
SST =
n
P
i=1
(yiy)2
,
SSR =
n
P
i=1
(ˆyiy)2
,
SSE =
n
P
i=1
(yiˆyi)2.
We designed a self-reducing step length based
on the idea of dichotomy to get the segment-
ed points of the rotation information curves
according to Algorithm 1. The specific pro-
ducer is summarized as Algorithm 1: Where
R2(start, old end)
represents the goodness-of-
fit between
start
and
old end
of each section.
R2
represents the goodness-of-fit specified by
the user.
Keylist
stores the segment points ob-
tained. Using Algorithm 1, we can minimize the
number of segments of the rotation information
curves, and each segment is the longest, under
the current R2.
We used binomial
f(x) = a0+a1x
to fitting
the rotation information curves because it is sen-
sitive to changes in the slope of a curve. Points
taken from the parts of the curves where the s-
lope changes dramatically can better indicate the
Algorithm 1 Get the segment points
1: Keylist(1) = 1, i = 1
2: Input R2
3:
Set step size to
step
, starting point
start =
1
, temporary ending point
old end =
start+step
, final ending point
end =start
4: while start! = ndo
5: while old end! = end do
6: if R2(start, old end)>=R2then
7: end =old end
8: old end =old end +step
9: else
10: step =|old end end|/2
11: end =old end
12: old end =old end step
13: end if
14: end while
15: end while
16: Keylist(i+ +) = end
trend of the curves. In order to ensure that the
fitting result meets the given goodness of fit
R2
,
the curve will be segmented in a timely manner
when a region with a large change in slope is
encountered. Therefore, binomial fitting can be
used to obtain the key points that can represent
the overall change of the curves without calculat-
ing the curvature change of the curves.
3.1.3 Determining KeyFrame and Obtaining
Binomial Parameter
Although the segment points of each curve are
different, they are all taken from the continuous
frame numbers between 1 and
n
(suppose the
motion sequence has n frames), so some frames
must be selected multiple times as segmentation
points. For example, assuming that
m
curves are
piecewise fitted, according to Algorithm 1, the
first frame and the last frame would be selected
as segment points
m
times. For the same reason,
other frames may also be selected as segmen-
t points multiple times. As shown in Figure 1,
some frames are selected as segmented points
many times. But other frames near these frames
are selected as segmented points fewer times. It
indicates that the segment points are in a clus-
tered state, so we can select keyframes from the
clusters. Obviously, the density centers are the
best candidate keyframes if density clustering
is carried out on the set of segment points. In
Figure 1: Number distribution of key points
Figure 1, the horizontal axis represents the frame
ordinal number, and the vertical axis represents
the number of times the frame has been recorded
According to the characteristics of the segment
point set, we chose DBSCAN (Density-Based
Spatial Clustering of Applications with Noise)
for clustering.
ε
neighborhood and MinPts are
the most important impact factors of DBSCAN
clustering, which we will discuss in the section
4.
Another thing that needs special mention is
that in this clustering algorithm, the samples of
the segment points set we used as input are not
the coordinates of points in multidimensional
space, but the coordinates on one-dimensional
space. They are the serial frame numbers of the
motion sequence.
Using DBSCAN to cluster segment point
set, clustering results
Set1, ..., S etk, ...
and
Setp
are obtained. The keyframes sequence
keyf ramelist
is obtained by taking cluster cen-
ter
centerk
as the keyframe. The cluster center is
the point with the smallest sum of distance from
all other points. For the
ith
point in the class
Setk
, the sum of the distances of the other points
from it can be expressed as:
Dist(ik) =
LFk
X
jk=F Fk
(jkik)Nj(3)
ik, jkSetk
is the serial frame number,
F Fk
is the first frame in
Setk
,
LFk
is the last frame
in
Setk
, and
Nj
means that the
jk
th frame was
selected as a segment point
Nj
times. So the
cluster center centerksatisfies:
Dist(centerk) = min(Dist(ik)) (4)
where ikSetk.
Finally,we need to add the first frame 1 and
the last frame
n
of the motion sequence to
the
keyf ramelist
if they don’t exist. At the
same time, if there are consecutive frames in the
keyList, we should delete some of them and keep
only one of them.
3.2 Motion data Reconstruction Method
We proposed a novel method for motion data
recovery based on binomial coefficients. This
method is different from quaternion spherical in-
terpolation algorithm and linear interpolation al-
gorithm. In addition to the keyframe information,
the binomial coefficients need to be obtained in
advance. After getting the keyframe sequence,
we still need to get the binomial coefficients. Fig-
ure 2 shows the process of getting the parameters.
Figure 2: Clustering effects
So the motion curve set
M
can be reconstruct-
ed through the keyframes and binomial coeffi-
cients set
A
and
B
. As for the rotation informa-
tion curve Li, it can be reconstructed as:
L
i=
f
i,1(x) = ai,1x+bi,1x[kL(1), kL(2))
.
.
.
f
i,j (x) = ai,j x+bi,j x[kL(j), kL(j+ 1))
.
.
.
f
i,k(x) = ai,k x+bi,k x[kL(k), k L(end)]
(5)
j
represents the jth segment of curve
Li
, which
is divided into
k
segments by keyframes,
kL
represents keyf ramelist.
4 Discussion of parameters
There are three parameters need to be determined
in the proposed method: the distance threshold
ε
,the neighborhood sample threshold MinPts and
the goodness-of-fit
R2
. We tested 200 exercise
videos randomly selected from the Carnegie Mel-
lon University (CMU) Motion Capture Database.
In this paper, we introduced the mean error of
joint space coordinates as the evaluation metric
illustrate the quality of the keyframe extraction
results. The error Eis calculated as follows:
E=Ps
i=1 Pn
j=1 || Fo
i,j Fr
i,j ||2
s×n(6)
In order to compare with similar papers, the re-
construction error is calculated based on the tra-
jectory curve of the joint. So we first use the
rotation information of the joint to calculate the
trajectory curve and then calculate the reconstruc-
tion error. So in this formula,
Fo
i,j
is the jth joint
space coordinate in the ith frame of the original
motion sequence, and
Fr
i,j
is the jth joint space
coordinate in the ith frame of the reconstruction
motion sequence.
s
is the number of joints,
n
is
the number of frames.
4.1 εand MinPts
ε
represents the distance between samples. For
the reason that samples themselves are integers,
the distance between the samples is also an inte-
ger, so the best value of εis an integer.
In our experiments, when
ε
is larger than 1,
outliers will increase, and the reconstruction er-
ror of extracted keyframes is much larger than
that of
ε
is equal to 1, so the clustering effect is
poor, as shown in Figure 3(b). If
ε
is 0, every
frame that appears will be individually clustered
into a cluster, which leads to serious redundan-
cy of keyframes, so it is not suitable either. But
as the Figure 3(a) shows, when
ε
=1 and MinPts
(a) ε=1 MinPt=1
(b) ε=2 MinPt=12
Figure 3: Clustering effects
takes the appropriate value, the clustering effect
is decent.
Figure 3(a) shows the best clustering effect
of a motion sequence when
ε
is equal to 1 and
Figure 3(b) shows the best clustering effect of
a motion sequence when
ε
is equal to 2. The
x-coordinate represents the number of elements
in the same class, and the y-coordinate represents
the frame number.
o
indicates outliers, and the
same color of x represents one cluster. It can be
seen that there are a large number of outliers in
Figure 3(b). This indicates that the clustering
effect is the best when εis equal to 1.
MinPts is the minimum number of neighbor-
hood samples. In DBSCAN, its difficult to select
MinPts. In order to find the optimal value range
of MinPts, we conducted such an experiment:
ε
is 1, and
R2
is set at 0.05 intervals between 0.05
and 0.90. When
R2
is less than 0.90, the recon-
struction error of all motion sequences tested is
less than 0.20, and the average reconstruction
error is less than 0.03. When
R2
is greater than
0.90, the reconstruction error of some motion se-
quences is above 3.0, and some are even as high
as 63.45, and the average reconstruction error is
also above 1.00. Therefore, when
R2
is greater
than 0.90, the reconstruction effect is not only
poor, but also unstable. So in this article, we
will only discuss the part where
R2
is small than
0.90. For each
R2
value, MinPts starts at 1 and
increments by 1 for each cycle. In each loop,
we obtained the keyframes of the input motion
sequence and calculated the RICE (the recon-
struction error of rotation information curves).
By comparing RICE corresponding to each
MinPts, select the MinPts with the minimum
RICE , which is the best MinPts corresponding
to the current
R2
value. Under the condition that
ε
is 1, we conducted experiments on 200 motion
sequences and obtained the optimal MinPts range
for each R2value, as shown in Table 1.
4.2 Coefficient of Determination R2
R2
has a significant effect on keyframe extrac-
tion. The larger the
R2
is, the more segments the
rotation information curve will segment. How-
ever, the more segmentation points there are, the
closer the density connection between samples
will be, and some important frames will be classi-
fied as one category due to the increase of the den-
sity, leading to a result that only one of several po-
tential keyframes is ultimately extracted, which
leads to insufficient extraction of keyframes and
the increase of motion data reconstruction error.
We tested 200 motion sequences to explore the
effect of
R2
on compression ratio (the ratio of
the number of keyframes to the total number of
frames) and reconstruction error. Figure 4 and
Figure 5 reflect the influence of
R2
on the com-
pression ratio and the compression ratio on the
reconstruction error.
Figure 4 shows the relationship between
R2
and compression ratio. As can be seen from Fig-
ure 4,compression increases when
R2
between
0 and 0.3, after that it decreases. Figure 5 re-
flects the relationship between the compression
rate and the reconstruction error. In general, the
Table 1: Different MinPts ranges with different values of R2
R20.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90
MinPts 1 1 1 1 1 1 1 1 1-5 1 1-5 1-8 1-7 1-10 1-13 1-15 6-18 7-23
reconstruction error is small than 0.16. As the
compression rate increases, the reconstruction er-
ror gradually decreases. When compression ratio
is small then 0.1537, the error decreases rapidly.
When compression ratio is larger then 0.1537,
the error decreases slowly. For keyframe extrac-
tion attempts to ensure a small compression rate
and reconstruction error at the same time, the
R2
corresponding to a compression rate of 0.1537
is the most appropriate, with a value of 0.65, as
shown in Figure 4.
Figure 4: Compression Ratio and R2
Figure 5:
Reconstruction Error and Compression
Ratio
Table 2: Results of Keyframe Extraction
classes jump cartwheel dance
Total Frames 295 575 1131
Number of Keyframes 37 69 134
Compression ratio 12.50% 12.00% 11.80%
Computation time(s) 9.42 15.46 41.95
5 Result
In section 4, we determined that parameter
ε
is 1,
R2
is 0.65, and
R2
corresponds to MinPts rang-
ing from 1 to 7. In this section, we will use these
parameters to perform keyframe extraction and
reconstruction error calculation, and compare the
advantages and disadvantages of our motion data
reconstruction algorithm and the traditional mo-
tion data reconstruction algorithm, as well as the
advantages and disadvantages of our keyframe
extraction algorithm and the traditional keyframe
extraction algorithm.
Figure 6, show the key frames extracted from
different motions. As can be seen from the figure,
the key frame extracted by our method can well
represent the changing trend of the motions.
Table 2 lists the number of keyframes, com-
pression ratios, and calculation time for different
action classes. As can be seen from the results,
the compression ratio and computation time give
an ideal result.
The keyframes extracted by our method are
reconstructed using three different motion data
reconstruction methods. Table 3 lists the recon-
struction error of three different reconstruction
methods for the three classes motions. BFE is the
error of the motion data reconstruction algorithm
we proposed, LIE is the reconstruction error of
the linear interpolation algorithm, and QSIE is
the reconstruction error of the quaternion spheri-
cal interpolation algorithm. As can be seen from
the results that the reconstruction effect of lin-
ear interpolation is the worst, followed by QSIE,
and the motion data reconstruction algorithm pro-
posed by us has the best effect.
It is easy to understand that our method is
better than linear interpolation. The linear inter-
polation formula is shown below:
flerp(p, q, t) = t(pq) + p(7)
Table 3:
Results of different reconstruction meth-
ods
classes jump cartwheel dance
BFE 0.0095 0.0793 0.0095
LIE 0.0599 0.2094 0.0176
QSIE 0.0218 0.0176 0.0172
p
and
q
are the values of two adjacent keyframes
in the keyframe sequence and represent the or-
dinate of the original curve when reconstructing
rotation information curve, and
t
is the interval.
In fact, two adjacent keyframes are connected by
line segments, and then intermediate frames are
found on the line segments according to the in-
terval. Our method fits a curve with the smallest
regression error according to the original rotation
information curve in advance, and reconstructing
the data is to restore the fitting curve.
Quaternion can reconstruct smooth motion tra-
jectories, but they also have limitations. The
condition for applying the quaternion is to con-
vert the rotation information of the joint into a
quaternion expression. Its algorithm is as fol-
lows:
fslerp(P, Q, t) = sin[(1 t)θ]P+sintθQ
sinθ (8)
P
,
Q
represents a vector containing the three-
dimensional rotation information of the join-
t, while the rotation information curve repre-
sents one-dimensional information, and the three-
dimensional information is represented by three
pieces of curves.
θ
is the angle between the vec-
tor
P
and
Q
. So the points quaternion spherical
interpolation obtained is on the rotation trajectory
of vector
P
, which is on the plane S determined
by
P
and
Q
. However, the vector
P
may ro-
tate from the space outside the
S
to the
Q
. For
example, when we straighten our arms, the fist
has more than one path from point Ato point B
in space, but the quaternion spherical interpola-
tion algorithm can only calculate the trace on the
plane determined by points
A
,
B
and shoulder
joint
C
. Our method reconstructs the rotation in-
formation of each dimension of the joint, so the
trajectory found by our method is closer to the
real motion path than the trajectory calculated by
the quaternion spherical interpolation algorithm,
and the error is smaller.
Compared with the quaternion spherical in-
terpolation and linear interpolation algorithms,
the binomial fitting based reconstruction algorith-
m has smaller reconstruction error, but it needs
extra storage space to store parameters. When us-
ing quaternion spherical interpolation algorithm
or linear interpolation algorithm to reconstruc-
t the motion sequence, not only the keyframe
sequence but also the motion information corre-
sponding to the keyframe is needed. Assuming
that a motion sequence with
n
frames can decom-
pose
m
motion curves, the total motion informa-
tion of the sequence is
mn
. If
k
keyframes are
extracted from them, the total amount of infor-
mation to be recorded is
k+km
for motion
reconstruction. Therefore, after extracting the
key, the ratio of the stored information and the to-
tal amount of the original motion information is
(k+km)/(nm) = k/n
, namely the compres-
sion ratio of the keyframe. If binomial fitting re-
construction is used, according to
y=ax +b
, al-
though the motion information corresponding to
the key frame is not recorded, fitting parameters
should be recorded. According to Figure 2, each
curve segment needs to record two parameter in-
formation, so the total motion information to be
recorded is
k+ 2 km
. The ratio of stored in-
formation to the total amount of original motion
information is
(k+2km)/(nm) = 2k/n
.
Therefore, using binomial fitting to reconstruc-
t the motion sequence requires about twice as
much information to be saved when extracting
the key frame as the other two methods.
Figure 7, 8,9 show the comparison between the
original sequence and the reconstructed sequence
based on the keyframes extracted by our method,
joint kernel sparse representation algorithm and
affinity propagation algorithm, where green is the
original data and red is the reconstructed result.
In Figure 7 dance motion sequence has 1131
frames, and the number of keyframes is 134 with
compression ratio of 11.80%. As can be seen
from Figure 7(a), there is little difference be-
tween the reconstructed frames and the original
data. Figure 7(b) shows a significant difference
between the reconstructed results and the original
data at the foots after dance motion beginning.
In Figure7(c), the errors at foots and head are ob-
vious at the beginning, and later the upper body
also appeared a slight error.
In Figure 8 dance motion sequence has 575
frames, and the number of keyframes is 69 with
compression ratio of 12.0%. It can be seen from
Figure 8(a) that there is no obvious error in the
reconstruction results except that a little error in
(a) Keyframe sequences extracted from dance motion
(b) Keyframe sequences extracted from cartwheel motion
(c) Keyframe sequences extracted from jump motion
Figure 6: Results of keyframe extraction
the ball of the foot at the beginning of the motion.
However, it can be seen from Figure 8(b) and
Figure 8(c) that there is a large error between
the reconstruction results and the original data.
In Figure 8(b), the error is significant at the be-
ginning phases of the motion and in the middle
phases of the motion, the reconstructed results
are markedly different from the original data. In
Figure 8(c), there is an noticeable error at the be-
ginning of the motion, and before the end of the
motion, the difference between the reconstructed
results and the original data is obvious.
In Figure 9 jump motion sequence has 295
frames, and the number of keyframes is 37 with
compression ratio of 12.5%, Figure 9(a) shows a
good reconstruction effect, with only slight errors
in the feet. In Figure 9(b), there is a significant
error in the arm when the person jumps. In Figure
9(c), there are errors in both the arms and feet
during the squat and the jump.
Table 4 compares the reconstruction errors
of the three keyframe extraction algorithms.
AE represents Affine Propagation algorithm[
12
]
and JE represents Joint Kernel Representation
algorithm[
16
]. Since both algorithms use quater-
nion method for reconstruction, we compare the
reconstruction error of using quaternion method
with them. As can be seen from the experimental
results, the reconstruction error of the proposed
Table 4:
Reconstruction error of three keyframe
extraction algorithms
classes jump cartwheel dance
Our method 0.0218 0.0176 0.0172
AE 0.4518 0.2566 0.248
JE 0.1483 0.47 0.0324
method is significantly better than Affine Propa-
gation algorithm and Joint Kernel Representation
algorithm.
6 Conclusion
We proposed a new keyframe extraction algo-
rithm and a motion data reconstruction method
based on multiple binomial fitting. They are sim-
ple and effective. These two algorithms compre-
hensively consider the motion changes of various
parts of the joint and obtain impressive results.
Experiments show that the proposed method is
superior to the linear interpolation algorithm and
the quaternion spherical interpolation algorithm.
Although the key frame compression rate extract-
ed by our algorithm is relatively large, averaging
over 10%, which needs to be improved in future
work. But the reconstruction error is much small-
er than the algorithm based on clustering and
optimization.
(a) The proposed method
(b) Joint kernel sparse representation algorithm
(c) Affinity propagation algorithm
Figure 7: Results of original dance motion and reconstructed motion
(a) The proposed method
(b) Joint kernel sparse representation algorithm
(c) Affinity propagation algorithm
Figure 8: Results of original cartwheel motion and reconstructed motion
(a) The proposed method
(b) Joint kernel sparse representation algorithm
(c) Affinity propagation algorithm
Figure 9: Results of original jump motion and reconstructed motion
Acknowledgement
This work is partially funded by Key Laboratory
of Agricultural Internet of Things, Ministry of
Agriculture and Rural Affairs, China(2018AIOT-
09), Key Research and Development Program of
Shaanxi Province(2018NY-127), and the Shaanx-
i Key Industrial Innovation Chain Project in A-
gricultural Domain(Grant No. 2019ZDLNY02-
05).The authors acknowledge Carnegie Mellon
University for the motion capture data resources.
References
[1]
Z. J. Li, C. Y. Su, L. Y. Wang, et al. Non-
linear Disturbance Observer-Based Control
Design for a Robotic Exoskeleton Incorpo-
rating Fuzzy Approximation. Ieee Transac-
tions on Industrial Electronics, 62(9):5763-
5775, 2015.
[2]
S. Y. Shin, C. Kim Human-Like Motion
Generation and Control for Humanoid’s D-
ual Arm Object Manipulation. Ieee Transac-
tions on Industrial Electronics, 62(4):2265-
2276, 2015.
[3]
T. Sasaki, D. Brscic, H. Hashimoto Human-
Observation-Based Extraction of Path Pat-
terns for Mobile Robot Navigation. Ieee
Transactions on Industrial Electronics,
57(4):1401-1410, 2010.
[4]
Y. R. Li, Z. Wang, X. S. Yang, et al. Ef-
ficient convolutional hierarchical autoen-
coder for human motion prediction. Visual
Comput, 35(6-8):1143-1156, 2019.
[5]
C. Halit,Capin T. Multiscale motion salien-
cy for keyframe extraction from motion
capture sequences. Comput Animat Virt W,
22(1):3-14, 2011.
[6]
T. Miura, T. Kaiga, T. Shibata, et al. A hy-
brid approach to keyframe extraction from
motion capture data using curve simplifi-
cation and principal component analysis[J].
IEEJ Transactions on Electrical and Elec-
tronic Engineering, 2015, 9(6):697-699.
[7]
Jun Xiao, Yueting Zhuang, Tao Yang, et al.
An Efficient Keyframe Extraction from Mo-
tion Capture Data[C]// International Con-
ference on Advances in Computer Graphics.
Springer-Verlag, 2006.
[8]
K. S. Huang, C. F. Chang, Y. Y. Hsu, et
al. Key Probe: a technique for animation
keyframe extraction. Visual Comput, 21(8-
10):532-541, 2005.
[9]
Y. H. Gong, X. Liu Video summarization
and retrieval using singular value decom-
position. Multimedia Syst, 9(2):157-168,
2003.
[10]
X. M. Liu, A. M. Hao, D. Zhao
Optimization-based keyframe extraction for
motion capture animation. Visual Comput,
29(1):85-95, 2013.
[11]
X. j. Chang, P. F. Yi et al. Key Frames Ex-
traction from Human Motion Capture Data
Based on Hybrid Particle Swarm Optimiza-
tion Algorithm[J], Recent Developments in
Intelligent Information and Database Sys-
tems, pp335-342.
[12]
B. Sun, Kong D., S. Wang, et al. Keyframe
extraction for human motion capture da-
ta based on affinity propagation[C]// 2018
IEEE 9th Annual Information Technolo-
gy, Electronics and Mobile Communication
Conference (IEMCON). IEEE, 2018.
[13]
Q. Zhang, S. P. Yu, D. Zhou, et al. An Ef-
ficient Method of Key-Frame Extraction
Based on a Cluster Algorithm. J Hum Kinet,
39(1):5-13, 2013.
[14]
Y. Zhang, J. Li, M. Zhang, et al. Motion
Key frame Extraction Based on Grey Wolf
Optimization Algorithm[J]. MATEC Web
of Conferences, 2018, 232.
[15]
Q. Zhang, S. Zhang, D. Zhou,et al.
Keyframe Extraction from Human Motion
Capture Data Based on a Multiple Pop-
ulation Genetic Algorithm[J]. Symmetry,
2014, 6(4):926-937.
[16]
G. Y. Xia, H. J. Sun, X. Q. Niu, et al.
Keyframe Extraction for Human Motion
Capture Data Based on Joint Kernel Sparse
Representation. Ieee Transactions on Indus-
trial Electronics, 64(2):1589-1599, 2017.
[17]
Q. Z, G. Z, L. X, et al. Human Motion
Capture Error Correction Method without
Force-Feedback[C]. international confer-
ence on artificial intelligence, 2019.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Human motion prediction is a challenging problem due to the complicated human body constraints and high-dimensional dynamics. Recent deep learning approaches adopt RNN, CNN or fully connected networks to learn the motion features which do not fully exploit the hierarchical structure of human anatomy. To address this problem, we propose a convolutional hierarchical autoencoder model for motion prediction with a novel encoder which incorporates 1D convolutional layers and hierarchical topology. The new network is more efficient compared to the existing deep learning models with respect to size and speed. We train the generic model on Human3.6M and CMU benchmark and conduct extensive experiments. The qualitative and quantitative results show that our model outperforms the state-of-the-art methods in both short-term prediction and long-term prediction.
Article
Full-text available
In order to extract the key frames more effectively, we propose a key frame extraction method for human motion sequences based on Grey Wolf Optimization (GWO) algorithm. The fitness function is defined with the minimum reconstruction error and the optimal compression rate. The social hierarchy of grey wolves and hunting strategy are simulated to search key frames. Experimental results show that the proposed method can not only maintain the consistency of key frames between similar human motion sequences, but also effectively compress and summarize the original motion data. Under the same compression ratio, the reconstruction error is the minimum.
Article
Full-text available
To reduce reconstruction errors during keyframe extraction and to control the optimal compression ratio, this study proposes a method for keyframe extraction from human motion capture data based on a multiple population genetic algorithm. The fitness function is defined to meet the goals of minimal reconstruction errors and the optimal compression rate, where multiple initial populations are subjected to co-evolution. The multiple population genetic algorithm considers global and local search. Experimental results showed that the algorithm can effectively extract the keyframe from motion capture data and it satisfied the desired reconstruction error.
Article
Full-text available
The robot manipulation in human environment is a challenging issue because the human environment is complex, dynamic, unstructured, and difficult to perceive reliably. In order to implement promising robot applications in our daily lives, robots need to perform manipulation tasks within human environment. Particularly for a humanoid robot, the manipulability of the objects is essential to assist the humans in human environment. This paper presents a method for manipulating an object with both arms of a humanoid robot. We focus on the generation of human-like movements by using the human motion capture data. Then, the control method based on the virtual dynamics model is proposed to control both the motion and force under the uniform control system. This method empowers the robot to perform the object manipulation task including reaching, grasping, and moving an object in sequence. The proposed algorithm is implemented on a humanoid robot with independent joint controller at each motor; its performance is demonstrated by manipulating an object with both arms.
Conference Paper
Aiming at the influence of the lack of force-feedback on the accuracy of human motion in virtual reality applications, based on the human dynamics model and Newton-Euler calculation method, the upper limb movements under different external forces were calculated, and the dynamic calculation was verified by a series of experiments. On this basis, the data fitting method was used to correct the human motion capture error caused by the deletion of force-feedback, and the third-order polynomial fitting results of human joint angle and force magnitude were obtained. Further experiments have showed that the human action correction method can effectively reduce the traditional action errors mentioned and has good application value.
Chapter
Extracting key frames from human motion capture data is a hot issue of computer animation in recent years. Though the reconstruction error of the key frames by current methods is small, the number of key frames still needs to be reduced. In order to produce results with less key frames and small reconstruction error, we propose a method employing hybrid particle swarm optimization algorithm to extract key frames. By introducing evolution strategy of Genetic Algorithm (GA) to hybrid particle swarm optimization algorithm, the method can get key frames with optimal compression ratio and small reconstruction error. Experimental results show the effectiveness of our method.
Article
Human motion capture data, which is used to animate animation characters, has been widely used in many areas. To satisfy the high-precision requirement, human motion data is captured with a high frequency (120 frames/sec) by a high-precision capture system. However, the high frequency and nonlinear structure make the s- torage, retrieval, and browsing of motion data challenging problems, which can be solved by keyframe extraction. Cur- rent keyframe extraction methods do not properly model two important characteristics of motion data, i.e. sparse- ness and Riemannian manifold structure. Therefore, we propose a new model called Joint Kernel Sparse Represen- tation, which is in marked contrast to all current keyframe extraction methods for motion data and can simultaneously model the sparseness and the Riemannian manifold struc- ture. The proposed model completes the sparse represen- tation in a kernel-induced space with a geodesic exponen- tial kernel, while the traditional sparse representation (SR) cannot model the nonlinear structure of motion data in Euclidean space. Meanwhile, because of several important modifications to traditional SR, our model can also exploit the relations between joints and solve two problems, i.e. the unreasonable distribution and redundancy of extracted keyframes, which current methods do not solve. Extensive experiments demonstrate the effectiveness of the proposed method.
Article
To perform power augmentation tasks of a robotic exoskeleton, this paper utilizes fuzzy approximation and designed disturbance observers to compensate for the disturbance torques caused by unknown input saturation, fuzzy approximation errors, viscous friction, gravity, and payloads. The proposed adaptive fuzzy control with updated parameters' mechanism and additional torque inputs by using the disturbance observers are exerted into the robotic exoskeleton via feedforward loops to counteract to the disturbances. Through such an approach, the system does not need any requirement of built-in torque sensing units. In order to validate the proposed framework, extensive experiments are conducted on the upper limb exoskeleton using the state feedback and output feedback control to illustrate the performance of the proposed approaches.
Article
In this paper, we propose a novel method to extract keyframes from motion capture data. A hybrid approach, which combines a curve-simplification algorithm with an initialization procedure including principal component analysis, is adopted. The developed method automatically extracts an appropriate number of keyframes at high speed without performance degradation. Experimental results prove the effectiveness of the present method. © 2014 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.