Content uploaded by Xuequan Lu

Author content

All content in this area was uploaded by Xuequan Lu on Dec 28, 2020

Content may be subject to copyright.

KeyFrame Extraction for Human Motion Capture

Data via Multiple Binomial Fitting

Chenxu Xu1, Wenjie Yu1, Yanran Li4, Xuequan Lu5, Meili Wang ∗123 and

Xiaosong Yang4

1College of Information Engineering, Northwest A&F University,Yangling 712100,Chian

2Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture, Yangling 712100, China

3Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling 712100,

China

4Bournemouth University, NCCA

5Deakin University - Waurn Ponds Campus, School of Information Technology

Abstract

In this paper, we make two contributions. The

ﬁrst is to propose a new keyframe extraction

algorithm, which reduces the keyframe re-

dundancy and reduces the motion sequence

reconstruction error. Secondly, a new motion

sequence reconstruction method is proposed,

which further reduces the error of motion

sequence reconstruction. Speciﬁcally, we treated

the input motion sequence as curves, then the

binomial ﬁtting was extended to obtain the points

where the slope changes dramatically in the

vicinity. Then we took these points as inputs to

obtain keyframes by density clustering. Finally,

the motion curves were segmented by keyframes

and the segmented curves were ﬁtted by binomial

formula again to obtain the binomial parame-

ters for motion reconstruction. Experiments

show that our methods outperform existing

techniques, in terms of reconstruction error.

Keywords:

keyframe extraction, motion capture,

curve simpliﬁcation, computer animation

1 Introduction

Motion capture plays an important role in com-

puter games, virtual environments, movie effects,

∗Corresponding author: wml@nwsuaf.edu.cn

robotics [

1

,

2

,

3

], motion prediction [

4

], and other

ﬁelds. Motion capture originates from authentic

motion, so even complex motion can produce

real effects in real time, which is very attrac-

tive for related ﬁelds. However, there still exist

signiﬁcant problems while using motion capture

data. Human motion data is usually captured

at high frequencies, which leads to a very large

amount of data. As a result, it is difﬁcult to the

storage, retrieval, browse and reuse the capture

data. It seems that keyframe extraction brings

hope for solving the problems. As its name sug-

gests, the technique of keyframe extraction is

to select the most representative frames from

the motion sequence to represent the whole mo-

tion sequence. It eliminates redundant frames

in motion sequences, saves the space needed for

storage, facilitates the browsing and editing of

motion sequences, and improves the reuse efﬁ-

ciency of motion capture sequences. From what

has been mentioned above, keyframes should

have characteristics as follows: (1) Keyframes

should be as few as possible while reﬂecting the

overall movement trend of motion sequence. (2)

Keyframes can reconstruct the motion sequence

with minimum error by interpolation.

In this paper, a curve ﬁtting method is pro-

posed to extract keyframes from motion capture

sequences. First, the input motion data is re-

garded as a set of rotation information curves,

This is a preprint

because except for three records that are the lo-

cus information of the root joint, all the other

records are the rotation information of the joint.

Then identify the areas of the curve where the s-

lope changes dramatically and the points in these

areas can best represent the overall trend of joints

motion changes. We employ binomial to piece-

wise ﬁt the rotation information curves, make the

curves ﬁt to the speciﬁed goodness-of-ﬁt

R2

, and

get the segment points. Then we cluster the seg-

ment points, and the obtained clustering centers

are optimized to obtain the keyframes. At the

same time, we also propose a method of motion

data reconstruction based on several areas bino-

mial ﬁtting. Its reconstruction error is lower than

the current popular method.

2 Related Work

2.1 KeyFrame Extraction

The methods of keyframe extraction main-

ly include curve simpliﬁcation-based, ma-

trix decomposition-based, cluster-based and

optimization-based. Curve simpliﬁcation takes

a frame as a point and a motion sequence as a

curve in a high-dimensional space, and realizes

keyframe extraction by ﬁnding a series of points

that can depict the whole curve well[

5

,

6

,

7

].

However, the keyframe is extracted according to

the local extremum of the motion curve, and oth-

er parts of the curve with obvious slope change

are ignored. The method based on matrix de-

composition is to represent motion sequence

as motion matrix, and then decompose the ma-

trix into weight matrix and keyframe matrix

approximately[

8

,

9

], but this method is time-

consuming and easy to ignore the time informa-

tion, and the extracted keyframes may not cover

the whole sequence.

Algorithms concerning clustering-based and

optimization-based keyframe extraction have be-

come hot topic in recent years[

10

,

11

]. B. Sun et

al.[

12

] proposed a keyframe extraction method

based on afﬁne propagation (AP) clustering algo-

rithm to adaptively search for the best keyframe

of video. The algorithm is fast and can extract

high-quality keyframes, which is also easy to im-

plement. Q. Zhang et al.[

13

] used ISODATA dy-

namic clustering algorithm to cluster all frames

and extracted the frames close to the clustering

center as keyframes. This method is suitable for

a variety of motion types without user-speciﬁed

parameters. However, the keyframes extracted

by clustering method are not in chronological

order, and it is difﬁcult to represent the original

motion sequence information.

Y. Zhang et al.[

14

] proposed a grey Wolf

optimization algorithm, in which the extracted

keyframes can well summarize the original mo-

tion sequence and maintain the consistency of

keyframes between similar motion sequences. A

genetic algorithm based on multiple populations

proposed by Q. Zhang[

15

] et al. mimics natu-

ral selection and evolution mechanism, and the

evolution process does not need to artiﬁcially

specify threshold parameters, with fast conver-

gence speed, but still cannot specify the number

of keyframes. G. Y. Xia et al.[

16

] proposed a

novel model called ”Joint kernel sparse Repre-

sentation”, which could perfectly simulate the

sparseness of human motion and Riemannian

manifold structure, and extract keyframes from

it. However, it does not effectively solve the

problem of keyframe redundancy, and the time

complexity is O(n2).

2.2 Motion data reconstruction

Linear interpolation and quaternion interpola-

tion are commonly used in motion reconstruction

[

5

,

6

,

12

,

14

]. In animation, linear interpolation

and quaternion interpolation are very good for

intermediate frame compensation. Using the da-

ta ﬁtting method to correct the motion capture

error caused by deleting force feedback can ef-

fectively reduce the traditional motion error and

has good application value[

17

]. However, if the

original motion sequence is known, using these

two methods to reconstruct the motion sequence

will cause a large reconstruction error, because

they completely discard the constraints of the

non-keyframe parts on the rotation information

curves.

3 The proposed Methodology

We solved two problems, keyframe extraction

and motion data reconstruction, which will be

discussed separately in the following sections.

3.1 KeyFrame Extraction Method

The keyframe extraction method consists of

three steps: curve simpliﬁcation, key point and

keyframes acquisition, binomial ﬁtting parame-

ters acquisition. We treat the rotation information

in each direction of the joints and the movement

information in each direction of the root joint as a

single curve. Then the binomial ﬁtting is used to

ﬁt the curve in segments, and the points of each

segment of the curve are obtained. All of these

points are used as a set for density clustering, and

the density centers are extracted as the keyframes.

Finally, the rotation information curves are seg-

mented by the keyframes, and binomial ﬁtting is

performed by the least square method to obtain

the ﬁtting parameters.

3.1.1 Simpliﬁed Curve Model

When human body moves, many joints change at

the same time. Therefore, the information is rich

and the calculation is complicated. Human mo-

tion capture records the rotation information of

each joint and the translation information of the

root joint. This information is three-dimensional,

and each dimension of each joint records a piece

of information Some joints have only one or two

degrees of freedom, so the rotation information

is only one or two dimensions. According to

the characteristics of human motion capture data,

we take time as the independent variable and the

motion information of each dimension of human

joints as dependent variables, as a result of which,

the integrated motion is reduced to simple curves.

We use the curve

Li

to represent the change of ro-

tation of a joint in one direction or the trajectory

of the root joint in one direction. For each motion

sequence, we can obtain 55 rotation information

curves and 3 motion trajectory curves. Then the

motion sequence is expressed as a set of curves

M=L1, ...Li, ...Lm

, where

i

represents the ith

curve and m is 58.

3.1.2 Key Points Extraction

It is difﬁcult to describe the continuous rotation

information curves with the function

L=f(x)

,

but it is feasible to divide the curve into small

segments and then use binomial ﬁtting. So

Li

can be expressed as:

Li=

li,1=fi,1(xj)j∈[1, ni1)

li,2=fi,2(xj)j∈[ni1, ni2)

.

.

.

li,k =fi,k(xj)j∈[nik −1, nik)

(1)

Where

n1, n2

and

nk−1

are segment points and

there are

n

frames in the motion sequence, and

j

means the jth frame. So the rotation information

curves is divided into

k

segments to ﬁt. For differ-

ent

Li

, the value of

k

is also different. We want

to keep the ﬁtting effect and the fewer segments,

the better. In order to achieve this goal, the test

index of goodness-of-ﬁt of regression equation

R2

is introduced to restrict the curve piecewise

ﬁtting and ﬁnd the longest piecewise ﬁtting.

R2

is calculated as follows:

R2=SSR

SST =

n

P

i=1

(ˆyi−y)2

n

P

i=1

(yi−y)2

= 1 −SSE

SST

= 1 −

n

P

i=1

(yi−ˆyi)2

n

P

i=1

(yi−y)2

(2)

Where SST is the total sum of squares, SSR is

the sum of regression squares, SSE is the sum

of residual squares, and SST = SSR + SSE, and

SST =

n

P

i=1

(yi−y)2

,

SSR =

n

P

i=1

(ˆyi−y)2

,

SSE =

n

P

i=1

(yi−ˆyi)2.

We designed a self-reducing step length based

on the idea of dichotomy to get the segment-

ed points of the rotation information curves

according to Algorithm 1. The speciﬁc pro-

ducer is summarized as Algorithm 1: Where

R2(start, old end)

represents the goodness-of-

ﬁt between

start

and

old end

of each section.

R2

represents the goodness-of-ﬁt speciﬁed by

the user.

Keylist

stores the segment points ob-

tained. Using Algorithm 1, we can minimize the

number of segments of the rotation information

curves, and each segment is the longest, under

the current R2.

We used binomial

f(x) = a0+a1x

to ﬁtting

the rotation information curves because it is sen-

sitive to changes in the slope of a curve. Points

taken from the parts of the curves where the s-

lope changes dramatically can better indicate the

Algorithm 1 Get the segment points

1: Keylist(1) = 1, i = 1

2: Input R2

3:

Set step size to

step

, starting point

start =

1

, temporary ending point

old end =

start+step

, ﬁnal ending point

end =start

4: while start! = ndo

5: while old end! = end do

6: if R2(start, old end)>=R2then

7: end =old end

8: old end =old end +step

9: else

10: step =|old end −end|/2

11: end =old end

12: old end =old end −step

13: end if

14: end while

15: end while

16: Keylist(i+ +) = end

trend of the curves. In order to ensure that the

ﬁtting result meets the given goodness of ﬁt

R2

,

the curve will be segmented in a timely manner

when a region with a large change in slope is

encountered. Therefore, binomial ﬁtting can be

used to obtain the key points that can represent

the overall change of the curves without calculat-

ing the curvature change of the curves.

3.1.3 Determining KeyFrame and Obtaining

Binomial Parameter

Although the segment points of each curve are

different, they are all taken from the continuous

frame numbers between 1 and

n

(suppose the

motion sequence has n frames), so some frames

must be selected multiple times as segmentation

points. For example, assuming that

m

curves are

piecewise ﬁtted, according to Algorithm 1, the

ﬁrst frame and the last frame would be selected

as segment points

m

times. For the same reason,

other frames may also be selected as segmen-

t points multiple times. As shown in Figure 1,

some frames are selected as segmented points

many times. But other frames near these frames

are selected as segmented points fewer times. It

indicates that the segment points are in a clus-

tered state, so we can select keyframes from the

clusters. Obviously, the density centers are the

best candidate keyframes if density clustering

is carried out on the set of segment points. In

Figure 1: Number distribution of key points

Figure 1, the horizontal axis represents the frame

ordinal number, and the vertical axis represents

the number of times the frame has been recorded

According to the characteristics of the segment

point set, we chose DBSCAN (Density-Based

Spatial Clustering of Applications with Noise)

for clustering.

ε

neighborhood and MinPts are

the most important impact factors of DBSCAN

clustering, which we will discuss in the section

4.

Another thing that needs special mention is

that in this clustering algorithm, the samples of

the segment points set we used as input are not

the coordinates of points in multidimensional

space, but the coordinates on one-dimensional

space. They are the serial frame numbers of the

motion sequence.

Using DBSCAN to cluster segment point

set, clustering results

Set1, ..., S etk, ...

and

Setp

are obtained. The keyframes sequence

keyf ramelist

is obtained by taking cluster cen-

ter

centerk

as the keyframe. The cluster center is

the point with the smallest sum of distance from

all other points. For the

ith

point in the class

Setk

, the sum of the distances of the other points

from it can be expressed as:

Dist(ik) =

LFk

X

jk=F Fk

(jk−ik)Nj(3)

ik, jk∈Setk

is the serial frame number,

F Fk

is the ﬁrst frame in

Setk

,

LFk

is the last frame

in

Setk

, and

Nj

means that the

jk

th frame was

selected as a segment point

Nj

times. So the

cluster center centerksatisﬁes:

Dist(centerk) = min(Dist(ik)) (4)

where ik∈Setk.

Finally,we need to add the ﬁrst frame 1 and

the last frame

n

of the motion sequence to

the

keyf ramelist

if they don’t exist. At the

same time, if there are consecutive frames in the

keyList, we should delete some of them and keep

only one of them.

3.2 Motion data Reconstruction Method

We proposed a novel method for motion data

recovery based on binomial coefﬁcients. This

method is different from quaternion spherical in-

terpolation algorithm and linear interpolation al-

gorithm. In addition to the keyframe information,

the binomial coefﬁcients need to be obtained in

advance. After getting the keyframe sequence,

we still need to get the binomial coefﬁcients. Fig-

ure 2 shows the process of getting the parameters.

Figure 2: Clustering effects

So the motion curve set

M

can be reconstruct-

ed through the keyframes and binomial coefﬁ-

cients set

A

and

B

. As for the rotation informa-

tion curve Li, it can be reconstructed as:

L

i=

f

i,1(x) = ai,1x+bi,1x∈[kL(1), kL(2))

.

.

.

f

i,j (x) = ai,j x+bi,j x∈[kL(j), kL(j+ 1))

.

.

.

f

i,k(x) = ai,k x+bi,k x∈[kL(k), k L(end)]

(5)

j

represents the jth segment of curve

Li

, which

is divided into

k

segments by keyframes,

kL

represents keyf ramelist.

4 Discussion of parameters

There are three parameters need to be determined

in the proposed method: the distance threshold

ε

,the neighborhood sample threshold MinPts and

the goodness-of-ﬁt

R2

. We tested 200 exercise

videos randomly selected from the Carnegie Mel-

lon University (CMU) Motion Capture Database.

In this paper, we introduced the mean error of

joint space coordinates as the evaluation metric

illustrate the quality of the keyframe extraction

results. The error Eis calculated as follows:

E=Ps

i=1 Pn

j=1 || Fo

i,j −Fr

i,j ||2

s×n(6)

In order to compare with similar papers, the re-

construction error is calculated based on the tra-

jectory curve of the joint. So we ﬁrst use the

rotation information of the joint to calculate the

trajectory curve and then calculate the reconstruc-

tion error. So in this formula,

Fo

i,j

is the jth joint

space coordinate in the ith frame of the original

motion sequence, and

Fr

i,j

is the jth joint space

coordinate in the ith frame of the reconstruction

motion sequence.

s

is the number of joints,

n

is

the number of frames.

4.1 εand MinPts

ε

represents the distance between samples. For

the reason that samples themselves are integers,

the distance between the samples is also an inte-

ger, so the best value of εis an integer.

In our experiments, when

ε

is larger than 1,

outliers will increase, and the reconstruction er-

ror of extracted keyframes is much larger than

that of

ε

is equal to 1, so the clustering effect is

poor, as shown in Figure 3(b). If

ε

is 0, every

frame that appears will be individually clustered

into a cluster, which leads to serious redundan-

cy of keyframes, so it is not suitable either. But

as the Figure 3(a) shows, when

ε

=1 and MinPts

(a) ε=1 MinPt=1

(b) ε=2 MinPt=12

Figure 3: Clustering effects

takes the appropriate value, the clustering effect

is decent.

Figure 3(a) shows the best clustering effect

of a motion sequence when

ε

is equal to 1 and

Figure 3(b) shows the best clustering effect of

a motion sequence when

ε

is equal to 2. The

x-coordinate represents the number of elements

in the same class, and the y-coordinate represents

the frame number.

o

indicates outliers, and the

same color of x represents one cluster. It can be

seen that there are a large number of outliers in

Figure 3(b). This indicates that the clustering

effect is the best when εis equal to 1.

MinPts is the minimum number of neighbor-

hood samples. In DBSCAN, its difﬁcult to select

MinPts. In order to ﬁnd the optimal value range

of MinPts, we conducted such an experiment:

ε

is 1, and

R2

is set at 0.05 intervals between 0.05

and 0.90. When

R2

is less than 0.90, the recon-

struction error of all motion sequences tested is

less than 0.20, and the average reconstruction

error is less than 0.03. When

R2

is greater than

0.90, the reconstruction error of some motion se-

quences is above 3.0, and some are even as high

as 63.45, and the average reconstruction error is

also above 1.00. Therefore, when

R2

is greater

than 0.90, the reconstruction effect is not only

poor, but also unstable. So in this article, we

will only discuss the part where

R2

is small than

0.90. For each

R2

value, MinPts starts at 1 and

increments by 1 for each cycle. In each loop,

we obtained the keyframes of the input motion

sequence and calculated the RICE (the recon-

struction error of rotation information curves).

By comparing RICE corresponding to each

MinPts, select the MinPts with the minimum

RICE , which is the best MinPts corresponding

to the current

R2

value. Under the condition that

ε

is 1, we conducted experiments on 200 motion

sequences and obtained the optimal MinPts range

for each R2value, as shown in Table 1.

4.2 Coefﬁcient of Determination R2

R2

has a signiﬁcant effect on keyframe extrac-

tion. The larger the

R2

is, the more segments the

rotation information curve will segment. How-

ever, the more segmentation points there are, the

closer the density connection between samples

will be, and some important frames will be classi-

ﬁed as one category due to the increase of the den-

sity, leading to a result that only one of several po-

tential keyframes is ultimately extracted, which

leads to insufﬁcient extraction of keyframes and

the increase of motion data reconstruction error.

We tested 200 motion sequences to explore the

effect of

R2

on compression ratio (the ratio of

the number of keyframes to the total number of

frames) and reconstruction error. Figure 4 and

Figure 5 reﬂect the inﬂuence of

R2

on the com-

pression ratio and the compression ratio on the

reconstruction error.

Figure 4 shows the relationship between

R2

and compression ratio. As can be seen from Fig-

ure 4,compression increases when

R2

between

0 and 0.3, after that it decreases. Figure 5 re-

ﬂects the relationship between the compression

rate and the reconstruction error. In general, the

Table 1: Different MinPts ranges with different values of R2

R20.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90

MinPts 1 1 1 1 1 1 1 1 1-5 1 1-5 1-8 1-7 1-10 1-13 1-15 6-18 7-23

reconstruction error is small than 0.16. As the

compression rate increases, the reconstruction er-

ror gradually decreases. When compression ratio

is small then 0.1537, the error decreases rapidly.

When compression ratio is larger then 0.1537,

the error decreases slowly. For keyframe extrac-

tion attempts to ensure a small compression rate

and reconstruction error at the same time, the

R2

corresponding to a compression rate of 0.1537

is the most appropriate, with a value of 0.65, as

shown in Figure 4.

Figure 4: Compression Ratio and R2

Figure 5:

Reconstruction Error and Compression

Ratio

Table 2: Results of Keyframe Extraction

classes jump cartwheel dance

Total Frames 295 575 1131

Number of Keyframes 37 69 134

Compression ratio 12.50% 12.00% 11.80%

Computation time(s) 9.42 15.46 41.95

5 Result

In section 4, we determined that parameter

ε

is 1,

R2

is 0.65, and

R2

corresponds to MinPts rang-

ing from 1 to 7. In this section, we will use these

parameters to perform keyframe extraction and

reconstruction error calculation, and compare the

advantages and disadvantages of our motion data

reconstruction algorithm and the traditional mo-

tion data reconstruction algorithm, as well as the

advantages and disadvantages of our keyframe

extraction algorithm and the traditional keyframe

extraction algorithm.

Figure 6, show the key frames extracted from

different motions. As can be seen from the ﬁgure,

the key frame extracted by our method can well

represent the changing trend of the motions.

Table 2 lists the number of keyframes, com-

pression ratios, and calculation time for different

action classes. As can be seen from the results,

the compression ratio and computation time give

an ideal result.

The keyframes extracted by our method are

reconstructed using three different motion data

reconstruction methods. Table 3 lists the recon-

struction error of three different reconstruction

methods for the three classes motions. BFE is the

error of the motion data reconstruction algorithm

we proposed, LIE is the reconstruction error of

the linear interpolation algorithm, and QSIE is

the reconstruction error of the quaternion spheri-

cal interpolation algorithm. As can be seen from

the results that the reconstruction effect of lin-

ear interpolation is the worst, followed by QSIE,

and the motion data reconstruction algorithm pro-

posed by us has the best effect.

It is easy to understand that our method is

better than linear interpolation. The linear inter-

polation formula is shown below:

flerp(p, q, t) = t(p−q) + p(7)

Table 3:

Results of different reconstruction meth-

ods

classes jump cartwheel dance

BFE 0.0095 0.0793 0.0095

LIE 0.0599 0.2094 0.0176

QSIE 0.0218 0.0176 0.0172

p

and

q

are the values of two adjacent keyframes

in the keyframe sequence and represent the or-

dinate of the original curve when reconstructing

rotation information curve, and

t

is the interval.

In fact, two adjacent keyframes are connected by

line segments, and then intermediate frames are

found on the line segments according to the in-

terval. Our method ﬁts a curve with the smallest

regression error according to the original rotation

information curve in advance, and reconstructing

the data is to restore the ﬁtting curve.

Quaternion can reconstruct smooth motion tra-

jectories, but they also have limitations. The

condition for applying the quaternion is to con-

vert the rotation information of the joint into a

quaternion expression. Its algorithm is as fol-

lows:

fslerp(P, Q, t) = sin[(1 −t)θ]P+sintθQ

sinθ (8)

P

,

Q

represents a vector containing the three-

dimensional rotation information of the join-

t, while the rotation information curve repre-

sents one-dimensional information, and the three-

dimensional information is represented by three

pieces of curves.

θ

is the angle between the vec-

tor

P

and

Q

. So the points quaternion spherical

interpolation obtained is on the rotation trajectory

of vector

P

, which is on the plane S determined

by

P

and

Q

. However, the vector

P

may ro-

tate from the space outside the

S

to the

Q

. For

example, when we straighten our arms, the ﬁst

has more than one path from point Ato point B

in space, but the quaternion spherical interpola-

tion algorithm can only calculate the trace on the

plane determined by points

A

,

B

and shoulder

joint

C

. Our method reconstructs the rotation in-

formation of each dimension of the joint, so the

trajectory found by our method is closer to the

real motion path than the trajectory calculated by

the quaternion spherical interpolation algorithm,

and the error is smaller.

Compared with the quaternion spherical in-

terpolation and linear interpolation algorithms,

the binomial ﬁtting based reconstruction algorith-

m has smaller reconstruction error, but it needs

extra storage space to store parameters. When us-

ing quaternion spherical interpolation algorithm

or linear interpolation algorithm to reconstruc-

t the motion sequence, not only the keyframe

sequence but also the motion information corre-

sponding to the keyframe is needed. Assuming

that a motion sequence with

n

frames can decom-

pose

m

motion curves, the total motion informa-

tion of the sequence is

m∗n

. If

k

keyframes are

extracted from them, the total amount of infor-

mation to be recorded is

k+k∗m

for motion

reconstruction. Therefore, after extracting the

key, the ratio of the stored information and the to-

tal amount of the original motion information is

(k+k∗m)/(n∗m) = k/n

, namely the compres-

sion ratio of the keyframe. If binomial ﬁtting re-

construction is used, according to

y=ax +b

, al-

though the motion information corresponding to

the key frame is not recorded, ﬁtting parameters

should be recorded. According to Figure 2, each

curve segment needs to record two parameter in-

formation, so the total motion information to be

recorded is

k+ 2 ∗k∗m

. The ratio of stored in-

formation to the total amount of original motion

information is

(k+2∗k∗m)/(n∗m) = 2∗k/n

.

Therefore, using binomial ﬁtting to reconstruc-

t the motion sequence requires about twice as

much information to be saved when extracting

the key frame as the other two methods.

Figure 7, 8,9 show the comparison between the

original sequence and the reconstructed sequence

based on the keyframes extracted by our method,

joint kernel sparse representation algorithm and

afﬁnity propagation algorithm, where green is the

original data and red is the reconstructed result.

In Figure 7 dance motion sequence has 1131

frames, and the number of keyframes is 134 with

compression ratio of 11.80%. As can be seen

from Figure 7(a), there is little difference be-

tween the reconstructed frames and the original

data. Figure 7(b) shows a signiﬁcant difference

between the reconstructed results and the original

data at the foots after dance motion beginning.

In Figure7(c), the errors at foots and head are ob-

vious at the beginning, and later the upper body

also appeared a slight error.

In Figure 8 dance motion sequence has 575

frames, and the number of keyframes is 69 with

compression ratio of 12.0%. It can be seen from

Figure 8(a) that there is no obvious error in the

reconstruction results except that a little error in

(a) Keyframe sequences extracted from dance motion

(b) Keyframe sequences extracted from cartwheel motion

(c) Keyframe sequences extracted from jump motion

Figure 6: Results of keyframe extraction

the ball of the foot at the beginning of the motion.

However, it can be seen from Figure 8(b) and

Figure 8(c) that there is a large error between

the reconstruction results and the original data.

In Figure 8(b), the error is signiﬁcant at the be-

ginning phases of the motion and in the middle

phases of the motion, the reconstructed results

are markedly different from the original data. In

Figure 8(c), there is an noticeable error at the be-

ginning of the motion, and before the end of the

motion, the difference between the reconstructed

results and the original data is obvious.

In Figure 9 jump motion sequence has 295

frames, and the number of keyframes is 37 with

compression ratio of 12.5%, Figure 9(a) shows a

good reconstruction effect, with only slight errors

in the feet. In Figure 9(b), there is a signiﬁcant

error in the arm when the person jumps. In Figure

9(c), there are errors in both the arms and feet

during the squat and the jump.

Table 4 compares the reconstruction errors

of the three keyframe extraction algorithms.

AE represents Afﬁne Propagation algorithm[

12

]

and JE represents Joint Kernel Representation

algorithm[

16

]. Since both algorithms use quater-

nion method for reconstruction, we compare the

reconstruction error of using quaternion method

with them. As can be seen from the experimental

results, the reconstruction error of the proposed

Table 4:

Reconstruction error of three keyframe

extraction algorithms

classes jump cartwheel dance

Our method 0.0218 0.0176 0.0172

AE 0.4518 0.2566 0.248

JE 0.1483 0.47 0.0324

method is signiﬁcantly better than Afﬁne Propa-

gation algorithm and Joint Kernel Representation

algorithm.

6 Conclusion

We proposed a new keyframe extraction algo-

rithm and a motion data reconstruction method

based on multiple binomial ﬁtting. They are sim-

ple and effective. These two algorithms compre-

hensively consider the motion changes of various

parts of the joint and obtain impressive results.

Experiments show that the proposed method is

superior to the linear interpolation algorithm and

the quaternion spherical interpolation algorithm.

Although the key frame compression rate extract-

ed by our algorithm is relatively large, averaging

over 10%, which needs to be improved in future

work. But the reconstruction error is much small-

er than the algorithm based on clustering and

optimization.

(a) The proposed method

(b) Joint kernel sparse representation algorithm

(c) Afﬁnity propagation algorithm

Figure 7: Results of original dance motion and reconstructed motion

(a) The proposed method

(b) Joint kernel sparse representation algorithm

(c) Afﬁnity propagation algorithm

Figure 8: Results of original cartwheel motion and reconstructed motion

(a) The proposed method

(b) Joint kernel sparse representation algorithm

(c) Afﬁnity propagation algorithm

Figure 9: Results of original jump motion and reconstructed motion

Acknowledgement

This work is partially funded by Key Laboratory

of Agricultural Internet of Things, Ministry of

Agriculture and Rural Affairs, China(2018AIOT-

09), Key Research and Development Program of

Shaanxi Province(2018NY-127), and the Shaanx-

i Key Industrial Innovation Chain Project in A-

gricultural Domain(Grant No. 2019ZDLNY02-

05).The authors acknowledge Carnegie Mellon

University for the motion capture data resources.

References

[1]

Z. J. Li, C. Y. Su, L. Y. Wang, et al. Non-

linear Disturbance Observer-Based Control

Design for a Robotic Exoskeleton Incorpo-

rating Fuzzy Approximation. Ieee Transac-

tions on Industrial Electronics, 62(9):5763-

5775, 2015.

[2]

S. Y. Shin, C. Kim Human-Like Motion

Generation and Control for Humanoid’s D-

ual Arm Object Manipulation. Ieee Transac-

tions on Industrial Electronics, 62(4):2265-

2276, 2015.

[3]

T. Sasaki, D. Brscic, H. Hashimoto Human-

Observation-Based Extraction of Path Pat-

terns for Mobile Robot Navigation. Ieee

Transactions on Industrial Electronics,

57(4):1401-1410, 2010.

[4]

Y. R. Li, Z. Wang, X. S. Yang, et al. Ef-

ﬁcient convolutional hierarchical autoen-

coder for human motion prediction. Visual

Comput, 35(6-8):1143-1156, 2019.

[5]

C. Halit,Capin T. Multiscale motion salien-

cy for keyframe extraction from motion

capture sequences. Comput Animat Virt W,

22(1):3-14, 2011.

[6]

T. Miura, T. Kaiga, T. Shibata, et al. A hy-

brid approach to keyframe extraction from

motion capture data using curve simpliﬁ-

cation and principal component analysis[J].

IEEJ Transactions on Electrical and Elec-

tronic Engineering, 2015, 9(6):697-699.

[7]

Jun Xiao, Yueting Zhuang, Tao Yang, et al.

An Efﬁcient Keyframe Extraction from Mo-

tion Capture Data[C]// International Con-

ference on Advances in Computer Graphics.

Springer-Verlag, 2006.

[8]

K. S. Huang, C. F. Chang, Y. Y. Hsu, et

al. Key Probe: a technique for animation

keyframe extraction. Visual Comput, 21(8-

10):532-541, 2005.

[9]

Y. H. Gong, X. Liu Video summarization

and retrieval using singular value decom-

position. Multimedia Syst, 9(2):157-168,

2003.

[10]

X. M. Liu, A. M. Hao, D. Zhao

Optimization-based keyframe extraction for

motion capture animation. Visual Comput,

29(1):85-95, 2013.

[11]

X. j. Chang, P. F. Yi et al. Key Frames Ex-

traction from Human Motion Capture Data

Based on Hybrid Particle Swarm Optimiza-

tion Algorithm[J], Recent Developments in

Intelligent Information and Database Sys-

tems, pp335-342.

[12]

B. Sun, Kong D., S. Wang, et al. Keyframe

extraction for human motion capture da-

ta based on afﬁnity propagation[C]// 2018

IEEE 9th Annual Information Technolo-

gy, Electronics and Mobile Communication

Conference (IEMCON). IEEE, 2018.

[13]

Q. Zhang, S. P. Yu, D. Zhou, et al. An Ef-

ﬁcient Method of Key-Frame Extraction

Based on a Cluster Algorithm. J Hum Kinet,

39(1):5-13, 2013.

[14]

Y. Zhang, J. Li, M. Zhang, et al. Motion

Key frame Extraction Based on Grey Wolf

Optimization Algorithm[J]. MATEC Web

of Conferences, 2018, 232.

[15]

Q. Zhang, S. Zhang, D. Zhou,et al.

Keyframe Extraction from Human Motion

Capture Data Based on a Multiple Pop-

ulation Genetic Algorithm[J]. Symmetry,

2014, 6(4):926-937.

[16]

G. Y. Xia, H. J. Sun, X. Q. Niu, et al.

Keyframe Extraction for Human Motion

Capture Data Based on Joint Kernel Sparse

Representation. Ieee Transactions on Indus-

trial Electronics, 64(2):1589-1599, 2017.

[17]

Q. Z, G. Z, L. X, et al. Human Motion

Capture Error Correction Method without

Force-Feedback[C]. international confer-

ence on artiﬁcial intelligence, 2019.