ArticlePDF Available

Abstract and Figures

Controlling a crowd using multi-touch devices appeals to the computer games and animation industries, as such devices provide a high dimensional control signal that can effectively define the crowd formation and movement. However, existing works relying on pre-defined control schemes require the users to learn a scheme that may not be intuitive. We propose a data-driven gesture-based crowd control system, in which the control scheme is learned from example gestures provided by different users. In particular, we build a database with pairwise samples of gestures and crowd motions. To effectively generalize the gesture style of different users, such as the use of different numbers of fingers, we propose a set of gesture features for representing a set of hand gesture trajectories. Similarly, to represent crowd motion trajectories of different numbers of characters over time, we propose a set of crowd motion features that are extracted from a Gaussian mixture model. Given a run-time gesture, our system extracts the K nearest gestures from the database and interpolates the corresponding crowd motions in order to generate the run-time control. Our system is accurate and efficient, making it suitable for real-time applications such as real-time strategy games and interactive animation controls.
Content may be subject to copyright.
EUROGRAPHICS Workshop on ... ’0x
N.N. and N.N.
(Editors)
Volume 0 (1981), Number 0
Data-Driven Crowd Motion Control with Multi-touch
Gestures
Yijun Shen1, Joseph Henry2He Wang§3, Edmond S. L. Ho1, Taku Komurak2, Hubert P. H. Shum∗∗1
1Northumbria University, United Kingdom
2University of Edinburgh, United Kingdom
3University of Leeds, United Kingdom
Abstract
Controlling a crowd using multi-touch devices appeals to the computer games and animation industries, as such
devices provide a high dimensional control signal that can effectively define the crowd formation and movement.
However, existing works relying on pre-defined control schemes require the users to learn a scheme that may not
be intuitive. We propose a data-driven gesture-based crowd control system, in which the control scheme is learned
from example gestures provided by different users. In particular, we build a database with pairwise samples of
gestures and crowd motions. To effectively generalize the gesture style of different users, such as the use of dif-
ferent numbers of fingers, we propose a set of gesture features for representing a set of hand gesture trajectories.
Similarly, to represent crowd motion trajectories of different numbers of characters over time, we propose a set
of crowd motion features that are extracted from a Gaussian mixture model. Given a run-time gesture, our system
extracts the K nearest gestures from the database and interpolates the corresponding crowd motions in order to
generate the run-time control. Our system is accurate and efficient, making it suitable for real-time applications
such as real-time strategy games and interactive animation controls.
Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-Dimensional
Graphics and Realism—Animation;
1. Introduction
Controlling a crowd using hand gestures captured by multi-
touch devices appeals to the computer games and animation
industries. First, multi-touch systems are getting more and
more popular nowadays due to the advancement of hard-
ware. Second, a crowd has a large degree of freedom, which
is difficult to be controlled using traditional controllers with
lower dimensional control signals, such as mice and key-
boards. Multi-touch devices register several simultaneous
control inputs, such that the user can control the complex
formation of a crowd intuitively.
yi.shen@northumbria.ac.uk (co-first author)
j.henry@ed.ac.uk (co-first author)
§h.e.wang@leeds.ac.uk
e.ho@northumbria.ac.uk
ktkomura@ed.ac.uk
∗∗ hubert.shum@northumbria.ac.uk (corresponding author)
The hand gestures captured by multi-touch devices are
typically sets of time series of finger positions. Many ex-
isting works show that it is possible to map such control
signals to a crowd motion using predefined control schemes
[HSK12,HSK14]. This allows the user to control the forma-
tion and movement of the crowd by performing specific ges-
tures. While these manually designed control schemes are
efficient in crowd control, different systems usually employ
different control schemes. This is because there are an infi-
nite number of possible mappings between the gesture and
the crowd space. Rules need to be explicitly defined to ful-
fill the control needs optimally. As a result, the users have
to learn the schemes in advance before using the systems.
Unlikely previous work, we learn a mapping that focuses on
both user friendliness and control expressibility in this work
to shorten the learning curve.
To this end, we present different crowd motions to a group
of users and ask them to give their desirable control gestures,
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John
Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
which allows us to generalize the preferred gestures and im-
plement an intuitive control scheme. For every crowd mo-
tion in our training dataset, we ask the users to perform a
control gesture that they think to be the best to create such
a motion. It results in a database with pairwise samples of
gestures and crowd motions. During run-time, we obtain a
gesture from the user, and find the Knearest gestures from
the database. We then interpolate the corresponding Kcrowd
motions in order to generate the run-time control. Since the
control scheme is learned from different users without prior
constraints, our system is intuitive to use.
One important component of our research is the gesture
space representation. As we do not impose any constraints
when collecting the control gestures, a representation in-
variant to individual gesture variations is needed, such as
the number of fingers used, different speed, etc. Users of-
ten articulate gestures with one or both hands, using multi-
ple fingers when performing similar tasks [RGR13]. At the
same time, they show similar variations in their gestures
when asked to provide control for the movement of robot
groups [MDC09]. We propose a set of gesture features that
effectively represents a wide variety of gestures while in-
dependent to inter-user style differences. This includes the
centroid feature, the distance to centroid feature, the rotation
feature and the minimum oriented bounding box features.
We further propose a distance function to evaluate the dis-
tance between two gestures in such a space in order to obtain
the Knearest neighbours of a run-time gesture.
Similarity, crowds under different scenarios contain vari-
ations such as the numbers of characters within the crowd,
and therefore crowd motions also require a general represen-
tation. Such a representation should ideally parameterize the
whole crowd motion space based on the crowd data. We pro-
pose a crowd motion feature space that models a crowd mo-
tion with a Gaussian mixture model (GMM), in which the
trajectory of each character is modelled by the distribution
of the Gaussian component. The major advantage of GMM
is that we can set up multiple Gaussian components to ac-
curately model the movement of small groups of characters
within the crowd. We further propose a scheme to interpo-
late multiple crowd motion in the feature space in order to
generate the run-time control signal.
We demonstrate that our system can accurately infer the
crowd motion based on a given gesture. Users can effectively
control a crowd of arbitrary size with intuitive gestures and
guide the crowd to navigate through a given virtual environ-
ment. Our system is best to be applied in computer games
like the crowd control systems in real-time strategy games,
and in interactive character animation designs.
This paper presents the following contributions:
We propose a data-driven method for inferring an appro-
priate crowd motion based on the gesture input obtained
from a touch device. Our approach is not restricted by a
pre-specified control scheme. Instead, the control scheme
is learned as a mapping between user-preferred gestures
and corresponding crowd motions, which encodes both
user-friendliness and control expressibility.
We propose a set of gesture features that are invariant to
the variations of the user’s preferred touch input style such
as the number of fingers used. These features are used for
recognising different properties of a user’s multi-touch in-
put, allowing the system to distinguish between a variety
of control signals.
We propose to represent crowd movement with a set of
crowd motion features that are obtained from GMM. This
representation allows modelling different sub-groups of
the crowd and is independent of the number of characters.
We further propose a method to interpolate crowd mo-
tion features in order to generate a new crowd motion that
matches the user input the best.
The rest of the paper is organized as follows. Sec. 2dis-
cusses the related works and identifies the research gap in
gesture-based crowd control. Sec. 3provides an overview
of our proposed system. Sec. 4details the data collection
process in order to create the gesture and crowd motion
database. Sec. 5and Sec. 6explains our proposed gesture
space and crowd motion space respectively. Sec. 7explains
the method to synthesize run-time crowd motions. Sec. 8
gives detailed evaluations on the system. Finally, Sec. 9and
Sec. 10 discusses the limitations and concludes the work.
2. Related work
Crowd simulation has been widely used in many areas such
as entertainment production and urban planning, where two
central issues are control and simulation. Related to our re-
search, there are mainly three sub-fields where we draw our
inspirations upon: gesturing on multi-touch devices, crowd
motion control and formation control.
2.1. Gesture Recognition on Multi-touch Device
Since the invention of multi-touch devices, gestures have
provided a rich capacity of control input design. Gestures
can be sequenced to express complex control purposes and
are typically represented by time series of positions and
velocities. For any pre-designed stroke patterns, there are
some user input variations. Spatially-based control design
[JTZ12,VAW12,RVG14] mainly targets on recognizing
stroke patterns out of variations to improve the express-
ibility, but with limited understanding on the time depen-
dencies between strokes. To model temporal or semantic
dependencies, rule-based systems such as gesture formali-
sation [GCG10], grammar [GGH03,KWK10], state ma-
chines [LL12] or syntax [KHDA12] are proposed. However,
they either lack the accommodation of user input variations
or do not generalise well.
Among the previous research works, Lü & Li’s work
[LL13] is most related to ours. They present a set of features
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
based on translation, rotation, and scaling of a user’s finger
configurations to encode strokes and recognise gestures. Our
proposed gesture representation has similar concepts but dif-
ferent designs for better representativeness. In particular, we
utilize both the average distance to centroid and the mini-
mum oriented bounding box (MOBB), instead of a single
scaling parameter, to help identifying gestures such as ex-
pand vs. split. Furthermore, instead of simply recognising
the gestures, our system focuses on finding a good mapping
between gestures and crowd motions, which involves com-
paring gestures using the proposed representations. This has
not been explored in previous work such as [LL13].
2.2. Crowd Motion Control
Crowd motion control has been studied extensively, includ-
ing controlling the whole crowd [Par10,GD11], subgroups
[OO09,KHKL09], sets of control points [KLLT08], and the
style [LCHL07,JCP10].
Field-based control focuses on the design of guidance
fields in the environment. Dynamic potential fields can be
used to represent the flow of the crowd with respect to other
moving characters and the environment [TCP06]. Vector
fields are typically used to guide each subgroup of the crowd
[KSII09]. In [JXW08], the user can control crowd motion
by adding anchor points to indicate their moving directions,
with which a vector field is generated. In [PVDBC11],
the movement of the agents is generated by the guidance
field that is sketched by the user or extracted from the
video. Such a field is used to construct a navigation field
that refines the flow of the crowd by avoiding collisions in
the environment. While these methods enable the user to
easily author the movement of a crowd, they typically re-
quire a high-dimensional representation of the crowd mo-
tion based on the 2D terrain. Field-based methods are, there-
fore, ineffective for data-driven crowd control, as the sys-
tem needs to learn from/interpolate high-dimensional feature
vectors. Motivated by the strength of data-driven systems
that they can model complex relationships between gestures
and crowd motions, we decide to represent the crowd us-
ing GMM, in which crowd motions are modelled by low-
dimensional feature vectors.
Mesh-based control is another control scheme that evalu-
ates the crowd movement and formation using mesh defor-
mation. Utilizing a single-pass algorithm, crowd movements
can be evaluated based on a deformable mesh [HSK12,
HSK14]. It is also possible to interactively edit large-scale
crowd while maintaining the spatial relationship between in-
dividuals [KSKL14]. Voronoi diagram can be used to repre-
sent the spatial relationship between different agents and or-
ganize the crowd movement in constraint space using Torso
Crowd Model [SMTT17].
One particular problem of exiting methods is the lack of
high dimensional control signals that can be used to define
the movement details of a crowd that consists of multiple
sub-groups. Existing methods typically employ multiple lev-
els of control rules, such that the user can define the overall
crowd movement first, and define the details of sub-group
later. Instead, we decide to embed the control mechanism
into our learned mapping between the control signal and the
corresponding crowd motions. It solves the problem of po-
tentially contradictory control objectives on different levels,
such as different overall crowd and sub-group targets.
2.3. Formation Control
Formation control is a technique to control the movement
of crowds while maintaining formations. A significant num-
ber of papers propose to represent the shape of the crowd
by modelling the geometric relations between individual
agents. Mesh-based methods are very popular because they
can easily represent the formation and accommodate some
randomness due to individual motions by controlled mesh
deformation. Laplacian mesh editing [SCOL04] controls
and combines existing crowd formations into larger scale
crowd animation [KLLT08]. An intermediate 2D mesh be-
tween user input and crowd motion can be defined so that
crowd formations are controlled by simple user gestures
[HSK12,HSK14]. Spectral analysis smoothly transforms
the crowd from one formation to another which is repre-
sented by Delaunay tetrahedral meshes [TYK09]. A lo-
cal coordinate system called formation coordinates main-
tains the adjacent relationship between individuals in the
crowd [GD11]. More variants of these methods can be found
in [KO10,ZZC14,GD13,XWY15].
The Morphable Crowds [JCP10], which is based on data
examples of different styles of crowd motion, is conceptually
similar to our work. While their method is based on mod-
elling the positions of characters surrounding an individual
in a crowd motion, our method models the full trajectories of
characters in the crowd. Such a full modelling enables us to
build up a precise control mapping from the input to crowd
motions, which enhances the quality of controlling and syn-
thesizing new crowd motions.
Path pattern that consists of flows of location-orientation
pairs is also a good representation of crowd motions, which
can be extracted from crowd video [WOO16]. However, the
representation is complicated and is too computationally ex-
pensive to be used for interactive control purpose.
3. Method Overview
The overview of our system is shown in Fig. 1. In the of-
fline stage, we collect user data that describe the mappings
between gesture inputs and given crowd motions and cre-
ate a database. We prepare a number of precomputed crowd
motion trajectories (Fig. 1a) and obtain the corresponding
gesture trajectories (Fig. 1b) from the users. As trajectories
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
Figure 1: The overview of our crowd control system.
information has inconsistent dimensions and inefficient rep-
resentation, we propose to map gesture and motion trajecto-
ries into their respective high-level feature space (Fig. 1c and
d). The correlated gesture and crowd motion feature spaces
generalize and unify the representation of the gesture and
crowd data respectively. In the online stage, our system re-
ceives run-time user gestures and evaluate the corresponding
crowd motion. Given the run-time gesture trajectories (Fig.
1e), we calculate its gesture features (Fig. 1f) and conduct
a K nearest neighbours (KNN) search in the database. This
allows us to obtain Ksimilar gestures and their correspond-
ing Kcrowd motion. We interpolate the Kcrowd motion and
generate the resultant run-time crowd motion features (Fig.
1g), which is finally converted into crowd motion trajectories
(Fig. 1h) that can control the run-time crowd. Since the ges-
ture data in the database are obtained from real user inputs
with different variations, our system allows intuitive control
of crowd in real-time.
4. Data Collection
In this section, we explain how we collect user gesture data
based on a set of pre-generated crowd motion data.
We first generate a set of crowd motions with the crowd
simulation system presented in [HSK12,HSK14]. We cre-
ated 150 motions under 10 different motion classes, which
are shown in Fig. 2. Such a set of crowd motions consists
of 6 classes of typical crowd motions, including translate
(i.e. characters all moving in a direction), twist (i.e. char-
acters moving in a circular direction around the centre of
the scene), contract (i.e. characters moving towards the cen-
tre of the scene), expand (i.e. characters moving away from
the centre of the scene), join (i.e. two groups of characters
moving towards one another), and split (i.e. two groups of
characters moving away from one another). It also consists
of 4 classes of more complicated crowd motions, including
split then translate,translate then join,twist while expand-
ing,twist while contracting. The motion set is designed to
demonstrate that our system can handle typical crowd mo-
tions seen in computer games and movies, as well as com-
Figure 2: Examples of crowd motion shown to users to col-
lect their control gestures, with the light blue colour indi-
cating the start of the motion and the dark blue indicating
the end: (a) translate, (b) twist, (c) contract, (d) expand, (e)
join, (f) split, (g) split then translate, (h) translate then join,
(i) twist while expanding, and (j) twist while contracting.
plicated motions that consist of combinations of typical mo-
tions. Our proposed framework is easily extensible. Devel-
opers can add or remove classes of crowd motions based on
the requirement of the target application.
Ten volunteer participants, aged between 20 and 50, were
asked to provide gestures for the crowd motions shown on
a touchscreen (the Wacom Cintiq 27QHD sized 27 inches
diagonally). Each participant was allocated with 15 crowd
motions, which were randomly picked from the 150 motions
required to train the system. The participants were asked to
provide a corresponding hand gesture on the screen as if they
were controlling each of such motions with two or more fin-
gers. They were not given any instruction about what the
gestures should be, the number of fingers to be used, as well
as the duration of the gestures. The orientation of the crowd
motion on the screen was varied to prevent any bias in terms
of the positioning of the hands when recording the gesture.
On average, it took the participants 7 seconds to observe a
crowd motion and provide a gesture. Fig. 3shows some ex-
ample input of typical crowd motions.
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
Figure 3: Examples of collected user gesture, with the light
colours indicating the starts of the trajectories, the dark ones
indicating the ends, and different colours representing differ-
ent set of gestures: (a) translate, (b) twist, (c) contract, (d)
expand, (e) join, (f) split.
5. Gesture Space
The success of finding a good mapping between the ges-
ture and crowd motion space lies in their corresponding
parametrization, such that the variation of the data can be
captured. For gestures, we find that the combination of mul-
tiple features provide a powerful representation. In this sec-
tion, we propose a set of gesture features that can be ex-
tracted from raw gesture trajectories. Such features form the
gesture space, which is a low dimensional, continues space
in which each point represents one gesture. It allows us to
compare and distinguish gesture effectively.
Our concept of a gesture space is similar to the idea of
Motion Fields [LWB14], in which the authors propose a
high-dimensional continuous space that incorporates the set
of all possible motion states in character motion. However,
unlike character motion, the way a user performs a particular
gesture can vary significantly from person to person. For ex-
ample, users can use a different number of fingers to perform
the same intended gesture. Our proposed gesture space con-
sists of a set of features that are independent of such inter-
user variations, while capable of capturing the intent of the
input. This allows us to robustly distinguish between differ-
ent types of user gesture.
5.1. Gesture Trajectories
Here, we define the representation of raw gesture trajectories
and explain the process to resample the gesture with a spline
function.
A raw gesture is described by the set of trajectories corre-
sponding to the finger inputs provided by a user. The touch-
screen records the position of each touch points in discrete
time intervals. As a result, a gesture Graw is defined as a set
of trajectories:
gn(t)n[1,N],t[1,Tn],(1)
where Nis the total number of trajectories, Tnis the to-
tal number of time intervals (i.e. points) in the trajectory n,
the representation gn0(t0)indicates the 2D location of a spe-
cific trajectory n0at a specific time t0. Similar to existing
research [WWL07,VAW12,RVG14], we normalize the ges-
ture by translating and uniformly scaling the whole gesture.
In particular, the whole gesture is translated such that the
minimum x and y position in the gesture is at the origin. We
calculate a scaling factor λto scale the gesture uniformly:
λ=1/max(dv,dh),(2)
where dvand dhare the maximum vertical and horizontal
distance among all points respectively. After normalization,
all trajectory points are within the range [0,1]×[0,1].
We assume all touch trajectories have a similar number of
time intervals, as a gesture usually starts and ends with all
fingers touching and leaving the screen at the similar time.
This allows us to utilize spline functions for approximating
and resampling touch trajectories to the same length. In our
implementation, we apply the Hermite spline [Lal09] to ap-
proximate each of the ntouch trajectories. Then, we uni-
formly resample each trajectory from Tnpoints to THpoints.
The choice of value for THis important since undersampling
would remove too much information from the original ges-
ture, but oversampling would add unnecessary details and
increase computational overhead in later stages. We follow
the suggestion in [WWL07] and set TH=64, which works
effectively in all of our experiments. As a result, we define a
gesture Gas:
gn(t)n[1,N],t[1,TH],(3)
where Tis the pre-defined sample number.
There are multiple advantages of approximating and re-
sampling the gesture utilizing Hermite splines. First, differ-
ent touchscreens have slightly different sample rate. Resam-
pling the trajectories allows the system to work robustly with
different hardware. Second, it unifies the density of points
in a trajectory, which helps us to more accurately identify a
gesture using a gesture database. Third, from our discussion
with practitioners, crowd control is usually based on the ge-
ometry of the trajectories instead of the speed of performing
them, as the movement speeds of a crowd are usually con-
strained in graphics systems. Representing the geometry of
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
the overall trajectories with spines and then uniformly re-
sampling them allow us to model the trajectories with fixed
lengths, which removes the speed factor from the trajecto-
ries. If the gesture speed is needed, it can be calculated be-
fore the resampling stage and stored as an extra feature.
5.2. Gesture Features
Here, we define a set of high-level gesture features extracted
from the gesture trajectories. Such features are designed
to represent the essential components of a gesture in low
dimension, making them effective in identifying gestures.
Also, they are independent of the number of touch points.
As a result, with gesture features, gestures of different touch
points can be directly comparable.
The centroid feature represents the average position of
the user’s touch inputs over time. It captures the general
shape of the gesture and is independent of the number of
touch points. It is defined as a column vector:
C(G) = [cG(1),cG(2),...,cG(TH)]T,(4)
where cG(t)is the centroid at time t:
cG(t) = 1
N
N
n=1
gn(t).(5)
The distance to centroid feature represents the distance
of each touch point relative to the centroid over time. It al-
lows us to capture the spread of the gesture. It is defined as:
S(G) = [sG(1),sG(2),...,sG(TH)]T,(6)
where sG(t)is the spread at time t:
sG(t) = 1
N
N
n=1
|gn(t)cG(t)|,(7)
where |∗|represents the Euclidean norm, cG(t)is calculated
in Eq. 5.
The rotation feature represents the average cumulative
change in rotation over time of the touch inputs around the
centroid. Such a feature allows us to capture the overall ro-
tation in the gesture. It is defined as:
R(G) = [
0
t=0
rG(t),
1
t=0
rG(t),...,
TH
t=0
rG(t)]T,(8)
where rG(t)is the average change in rotation at time t:
rG(t) =
0,if t=0
1
NN
n=1arctan (gn(t)cG(t))×(gn(t1)cG(t1))
((gn(t)cG(t))·(gn(t1)cG(t1)) ,
otherwise
(9)
The minimum oriented bounding box feature repre-
sents the minimum and maximum dimension of the mini-
mum oriented bounding box (MOBB) of the touch inputs at
each time step. It allows us to represent the movement varia-
tion of the gesture over time, which can approximate the area
of the gesture. Given a set of touch points at time t,gn(t),
we apply the rotating calipers method [Tou83] to calculate
the minimum rectangle bounding the points. We extract the
width, bw(t), and the height, bh(t)of the rectangle, and de-
fine the feature as:
B(G) = [(bw(0),bh(0)),(bw(1),bh(1)),
...,(bw(TH),bh(TH))]T,(10)
Finally, the gesture space is formed by concatenation of
the four gesture features. As a result, a gesture Gcan be
represented by a point in the space with the feature vector:
G= [G(G),S(G),R(G),B(G)]T.(11)
Figure 4: Example gestures, in which (a) & (b) are more
different according to Eq. 12, (c) & (d) are more similar.
5.3. Distance between Two Gestures
Here, we explain how we compare gestures using gesture
features in the gesture space.
Given two gestures G0and G1, we define the distance as
D(G0,G1) = αDTW(C(G0),C(G1)) +
βDTW(S(G0),S(G1)) +
γDTW(R(G0),R(G1)) +
δDTW(B(G0),B(G1)),
(12)
where DTW provides a distance between two vectors us-
ing dynamic time warping [BC94], and α,β,γ, and δ,
are weights for each feature. We empirically found that
α=0.04, β=0.36, γ=0.36, and δ=0.24 work well in
our dataset. Fig. 4shows two pairs of example gestures, in
which (a) and (b) are more different according to Eq. 12
(D=6.1980), (c) and (d) are more similar (D=0.7064).
This shows that our distance function is less affected by the
number of fingers used and is effective in identifying the
context of the gestures.
The feature set and distance metrics together determine
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
the well-represented gesture space where algebraic opera-
tions can be sensibly defined. It forms the basis of the control
scheme learning in later sections.
6. Crowd Motion Space
In this section, we present our formulation of a crowd motion
space, which is conceptually similar to a gesture space. We
consider the set of movement trajectories from the charac-
ters of the crowd, and represent the overall crowd movement
with a set of features modelled by a mixture of Gaussian
processes.
6.1. Crowd Motion Trajectories
Here, we represent the motion of a crowd using the trajecto-
ries of the characters in the crowd.
A crowd motion Cis defined as a set of trajectories:
cm(s)m[1,M],t[1,S],(13)
where Mis the total number of character in the crowd, Sis
the duration of the crowd motion, the representation c0
m(s0)
indicates the 2D location (c0
m(s0).x,c0
m(s0).y)of a character
m0at time s0. Similar to the gesture trajectories, we normal-
ize the crowd motion trajectories by translating the whole
motion such that the starting point is at the origin.
We also resample the crowd motion trajectories from S
points to SHpoints using the Hermite spline [Lal09] and set
SH=64 [WWL07], as we do for the gesture trajectories in
Sec. 5.1. As a result, a crowd motion Cis defined as:
cm(s)m[1,M],t[1,SH].(14)
For the sake of calculation simplicity, we express the tra-
jectory of the m0th character, cm0(s), as a vector of serialized
X and Y positions:
cm0(s) = [cm0(1).x,c1(1).y,cm0(2).x,c1(2).y,
···,cm0(SH).x,cm0(SH).y.]T.(15)
6.2. Crowd Motion Features
Next, we present our crowd motion features that describe
the high-level features of a moving crowd. Such features are
independent of the number of characters in the crowd. They
can also be used to interpolate two crowd motions in order
to generate new ones.
Since the character trajectories in a crowd is controlled by
one input gesture, we assume that there exists a linear low
dimensional space that can represent the trajectories of all
characters. Trajectories can be treated as functions. Essen-
tially, each crowd motion is a series of 2D functions that de-
fines the trajectories of all characters. This allows us to con-
struct a low-dimensional space and represent the motion tra-
jectories of all characters using Functional Principle Compo-
nent Analysis (FPCA). FPCA projects a group of functions
linearly into a space where a mean function and functional
variations serve as the basis of function representation, sim-
ilar to PCA but on a function level. The mean function, ¯cs
where s[1,SH], can be computed by averaging the mo-
tion trajectories of all characters. Then, a set of eigenfunc-
tions, EC
V, and a set of eigenscores, EC
Scan be computed. The
eigenfunctions describe the principle movement over time of
all characters in the crowd, and the eigenscores represents
how the movement of a character can be projected into the
low dimensional space. The trajectory of the m0th character
can be recovered as:
cm0s=¯cs+EC
VEC
Sm0,(16)
where EC
Sm0is the Eigenscore of the m0th character.
Although FPCA gives a compact representation, it does
not generalize enough to take all the input variations into ac-
count such as different numbers of trajectories or style vari-
ations of the same motion. This motivates us to further gen-
eralize the representation. We discover that the high-level
visual observation of the general motions can be described
by the eigenscore distributions. As a result, modelling the
crowd motion trajectories of the whole crowd can be con-
sidered as modelling the distribution of the eigenscores of
the characters. This high-level model allows us to interpolate
the distribution of eigenscores, instead of the actual trajec-
tories, between two crowd motions effectively. In addition,
such a distribution-based representation does not depend on
the number of characters, and does not explicitly map the
trajectories of the characters from one crowd to another.
Since the eigenscores of a group of similar motions usu-
ally exhibit multi-modality, we propose to use GMM to
model the distribution of the eigenscores. There are three
main advantages. First, the non-linearity of GMM fits the
trajectory data well. Second, the multi-modality nature of
GMM captures semantic-level meanings such as the crowd
being split into multiple groups, which cannot be modelled
easily with a single model. This is particularly relevant to
crowd motion such as splitting and joining. Finally, multiple
GMMs can be easily interpolated and the interpolation has
visual as well as semantic meanings, which is important to
generate new crowd motions.
There are two import issues in applying GMMs to model
the data, which are the optimal parameters and the num-
ber of components of the model. We apply the Expectation-
Maximization algorithm [Bis96] to optimize the parameters
for the distribution of eigenscores, φ(EC
S). The component
number essentially allows the system to accurately model
multiple sub-groups in the crowd motions. In theory, it is
possible to automatically determine that by iterating from
one and choose the smallest value that reaches the required
data likelihood. In practice, we found that users rarely split
a crowd into more than two subgroups, even with two hands
controlling the crowd. As a result, two Gaussian components
are enough to model our database. For simpler motion with
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
only one sub-group, the two components in the GMM blends
together to represent the distribution of character trajecto-
ries. If more complicated crowd motions with multiple sub-
groups of characters are involved, an analysis on data like-
lihood should be performed and more components can be
used.
Therefore, the crowd motion features of a crowd Cis de-
fined by a vector:
C= [ ¯cs,EC
V,φ(EC
S)]T,(17)
where ¯csis the mean trajectory, EC
Vis the set of eigenvectors,
and φ(EC
S)Tis the distribution of the eigenscores modelled
by GMM. Conceptually, our crowd motion feature is similar
to the morphable motion primitives [MCC09,MC12]. The
difference is that it is applied on a crowd instead of an indi-
vidual motion.
Here, we include an optional step to improve the perfor-
mance of our system. We observe that there is an intrinsic re-
dundancy in the crowd motion trajectories as the characters’
trajectories are not arbitrary. Therefore, it is not necessary to
use all the eigenvectors EC
Vas the features. In fact, we only
use the first 15 principal components returned by FPCA, and
the recovered trajectories from Eq. 16 achieves <1% error
for all the crowd motion in our database. This not only re-
duces computational cost, but also removes noises that may
exist in the motion data.
7. Crowd Motion Control
In this section, we explain how a run-time gesture can be
identified based on the set of gestures in the database. Then,
we explain how such a gesture generates the corresponding
crowd motion.
7.1. Run-time Gesture Representation
Here, we explain how we represent a gesture using neigh-
bour ones in the database, which allows us to understand the
crowd motion the user intended to perform.
We have collected a set of gestures with the corresponding
crowd motions as explained in Sec. 4. The gesture space is
non-linear due to the complex nature of hand gesture. Mod-
elling the whole space with high degree non-linear functions
would require a large amount of gesture data, which is labour
intensive to obtain and would limit the feasibility to increase
the gesture types. Instead, we propose to model a local area
of the gesture space that is relevant to the run-time gesture
using a linear function. Such a method works robustly even
with smaller database and generates reliable results.
In particular, given a run-time gesture, Gr, we utilize Eq.
12 to evaluate its distance with the stored gestures in the
database. We represent Grusing a set of Knearest neigh-
bours, Gkk= [1,K]. The neighbours are associated with
Figure 5: Generating crowd motion with run-time gesture.
The circles represent gestures in the gesture space, with the
hollow one representing the gesture obtained in run-time.
Based on the run-time input, we obtained the K nearest ges-
tures, visualized by the double lines. The triangles represent
crowd motion in the crowd motion space. We find the crowd
motions corresponding to the K nearest gestures, pointed by
the black arrows. We finally interpolate these crowd motions
to create the run-time crowd motion represented by the hol-
low triangle.
the corresponding weights, wkk= [1,K], which are in-
versely proportional to the distance with respect to the run-
time gesture. The particular weight wk0that is corresponding
to the gesture Gk0is defined as:
wk0=1
D(Gr,Gk0)/
K
k=1
1
D(Gr,Gk),(18)
where the summation term acts as a normalization factor to
ensure that all the weights sum up to 1.0. In our experiment,
we found that K=10 produces good results. This process is
visualized in the left part of Fig. 5.
Since our gesture database is relatively compact, a brute
force search is quick enough to find the Knearest neigh-
bours in real-time. For a larger database, we may organize
the database with data structures such as the k-d tree to speed
up searching.
The mapping between gestures and motions is necessary
to capture the variations of control styles, even for the same
motion. A user study could be a good way to establish a map-
ping but only when there is a consensus on the best gesture
for a specific motion. The fact that different users used dif-
ferent gestures even for simple motions suggested that this
might not be the case. As an example, for the twist motion,
some users prefer to use an outwards spiral gesture but some
prefer an inwards one. In addition, when doing control on
the fly, the input variations are also better handled by the
mapping because the input gesture would not be exactly the
same as the optimal one, if there is one at all.
While it is possible to apply methods such as regression to
evaluate the run-time gesture, we find that KNN is the most
reliable way, mainly because our gesture database contains
a variety of gestures, where the sample size is big enough to
locally approximate the gesture manifold as hyper-planes. In
theory, if the database is dense enough, it could be possible
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
to use the most similar gesture only. However, KNN is more
robust against outliers, and constructing a dense database is
labour intensive.
7.2. Run-time Crowd Motion Creation
Here, given the Knearest neighbours of the run-time gesture,
we interpolate the corresponding Kcrowd motions in the
database in order to generate the run-time crowd motion.
Given a run-time gesture, the obtained Knearest gestures,
Gkk= [1,K], are corresponded with Kcrowd motions,
Ckk= [1,K], according to the database. The run-time
crowd motion, Cr= [ ¯
cr
s,ECr
V,φ(ECr
S)]T, is evaluated as the
weighted sum of the Kcrowd motions. This process is vi-
sualized in the right part of Fig. 5. Such an interpolation in-
volves interpolating the crowd motion features individually
as follows.
The run-time mean crowd trajectory can be obtained by
vector sum, as all mean trajectories in the database has the
same size SH:
¯
cr
s=
K
k=1
wk¯
ck
s.(19)
Similarly, we interpolate and create a new set of eigenvec-
tors:
ECr
V=
K
k=1
wkECk
V.(20)
To ensure orthonormalization of the new eigenvectors, we
apply the modified Gramm-Schmidt method presented in
[CK08].
Figure 6: (Left) Mixing two GMM (each with two compo-
nents) can generate different results depending on how the
Gaussian components are matched. (Middle) The desired re-
sult that retains the features of the source GMMs. (Right)
The undesired result of cross fading.
The blend weights wkis important to ensure the quality
of the resultant GMM, which account for the naturalness of
the generated crowd motion. Considering that our gesture-
motion pairs in the database are very distinctive and that both
the gesture space and crowd motion space can be modelled
by the local hyperplane, we use wk0in Eq. 18 as the blend
weights for the crowd motion wk. The underlying assump-
tion here is that similar gesture in the gesture space would
indicate similar crowd motion in the crowd motion space.
Finally, we propose a mass transport solver based method
to combine multiple distributions of eigenscores and gener-
ate φ(ECr
S). A naive one-to-one combination of the Gaussian
components of two GMMs does not work well. As shown in
Fig. 6, assuming each GMM has two components, depend-
ing on how we match the components, blending two GMMs
has two possible outputs. One of them retains the features
from the source GMM, while the other does not as Gaussian
components of very different parameters are blended, result-
ing in a scenario known as cross fading. We follow the dis-
placement interpolation method presented by [BvdPPH11]
here. First, given two GMMs, we establish the correspon-
dence of their Gaussian components. Each Gaussian com-
ponent is defined by a mean value and a covariance value.
We evaluate the correspondence using the mass transport
solver [HSK12], in which the source and target are set as
the Gaussian means of the Gaussian components. Second,
we produce a weighted sum of the Gaussian mean and co-
variance of each matching Gaussian component, in which
the weights are obtained by Eq. 18, in order to generate a
combined GMM. We iteratively combine all the GMMs in
the Knearest crowd motions, and obtain φ(ECr
S).
7.3. Crowd Motion Synthesis
Here, we explain how we apply the crowd motion created in
the last section to control a run-time crowd.
Assume that the user is controlling a group of Mchar-
acters. Given a user gesture, we obtain the correspond-
ing crowd motion Cr= [ ¯
cr
s,ECr
V,φ(ECr
S)]Tas explained in
Sec. 7.2. We first utilize the distribution of the eigenscores,
φ(ECr
S)to sample Meigenscores. Second, we apply the
eigenscores with the mean trajectory ¯
cr
sand eigenvectors ECr
V
to generate Mcrowd motion trajectories using Eq. 16. Third,
we apply a mass transport solver [HSK12] to find out the
optimal matching between the controlling characters and the
calculated crowd motion trajectories. This is done by setting
the positions of the characters as the source and the start-
ing points of the trajectories as the target. By using the mass
transport solver to evaluate the matching, we avoid visual
artifact in which characters have to move a long distance
before reaching the starting point of their corresponding tra-
jectories. Finally, the characters move to the starting point of
their respective trajectories, and then follow the trajectories,
in order to produce the overall motion.
For handling collision detection and avoidance, we im-
plemented the high-level crowd motion synthesis and the
low-level character collision avoidance as two separate lev-
els. The high-level system provides the desired position of
all characters in the crowd, while the low-level system re-
solves their positions locally. In our experiments, the low-
level system models each character as a cylinder, and uti-
lizes a spring-mass model to calculate the forces required to
move the characters into their respective target locations. It
resolves the penetration among characters by calculating the
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
push-back forces based on the penetrated depth and direc-
tion. Time-integration is applied in each time step to calcu-
late the positions of the characters after all forces are applied.
Other more advanced collision avoidance systems can be di-
rectly employed into our framework.
We apply the full body motion synthesis method in
[SKY12] as an offline process to generate full body run-
ning motion based on the point-based movement trajecto-
ries. This involves creating a motion graph that consists of
different running actions, and evaluating the optimal action
to perform in order to follow the trajectories. We also apply
the physical modelling method in [SH12] to create physi-
cally plausible movements. This allows us to resolve body
part level collisions and penetrations effectively.
8. Experimental Results
In this section, we provide both qualitative and quantitative
evaluations of our proposed system.
All the experiments are run with one core of a Core i7
2.67GHz CPU with 1GB of memory. For the multi-touch in-
put, we used a G4 multitouch overlay from PQ labs, attached
to a 24" Acer S240HL LCD monitor. In general, the system
runs in 40 frames per second, which is higher than the real-
time requirement of 30 frames per second. However, there is
a slow-down when computing a new crowd motion from a
hand gesture, which takes 330 ms, including 300 ms for the
KNN searches, 12 ms for generating the crowd motion fea-
tures, 4 ms for generating and assigning trajectories to char-
acters. We believe that adapting a multi-thread implementa-
tion framework can create more consistent frame rate. Also,
more efficient search algorithms such as k-d tree search can
further reduce the computational time.
8.1. Qualitative Evaluations
Here, we evaluate our system qualitatively with different ex-
periments. The readers are referred to our supplementary
video for more results.
First, we evaluate the effectiveness of our method by pro-
ducing a set of crowd motions from a number of user inputs.
Fig. 7shows some user gestures and their corresponding
simulated crowd motion. Our system generates crowd mo-
tions that accurately reflect the different user gesture types. It
also works well under different initial positions of the char-
acters. The number of touch points provided for the gestures
does not affect our system’s ability to produce the appropri-
ate crowd motion.
We setup some virtual environments and ask a user to use
our system to control the navigation of the crowd. Fig. 8(up-
per) shows a corridor environment. The initial crowd cannot
fit through the narrow corridor. The user therefore applies
acontract gesture to reduce the size of the crowd, and two
translate gestures to move the crowd through the corridor.
Finally, the user applies an expand to expand the crowd to
its original size. Fig. 8(lower) shows a more complicated
environment, in which there is an obstacle in the middle of
a corridor. The user successively applies the gestures trans-
late,split,translate,join and translate such that the crowd
can avoid the obstacle and reach the other side of the envi-
ronment. The user finally applies a twist gesture such that
the crowd can rotate inside the circular environment. These
experiments show that our system can potentially be applied
to console games that require crowd control, such as the real-
time strategy games.
We generate a high-quality, complicated scenario in
which 100 characters avoid a number of dynamic moving
cars, as shown in Fig. 9. The user controls the crowd move-
ment with our touch-based system that offers intuitive con-
trol on the timing for the change of formation. Multiple ges-
tures are required to steer the crowd. This kind of real-time,
precise, interactive control is difficult to be achieved with
existing systems. As this demo focuses on demonstrating
the animating power of the system for generating realistic
scenes, we implement a Gaussian filter to smooth the crowd
motion transitions.
While we propose to utilize [HSK12,HSK14] to gen-
erate examples for constructing the crowd motions in the
database, the overall framework is independent of the under-
lying method to generate the crowd motions. Basic crowd
simulation systems that control characters by setting the
starting and goal positions can effectively generate the
database and produce comparable results. To demonstrate
this, we perform an experiment to utilize the Reciprocal Ve-
locity Obstacle (RVO) 2 system [vdBLM08] to generate the
crowd motion database and synthesize new crowd motions
based on the user inputs. We compare the newly created re-
sults with those generated by our existing database, as shown
in Fig. 10. We find that while the two databases result in
crowds of different behaviour due to the different training
data, the resultant crowd motion quality is comparable. This
demonstrates the generalizability of our control system.
8.2. Quantitative Evaluations
In order to test if our proposed gesture features are discrim-
inative, we conduct leave-one-out cross validation using the
gestures for the 6 types of typical crowd motions (i.e. trans-
late, twist, contract, expand, join, split). We first use the ges-
ture features of one gesture as testing data for classification,
and that of all other gesture as training data. We then obtain
the Knearest gestures. Within them, we conduct a weighted
nearest neighbour voting with the weight obtained from Eq.
18, where the gesture type with the highest total weight is
considered to be the recognized type. We finally check if
such a type is the same as the real gesture type of the testing
data. We iteratively evaluate all gestures and calculate the
average accuracy. Fig. 11 shows the confusion matrix of this
analysis. It shows that the proposed gesture features are dis-
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
Figure 7: Examples of user input (orange lines) and the corresponding crowd simulation (blue lines) for (a) translate, (b) twist,
(c) contract, (d) expand, (e) join, (f) split, (g) split then translate, (h) translate then join, (i) twist while expanding, and (j) twist
while contracting. The light colours indicate the starts of the trajectories the dark colours indicate the ends.
Figure 8: Screenshots of a user controlling a crowd to nav-
igate through (upper) a corridor environment, and (lower) a
more complicated environment with an obstacle.
Figure 9: Screenshots of a user controlling a crowd in a
complicated scenario with dynamic obstacles.
criminative in order to accurately identify an unknown ges-
ture. The average classification accuracy is 89.6%. For all
gesture types except contract, the accuracy is over 87.5%.
The contract type has a lower accuracy of 62.5%, as some of
the gesture samples are very similar to those in the join type.
We visualize the four gesture features in Fig. 12 to show
that they are effective in representing different gesture types.
Here, we group the collected gestures based on the 6 types
of typical crowd motions they represent and calculate the
average gesture per group. We then plot the features to see
how distinctive different gesture groups can be. While indi-
Figure 10: The synthesized expand crowd motion using the
database built with (left) Henry et al. and (right) RVO2.
Figure 11: Confusion matrix of 6 typical motion types. The
cell in column i, row j indicates the proportion of all ith test
gestures recognized as the jth output gesture.
vidual features may not be able to clearly represent all the
types (e.g. S(G)cannot distinguish expand and split easily),
the features are complementary to each other (e.g. B(G)can
distinguish expand and split well).
9. Limitations
Our system does not consider the mapping between the
gestures and the full-body motions of the characters. Al-
though this is an interesting idea, such a mapping will suf-
fer from the ambiguity such as the walking phase as pre-
sented in [HKS17], and extra parameter inputs will be re-
quired. Instead, the detailed movements (e.g. walking or
jumping motions) are modeled by another sub-system given
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
Figure 12: The visualization of gesture features by types: (a) avg. y-position vs. avg. x-position for C(G), (b) avg. distance
from centroid vs. time for S(G), (c) avg. total rotation vs. time for R(G), (d) avg. bounding box height vs. width for B(G).
the computed trajectories. Splitting the mapping into two
sub-systems leaves the degrees of freedom to the animators
for designing their preferred movements. This idea is similar
to the framework proposed by [HSK14].
We only consider zero-order information (i.e. positions)
based on the advice from practitioners that crowd is typ-
ically geometry-based. Depending on the application, if
higher-order information such as velocity and acceleration
is needed, one may extract and integrate the information into
the feature vector. It will require some extra designs to map
multi-order information between gestures and motions in an
effective manner.
10. Conclusion and Discussions
In this paper, we propose a data-driven approach for crowd
control using a multi-touch device. Our method learns from
a set of user-performed gestures and allows a user to intu-
itively control a crowd. To achieve this, we propose a set of
gesture features that represent high-level information of the
user-performed gestures. We also propose a method to ex-
tract crowd motion features using a mixture of Gaussian pro-
cesses. Given a run-time gesture, we perform a KNN search
in our gesture database, and find the K corresponding crowd
motions. We then combine the K crowd motions to control
the run-time crowd. Our system runs in real-time and has
high control accuracy.
Like many existing systems, the simulation time increases
with the number of characters. However, our system is rela-
tively computational efficient with a large number of charac-
ters. This is because the majority part of the system is based
on the extracted motion features and gesture features, which
are independent of the number of characters. The only step
that is computationally proportional to the character number
is the synthesis of the final crowd motions.
Theoretically speaking, given the right gesture, it is possi-
ble to interpolate two classes of crowd motion (e.g. translate
and join) to generate a new run-time motion. However, it
rarely happens in practice due to the relatively wide range of
gestures we collected to cover the possible variation within
the same class. As a result, the interpolation performed is
mostly intra-class.
Theoretically, the mapping could be contaminated if the
gesture-motion pairs are not generated well, such as two
similar gestures generating very different motions or vice
versa. In practice, we find that KNN helps to reduce the ef-
fect of outlier mappings, as multiple motions/gestures are
combined, and less similar ones are given smaller weights.
Also, the mapping in our database is very descriptive thanks
to the distinctiveness among the types of basic motions,
which results in a set of distinctive corresponding gestures.
More complex motions can be decomposed into the com-
binations of basic ones to avoid over-complicated motion-
gesture mappings.
Our system analyzes the gesture in a discrete manner.
Each gesture controls the crowd in a short time interval. One
possible solution is to apply the continuous recognition al-
gorithms proposed in [SKT11], in which the input gesture is
continuously being recognized using a variable size sliding
window.
An interesting research direction is to introduce more
intra-class differences in the crowd motion. For example, we
can have a spread-out translate crowd motion and a con-
densed one. We then collect corresponding gesture inputs
from the user into the database. As a result, a small gesture
difference will generate a small variation of the crowd mo-
tion, allowing fine control of the crowd.
Acknowledgement
This project was supported in part by the Engineering
and Physical Sciences Research Council (EPSRC) (Ref:
EP/M002632/1) and the Royal Society (REF: IE160609).
References
[BC94] BERNDT D. J., CLIFFOR D J.: Using dynamic time warp-
ing to find patterns in time series. In KDD Workshop (1994),
Fayyad U. M., Uthurusamy R., (Eds.), AAAI Press, pp. 359–370.
6
[Bis96] BISHOP C. M.: Neural Networks for Pattern Recognition.
Oxford University Press, Oxford, UK, 1996. 7
[BvdPPH11] BONNE EL N., VAN DE PA NNE M ., PARIS S. , HE I-
DRICH W.: Displacement interpolation using lagrangian mass
transport. Proceedings of the 2011 SIGGRAPH Asia Conference
on - SA ’11 (2011). 9
[CK08] CHENEY W., KINCAID D. R.: Linear Algebra: Theory
and Applications, 1st ed. Jones and Bartlett Publishers, Inc.,
USA, 2008. 9
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
[GCG10] GÖRG M. T., CEB ULLA M., GARZON S. R.: A frame-
work for abstract representation and recognition of gestures in
multi-touch applications. In Advances in Computer-Human In-
teractions, 2010. ACHI’10. Third International Conference on
(2010), IEEE, pp. 143–147. 2
[GD11] GUQ., DE NG Z.: Formation sketching: an approach to
stylize groups in crowd simulation. In Proceedings of Graphics
Interface 2011 (2011), Canadian Human-Computer Communica-
tions Society, pp. 1–8. 3
[GD13] GUQ., DE NG Z.: Generating freestyle group formations
in agent-based crowd simulations. IEEE Computer Graphics and
Applications 33, 1 (2013), 20–31. 3
[GGH03] GIBBON D., GU T U., HEL L B., LOO KS K., TH IE S
A., TRI PPEL T.: A computational model of arm gestures in con-
versation. In INTERSPEECH (2003). 2
[HKS17] HOLDE N D., KOMURA T., SA ITO J .: Phase-functioned
neural networks for character control. ACM Trans. Graph. 36, 4
(July 2017), 42:1–42:13. 11
[HSK12] HENRY J., SHUM H. P. H., KOMURA T.: Environment-
aware real-time crowd control. In Proceedings of the 11th ACM
SIGGRAPH / Eurographics conference on Computer Animation
(Aire-la-Ville, Switzerland, Switzerland, 2012), EUROSCA’12,
Eurographics Association, pp. 193–200. 1,3,4,9,10
[HSK14] HENRY J., SHUM H. P. H., KOMURA T.: Interactive
formation control in complex environments. IEEE transactions
on visualization and computer graphics 20, 2 (2014), 211–222.
1,3,4,10,12
[JCP10] JUE., CH OI M. G., PARK M., LEE J., L EE K. H.,
TAKAHASHI S.: Morphable crowds. ACM Transactions on
Graphics (TOG) 29, 6 (2010), 140. 3
[JTZ12] JIANG Y., TIAN F., ZHANG X., LIU W., DAI G.,
WANG H.: Unistroke gestures on multi-touch interaction: Sup-
porting flexible touches with key stroke extraction. In Proceed-
ings of the 2012 ACM International Conference on Intelligent
User Interfaces (New York, NY, USA, 2012), IUI ’12, ACM,
pp. 85–88. 2
[JXW08] JIN X., XUJ., WANG C. C., HUANG S., ZHANG J.:
Interactive control of large-crowd navigation in virtual environ-
ments using vector fields. Computer Graphics and Applications,
IEEE 28, 6 (2008), 37–46. 3
[KHDA12] KIN K., HARTMANN B. , DERO SE T., AG RAWALA
M.: Proton++: A customizable declarative multitouch frame-
work. In Proceedings of the 25th Annual ACM Symposium on
User Interface Software and Technology (New York, NY, USA,
2012), UIST ’12, ACM, pp. 477–486. 2
[KHKL09] KIM M., HYUN K., KIM J ., LEE J.: Synchronized
multi-character motion editing. In ACM Transactions on Graph-
ics (TOG) (2009), vol. 28, ACM, p. 79. 3
[KLLT08] KWON T., LEE K. H., LEE J., TAKAHASHI S.: Group
motion editing. In ACM Transactions on Graphics (TOG) (2008),
vol. 27, ACM, p. 80. 3
[KO10] KARAMOUZAS I., OVERMARS M.: Simulating the lo-
cal behaviour of small pedestrian groups. In Proceedings of the
17th ACM Symposium on Virtual Reality Software and Technol-
ogy (2010), ACM, pp. 183–190. 3
[KSII09] KATO J., SA KAMOTO D., INAM I M., I GARASHI T.:
Multi-touch interface for controlling multiple mobile robots. In
CHI’09 Extended Abstracts on Human Factors in Computing
Systems (2009), ACM, pp. 3443–3448. 3
[KSKL14] KIM J., SEOL Y., KWON T., LEE J .: Interactive ma-
nipulation of large-scale crowd animation. ACM Transactions on
Graphics 33, 4 (Jul 2014), 1–10. 3
[KWK10] KAMMER D., WOJDZIAK J., KECK M., GROH R.,
TARA NKO S.: Towards a formalization of multi-touch gestures.
In ACM International Conference on Interactive Tabletops and
Surfaces (New York, NY, USA, 2010), ITS ’10, ACM, pp. 49–
58. 2
[Lal09] LALES CU C. C. : Two hierarchies of spline interpolations.
practical algorithms for multivariate higher order splines. arXiv
(2009). 5,7
[LCHL07] LEE K. H., CHOI M. G., H ON G Q., L EE J.: Group
behavior from video: a data-driven approach to crowd simula-
tion. In Proceedings of the 2007 ACM SIGGRAPH/Eurographics
symposium on Computer animation (2007), Eurographics Asso-
ciation, pp. 109–118. 3
[LL12] LÜH., LIY.: Gesture coder: a tool for programming
multi-touch gestures by demonstration. In Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems
(2012), ACM, pp. 2875–2884. 2
[LL13] LÜH., LIY.: Gesture studio: authoring multi-touch in-
teractions through demonstration and declaration. In Proceed-
ings of the SIGCHI Conference on Human Factors in Computing
Systems (2013), ACM, pp. 257–266. 2,3
[LWB14] LEE Y., WAMPLER K., BE RN STE IN G., PO POVI ´
CJ.,
POP OVI ´
CZ.: Motion fields for interactive character locomotion.
Communications of the ACM 57, 6 (Jun 2014), 101–108. 5
[MC12] MIN J., CHAI J.: Motion graphs++. ACM Transactions
on Graphics 31, 6 (Nov 2012), 1. 8
[MCC09] MIN J., CHEN Y.-L., CHAI J.: Interactive genera-
tion of human animation with deformable motion models. ACM
Transactions on Graphics 29, 1 (Dec 2009), 1–12. 8
[MDC09] MICIRE M., DE SAI M., COURTEMANCHE A., TSUI
K. M., YANC O H. A.: Analysis of natural gestures for control-
ling robot teams on multi-touch tabletop surfaces. In Proceedings
of the ACM International Conference on Interactive Tabletops
and Surfaces (2009), ACM, pp. 41–48. 2
[OO09] OSHITA M., OG IWAR A Y.: Sketch-based interface for
crowd animation. In Smart Graphics (2009), Springer, pp. 253–
262. 3
[Par10] PARK M. J.: Guiding flows for controlling crowds. The
Visual Computer 26, 11 (2010), 1383–1391. 3
[PVDBC11] PATIL S., VAN DEN BER G J., CURT IS S., LI N
M. C., MANOCHA D.: Directing crowd simulations using navi-
gation fields. IEEE Transactions on Visualization and Computer
Graphics 17, 2 (2011), 244–254. 3
[RGR13] REKIK Y., GRISONI L., RO US SEL N .: Towards Many
Gestures to One Command: A User Study for Tabletops. In IN-
TERACT - 14th IFIP TC13 Conference on Human-Computer In-
teraction (Cape Town, South Africa, Sept. 2013), Nelson Man-
dela Metropolitan University, CSIR Meraka Institute and the Uni-
versity of Cape Town, Springer. 2
[RVG14] RE KIK Y., VATAVU R.-D., GRISONI L.: Match-up &
conquer. Proceedings of the 2014 International Working Confer-
ence on Advanced Visual Interfaces - AVI ’14 (2014), 201–208.
2,5
[SCOL04] SORKINE O., CO HEN-ORD., LIPMAN Y., ALEXA
M., RÖS SL C., SEIDEL H .-P.: Laplacian surface editing. In Pro-
ceedings of the 2004 Eurographics/ACM SIGGRAPH symposium
on Geometry processing (2004), ACM, pp. 175–184. 3
[SH12] SHUM H. P. H., HOE. S . L.: Real-time physical mod-
elling of character movements with microsoft kinect. In Proceed-
ings of the 18th ACM Symposium on Virtual Reality Software and
Technology (New York, NY, USA, Dec 2012), VRST ’12, ACM,
pp. 17–24. 10
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
Y. Shen, J. Henry, H. Wang, E. S. L. Ho, T. Komura & H. P. H. Shum / Data-Driven Crowd Motion Control with Multi-touch Gestures
[SKT11] SHUM H. P. H., KOMURA T., TAK AGI S.: Fast
accelerometer-based motion recognition with a dual buffer
framework. The International Journal of Virtual Reality 10, 3
(Sep 2011), 17–24. 12
[SKY12] SHUM H. P. H., KOMURA T., YAMAZAKI S.: Simulat-
ing multiple character interactions with collaborative and adver-
sarial goals. IEEE Transactions on Visualization and Computer
Graphics 18, 5 (May 2012), 741–752. 10
[SMTT17] STUVE L S. A., MAGNE NAT-THALMANN N.,
THALMANN D., VAN DER STA PPE N A. F., E GG ES A., UN DE -
FINED,U NDEFIN ED,UND EFINED ,UN DEFI NE D: Torso crowds.
IEEE Transactions on Visualization and Computer Graphics 23,
7 (2017), 1823–1837. 3
[TCP06] TREUILLE A., CO OPER S., PO POVI ´
CZ.: Contin-
uum crowds. In ACM Transactions on Graphics (TOG) (2006),
vol. 25, ACM, pp. 1160–1168. 3
[Tou83] TOUSSAINT G.: Solving geometric problems with the
rotating calipers, 1983. 6
[TYK09] TAKAHASHI S., YO SH IDA K., KW ON T., LEE K. H.,
LEE J., SHIN S. Y.: Spectral-based group formation control. In
Computer Graphics Forum (2009), vol. 28, Wiley Online Library,
pp. 639–648. 3
[VAW12] VATAVU R.-D., ANTHONY L., WOB BROCK J. O.:
Gestures as point clouds: A $p recognizer for user interface pro-
totypes. In Proceedings of the 14th ACM International Confer-
ence on Multimodal Interaction (New York, NY, USA, 2012),
ICMI ’12, ACM, pp. 273–280. 2,5
[vdBLM08] VAN DEN BE RG J., LI N M., MANOCHA D.: Recip-
rocal velocity obstacles for real-time multi-agent navigation. In
2008 IEEE International Conference on Robotics and Automa-
tion (May 2008), pp. 1928–1935. 10
[WOO16] WANG H., ON D ˇ
REJ J., O’SULLIVAN C.: Path pat-
terns: Analyzing and comparing real and simulated crowds. In
ACM SIGGRAPH Symposium on Interactive 3D Graphics and
Games 2016 (2016), ACM, pp. 49–57. 3
[WWL07] WOB BRO CK J. O., W ILSON A. D., LIY.: Gestures
without libraries, toolkits or training: A $1 recognizer for user in-
terface prototypes. In Proceedings of the 20th Annual ACM Sym-
posium on User Interface Software and Technology (New York,
NY, USA, 2007), UIST ’07, ACM, pp. 159–168. 5,7
[XWY15] XUM., WUY., YEY., FARKAS I., JI ANG H.,
DENG Z.: Collective crowd formation transform with mutual
information–based runtime feedback. In Computer Graphics Fo-
rum (2015), vol. 34, Wiley Online Library, pp. 60–73. 3
[ZZC14] ZHENG L., ZHAO J. , CH ENG Y., CHEN H ., LIU X.,
WANG W.: Geometry-constrained crowd formation animation.
Computers & Graphics 38 (2014), 268–276. 3
c
2018 The Author(s)
Computer Graphics Forum c
2018 The Eurographics Association and John Wiley & Sons Ltd.
... They construct multiple decoders for depth, mask and curvature flow, and therefore the system requires ex-man surface from a sparse input with a large motion database. Shen et al. (Shen et al., 2018) map complex gestures to crowd movement for gesture-based crowd control. Shum et al. reconstruct noisy human motion captured by Kinect. ...
... Learning a global model from all cars for fine details is therefore highly ineffective. Motivated by the success of lazy learning in mesh processing Shen et al., 2018;Ho et al., 2013), we propose to adapt lazy learning to reconstructed the details of the car shape. Unlike traditional machine learning approaches that generalize data in the whole database as a preprocess, lazy learning postpones the generalization to run-time . ...
... global model from all cars for fine details is therefore highly ineffective. Motivated by the success of lazy learning in mesh processing Shen et al., 2018;Ho et al., 2013), we propose to adapt lazy learning to reconstructed the details of the car shape. ...
... Spatial constraints such as those introduced by Henry et al. (2014) allow agents to keep a reasonable distance while maintaining the overall formation. Such approaches can be extended to an arbitrary number of agents through control signal abstraction (Shen et al., 2018). This research proposes a constrained minimisation method to maintain the distances between UAVs. ...
Article
Full-text available
Existing studies on formation control for unmanned aerial vehicles (UAV) have not considered encircling targets where an optimum coverage of the target is required at all times. Such coverage plays a critical role in many real-world applications such as tracking hostile UAVs. This paper proposes a new path planning approach called the Flux Guided (FG) method, which generates collision-free trajectories for multiple UAVs while maximising the coverage of target(s). Our method enables UAVs to track directly toward a target whilst maintaining maximum coverage. Furthermore, multiple scattered targets can be tracked by scaling the formation during flight. FG is highly scalable since it only requires communication between sub-set of UAVs on the open boundary of the formation’s surface. Experimental results further validate that FG generates UAV trajectories 1.5× shorter than previous work and that trajectory planning for 9 leader/follower UAVs to surround a target in two different scenarios only requires 0.52 s and 0.88 s, respectively. The resulting trajectories are suitable for robotic controls after time-optimal parameterisation; we demonstrate this using a 3d dynamic particle system that tracks the desired trajectories using a PID controller.
... Spatial constraints such as those introduced by Henry et al. (2014) allow agents to keep a reasonable distance while maintaining the overall formation. Such approaches can be extended to an arbitrary number of agents through control signal abstraction (Shen et al., 2018). This research proposes a constrained minimisation method to maintain the distances between UAVs. ...
... Learning a universal model from all cars with fine details is therefore highly ineffective. Motivated by the success of lazy learning in mesh processing [3,20,45], we propose to adapt lazy learning to reconstruct the details. ...
Article
Full-text available
3D car models are heavily used in computer games, visual effects, and even automotive designs. As a result, producing such models with minimal labour costs is increasingly more important. To tackle the challenge, we propose a novel system to reconstruct a 3D car using a single sketch image. The system learns from a synthetic database of 3D car models and their corresponding 2D contour sketches and segmentation masks, allowing effective training with minimal data collection cost. The core of the system is a machine learning pipeline that combines the use of a generative adversarial network (GAN) and lazy learning. GAN, being a deep learning method, is capable of modelling complicated data distributions, enabling the effective modelling of a large variety of cars. Its major weakness is that as a global method, modelling the fine details in the local region is challenging. Lazy learning works well to preserve local features by generating a local subspace with relevant data samples. We demonstrate that the combined use of GAN and lazy learning produces is able to produce high-quality results, in which different types of cars with complicated local features can be generated effectively with a single sketch. Our method outperforms existing ones using other machine learning structures such as the variational autoencoder.
... Quantitative simulation guidance has been investigated before, through user control or real-world data. In the former, trajectorybased user control signals can be converted into guiding trajectories for simulation [Shen et al. 2018]. Predefined crowd motion 'patches' can be used to compose heterogeneous crowd motions [Jordao et al. 2014]. ...
Article
Full-text available
Fig. 1. Overview of our framework. Crowd simulation is a central topic in several fields including graphics. To achieve high-fidelity simulations, data has been increasingly relied upon for analysis and simulation guidance. However, the information in real-world data is often noisy, mixed and unstructured, making it difficult for effective analysis, therefore has not been fully utilized. With the fast-growing volume of crowd data, such a bottleneck needs to be addressed. In this paper, we propose a new framework which comprehensively tackles this problem. It centers at an unsupervised method for analysis. The method takes as input raw and noisy data with highly mixed multi-dimensional (space, time and dynamics) information, and automatically structure it by learning the correlations among these dimensions. The dimensions together with their correlations fully describe the scene semantics which consists of recurring activity patterns in a scene, manifested as space flows with temporal and dynamics profiles. The effectiveness and robustness of the analysis have been * Corresponding author † Corresponding author tested on datasets with great variations in volume, duration, environment and crowd dynamics. Based on the analysis, new methods for data visual-ization, simulation evaluation and simulation guidance are also proposed. Together, our framework establishes a highly automated pipeline from raw data to crowd analysis, comparison and simulation guidance. Extensive experiments and evaluations have been conducted to show the flexibility, versatility and intuitiveness of our framework.
Article
Full-text available
Realistic crowd simulation has been pursued for decades, but it still necessitates tedious human labour and a lot of trial and error. The majority of currently used crowd modelling is either empirical (model‐based) or data‐driven (model‐free). Model‐based methods cannot fit observed data precisely, whereas model‐free methods are limited by the availability/quality of data and are uninterpretable. In this paper, we aim at taking advantage of both model‐based and data‐driven approaches. In order to accomplish this, we propose a new simulation framework built on a physics‐based model that is designed to be data‐friendly. Both the general prior knowledge about crowds encoded by the physics‐based model and the specific real‐world crowd data at hand jointly influence the system dynamics. With a multi‐granularity physics‐based model, the framework combines microscopic and macroscopic motion control. Each simulation step is formulated as an energy optimization problem, where the minimizer is the desired crowd behaviour. In contrast to traditional optimization‐based methods which seek the theoretical minimizer, we designed an acceleration‐aware data‐driven scheme to compute the minimizer from real‐world data in order to achieve higher realism by parameterizing both velocity and acceleration. Experiments demonstrate that our method can produce crowd animations that are more realistically behaved in a variety of scales and scenarios when compared to the earlier methods.
Chapter
Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored. The former include rule-based, geometric or optimization-based models, and the latter are mainly comprised of deep learning approaches. In this paper, we propose a new method combining both methodologies based on a new Neural Differential Equation model. Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters. The explicit physics model serves as a strong inductive bias in modeling pedestrian behaviors, while the rest of the network provides a strong data-fitting capability in terms of system parameter estimation and dynamics stochasticity modeling. We compare NSP with 15 recent deep learning methods on 6 datasets and improve the state-of-the-art performance by 5.56%–70%. Besides, we show that NSP has better generalizability in predicting plausible trajectories in drastically different scenarios where the density is 2–5 times as high as the testing data. Finally, we show that the physics model in NSP can provide plausible explanations for pedestrian behaviors, as opposed to black-box deep learning. Code is available: https://github.com/realcrane/Human-Trajectory-Prediction-via-Neural-Social-Physics.KeywordsHuman trajectory predictionNeural differential equations
Article
Crowd formation transformation is the process of controlling crowd movement under high constraints, which has received massive attention in recent years due to its ability to generate high‐quality visual effects. It is challenging because of the complexity and the lack of evaluation standard. This study introduces an anchor‐based crowd formation transformation control method that achieves highly controllable crowd movement by extracting anchors and optimizing a loss function based on anchor constraints. In addition, we propose a series of metrics to assist users in quantifying a segment of group motion. Some cases not considered in past work are designed to demonstrate the superiority of this method. With the help of proposed metrics, it could be found that in addition to being able to uniquely accomplish some tasks, our method also exhibits higher control over the crowd in other scenarios. Compared with methods of other research, our method has significant advantages in diversity and controllability of the generated results. This study introduces an anchor‐based crowd formation transformation control method that achieves highly controllable crowd movement by extracting anchors and optimizing a loss function based on anchor constraints. In addition, we propose a series of metrics to assist users in quantifying a segment of group motion. With the help of proposed metrics, it could be found that in addition to being able to uniquely accomplish some tasks, our method also exhibits higher control over the crowd in other scenarios.
Article
Full-text available
The real‐time simulation of human crowds has many applications. In a typical crowd simulation, each person ('agent') in the crowd moves towards a goal while adhering to local constraints. Many algorithms exist for specific local ‘steering’ tasks such as collision avoidance or group behavior. However, these do not easily extend to completely new types of behavior, such as circling around another agent or hiding behind an obstacle. They also tend to focus purely on an agent's velocity without explicitly controlling its orientation. This paper presents a novel sketch‐based method for modelling and simulating many steering behaviors for agents in a crowd. Central to this is the concept of an interaction field (IF): a vector field that describes the velocities or orientations that agents should use around a given ‘source’ agent or obstacle. An IF can also change dynamically according to parameters, such as the walking speed of the source agent. IFs can be easily combined with other aspects of crowd simulation, such as collision avoidance. Using an implementation of IFs in a real‐time crowd simulation framework, we demonstrate the capabilities of IFs in various scenarios. This includes game‐like scenarios where the crowd responds to a user‐controlled avatar. We also present an interactive tool that computes an IF based on input sketches. This IF editor lets users intuitively and quickly design new types of behavior, without the need for programming extra behavioral rules. We thoroughly evaluate the efficacy of the IF editor through a user study, which demonstrates that our method enables non‐expert users to easily enrich any agent‐based crowd simulation with new agent interactions.
Conference Paper
Full-text available
Crowd simulation has been an active and important area of research in the field of interactive 3D graphics for several decades. However, only recently has there been an increased focus on evaluating the fidelity of the results with respect to real-world situations. The focus to date has been on analyzing the properties of low-level features such as pedestrian trajectories, or global features such as crowd densities. We propose a new approach based on finding latent Path Patterns in both real and simulated data in order to analyze and compare them. Unsupervised clustering by non-parametric Bayesian inference is used to learn the patterns, which themselves provide a rich visualization of the crowd's behaviour. To this end, we present a new Stochastic Variational Dual Hierarchical Dirichlet Process (SV-DHDP) model. The fidelity of the patterns is then computed with respect to a reference, thus allowing the outputs of different algorithms to be compared with each other and/or with real data accordingly.
Technical Report
Full-text available
We present a novel dense crowd simulation method. In real crowds of high density, people manoeuvring the crowd need to twist their torso to pass between others. Our proposed method employs capsule-shaped agents, which enables us to plan such torso orientations. Contrary to other crowd simulation systems, which often focus on the movement of the entire crowd, our method distinguishes between active agents that try to manoeuvre through the crowd, and passive agents that have no incentive to move. We introduce the concept of a focus point to influence crowd agent orientation. Recorded data from real human crowds are used for validation, which shows that our proposed model produces equivalent paths for 85% of the validation set. Furthermore, we present a character animation technique that uses the results from our crowd model to generate torso-twisting and side-stepping characters.
Conference Paper
Full-text available
Multi-touch gestures are often thought by application designers for a one-to-one mapping between gestures and commands, which does not take into account the high variability of user gestures for actions in the physical world; it can also be a limitation that leads to very simplistic interaction choices. Our motivation is to make a step toward many-to-one mappings between user gestures and commands, by understanding user gestures variability for multi-touch systems; for doing so, we set up a user study in which we target symbolic gestures on tabletops. From a first phase study we provide qualitative analysis of user gesture variability; we derive this analysis into a taxonomy of user gestures, that is discussed and compared to other existing taxonomies. We introduce the notion of atomic movement; such elementary atomic movements may be combined throughout time (either sequentially or in parallel), to structure user gesture. A second phase study is then performed with specific class of gesture-drawn symbols; from this phase, and according to the provided taxonomy, we evaluate user gesture variability with a fine grain quantitative analysis. Our findings indicate that users equally use one or two hands, also that more than half of gestures are achieved using parallel or sequential combination of atomic movements. We also show how user gestures distribute over different movement categories, and correlate to the number of fingers and hands engaged in interaction. Finally, we discuss implications of this work to interaction design, practical consequences on gesture recognition, and potential applications.
Article
Interpolation between pairs of values, typically vectors, is a fundamental operation in many computer graphics applications. In some cases simple linear interpolation yields meaningful results without requiring domain knowledge. However, interpolation between pairs of distributions or pairs of functions often demands more care because features may exhibit translational motion between exemplars. This property is not captured by linear interpolation. This paper develops the use of displacement interpolation for this class of problem, which provides a generic method for interpolating between distributions or functions based on advection instead of blending. The functions can be non-uniformly sampled, high-dimensional, and defined on non-Euclidean manifolds, e.g., spheres and tori. Our method decomposes distributions or functions into sums of radial basis functions (RBFs). We solve a mass transport problem to pair the RBFs and apply partial transport to obtain the interpolated function. We describe practical methods for computing the RBF decomposition and solving the transport problem. We demonstrate the interpolation approach on synthetic examples, BRDFs, color distributions, environment maps, stipple patterns, and value functions.
Article
We present a real-time character control mechanism using a novel neural network architecture called a Phase-Functioned Neural Network. In this network structure, the weights are computed via a cyclic function which uses the phase as an input. Along with the phase, our system takes as input user controls, the previous state of the character, the geometry of the scene, and automatically produces high quality motions that achieve the desired user control. The entire network is trained in an end-to-end fashion on a large dataset composed of locomotion such as walking, running, jumping, and climbing movements fitted into virtual environments. Our system can therefore automatically produce motions where the character adapts to different geometric environments such as walking and running over rough terrain, climbing over large rocks, jumping over obstacles, and crouching under low ceilings. Our network architecture produces higher quality results than time-series autoregressive models such as LSTMs as it deals explicitly with the latent variable of motion relating to the phase. Once trained, our system is also extremely fast and compact, requiring only milliseconds of execution time and a few megabytes of memory, even when trained on gigabytes of motion data. Our work is most appropriate for controlling characters in interactive scenes such as computer games and virtual reality systems.
Conference Paper
Interpolation between pairs of values, typically vectors, is a fundamental operation in many computer graphics applications. In some cases simple linear interpolation yields meaningful results without requiring domain knowledge. However, interpolation between pairs of distributions or pairs of functions often demands more care because features may exhibit translational motion between exemplars. This property is not captured by linear interpolation. This paper develops the use of displacement interpolation for this class of problem, which provides a generic method for interpolating between distributions or functions based on advection instead of blending. The functions can be non-uniformly sampled, high-dimensional, and defined on non-Euclidean manifolds, e.g., spheres and tori. Our method decomposes distributions or functions into sums of radial basis functions (RBFs). We solve a mass transport problem to pair the RBFs and apply partial transport to obtain the interpolated function. We describe practical methods for computing the RBF decomposition and solving the transport problem. We demonstrate the interpolation approach on synthetic examples, BRDFs, color distributions, environment maps, stipple patterns, and value functions.
Article
Editing large-scale crowd animation is a daunting task due to the lack of an efficient manipulation method. This paper presents a novel cage-based editing method for large-scale crowd animation. The cage encloses animated characters and supports convenient space/time manipulation methods that were unachievable with previous approaches. The proposed method is based on a combination of cage-based deformation and as-rigid-as-possible deformation with a set of constraints integrated into the system to produce desired results. Our system allows animators to edit existing crowd animations intuitively with real-time performance while maintaining complex interactions between individual characters. Our examples demonstrate how our cage-based user interfaces mitigate the time and effort for the user to manipulate large crowd animation.
Article
We present a simple, two-step technique for recognizing multi-touch gesture input that is invariant to how users articulate gestures, i.e., by using one or two hands, one or multiple fingers, one or multiple strokes, synchronous or asynchronous stroke input. We introduce, for the first time in the gesture literature, a preprocessing step that is specific to multi-touch gestures (Match-Up) that clusters together similar strokes produced by different fingers, before running a gesture recognizer (Conquer). We report gains in recognition accuracy up to 10% leveraged by our new preprocessing step, which manages to construct a more adequate representation of multi-touch gestures in terms of key strokes. It is our hope that the Match-Up technique will add to the practitioners' toolkit of gesture preprocessing techniques, as a first step toward filling today's lack of algorithmic knowledge to process multi-touch input and leading toward the design of more efficient and accurate recognizers for touch surfaces.
Article
This paper introduces a new crowd formation transform approach to achieve visually pleasing group formation transition and control. Its core idea is to transform crowd formation shapes with a least effort pair assignment using the Kuhn–Munkres algorithm, discover clusters of agent subgroups using affinity propagation and Delaunay triangulation algorithms and apply subgroup-based social force model (SFM) to the agent subgroups to achieve alignment, cohesion and collision avoidance. Meanwhile, mutual information of the dynamic crowd is used to guide agents' movement at runtime. This approach combines both macroscopic (involving least effort position assignment and clustering) and microscopic (involving SFM) controls of the crowd transformation to maximally maintain subgroups' local stability and dynamic collective behaviour, while minimizing the overall effort (i.e. travelling distance) of the agents during the transformation. Through simulation experiments and comparisons, we demonstrate that this approach is efficient and effective to generate visually pleasing and smooth transformations and outperform several existing crowd simulation approaches including reciprocal velocity avoidances, optimal reciprocal collision avoidance and OpenSteer.