Conference PaperPDF Available

# Path Patterns: Analyzing and Comparing Real and Simulated Crowds

Authors:

## Abstract and Figures

Crowd simulation has been an active and important area of research in the field of interactive 3D graphics for several decades. However, only recently has there been an increased focus on evaluating the fidelity of the results with respect to real-world situations. The focus to date has been on analyzing the properties of low-level features such as pedestrian trajectories, or global features such as crowd densities. We propose a new approach based on finding latent Path Patterns in both real and simulated data in order to analyze and compare them. Unsupervised clustering by non-parametric Bayesian inference is used to learn the patterns, which themselves provide a rich visualization of the crowd's behaviour. To this end, we present a new Stochastic Variational Dual Hierarchical Dirichlet Process (SV-DHDP) model. The fidelity of the patterns is then computed with respect to a reference, thus allowing the outputs of different algorithms to be compared with each other and/or with real data accordingly.
Content may be subject to copyright.
Path Patterns: Analyzing and Comparing Real and Simulated Crowds
He WangJan Ondˇ
rejCarol O’Sullivan
Disney Research Los Angeles
Figure 1: (a) A video screenshot from a train station; (b) 1000 tracklets (randomly selected from 19999); (c-g) The ﬁve orientation sub-
domains of the top pattern as location-orientation distributions. Inset shows discretization of the orientation, with black representing zero
velocity.
Abstract
Crowd simulation has been an active and important area of research
in the ﬁeld of interactive 3D graphics for several decades. How-
ever, only recently has there been an increased focus on evaluat-
ing the ﬁdelity of the results with respect to real-world situations.
The focus to date has been on analyzing the properties of low-level
features such as pedestrian trajectories, or global features such as
crowd densities. We propose a new approach based on ﬁnding la-
tent Path Patterns in both real and simulated data in order to analyze
and compare them. Unsupervised clustering by non-parametric
Bayesian inference is used to learn the patterns, which themselves
provide a rich visualization of the crowd’s behaviour. To this end,
we present a new Stochastic Variational Dual Hierarchical Dirich-
let Process (SV-DHDP) model. The ﬁdelity of the patterns is then
computed with respect to a reference, thus allowing the outputs of
different algorithms to be compared with each other and/or with
real data accordingly.
Keywords: Crowd Simulation, Crowd Comparison, Data-Driven,
Clustering, Hierarchical Dirichlet Process, Stochastic Optimization
Concepts: Computing methodologies Motion processing;
Topic modeling; Bayesian network models;
1 Introduction
Although a large variety of crowd simulation methods exist, choos-
ing the best algorithm for speciﬁc scenarios or applications remains
a challenge. Human behavior is very complex and no one algorithm
can be a magic bullet for every situation. Furthermore, different pa-
realcrane@gmail.com (corresponding author)
jan.ondrej@disneyresearch.com
carol.osullivan@tcd.ie
Permission to make digital or hard copies of all or part of this work for per-
sonal or classroom use is granted without fee provided that copies are not
made or distributed for proﬁt or commercial advantage and that copies bear
this notice and the full citation on the ﬁrst page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstract-
ing with credit is permitted. To copy otherwise, or republish, to post on
servers or to redistribute to lists, requires prior speciﬁc permission and/or a
fee. Request permissions from permissions@acm.org. c
2016 Copyright
held by the owner/author(s). Publication rights licensed to ACM.
I3D ’16,, February 27-28, 2016, Redmond, WA, USA
ISBN: 978-1-4503-4043-4/16/03
DOI: http://dx.doi.org/10.1145/2856400.2856410...\$15.00
rameter settings for any given method can give widely varying re-
sults. Subjective user studies can be useful to determine perceived
realism or aesthetic qualities, but more objective methods are of-
ten needed to determine the ﬁdelity and/or predictive power of a
given simulation method with respect to real human behaviors. The
hierarchical and heterogeneous nature of human crowd behaviors
make it very difﬁcult to ﬁnd a deﬁnitive set of evaluation rules or
empirical metrics. Therefore, data-driven evaluation methods are
particularly useful for this purpose.
In this paper, we propose a data-driven approach to crowd evalua-
tion based on exposing the latent patterns of behavior that exist in
both real and simulated data. Previous data-driven methods tend
to focus on comparisons between high-level global features such as
densities and exit rates, or low-level data such as individual trajecto-
ries. In the former case, the results are often too general and do not
reﬂect the heterogeneity of human behaviors, and in the latter case,
the results are too speciﬁc to the exact scenario recorded. Our ap-
proach offers a compromise between these two extremes that takes
both the global and local properties of crowd motion into account
in order to facilitate a comprehensive qualitative and quantitative
analysis of the data.
For a high-level explanation of our approach, we consider the exam-
ple of a large public square with many entrances and exits, such as
the train station shown in Figure 1(a). Pedestrians typically do not
wander randomly, nor do they walk in straight lines; rather, they
self-organize into ﬂows or standing clusters, with each trajectory
consisting of a series of one person’s steps as he moves through the
square (Figure 1(b)). A group of similar trajectories can be thought
of as a trending path that represents the aggregation of multiple
pedestrians’ positions and orientations. Combining all such trend-
ing paths together will generate an overall path pattern that consists
of ﬂows of location-orientation pairs (Figure 1(c-g)). In scenarios
where global path planning does not signiﬁcantly affect behavior,
e.g., walking through a corridor, local inter-personal dynamics can
also lead to different path patterns. The path patterns created are
therefore the result of local/internal dynamics and global/external
factors.
The main contribution of our paper is a new approach to analyz-
ing and comparing crowd data based on discovering latent path
patterns. To automatically extract these patterns from both real
and simulated data, we present a SV-DHDP model that is the ﬁrst
to combine a Dual Hierarchical Dirichlet Process with Stochastic
Variational Inference. The patterns themselves provide a rich visu-
alization of the crowd’s behaviors and can reveal qualitative prop-
erties that would be difﬁcult or impossible to see by simply view-
ing the original data. Furthermore, we propose a quantitative met-
ric that computes the similarity between both real and simulated
datasets. This allows us to analyze the predictive quality of various
simulation algorithms with respect to real data. We demonstrate the
qualitative and quantitative capabilities of our approach on several
real and simulated crowd datasets.
2 Related Work
Crowd motion properties are affected by a hierarchy of factors from
geometric to cognitive [Funge et al. 1999]. To model these myriad
behavioral aspects, methods such as ﬁeld and ﬂow based [Narain
et al. 2009; Treuille et al. 2006], force-based [Helbing and Moln´
ar
1995; Karamouzas et al. 2009], velocity and geometric optimiza-
tion [van den Berg et al. 2008; Pettr´
e et al. 2009; Ondˇ
rej et al.
2010] and data-driven [Lee et al. 2007; Lerner et al. 2009b] have
been proposed. Our aim is to provide an evaluation framework that
imposes no assumptions on the underlying simulation mechanism
and can therefore work on the output data of all such methods.
Qualitative methods for crowd evaluation have been proposed and
include visual comparison [Kim et al. 2013; Lemercier et al. 2012]
and perceptual experiments [McDonnell et al. 2008; Guy et al.
2011; Ennis et al. 2011]. Quantitative methods fall into two main
categories: model-based [Kim et al. 2012; Golas et al. 2013] and
data-driven [Singh et al. 2009; Ju et al. 2010; Kapadia et al. 2011;
Musse et al. 2012]. Data-driven metrics have been proposed that
use the statistics of geometric and dynamic feature analysis [Wolin-
ski et al. 2014], model-based comparisons of motion randomness
[Guy et al. 2012] and decision making processes [Lerner et al.
2009a].
Our data-driven evaluation method is partly inspired by two previ-
ous approaches. Guy et al. [2012] use a dynamic system to model
crowd dynamics and compute an entropy metric based on individ-
ual motion randomness distributions learned from the data. Our
method differs in that we learn global path patterns from groups
of trajectories, rather than individual ones. Charamlambous et.al.
[2014] apply a number of different criteria to detect anomalies in
the data, whereas we focus on discovering mainstream latent pat-
terns.
We also draw inspiration from the ﬁeld of Computer Vision, where
hierarchical Bayesian models [Blei et al. 2003; Teh et al. 2006]
have been successfully employed for scene classiﬁcation [Fei-Fei
and Perona 2005; Sudderth et al. 2007], object recognition [Sivic
et al. 2005], human action detection [Niebles et al. 2008] and video
analysis [Kaufman and Rousseeuw 2005; Zhou et al. 2011; Wang
et al. 2009]. The Hierararchial Dirichlet Process (HDP) has been
successfully used in Natural Language Processing to discover can-
didate topics within corpora. By observing that crowd data can also
be decomposed into a bag of words, Wang et al. [2009] used a Dual
HDP (DHDP) to analyze paths in video data.
There has been extensive research in computer vision and robotics
on crowd analysis and we discuss some representative approaches
here. Zhou et. al [2012] model trajectories as linear dynamic sys-
tems and model starting positions and destinations as beliefs. The
key information, belief, is manually labelled. Although the user
can roughly label these areas, we suspect that a ﬁner classiﬁca-
tion will require more extensive labelling. Furthermore, it is un-
clear how they such areas could be labelled in a highly unstruc-
tured space where every position on the boundary could be both a
starting and an ending area. In our approach, we do not require
manual labels for such beliefs. Ikeda et.al [2013] models paths by
ﬁrst determining sub-goals and then learning transitions between
sub-goals. However, their model of the crowd is solely based on
the social-force model, and sub-goals are deﬁned as points towards
which many velocities converge. There may not be any such sub-
goals (consider ﬂows with no intersections), or there could be too
many. Our method does not make any assumptions about the under-
lying behavior model or the existence of sub-goals. Other methods
based on density or mean-ﬂows [Ali and Shah 2007; Zhong et al.
2015] interpret the whole ﬁeld as one density map or one ﬂow ﬁeld
whereas our method gives a series of weighted patterns.
3 Methodology
3.1 Model Choice
The ﬁrst step towards exposing the latent path patterns in a crowd
data set is to ﬁnd a set of trending paths. Here, a trending path
can be seen as a collection of similar trajectories. However, man-
ually labelling clusters of trajectories would be difﬁcult and time-
consuming as we lack a good distance metric and prior knowledge
of the number of patterns present. Popular unsupervised clustering
algorithms, such as K-means [MacQueen 1967] and Gaussian Mix-
ture Models (GMMs) [Bishop 2007], require a pre-deﬁned clus-
ter number. Hierarchical Agglomerative Clustering [Kaufman and
Rousseeuw 2005] does not require a predeﬁned cluster number, but
the user must decide when to stop merging, which is similarly prob-
lematic. Spectral-based clustering methods [Shi and Malik 2000]
solve this problem, but require the computation of a similarity ma-
trix whose space complexity is O(n2)on the number of trajectories.
Too much memory is needed for large datasets and performance de-
grades quickly with increasing matrix size.
An alternative perspective is to treat a trending path as a dis-
tribution over location-orientation pairs (Figure 2). A group of
trajectories connecting points A and B can be represented by a
trending path modeled by Multinomial distributions over location-
orientation pairs. Note in this representation, a trending path is a
ﬂow sub-ﬁeld rather than a group of 2D curves. Although the trajec-
tories are broken into individual location-orientation observations
in this way, we overcome the randomness of a particular trajectory
and represent such a trajectory group as one trending path. Next, we
ﬁnd all trending paths under the assumption that: if a trending path
exists, there should be repeated location-orientation occurrences on
this path. Then the problem is transformed to computing a (poten-
tially inﬁnite) number of Multinomial distributions. We present a
non-parametric hierarchical Bayesian model that can automatically
compute a desirable number of such Multinomial distributions from
the data. Thus, it does not require a pre-deﬁned cluster number and
its space complexity is smaller than O(n2).
Figure 2: Two sets of trajectories (a, c) and their corresponding
trending paths modeled by Multinomials (b, d). Color coding rep-
resents different orientation sub-domains (cf. Figure 1)
We ﬁrst deﬁne the terminologies in Table 1. Our SV-DHDP model
employs a Dual Hierarchical Dirichlet Process, similar to that pre-
sented in [Wang et al. 2009], for pattern analysis, but we combine it
with Stochastic Variational Inference (SVI) for posterior estimation
that results in better performance on large datasets. In a standard
hierarchical Bayesian setting, a tree is constructed in an attempt to
explain a set of observations through a hierarchy of factors. In our
problem, the observations are agent states, which we divide into
Figure 3: SV-DHDP model. DP is Dirichlet Process. wdn is the
nth agent state in data segment d. K is the total number of patterns.
vkis the weight of the kth pattern. βkis the kth pattern. Arrows
indicate dependencies.
Terms Notation Meaning
Agent State w w={p, v}where p and v are the
position and orientation of an agent
State Space SThe set of all possible states. S=
{wi}
Path PA probability distribution over S.
P(s)
Path Pattern βA mixture of paths.
Table 1: Terminology and Parameters
equal-length data segments along the time domain. Our goal is to
ﬁnd a set of path patterns {βk}that, when combined with their re-
spective weights, best describe all the segments in terms of their
likelihoods. Such a tree structure is shown in Figure 3. This is
a simpliﬁed ﬁgure of a three-layer Bayesian hierarchy explaining
how the observations wdncan be explained by all possible patterns
βkwith weights vk. For the sake of conciseness, the full detailed
version of the model is provided in the supplementary material. The
overall goal here is to estimate βks and vks given wdn, which is the
posterior distribution of this model p({βk},{vk} | wdn). We ex-
plain the posterior estimation in Section 4.
Figure 4: Illustrative example with 100 data segments each with
accumulative 50 positions: (top left) 10 ground truth path patterns;
(right) example data; (bottom left) The top 16 path patterns learned
3.2 An Illustrative Example
After initial experiments using our model, we ﬁnd that although
many trending paths can be found in a dataset, only a subset of
them are needed to describe a data segment (i.e., a time slice of the
dataset). Furthermore, different subsets of the path patterns exist in
different data segments. We use a simple example to illustrate this
principle.
Consider again the case of a public square, simpliﬁed as a 5×5
grid environment. Imagine that there are only 10 possible paths
that people will take, illustrated as horizontal and vertical bars (Fig-
ure 4 top left). Note that in this simple case each path represents a
single ground truth path pattern, whereas in more complex scenes
such as those presented later in this paper, a particular path usually
co-occurs with different ones. For the sake of clarity, we also only
cluster positions. We synthesize a dataset representing the activ-
ity in the square by randomly combining several ground truth path
patterns and performing random sampling to generate 100 data seg-
ments, each consisting of 50 accumulated positions (e.g., Figure 4
right). Each data segment is a density map of positions (the darker
the cell, the higher the density) and mimics an observation of the
square over some time interval. We can observe the phenomenon
that each segment can be described by a subset of path patterns. Ap-
plying our model, we learn all the latent path patterns from our syn-
thetic dataset and Figure 4 (bottom left) shows the top 16 found. As
we can see, the top 10 match our ground truth patterns. Although
additional patterns are learned, they are less prominent (smaller in-
tensities) and have much smaller weights, so they are ranked lower.
4 Posterior Estimation and Similarity Metric
As discussed in Section 3.1, the novelty of our SV-DHDP model is
the way we compute the posterior. There are two approaches com-
monly used for this purpose: sampling and variational inference.
Sampling methods provide good model ﬁtness on relatively small
datasets. But the proof of convergence is still open and they have
other shortcomings [Teh et al. 2008]. We therefore use a Stochastic
Variational Inference (SVI) method, which is much faster on large
datasets (such as crowd behaviors observed over time).
For a standard two-layer HDP model, many methods have been de-
veloped [Teh et al. 2008; Hoffman et al. 2013; Wang et al. 2011].
Our SVI technique is similar to that recently proposed in [Hoff-
man et al. 2013], except that their model is a simple two-layer
HDP model whereas ours has an additional DDP layer. This ex-
tension is non-trivial and involves much more than merely adding
one more DP layer to a two-layer HDP model. To our knowledge,
this is the ﬁrst attempt to apply variational inference on this type
of model. Please refer to the supplementary materials for detailed
math derivations and algorithms.
4.1 Model Fitness
By dividing a dataset into training data Ctrain and a test data seg-
ment Ctest, we can evaluate the model ﬁtness by the predictive
likelihood of Ctest. We further divide Ctest into two sets of sam-
ples: observed wobs
i, and held-out who
i. We also keep the unique
state sets of the two sets disjoint. We ﬁrst use Ctrain to train our
model to compute the approximate posterior, and then use wobs
i
and the approximate posterior to ﬁne tune the top-level path pattern
weights. Finally, we compute the log likelihood of who
i. This met-
ric gives a good predictive distribution and avoids comparing pa-
rameter bounds. Similar metrics are used in [Hoffman et al. 2013;
Teh et al. 2008; Wang et al. 2011] for evaluating model ﬁtness. It is
computed by:
p(who
j|Ctrain, w obs
j)
=Z Z (
K
X
k=1
vkβk,who
j)p(v|Ctrain, β )p(β|Ctrain)dvdβ
Z Z (
K
X
k=1
vkβk,who
j)q(v)q(β)dvdβ
=
K
X
k=1
Eq[vk]Eq[βk,who
j]
(1)
where Kis the truncation number at the top level and βk,who
jis the
probability of the state who
jin path pattern βk.qis the variational
distribution. For a testing data segment, per-state log likelihood:
Qjp(who
j|Ctrain, w obs
j)is computed. When training the model,
we plot per-state log likelihood and stop the optimization when it
becomes stable.
4.2 Inference Based Similarity Metric
In addition to extracting path patterns, we would also like to pro-
pose a metric for measuring similarities between datasets, so that
a quantitative similarity can be computed for simulation v.s sim-
ulation, real data vs. simulation or even real data vs. real data.
Since our model can compute path patterns for two datasets, a
naive approach is to use some commonly used metric such as KL-
divergence or even plain Euclidean distance between pairs of pat-
terns. However, we can easily end up with two sets of different
sizes. And comparing two sets of probabilistic distributions is not
a well-deﬁned problem. Another seemingly good idea would be to
only compare the top n patterns from both pattern sets. However, it
is unfair because the patterns are weighted differently within their
sets. And the choice of n is unclear. A more elegant metric is
needed to compare two datasets.
We know that to evaluate model ﬁtness on dataset A, we would use
a test data segment from A. This model ﬁtness also implies that
if dataset B has similar path patterns to A, then the data from B
should also give a good likelihood in Equation 1. In this way, we
can compute per-state predictive likelihood of B given A:
lik(B|A) = p(who |A, wobs ).(2)
Here we replace Ctrain in Equation 1 with A. Both the observed
data wobs and the held-out data who are from B instead of A. This
metric resolves the two concerns mentioned above.
In addition, since our patterns are Multinomials, it is always pos-
sible to do pair wise comparison such as KL-divergence and Root
Mean Squared Error if needed.
5 Path Pattern Abstraction
To show the generality and robustness of our method, we apply it to
both simulated datasets as well as real data from various scenarios
with different features on different noise levels. We also compare
our methods with existing approaches and discuss performance. All
our patterns are color-coded in the various ﬁgures, with different
colors represent orientation as in Figure 1 where color intensities
show probabilities.
5.1 Simulation Datasets
Real data exhibits both global and local features, caused by the
fact that pedestrians tend to plan their paths through an environ-
ment based on external factors such as entrances, exits and personal
goals, but they are often deﬂected from their paths due to the neces-
sity to avoid members of a crowd. In simulations, different types
of simulation algorithms are used to model local steering behaviors
and global path planning strategies. We explore the effects of these
algorithms separately by ﬁrst varying local steering methods while
minimizing the impact of the global path planning. Then we ﬁx the
steering behavior and vary the global path planning strategies.
5.1.1 Local Steering
We choose four steering algorithms that are representative of com-
monly used methods: MOU09 [Moussa¨
ıd et al. 2009] is a recent
version of Helbing’s social force model; PETT09 [Pettr´
e et al.
2009] is a velocity obstacle method based approach similar to RVO;
ONDREJ10 [Ondˇ
rej et al. 2010] uses bearing angle to avoid colli-
sions; and PARIS07 [Paris et al. 2007] solves steering in velocity
space. Many other methods exist, such as potential ﬁelds, ﬂuid
based, hybrids, foot-step planning, but our goal is not to analyze ev-
ery possible approach, but to demonstrate how our method can cap-
ture the differences produced by different reactive steering meth-
ods.
We set up a bi-directional ﬂow experiment to show our analysis for
local steering behaviors. Two rectangular areas are placed at the
top and bottom of a scene (Figure 5) and two groups of agents are
created. For one group, agents are randomly generated in one area
with randomly selected destinations in the other area, thus avoiding
any complex global path planning. For the other agent group, we
switch the generation and destination areas. This forces both agent
groups to use steering behaviors in order to avoid the others. Each
simulation lasts for around 25 minutes and involves 20000 agents.
Figure 5: Top path patterns from the data created by four repre-
sentative local steering algorithms
Snapshots of the simulation data can be found in the supplemental
material. Figure 5 shows the top path patterns computed. Intu-
itively, we can see that PARIS07 does not give prominent patterns
meaning the crowd is spreading out all the time. ONDREJ10 tends
to give stable ﬂows compared to other methods. And PETT09 and
MOU09 are in the middle because their patterns are slightly more
concentrated than PARIS07, but less so than ONDREJ10. PARIS07
looks more similar to PETT09 and MOU09 than ONDREJ10. This
visualization thus facilitates a qualitative understanding of the be-
haviours generated using different local steering mechanisms. Later
Figure 6: Trajectories created by three global path planning algo-
rithms: Navmesh, Roadmap and PoField
Figure 7: Top path patterns from the data created by three global
path planning algorithms: Navmesh, Roadmap and PoField
we will see how we can also quantitatively compare them with each
other.
5.1.2 Global Path Planning
In this experiment, we ﬁx our local steering model [van den Berg
et al. 2011] and vary the global path planning methods to test our
analysis model. The environment is a square with several obstacles
in the middle. We set up a generation area at the top and desti-
nation area at the bottom. Also, we recycle 64 agents over and
over again to generate 400 second data. Three global path plan-
ning methods, Navigation mesh [Snook 2000], Roadmap [Latombe
1991] and Potential Field [Khatib 1985] are used here, referred as
Navmesh, Roadmap and PoField. Trajectories are generated using
Menge [Curtis et al. 2014] (Figure 6) and the top patterns found are
shown in (Figure 7).
The top patterns of all three methods are down-going ﬂows as ex-
pected but they spread out within the environment in slightly dif-
ferent ways. In addition to these high probability patterns, other
patterns are also learned and we do ﬁnd other colors, albeit with
much smaller weights. These patterns occur when agents get com-
pletely blocked so they start to walk in other directions to ﬁnd their
way out.
5.2 Real Datasets
In addition to simulation data, we also show experimental results
computed on two real datasets. The ﬁrst dataset is a 6 minute video
clip of 967 pedestrians in a park recorded by a mid-distance camera
in a park. We manually annotate the trajectories so that we have
relatively complete trajectories with very little noise.
The second dataset consists of 19999 tracklets recorded in New
York Grand Central Terminal by a far distance camera [Zhong
et al. 2015] (downloaded from http://www.ee.cuhk.edu.hk/ xg-
wang/grandcentral.html). The trajectories are computed based on
moving pixels and contain only partial and noisy tracklets, thus
demonstrating the robustness of our method.
Figure 8: Park dataset: (a) Projected trajectories, (b) Annotated
trajectories overlaid on a frame from the video. Red dots are cam-
eras.
5.2.1 Park
All trajectories and some data segments are shown in Figure 8. To
train the model, the truncation numbers for J, I, L and K (param-
eters explained in the supplementary material) are set to 10, 15, 4
and 20 respectively. The training took 0.58 hours and Figure 9 Ref-
erence shows some high-weight patterns. There are several major
ﬂows learned from the data. One is the ﬂow going from 3 to 2 (Ref-
erences b and j). They mainly differ in whether they go through the
narrow corridor along the bottom or not. Another up-going ﬂow is
Reference f from 3 to 1. The major down-going ﬂows are from 2 to
3 (magenta and yellow). These paths are also observed in the data.
5.2.2 Train Station
The whole area is a square with each dimension approximately 50m
long. We discretize the domain into 1×1 meter grids and set J, I,
L and K (parameters explained in the supplementary material) to
10, 15, 3 and 20. The training took 1.83 hours. Some patterns are
shown in Figure 1.
In Figure 1, eis the major up-going ﬂow and dis the major down-
going ﬂow. Both are observed in the data. The right-going ﬂow
shown in Figure 1 cis another major ﬂow observed in the data.
Interestingly, the left-going ﬂow pattern (yellow) is not very promi-
nent. After looking at the data, we found that since it shares bound-
aries with green and magenta, some of the left-going ﬂows are cap-
tured in Figure 1 dand einstead.
5.3 Comparison with previous approaches
We empirically compare our SVI method with Gibbs sampling in
[Wang et al. 2009] on the train station dataset. Due to the nature
and stochasticity of these two methods, it is hard to compare them
in one standard setting. So we run our method until it converges,
then run Gibbs sampling for various times to compare the results.
We ﬁrst run the sampling for 1.95 hours and show the results in
Figure 10. We made our best effort to ﬁnd informative and sim-
ilar patterns in the results. Compared to the patterns in Figure 1,
we cannot ﬁnd any pattern that are as informative. Another inter-
esting difference is that patterns shown in Figure 10 are in general
more concentrated into individual grids (reﬂected by their intensi-
ties compared to the ones in Figure 1), and do not fully cover the
areas of the paths. We believe this is due to every state sample be-
ing given only one pattern label in sampling while in SVI each state
sample has a distribution over all patterns. Also, after 1.95 hours, a
total number of 198 patterns are learned and the number continues
to go up to 735 after running for 40 hours, clearly showing it is not
converged yet. More patterns are available in the supplementary
material.
We also compared the performance of SVI with sampling. SVI
is faster mainly because in every iteration, it uses a batch number
that is usually much smaller than the number of data segments. In
Figure 9: Top 3 patterns for the Park dataset that cover more than
90% of weights. Each pattern is shown for 4 directions in a group (4
rows – Blue is omitted because no signiﬁcant pattern found for that
direction). Column 1: Top patterns from real dataset. Columns 2-5:
Top patterns from four simulated datasets. Similarity scores using
the real data as reference are shown in the brackets next to the name
of the method. They are log likelihoods. The larger (closer to 0) the
better. At the bottom of each group, weights for corresponding pat-
terns are given. The percentages are computed by KL-divergence
between a reference pattern and a simulation pattern, then nor-
malized to 0-100. Intensity represent probability. The higher the
intensity, the higher the probability.
Figure 10: The top pattern in different velocity domains learned by
sampling. More patterns are available in the supplementary mate-
rial.
Figure 11: Model ﬁtness plot with iterations. a: Train Station. b:
ONDREJ10 in Section 5.1.1.
contrast, sampling uses all of them. Figure 11 shows how quickly
our SV-DHDP converges. We show plots for two examples. For
both synthetic and real data, our model converges at between 20 to
60 iterations.
6 Similarity Analysis
In this section, we show that how our similarity metric can be used
to provide meaningful comparisons between real reference data and
simulation data.
We used the four models in Section 5.1.1 in combination with
one global path planning method [2004] to simulate the crowds
in the park and the train station. We modelled the environ-
ments by observing the videos carefully, then randomly generating
agents within the entrance areas and randomly selecting destina-
tions within the exit areas. All similarities are computed using the
real dataset as the reference. Snapshots of data segments for both
experiments are shown in the supplementary material. The simi-
larity scores for the park and train station simulations are shown in
Figure 9 and Figure 12. Some top patterns are shown in Figure 9
column 2-5 and Figure 12 column 2-5.
First we emphasize that the similarities presented here are not de-
signed to provide any kind of conclusive statement of which simu-
lator is the best. Path patterns are affected by many factors and we
did not exhaustively try all combinations of all parameter settings.
For instance, it is difﬁcult to accurately calibrate parameters includ-
ing accurate entrances and exits, timing of arrival, the proportions
of population in different ﬂows and so on. After ﬁrst looking at the
computed patterns and scores we adjusted the entrances and exits
more carefully to ensure the best performance possible for all al-
gorithms and we speculate that the simulations could be even more
improved by adjusting timing and population density. This also
demonstrated how our metric can help to design simulations, be-
cause we can identify the key elements to adjust by looking at the
visual patterns.
To make good use of our metric for simulation, we suggest two
ways to interpret the patterns, by using Equation 2 and by comput-
ing KL-divergence between pairs of patterns to help in interpreting
the visual data.
Equation 2 shows the average likelihood of the testing data. There
are several major factors affecting the score. Firstly, the global
Figure 12: Train Station patterns for real and simulated datasets.
The layout and scores are computed in the same way as in Figure 9.
path planning has a great inﬂuence. One example is Figure 12
(a):Reference where a wide ﬂow can be seen going from the bot-
tom to the right. In the simulation, only PETT09 roughly captures
it which contributes to its score. In addition, the relative numbers
of agents on each path pattern also inﬂuences the similarity. Fig-
ure 12 (a):PARIS07 has several ﬂows that are not seen in the real
data pattern. After watching the video, we found that there are only
a few people walking on these paths but the simulation assigned a
large number of agents to them, thus contributing to the low score of
PARIS07. Next, in Figure 12, all simulations other than PARIS07
tend to form narrower paths than the real data, whereas in the park
simulation, some of them are wider than the real data such as Fig-
ure 9 (b):ONDREJ10. Some of them are about the same width
such as Figure 9 (f):MOU09 and some of them are too narrow such
as Figure 9 (b):MOU09. The path width is affected by the simu-
lation method itself as well as the number of agents on that path
too. Finally, when it comes down to a single path, some models
tend to form prominent patterns more than others, as seen in the
bi-directional ﬂow example. This also contributes to the scores.
In addition, the weights are used for two purposes: analysis and
comparison. Within a single dataset, the weights reﬂect the rela-
tive likelihoods of each path pattern. For instance, the likelihood of
observing an agent on Figure 9(a):Reference is more than twice as
that on Figure 9(e):Reference, indicated by their respective weights
v0= 0.57 and v1= 0.21. For comparison, the weights are also con-
sidered by Equation 2.
Aside from Equation 2, the user might want simply focus on some
pattern similarities. This can be computed by KL-divergence be-
tween pairs of patterns. In both Figure 9 and Figure 12, each pat-
tern is given a score comparing itself with the corresponding pattern
in the reference. We normalize the values to 0-100, where bigger is
better. The results of this metric may sometimes seem contradictory
with the previous one because the focus is different. For instance,
in Figure 9, ONDREJ10 has the lowest similarity score. But for
KL-divergence similarity, its ﬁrst three patterns outperforms other
datasets. This means we were able to reproduce some major ﬂows
faithfully in the reference data by ONDREJ10, but it does not do
well on capturing the other sub-dominant ﬂows. However, if the
user just wants to reproduce the major ﬂows, then ONDREJ10 is
going to be a good option in this case. PETT09 and MOU09 also
capture good ﬂows in the second group. So they might be the choice
if those ﬂows are to be reproduced. For the KL-divergence metric,
the weights are less meaningful because it can be applied on any
pair of patterns from different datasets depending on the applica-
tion.
Overall, the two metrics here focus on different aspects of the data.
The similarity score gives overall performance, which is the per-
state likelihood. The KL-divergence similarity emphasizes more
on visual similarities. Together, they provide enriched information
for different use cases.
7 Conclusions and Future Work
We propose a new perspective for comparing crowd data. We
present a non-parametric hierarchical Bayesian model to automati-
cally extract a desirable set of patterns. Also, we propose a similar-
ity metric for comparison.
Our metric is environment-based. The main reason is the reference
data is almost always affected by the environment in real-world
applications. When only local ﬂow patterns are needed, our met-
ric still works well as shown in the bidirectional ﬂow example. A
global shift of ﬂows will give low scores but we argue that they can
always be aligned if a rotation/translation invariant comparison is
needed.
Our method has some inherent limitations. Firstly, our method does
not directly measure individual trajectories thus does not reﬂect in-
dividual visual similarities. Our patterns are reﬂections of informa-
tion on a higher level than individual trajectories. Secondly, it does
not capture temporal information such as changes of patterns over
time. Lastly, our truncation-based stochastic variational inference
is sensitive to the initialization even if the stochasticity in gradient
helps to some extent. In our experiments, we did grid search to ﬁnd
out good initializations.
One future direction will be an extension of the current model into
a dynamic model. Currently, all data are considered at once. But
in real situations, the path patterns and their respective weights
can change over time. To capture this effect, a dynamic model
is needed. Another direction is introducing pattern merge and
delete during optimization to ﬁnd better solutions. To use our met-
ric to guide simulation more automatically, we could use patterns
as guiding ﬂows for crowd simulation to improve the scores by
methods such as [Berseth et al. 2014]. Currently, we only capture
ﬂows;although individual trajectories may also inﬂuence perceptual
realism. A good direction is to try to capture information on both
levels. Finally, we would like to add social activity and environ-
mental information such as talking and pouring a cup of coffee so
that it becomes a behavior pattern model. We believe it will further
help simulating realistic crowds with diverse behaviors.
References
ALI , S., AN D SHAH , M . 2007. A Lagrangian Particle Dynamics
Approach for Crowd Flow Segmentation and Stability Analysis.
In IEEE Conference on Computer Vision and Pattern Recogni-
tion, 2007. CVPR ’07, 1–6.
BER SE TH , G., KAPA DI A, M ., HAWO RTH, B., AND FA LO UT SOS ,
P. 2014. SteerFit: Automated Parameter Fitting for Steering Al-
gorithms. In Proceedings of the ACM SIGGRAPH/Eurographics
Symposium on Computer Animation, Eurographics Association,
Aire-la-Ville, Switzerland, Switzerland, SCA ’14, 113–122.
BISHOP, C. 2007. Pattern Recognition and Machine Learning.
Springer, New York.
BLE I, D. M., NG, A. Y., AN D JOR DA N, M. I. 2003. Latent
Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022.
CHARALAMBOUS, P., KARAMOUZAS, I., GUY, S . J., A ND
CHRYSANTHOU, Y. 2014. A Data-Driven Framework for Visual
Crowd Analysis. Comp. Graph. Forum 33, 7, 41–50.
CURTIS, S., BES T, A., AN D MANOCHA, D. 2014. Menge: A
modular framework for simulating crowd movement. University
of North Carolina at Chapel Hill, Tech. Rep.
ENN IS , C. , PET ER S, C ., AND O’SULLIVAN, C . 2011. Perceptual
Effects of Scene Context and Viewpoint for Virtual Pedestrian
Crowds. ACM Trans. Appl. Percept. 8, 2, 10:1–10:22.
FEI -FEI , L., AND PE RONA, P. 2005. A Bayesian hierarchical
model for learning natural scene categories. In IEEE CVPR
2005, 524–531.
FUN GE , J. , TU, X., AN D TERZOPOULOS, D. 1999. Cognitive
Modeling: Knowledge, Reasoning and Planning for Intelligent
Characters. In SIGGRAPH’99, ACM Press/Addison-Wesley
Publishing Co., 29–38.
GOL AS , A. , NARAIN, R., A ND LIN , M . 2013. Hybrid Long-range
Collision Avoidance for Crowd Simulation. In I3D 2013, 29–36.
GUY, S. J., KI M, S ., LI N, M. C., AND MANOCHA, D. 2011.
Simulating Heterogeneous Crowd Behaviors Using Personality
Trait Theory. In SCA 2011, 43–52.
GUY, S. J., VAN D EN BE RG, J., LIU , W., LAU, R ., LI N, M . C.,
AN D MANOCHA, D. 2012. A Statistical Similarity Measure for
Aggregate Crowd Dynamics. ACM Trans. Graph. 31, 6, 190:1–
190:11.
HEL BI NG , D., AND MO LN ´
AR , P. 1995. Social force model for
pedestrian dynamics. Phys. Rev. E 51, 5, 4282–4286.
HOFF MA N, M . D., BL E I, D. M., WANG , C. , AN D PAI SL EY, J.
2013. Stochastic Variational Inference. J. Mach. Learn. Res. 14,
1, 1303–1347.
IKE DA, T., CHIGODO, Y., REA , D. , ZANLUNGO, F., SHIOMI,
M., AN D KAN DA, T. 2013. Modeling and prediction of pedes-
trian behavior based on the sub-goal concept. Robotics, 137.
JU, E., CH OI , M. G., PARK, M., LEE , J. , LEE , K. H. , AND
TAKAHASHI, S. 2010. Morphable Crowds. ACM Trans. Graph.
29, 6, 140:1–140:10.
KAPAD IA, M., WAN G, M ., SI NG H, S ., REINMAN, G., AND
FALO U TS OS , P. 2011. Scenario Space: Characterizing Cov-
erage, Quality, and Failure of Steering Algorithms. In Proceed-
ings of the 2011 ACM SIGGRAPH/Eurographics Symposium on
Computer Animation, ACM, New York, NY, USA, SCA ’11, 53–
62.
KARAMOUZAS, I., HE IL , P., BE EK, P. V., AND OVERMARS,
M. H. 2009. A Predictive Collision Avoidance Model for Pedes-
trian Simulation. In Motion in Games, 41–52.
KAUF MA N, L ., AND ROU SS EEU W, P. J. 2005. Finding Groups in
Data: An Introduction to Cluster Analysis. Wiley-Interscience.
KHATIB, O. 1985. Real-time obstacle avoidance for manipulators
and mobile robots. In 1985 IEEE International Conference on
Robotics and Automation. Proceedings, vol. 2, 500–505.
KIM , S., GU Y, S. J. , MANOCHA, D., AN D LIN, M. C. 2012. In-
teractive Simulation of Dynamic Crowd Behaviors Using Gen-
eral Adaptation Syndrome Theory. In I3D 2012, 55–62.
KIM , S., GU Y, S. J., A ND MANOCHA, D . 2013. Velocity-based
Modeling of Physical Interactions in Multi-agent Simulations. In
SCA 2013, 125–133.
LAMARCHE, F., AND DONIKIAN, S . 2004. Crowd of virtual hu-
mans: a new approach for real time navigation in complex and
structured environments. Computer Graph. Forum 23, 3, 509–
518.
LATOMBE , J. -C. 1991. Robot Motion Planning. Kluwer Academic
Publishers, Norwell, MA, USA.
LEE , K. H., C HOI , M. G ., H ONG, Q., A ND LE E, J. 2007. Group
Behavior from Video: A Data-driven Approach to Crowd Simu-
lation. In SCA 2007, Eurographics Association, 109–118.
LEM ER CI ER, S., J ELI C, A., K ULPA, R ., HUA , J., FEH REN BAC H,
J., DEGOND, P., APP ERT-ROL LAN D, C ., DONIKIAN, S., AND
PET TR ´
E, J . 2012. Realistic Following Behaviors for Crowd
Simulation. Comp. Graph. Forum 31, 2, 489–498.
LER NE R, A ., CHRYSANTHOU, Y., SHA MIR , A., AN D COHE N-
OR, D. 2009. Data Driven Evaluation of Crowds. In Motion in
Games. 75–83.
LER NE R, A ., FITUSI, E., C HRYSANTHOU, Y. , AND COHE N-OR,
D. 2009. Fitting Behaviors to Pedestrian Simulations. In SCA
2009, 199–208.
MACQU EE N, J . 1967. Some methods for classiﬁcation and anal-
ysis of multivariate observations. In Berkeley Symp. on Math.
Statist. and Prob., 281–297.
MCDON NE LL , R., LARKIN, M. , DOBBYN, S., CO LL INS , S. ,
AN D O’SULLIVAN, C. 2008. Clone Attack! Perception of
Crowd Variety. ACM Trans. Graph. 27, 3, 26:1–26:8.
MOU SS A¨
ID , M., HE LBI NG , D., GARNIER, S., JOHANSSON, A .,
COMBE, M., AN D THE RAUL AZ , G. 2009. Experimental study
of the behavioural mechanisms underlying self-organization in
human crowds. Proc. Biol. Sci. 276, 1668, 2755–2762.
MUS SE , S. R ., CA SS OL , V. J., A ND JUN G, C. R. 2012. Towards
a Quantitative Approach for Comparing Crowds. Comp. Anim.
Virt. Worlds 23, 1, 49–57.
NARAIN, R., GO LA S, A., CURTIS, S. , AND LI N, M . C. 2009.
Aggregate Dynamics for Dense Crowd Simulation. ACM Trans.
Graph. 28, 5, 122:1–122:8.
NIE BL ES , J. C., WANG , H., AN D FEI-FE I, L. 2008. Unsupervised
Learning of Human Action Categories Using Spatial-Temporal
Words. Int. J. Comp. Vision 79, 3, 299–318.
OND ˇ
RE J, J., P E TTR ´
E, J., OLIVIER, A.-H., A ND DONIKIAN, S .
2010. A Synthetic-vision Based Steering Approach for Crowd
Simulation. ACM Trans. Graph. 29, 4, 123:1–123:9.
PARI S, S ., PE TTR ´
E, J., AN D DONIKIAN, S . 2007. Pedestrian Re-
active Navigation for Crowd Simulation: a Predictive Approach.
Comp. Graph. Forum 26, 3, 665–674.
PET TR ´
E, J., ON D ˇ
RE J, J., O LIVIER, A.-H., CRET UAL, A., A ND
DONIKIAN, S. 2009. Experiment-based Modeling, Simulation
and Validation of Interactions Between Virtual Walkers. In SCA
2009, ACM, 189–198.
SHI , J., AND MAL IK, J. 2000. Normalized Cuts and Image Seg-
mentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8, 888–905.
SIN GH , S. , KAPAD I A, M., FA LO UTS OS , P., AN D REINMAN, G.
2009. SteerBench: a benchmark suite for evaluating steering
behaviors. Comp. Anim. Virtual Worlds 20, 5-6, 533–548.
SIVIC, J., RUS SEL L, B. C. , EFRO S, A. A., ZISSERMAN, A. , AND
FREEMAN, W. T. 2005. Discovering object categories in image
collections. ICCV 2005.
SNO OK , G . 2000. Simpliﬁed 3d Movement and Pathﬁnding Using
Navigation Meshes. In Game Programming Gems, M. DeLoura,
Ed. Charles River Media, 288–304.
SUDDERTH, E. B., TORRALBA, A., FREEMAN, W. T., A ND
WIL LS KY, A. S . 2007. Describing Visual Scenes Using Trans-
formed Objects and Parts. Int J Comput Vis 77, 1-3, 291–330.
TEH , Y. W., JOR DA N, M . I., BE AL , M . J., AND BLE I, D. M.
2006. Hierarchical Dirichlet Processes. J. Am. Stat. Assoc. 101,
476, 1566–1581.
TEH , Y. W., KURIHARA, K ., AN D WEL LI NG, M. 2008. Collapsed
Variational Inference for HDP. In NIPS 2008.
TREUILLE, A., CO OP ER, S., A ND POP OVI ´
C, Z. 2006. Continuum
Crowds. ACM Trans. Graph. 25, 3, 1160–1168.
VAN DEN BER G, J ., LI N, M., A ND MANOCHA, D. 2008. Re-
ciprocal Velocity Obstacles for real-time multi-agent navigation.
In IEEE International Conference on Robotics and Automation,
2008. ICRA 2008, 1928–1935.
VAN DEN BER G, J., G U Y, S. J., L IN, M., AND MANOCHA, D.
2011. Reciprocal n-Body Collision Avoidance. In Robotics Re-
search, C. Pradalier, R. Siegwart, and G. Hirzinger, Eds., no. 70
in Springer Tracts in Advanced Robotics. Springer Berlin Hei-
delberg, 3–19. DOI: 10.1007/978-3-642-19457-3 1.
WANG, X., MA, X ., AN D GRIMSON, W. 2009. Unsupervised
Activity Perception in Crowded and Complicated Scenes Using
Hierarchical Bayesian Models. IEEE Trans. Patt. Anal. Machine
Intel. 31, 3, 539–555.
WANG, C., PAISL EY, J., AN D BLE I, D . M. 2011. Online vari-
ational inference for the hierarchical Dirichlet process. In AIS-
TATS.
WOLINSKI, D., GU Y, S. J., OLIVIER, A.-H., LIN , M. C. ,
MANOCHA, D., AND PET TR ´
E, J. 2014. Parameter estimation
and comparative evaluation of crowd simulations. Comp. Graph.
Forum 33, 2, 303–312.
ZHO NG , J. , CAI , W., LU O , L., AN D YIN, H. 2015. Learning
Behavior Patterns from Video: A Data-driven Framework for
Agent-based Crowd Modeling. In Autonomous Agents and Mul-
tiagent Systems, 801–809.
ZHO U, B., WANG, X. , AND TAN G, X . 2011. Random ﬁeld
topic model for semantic region analysis in crowded scenes from
tracklets. In IEEE CVPR 2011, 3441–3448.
ZHO U, B., WAN G, X., AND TANG , X . 2012. Understanding col-
lective crowd behaviors: Learning a mixture model of dynamic
pedestrian-agents. In Computer Vision and Pattern Recognition
(CVPR), 2012 IEEE Conference on, IEEE, 2871–2878.

## Supplementary resources (2)

... Dynamics Stochasticity α(t, q t:t−M ). Trajectory prediction needs to explicitly model the motion randomness caused by intrinsic motion stochasticity and observational noises [63,64]. We employ a more general setting by assuming the noise distribution can have arbitrary shapes and is also time varying, unlike previous formulations such as white noise [19] which is too restrictive. ...
Preprint
Full-text available
Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored. The former include rule-based, geometric or optimization-based models, and the latter are mainly comprised of deep learning approaches. In this paper, we propose a new method combining both methodologies based on a new Neural Differential Equation model. Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters. The explicit physics model serves as a strong inductive bias in modeling pedestrian behaviors, while the rest of the network provides a strong data-fitting capability in terms of system parameter estimation and dynamics stochasticity modeling. We compare NSP with 15 recent deep learning methods on 6 datasets and improve the state-of-the-art performance by 5.56%-70%. Besides, we show that NSP has better generalizability in predicting plausible trajectories in drastically different scenarios where the density is 2-5 times as high as the testing data. Finally, we show that the physics model in NSP can provide plausible explanations for pedestrian behaviors, as opposed to black-box deep learning. Code is available: https://github.com/realcrane/Human-Trajectory-Prediction-via-Neural-Social-Physics.
... Perceptual models have been further applied to improving virtual simulations [13], character animations [8], [14], human body modeling [15], fluid simulations [16], [17], and crowd simulations [18], [19]. High dynamic range imaging and tone mapping benefits from models of human light adaptation [20], [21], color to grey conversions simulate human color sensitivity [22], [23]. ...
Article
Terrains are visually prominent and commonly needed objects in many computer graphics applications. While there are many algorithms for synthetic terrain generation, it is rather difficult to assess the realism of a generated output. This paper presents a first step towards the direction of perceptual evaluation for terrain models. We gathered and categorized several classes of real terrains, and we generated synthetic terrain models using computer graphics methods. The terrain geometries were rendered by using the same texturing, lighting, and camera position. Two studies on these image sets were conducted, ranking the terrains perceptually, and showing that the synthetic terrains are perceived as lacking realism compared to the real ones. We provide insight into the features that affect the perceived realism by a quantitative evaluation based on localized geomorphology-based landform features (geomorphons) that categorize terrain structures such as valleys, ridges, hollows, etc. We show that the presence or absence of certain features has a significant perceptual effect. The importance and presence of the terrain features were confirmed by using a generative deep neural network that transferred the features between the geometric models of the real terrains and the synthetic ones. The feature transfer was followed by another perceptual experiment that further showed their importance and effect on perceived realism. We then introduce Perceived Terrain Realism Metrics (PTRM) that estimates human perceived realism of a terrain represented as a digital elevation map by relating distribution of terrain features with their perceived realism. This metric can be used on a synthetic terrain, and it will output an estimated level of perceived realism. We validated the proposed metrics on real and synthetic data and compared them to the perceptual studies.
... A growing research topic lies in measuring the 'realism' of a simulation, by measuring the similarity between two fragments of (real or simulated) crowd motion [WOO16]. ...
Thesis
Our lives are becoming increasingly influenced by robots. They are no longer limited to working in factories and increasingly appear in shared spaces with humans, to deliver goods and parcels, ferry medications, or give company to elderly people. Therefore, they need to perceive, analyze, and predict the behavior of surrounding people and take collision-free and socially-acceptable actions. In this thesis, we address the problem of (short-term) human trajectory prediction, to enable mobile robots, such as Pepper, to navigate crowded environments. We propose a novel socially-aware approach for prediction of multiple pedestrians. Our model is designed and trained based on Generative Adversarial Networks, which learn the multi-modal distribution of plausible predictions for each pedestrian. Additionally, we use a modified version of this model to perform data-driven crowd simulation. Predicting the location of occluded pedestrians is another problem discussed in this dissertation. Also, we carried out a study on common human trajectory datasets. A list of quantitative metrics is suggested to assess prediction complexity in those datasets.
... Crowd analysis has been drawing great attention from researchers in the field of computer vision. It covers a wide range of applications, including crowd simulation [25,32], crowd dynamics modeling [4,29,37], crowd segmentation and clustering [3,16,36], abnormal behavior detection [12,19], and group behavior analysis [9,27,33], among others. For a more detailed overview on the literature, please refer to other works [10,15,18,24]. ...
Article
In this article, we propose a framework for crowd behavior prediction in complicated scenarios. The fundamental framework is designed using the standard encoder-decoder scheme, which is built upon the long short-term memory module to capture the temporal evolution of crowd behaviors. To model interactions among humans and environments, we embed both the social and the physical attention mechanisms into the long short-term memory. The social attention component can model the interactions among different pedestrians, whereas the physical attention component helps to understand the spatial configurations of the scene. Since pedestrians’ behaviors demonstrate multi-modal properties, we use the generative model to produce multiple acceptable future paths. The proposed framework not only predicts an individual’s trajectory accurately but also forecasts the ongoing group behaviors by leveraging on the coherent filtering approach. Experiments are carried out on the standard crowd benchmarks (namely, the ETH, the UCY, the CUHK crowd, and the CrowdFlow datasets), which demonstrate that the proposed framework is effective in forecasting crowd behaviors in complex scenarios.
... A growing research topic lies in measuring the 'realism' of a simulation, by measuring the similarity between two fragments of (real or simulated) crowd motion. This requires a way to summarize the motion of a crowd, for example by detecting patterns among trajectories [Wang et al. 2016] or by computing fundamental diagrams [Zhang 2012]. However, since each real-world scenario is different, it is difficult to draw general conclusions from such measurements. ...
Article
Data-driven crowd modeling has now become a popular and effective approach for generating realistic crowd simulation and has been applied to a range of applications, such as anomaly detection and game design. In the past decades, a number of data-driven crowd modeling techniques have been proposed, providing many options for people to generate virtual crowd simulation. This article provides a comprehensive survey of these state-of-the-art data-driven modeling techniques. We first describe the commonly used datasets for crowd modeling. Then, we categorize and discuss the state-of-the-art data-driven crowd modeling methods. After that, data-driven crowd model validation techniques are discussed. Finally, six promising future research topics of data-driven crowd modeling are discussed.
Article
In this paper, we propose an attention-guided multi-scale fusion network (named as AMS-Net) for crowd counting in dense scenarios. The overall model is mainly comprised by the density and the attention networks. The density network is able to provide a coarse prediction of the crowd distribution (density map), while the attention network helps to distinguish crowded regions from backgrounds. The output of the attention network serves as a mask of the coarse density map. The number of persons in the scene is finally estimated by applying integration on the refined density map. In order to deal with persons of varied resolutions, we introduce a multi-scale fusion strategy which is built upon dilated convolution. Experiments are carried out on the standard benchmark datasets, covering varied over-crowded scenarios. Experimental results demonstrate the effectiveness of the proposed approach.
Preprint
Full-text available
Crowd simulation is a central topic in several fields including graphics. To achieve high-fidelity simulations, data has been increasingly relied upon for analysis and simulation guidance. However, the information in real-world data is often noisy, mixed and unstructured, making it difficult for effective analysis, therefore has not been fully utilized. With the fast-growing volume of crowd data, such a bottleneck needs to be addressed. In this paper, we propose a new framework which comprehensively tackles this problem. It centers at an unsupervised method for analysis. The method takes as input raw and noisy data with highly mixed multi-dimensional (space, time and dynamics) information, and automatically structure it by learning the correlations among these dimensions. The dimensions together with their correlations fully describe the scene semantics which consists of recurring activity patterns in a scene, manifested as space flows with temporal and dynamics profiles. The effectiveness and robustness of the analysis have been tested on datasets with great variations in volume, duration, environment and crowd dynamics. Based on the analysis, new methods for data visualization, simulation evaluation and simulation guidance are also proposed. Together, our framework establishes a highly automated pipeline from raw data to crowd analysis, comparison and simulation guidance. Extensive experiments and evaluations have been conducted to show the flexibility, versatility and intuitiveness of our framework.
Article
Full-text available
We present Menge, a cross-platform, extensible, modular framework for simulating pedestrian movement in a crowd. Menge's architecture is inspired by an implicit decomposition of the problem of simulating crowds into component subproblems. These subproblems can typically be solved in many ways; different combinations of subproblem solutions yield crowd simulators with likewise varying properties. Menge creates abstractions for those subproblems and provides a plug-in architecture so that a novel simulator can be dynamically configured by connecting built-in and bespoke implementations of solutions to the various subproblems. Use of this type of framework could facilitate crowd simulation research, evaluation, and applications by reducing the cost of entering the domain, facilitating collaboration, and making comparisons between algorithms simpler. We show how the Menge framework is compatible with many prior models and algorithms used in crowd simulation and illustrate its flexibility via a varied set of scenarios and applications.
Article
Large dense crowds show aggregate behavior with reduced individual freedom of movement. We present a novel, scalable approach for simulating such crowds, using a dual representation both as discrete agents and as a single continuous system. In the continuous setting, we introduce a novel variational constraint called unilateral incompressibility, to model the large-scale behavior of the crowd, and accelerate inter-agent collision avoidance in dense scenarios. This approach makes it possible to simulate very large, dense crowds composed of up to a hundred thousand agents at nearinteractive rates on desktop computers.
Article
This paper proposes a generic data-driven crowd modeling framework to generate crowd behaviors that can match the video data. The proposed framework uses a dual-layer mechanism to model the crowd behaviors. The bottom layer models the microscopic collision avoidance behaviors, while the top layer models the macroscopic crowd behaviors such as the goal selection patterns and the path navigation patterns. Based on the dual-layer mechanism, an automatic learning method is proposed to learn the model components from video data. To validate its effectiveness, the proposed framework is applied to generate the crowd behaviors in New York Grand Central Terminal. The simulation results demonstrate that the proposed method is able to construct effective model that can generate the desired emergent crowd behaviors and can offer promising prediction performance. Copyright © 2015, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
Chapter
Obstacles in the workspace W map in the configuration space C to regions called C-obstacles. In Chapter 2 we defined the C-obstacle CB corresponding to a workspace obstacle B as the following region in C:$$CB = \{ q \in C/A(q) \cap B \ne 0\}$$.