Content uploaded by He Wang

Author content

All content in this area was uploaded by He Wang on Oct 11, 2017

Content may be subject to copyright.

Path Patterns: Analyzing and Comparing Real and Simulated Crowds

He Wang∗Jan Ondˇ

rej†Carol O’Sullivan‡

Disney Research Los Angeles

Figure 1: (a) A video screenshot from a train station; (b) 1000 tracklets (randomly selected from 19999); (c-g) The ﬁve orientation sub-

domains of the top pattern as location-orientation distributions. Inset shows discretization of the orientation, with black representing zero

velocity.

Abstract

Crowd simulation has been an active and important area of research

in the ﬁeld of interactive 3D graphics for several decades. How-

ever, only recently has there been an increased focus on evaluat-

ing the ﬁdelity of the results with respect to real-world situations.

The focus to date has been on analyzing the properties of low-level

features such as pedestrian trajectories, or global features such as

crowd densities. We propose a new approach based on ﬁnding la-

tent Path Patterns in both real and simulated data in order to analyze

and compare them. Unsupervised clustering by non-parametric

Bayesian inference is used to learn the patterns, which themselves

provide a rich visualization of the crowd’s behaviour. To this end,

we present a new Stochastic Variational Dual Hierarchical Dirich-

let Process (SV-DHDP) model. The ﬁdelity of the patterns is then

computed with respect to a reference, thus allowing the outputs of

different algorithms to be compared with each other and/or with

real data accordingly.

Keywords: Crowd Simulation, Crowd Comparison, Data-Driven,

Clustering, Hierarchical Dirichlet Process, Stochastic Optimization

Concepts: •Computing methodologies →Motion processing;

Topic modeling; Bayesian network models;

1 Introduction

Although a large variety of crowd simulation methods exist, choos-

ing the best algorithm for speciﬁc scenarios or applications remains

a challenge. Human behavior is very complex and no one algorithm

can be a magic bullet for every situation. Furthermore, different pa-

∗realcrane@gmail.com (corresponding author)

†jan.ondrej@disneyresearch.com

‡carol.osullivan@tcd.ie

Permission to make digital or hard copies of all or part of this work for per-

sonal or classroom use is granted without fee provided that copies are not

made or distributed for proﬁt or commercial advantage and that copies bear

this notice and the full citation on the ﬁrst page. Copyrights for components

of this work owned by others than the author(s) must be honored. Abstract-

ing with credit is permitted. To copy otherwise, or republish, to post on

servers or to redistribute to lists, requires prior speciﬁc permission and/or a

fee. Request permissions from permissions@acm.org. c

2016 Copyright

held by the owner/author(s). Publication rights licensed to ACM.

I3D ’16,, February 27-28, 2016, Redmond, WA, USA

ISBN: 978-1-4503-4043-4/16/03

DOI: http://dx.doi.org/10.1145/2856400.2856410...$15.00

rameter settings for any given method can give widely varying re-

sults. Subjective user studies can be useful to determine perceived

realism or aesthetic qualities, but more objective methods are of-

ten needed to determine the ﬁdelity and/or predictive power of a

given simulation method with respect to real human behaviors. The

hierarchical and heterogeneous nature of human crowd behaviors

make it very difﬁcult to ﬁnd a deﬁnitive set of evaluation rules or

empirical metrics. Therefore, data-driven evaluation methods are

particularly useful for this purpose.

In this paper, we propose a data-driven approach to crowd evalua-

tion based on exposing the latent patterns of behavior that exist in

both real and simulated data. Previous data-driven methods tend

to focus on comparisons between high-level global features such as

densities and exit rates, or low-level data such as individual trajecto-

ries. In the former case, the results are often too general and do not

reﬂect the heterogeneity of human behaviors, and in the latter case,

the results are too speciﬁc to the exact scenario recorded. Our ap-

proach offers a compromise between these two extremes that takes

both the global and local properties of crowd motion into account

in order to facilitate a comprehensive qualitative and quantitative

analysis of the data.

For a high-level explanation of our approach, we consider the exam-

ple of a large public square with many entrances and exits, such as

the train station shown in Figure 1(a). Pedestrians typically do not

wander randomly, nor do they walk in straight lines; rather, they

self-organize into ﬂows or standing clusters, with each trajectory

consisting of a series of one person’s steps as he moves through the

square (Figure 1(b)). A group of similar trajectories can be thought

of as a trending path that represents the aggregation of multiple

pedestrians’ positions and orientations. Combining all such trend-

ing paths together will generate an overall path pattern that consists

of ﬂows of location-orientation pairs (Figure 1(c-g)). In scenarios

where global path planning does not signiﬁcantly affect behavior,

e.g., walking through a corridor, local inter-personal dynamics can

also lead to different path patterns. The path patterns created are

therefore the result of local/internal dynamics and global/external

factors.

The main contribution of our paper is a new approach to analyz-

ing and comparing crowd data based on discovering latent path

patterns. To automatically extract these patterns from both real

and simulated data, we present a SV-DHDP model that is the ﬁrst

to combine a Dual Hierarchical Dirichlet Process with Stochastic

Variational Inference. The patterns themselves provide a rich visu-

alization of the crowd’s behaviors and can reveal qualitative prop-

erties that would be difﬁcult or impossible to see by simply view-

ing the original data. Furthermore, we propose a quantitative met-

ric that computes the similarity between both real and simulated

datasets. This allows us to analyze the predictive quality of various

simulation algorithms with respect to real data. We demonstrate the

qualitative and quantitative capabilities of our approach on several

real and simulated crowd datasets.

2 Related Work

Crowd motion properties are affected by a hierarchy of factors from

geometric to cognitive [Funge et al. 1999]. To model these myriad

behavioral aspects, methods such as ﬁeld and ﬂow based [Narain

et al. 2009; Treuille et al. 2006], force-based [Helbing and Moln´

ar

1995; Karamouzas et al. 2009], velocity and geometric optimiza-

tion [van den Berg et al. 2008; Pettr´

e et al. 2009; Ondˇ

rej et al.

2010] and data-driven [Lee et al. 2007; Lerner et al. 2009b] have

been proposed. Our aim is to provide an evaluation framework that

imposes no assumptions on the underlying simulation mechanism

and can therefore work on the output data of all such methods.

Qualitative methods for crowd evaluation have been proposed and

include visual comparison [Kim et al. 2013; Lemercier et al. 2012]

and perceptual experiments [McDonnell et al. 2008; Guy et al.

2011; Ennis et al. 2011]. Quantitative methods fall into two main

categories: model-based [Kim et al. 2012; Golas et al. 2013] and

data-driven [Singh et al. 2009; Ju et al. 2010; Kapadia et al. 2011;

Musse et al. 2012]. Data-driven metrics have been proposed that

use the statistics of geometric and dynamic feature analysis [Wolin-

ski et al. 2014], model-based comparisons of motion randomness

[Guy et al. 2012] and decision making processes [Lerner et al.

2009a].

Our data-driven evaluation method is partly inspired by two previ-

ous approaches. Guy et al. [2012] use a dynamic system to model

crowd dynamics and compute an entropy metric based on individ-

ual motion randomness distributions learned from the data. Our

method differs in that we learn global path patterns from groups

of trajectories, rather than individual ones. Charamlambous et.al.

[2014] apply a number of different criteria to detect anomalies in

the data, whereas we focus on discovering mainstream latent pat-

terns.

We also draw inspiration from the ﬁeld of Computer Vision, where

hierarchical Bayesian models [Blei et al. 2003; Teh et al. 2006]

have been successfully employed for scene classiﬁcation [Fei-Fei

and Perona 2005; Sudderth et al. 2007], object recognition [Sivic

et al. 2005], human action detection [Niebles et al. 2008] and video

analysis [Kaufman and Rousseeuw 2005; Zhou et al. 2011; Wang

et al. 2009]. The Hierararchial Dirichlet Process (HDP) has been

successfully used in Natural Language Processing to discover can-

didate topics within corpora. By observing that crowd data can also

be decomposed into a bag of words, Wang et al. [2009] used a Dual

HDP (DHDP) to analyze paths in video data.

There has been extensive research in computer vision and robotics

on crowd analysis and we discuss some representative approaches

here. Zhou et. al [2012] model trajectories as linear dynamic sys-

tems and model starting positions and destinations as beliefs. The

key information, belief, is manually labelled. Although the user

can roughly label these areas, we suspect that a ﬁner classiﬁca-

tion will require more extensive labelling. Furthermore, it is un-

clear how they such areas could be labelled in a highly unstruc-

tured space where every position on the boundary could be both a

starting and an ending area. In our approach, we do not require

manual labels for such beliefs. Ikeda et.al [2013] models paths by

ﬁrst determining sub-goals and then learning transitions between

sub-goals. However, their model of the crowd is solely based on

the social-force model, and sub-goals are deﬁned as points towards

which many velocities converge. There may not be any such sub-

goals (consider ﬂows with no intersections), or there could be too

many. Our method does not make any assumptions about the under-

lying behavior model or the existence of sub-goals. Other methods

based on density or mean-ﬂows [Ali and Shah 2007; Zhong et al.

2015] interpret the whole ﬁeld as one density map or one ﬂow ﬁeld

whereas our method gives a series of weighted patterns.

3 Methodology

3.1 Model Choice

The ﬁrst step towards exposing the latent path patterns in a crowd

data set is to ﬁnd a set of trending paths. Here, a trending path

can be seen as a collection of similar trajectories. However, man-

ually labelling clusters of trajectories would be difﬁcult and time-

consuming as we lack a good distance metric and prior knowledge

of the number of patterns present. Popular unsupervised clustering

algorithms, such as K-means [MacQueen 1967] and Gaussian Mix-

ture Models (GMMs) [Bishop 2007], require a pre-deﬁned clus-

ter number. Hierarchical Agglomerative Clustering [Kaufman and

Rousseeuw 2005] does not require a predeﬁned cluster number, but

the user must decide when to stop merging, which is similarly prob-

lematic. Spectral-based clustering methods [Shi and Malik 2000]

solve this problem, but require the computation of a similarity ma-

trix whose space complexity is O(n2)on the number of trajectories.

Too much memory is needed for large datasets and performance de-

grades quickly with increasing matrix size.

An alternative perspective is to treat a trending path as a dis-

tribution over location-orientation pairs (Figure 2). A group of

trajectories connecting points A and B can be represented by a

trending path modeled by Multinomial distributions over location-

orientation pairs. Note in this representation, a trending path is a

ﬂow sub-ﬁeld rather than a group of 2D curves. Although the trajec-

tories are broken into individual location-orientation observations

in this way, we overcome the randomness of a particular trajectory

and represent such a trajectory group as one trending path. Next, we

ﬁnd all trending paths under the assumption that: if a trending path

exists, there should be repeated location-orientation occurrences on

this path. Then the problem is transformed to computing a (poten-

tially inﬁnite) number of Multinomial distributions. We present a

non-parametric hierarchical Bayesian model that can automatically

compute a desirable number of such Multinomial distributions from

the data. Thus, it does not require a pre-deﬁned cluster number and

its space complexity is smaller than O(n2).

Figure 2: Two sets of trajectories (a, c) and their corresponding

trending paths modeled by Multinomials (b, d). Color coding rep-

resents different orientation sub-domains (cf. Figure 1)

We ﬁrst deﬁne the terminologies in Table 1. Our SV-DHDP model

employs a Dual Hierarchical Dirichlet Process, similar to that pre-

sented in [Wang et al. 2009], for pattern analysis, but we combine it

with Stochastic Variational Inference (SVI) for posterior estimation

that results in better performance on large datasets. In a standard

hierarchical Bayesian setting, a tree is constructed in an attempt to

explain a set of observations through a hierarchy of factors. In our

problem, the observations are agent states, which we divide into

Figure 3: SV-DHDP model. DP is Dirichlet Process. wdn is the

nth agent state in data segment d. K is the total number of patterns.

vkis the weight of the kth pattern. βkis the kth pattern. Arrows

indicate dependencies.

Terms Notation Meaning

Agent State w w={p, v}where p and v are the

position and orientation of an agent

State Space SThe set of all possible states. S=

{wi}

Path PA probability distribution over S.

P(s)

Path Pattern βA mixture of paths.

Table 1: Terminology and Parameters

equal-length data segments along the time domain. Our goal is to

ﬁnd a set of path patterns {βk}that, when combined with their re-

spective weights, best describe all the segments in terms of their

likelihoods. Such a tree structure is shown in Figure 3. This is

a simpliﬁed ﬁgure of a three-layer Bayesian hierarchy explaining

how the observations wdncan be explained by all possible patterns

βkwith weights vk. For the sake of conciseness, the full detailed

version of the model is provided in the supplementary material. The

overall goal here is to estimate βks and vks given wdn, which is the

posterior distribution of this model p({βk},{vk} | wdn). We ex-

plain the posterior estimation in Section 4.

Figure 4: Illustrative example with 100 data segments each with

accumulative 50 positions: (top left) 10 ground truth path patterns;

(right) example data; (bottom left) The top 16 path patterns learned

3.2 An Illustrative Example

After initial experiments using our model, we ﬁnd that although

many trending paths can be found in a dataset, only a subset of

them are needed to describe a data segment (i.e., a time slice of the

dataset). Furthermore, different subsets of the path patterns exist in

different data segments. We use a simple example to illustrate this

principle.

Consider again the case of a public square, simpliﬁed as a 5×5

grid environment. Imagine that there are only 10 possible paths

that people will take, illustrated as horizontal and vertical bars (Fig-

ure 4 top left). Note that in this simple case each path represents a

single ground truth path pattern, whereas in more complex scenes

such as those presented later in this paper, a particular path usually

co-occurs with different ones. For the sake of clarity, we also only

cluster positions. We synthesize a dataset representing the activ-

ity in the square by randomly combining several ground truth path

patterns and performing random sampling to generate 100 data seg-

ments, each consisting of 50 accumulated positions (e.g., Figure 4

right). Each data segment is a density map of positions (the darker

the cell, the higher the density) and mimics an observation of the

square over some time interval. We can observe the phenomenon

that each segment can be described by a subset of path patterns. Ap-

plying our model, we learn all the latent path patterns from our syn-

thetic dataset and Figure 4 (bottom left) shows the top 16 found. As

we can see, the top 10 match our ground truth patterns. Although

additional patterns are learned, they are less prominent (smaller in-

tensities) and have much smaller weights, so they are ranked lower.

4 Posterior Estimation and Similarity Metric

As discussed in Section 3.1, the novelty of our SV-DHDP model is

the way we compute the posterior. There are two approaches com-

monly used for this purpose: sampling and variational inference.

Sampling methods provide good model ﬁtness on relatively small

datasets. But the proof of convergence is still open and they have

other shortcomings [Teh et al. 2008]. We therefore use a Stochastic

Variational Inference (SVI) method, which is much faster on large

datasets (such as crowd behaviors observed over time).

For a standard two-layer HDP model, many methods have been de-

veloped [Teh et al. 2008; Hoffman et al. 2013; Wang et al. 2011].

Our SVI technique is similar to that recently proposed in [Hoff-

man et al. 2013], except that their model is a simple two-layer

HDP model whereas ours has an additional DDP layer. This ex-

tension is non-trivial and involves much more than merely adding

one more DP layer to a two-layer HDP model. To our knowledge,

this is the ﬁrst attempt to apply variational inference on this type

of model. Please refer to the supplementary materials for detailed

math derivations and algorithms.

4.1 Model Fitness

By dividing a dataset into training data Ctrain and a test data seg-

ment Ctest, we can evaluate the model ﬁtness by the predictive

likelihood of Ctest. We further divide Ctest into two sets of sam-

ples: observed wobs

i, and held-out who

i. We also keep the unique

state sets of the two sets disjoint. We ﬁrst use Ctrain to train our

model to compute the approximate posterior, and then use wobs

i

and the approximate posterior to ﬁne tune the top-level path pattern

weights. Finally, we compute the log likelihood of who

i. This met-

ric gives a good predictive distribution and avoids comparing pa-

rameter bounds. Similar metrics are used in [Hoffman et al. 2013;

Teh et al. 2008; Wang et al. 2011] for evaluating model ﬁtness. It is

computed by:

p(who

j|Ctrain, w obs

j)

=Z Z (

K

X

k=1

vkβk,who

j)p(v|Ctrain, β )p(β|Ctrain)dvdβ

≈Z Z (

K

X

k=1

vkβk,who

j)q(v)q(β)dvdβ

=

K

X

k=1

Eq[vk]Eq[βk,who

j]

(1)

where Kis the truncation number at the top level and βk,who

jis the

probability of the state who

jin path pattern βk.qis the variational

distribution. For a testing data segment, per-state log likelihood:

Qjp(who

j|Ctrain, w obs

j)is computed. When training the model,

we plot per-state log likelihood and stop the optimization when it

becomes stable.

4.2 Inference Based Similarity Metric

In addition to extracting path patterns, we would also like to pro-

pose a metric for measuring similarities between datasets, so that

a quantitative similarity can be computed for simulation v.s sim-

ulation, real data vs. simulation or even real data vs. real data.

Since our model can compute path patterns for two datasets, a

naive approach is to use some commonly used metric such as KL-

divergence or even plain Euclidean distance between pairs of pat-

terns. However, we can easily end up with two sets of different

sizes. And comparing two sets of probabilistic distributions is not

a well-deﬁned problem. Another seemingly good idea would be to

only compare the top n patterns from both pattern sets. However, it

is unfair because the patterns are weighted differently within their

sets. And the choice of n is unclear. A more elegant metric is

needed to compare two datasets.

We know that to evaluate model ﬁtness on dataset A, we would use

a test data segment from A. This model ﬁtness also implies that

if dataset B has similar path patterns to A, then the data from B

should also give a good likelihood in Equation 1. In this way, we

can compute per-state predictive likelihood of B given A:

lik(B|A) = p(who |A, wobs ).(2)

Here we replace Ctrain in Equation 1 with A. Both the observed

data wobs and the held-out data who are from B instead of A. This

metric resolves the two concerns mentioned above.

In addition, since our patterns are Multinomials, it is always pos-

sible to do pair wise comparison such as KL-divergence and Root

Mean Squared Error if needed.

5 Path Pattern Abstraction

To show the generality and robustness of our method, we apply it to

both simulated datasets as well as real data from various scenarios

with different features on different noise levels. We also compare

our methods with existing approaches and discuss performance. All

our patterns are color-coded in the various ﬁgures, with different

colors represent orientation as in Figure 1 where color intensities

show probabilities.

5.1 Simulation Datasets

Real data exhibits both global and local features, caused by the

fact that pedestrians tend to plan their paths through an environ-

ment based on external factors such as entrances, exits and personal

goals, but they are often deﬂected from their paths due to the neces-

sity to avoid members of a crowd. In simulations, different types

of simulation algorithms are used to model local steering behaviors

and global path planning strategies. We explore the effects of these

algorithms separately by ﬁrst varying local steering methods while

minimizing the impact of the global path planning. Then we ﬁx the

steering behavior and vary the global path planning strategies.

5.1.1 Local Steering

We choose four steering algorithms that are representative of com-

monly used methods: MOU09 [Moussa¨

ıd et al. 2009] is a recent

version of Helbing’s social force model; PETT09 [Pettr´

e et al.

2009] is a velocity obstacle method based approach similar to RVO;

ONDREJ10 [Ondˇ

rej et al. 2010] uses bearing angle to avoid colli-

sions; and PARIS07 [Paris et al. 2007] solves steering in velocity

space. Many other methods exist, such as potential ﬁelds, ﬂuid

based, hybrids, foot-step planning, but our goal is not to analyze ev-

ery possible approach, but to demonstrate how our method can cap-

ture the differences produced by different reactive steering meth-

ods.

We set up a bi-directional ﬂow experiment to show our analysis for

local steering behaviors. Two rectangular areas are placed at the

top and bottom of a scene (Figure 5) and two groups of agents are

created. For one group, agents are randomly generated in one area

with randomly selected destinations in the other area, thus avoiding

any complex global path planning. For the other agent group, we

switch the generation and destination areas. This forces both agent

groups to use steering behaviors in order to avoid the others. Each

simulation lasts for around 25 minutes and involves 20000 agents.

Figure 5: Top path patterns from the data created by four repre-

sentative local steering algorithms

Snapshots of the simulation data can be found in the supplemental

material. Figure 5 shows the top path patterns computed. Intu-

itively, we can see that PARIS07 does not give prominent patterns

meaning the crowd is spreading out all the time. ONDREJ10 tends

to give stable ﬂows compared to other methods. And PETT09 and

MOU09 are in the middle because their patterns are slightly more

concentrated than PARIS07, but less so than ONDREJ10. PARIS07

looks more similar to PETT09 and MOU09 than ONDREJ10. This

visualization thus facilitates a qualitative understanding of the be-

haviours generated using different local steering mechanisms. Later

Figure 6: Trajectories created by three global path planning algo-

rithms: Navmesh, Roadmap and PoField

Figure 7: Top path patterns from the data created by three global

path planning algorithms: Navmesh, Roadmap and PoField

we will see how we can also quantitatively compare them with each

other.

5.1.2 Global Path Planning

In this experiment, we ﬁx our local steering model [van den Berg

et al. 2011] and vary the global path planning methods to test our

analysis model. The environment is a square with several obstacles

in the middle. We set up a generation area at the top and desti-

nation area at the bottom. Also, we recycle 64 agents over and

over again to generate 400 second data. Three global path plan-

ning methods, Navigation mesh [Snook 2000], Roadmap [Latombe

1991] and Potential Field [Khatib 1985] are used here, referred as

Navmesh, Roadmap and PoField. Trajectories are generated using

Menge [Curtis et al. 2014] (Figure 6) and the top patterns found are

shown in (Figure 7).

The top patterns of all three methods are down-going ﬂows as ex-

pected but they spread out within the environment in slightly dif-

ferent ways. In addition to these high probability patterns, other

patterns are also learned and we do ﬁnd other colors, albeit with

much smaller weights. These patterns occur when agents get com-

pletely blocked so they start to walk in other directions to ﬁnd their

way out.

5.2 Real Datasets

In addition to simulation data, we also show experimental results

computed on two real datasets. The ﬁrst dataset is a 6 minute video

clip of 967 pedestrians in a park recorded by a mid-distance camera

in a park. We manually annotate the trajectories so that we have

relatively complete trajectories with very little noise.

The second dataset consists of 19999 tracklets recorded in New

York Grand Central Terminal by a far distance camera [Zhong

et al. 2015] (downloaded from http://www.ee.cuhk.edu.hk/ xg-

wang/grandcentral.html). The trajectories are computed based on

moving pixels and contain only partial and noisy tracklets, thus

demonstrating the robustness of our method.

Figure 8: Park dataset: (a) Projected trajectories, (b) Annotated

trajectories overlaid on a frame from the video. Red dots are cam-

eras.

5.2.1 Park

All trajectories and some data segments are shown in Figure 8. To

train the model, the truncation numbers for J, I, L and K (param-

eters explained in the supplementary material) are set to 10, 15, 4

and 20 respectively. The training took 0.58 hours and Figure 9 Ref-

erence shows some high-weight patterns. There are several major

ﬂows learned from the data. One is the ﬂow going from 3 to 2 (Ref-

erences b and j). They mainly differ in whether they go through the

narrow corridor along the bottom or not. Another up-going ﬂow is

Reference f from 3 to 1. The major down-going ﬂows are from 2 to

3 (magenta and yellow). These paths are also observed in the data.

5.2.2 Train Station

The whole area is a square with each dimension approximately 50m

long. We discretize the domain into 1×1 meter grids and set J, I,

L and K (parameters explained in the supplementary material) to

10, 15, 3 and 20. The training took 1.83 hours. Some patterns are

shown in Figure 1.

In Figure 1, eis the major up-going ﬂow and dis the major down-

going ﬂow. Both are observed in the data. The right-going ﬂow

shown in Figure 1 cis another major ﬂow observed in the data.

Interestingly, the left-going ﬂow pattern (yellow) is not very promi-

nent. After looking at the data, we found that since it shares bound-

aries with green and magenta, some of the left-going ﬂows are cap-

tured in Figure 1 dand einstead.

5.3 Comparison with previous approaches

We empirically compare our SVI method with Gibbs sampling in

[Wang et al. 2009] on the train station dataset. Due to the nature

and stochasticity of these two methods, it is hard to compare them

in one standard setting. So we run our method until it converges,

then run Gibbs sampling for various times to compare the results.

We ﬁrst run the sampling for 1.95 hours and show the results in

Figure 10. We made our best effort to ﬁnd informative and sim-

ilar patterns in the results. Compared to the patterns in Figure 1,

we cannot ﬁnd any pattern that are as informative. Another inter-

esting difference is that patterns shown in Figure 10 are in general

more concentrated into individual grids (reﬂected by their intensi-

ties compared to the ones in Figure 1), and do not fully cover the

areas of the paths. We believe this is due to every state sample be-

ing given only one pattern label in sampling while in SVI each state

sample has a distribution over all patterns. Also, after 1.95 hours, a

total number of 198 patterns are learned and the number continues

to go up to 735 after running for 40 hours, clearly showing it is not

converged yet. More patterns are available in the supplementary

material.

We also compared the performance of SVI with sampling. SVI

is faster mainly because in every iteration, it uses a batch number

that is usually much smaller than the number of data segments. In

Figure 9: Top 3 patterns for the Park dataset that cover more than

90% of weights. Each pattern is shown for 4 directions in a group (4

rows – Blue is omitted because no signiﬁcant pattern found for that

direction). Column 1: Top patterns from real dataset. Columns 2-5:

Top patterns from four simulated datasets. Similarity scores using

the real data as reference are shown in the brackets next to the name

of the method. They are log likelihoods. The larger (closer to 0) the

better. At the bottom of each group, weights for corresponding pat-

terns are given. The percentages are computed by KL-divergence

between a reference pattern and a simulation pattern, then nor-

malized to 0-100. Intensity represent probability. The higher the

intensity, the higher the probability.

Figure 10: The top pattern in different velocity domains learned by

sampling. More patterns are available in the supplementary mate-

rial.

Figure 11: Model ﬁtness plot with iterations. a: Train Station. b:

ONDREJ10 in Section 5.1.1.

contrast, sampling uses all of them. Figure 11 shows how quickly

our SV-DHDP converges. We show plots for two examples. For

both synthetic and real data, our model converges at between 20 to

60 iterations.

6 Similarity Analysis

In this section, we show that how our similarity metric can be used

to provide meaningful comparisons between real reference data and

simulation data.

We used the four models in Section 5.1.1 in combination with

one global path planning method [2004] to simulate the crowds

in the park and the train station. We modelled the environ-

ments by observing the videos carefully, then randomly generating

agents within the entrance areas and randomly selecting destina-

tions within the exit areas. All similarities are computed using the

real dataset as the reference. Snapshots of data segments for both

experiments are shown in the supplementary material. The simi-

larity scores for the park and train station simulations are shown in

Figure 9 and Figure 12. Some top patterns are shown in Figure 9

column 2-5 and Figure 12 column 2-5.

First we emphasize that the similarities presented here are not de-

signed to provide any kind of conclusive statement of which simu-

lator is the best. Path patterns are affected by many factors and we

did not exhaustively try all combinations of all parameter settings.

For instance, it is difﬁcult to accurately calibrate parameters includ-

ing accurate entrances and exits, timing of arrival, the proportions

of population in different ﬂows and so on. After ﬁrst looking at the

computed patterns and scores we adjusted the entrances and exits

more carefully to ensure the best performance possible for all al-

gorithms and we speculate that the simulations could be even more

improved by adjusting timing and population density. This also

demonstrated how our metric can help to design simulations, be-

cause we can identify the key elements to adjust by looking at the

visual patterns.

To make good use of our metric for simulation, we suggest two

ways to interpret the patterns, by using Equation 2 and by comput-

ing KL-divergence between pairs of patterns to help in interpreting

the visual data.

Equation 2 shows the average likelihood of the testing data. There

are several major factors affecting the score. Firstly, the global

Figure 12: Train Station patterns for real and simulated datasets.

The layout and scores are computed in the same way as in Figure 9.

path planning has a great inﬂuence. One example is Figure 12

(a):Reference where a wide ﬂow can be seen going from the bot-

tom to the right. In the simulation, only PETT09 roughly captures

it which contributes to its score. In addition, the relative numbers

of agents on each path pattern also inﬂuences the similarity. Fig-

ure 12 (a):PARIS07 has several ﬂows that are not seen in the real

data pattern. After watching the video, we found that there are only

a few people walking on these paths but the simulation assigned a

large number of agents to them, thus contributing to the low score of

PARIS07. Next, in Figure 12, all simulations other than PARIS07

tend to form narrower paths than the real data, whereas in the park

simulation, some of them are wider than the real data such as Fig-

ure 9 (b):ONDREJ10. Some of them are about the same width

such as Figure 9 (f):MOU09 and some of them are too narrow such

as Figure 9 (b):MOU09. The path width is affected by the simu-

lation method itself as well as the number of agents on that path

too. Finally, when it comes down to a single path, some models

tend to form prominent patterns more than others, as seen in the

bi-directional ﬂow example. This also contributes to the scores.

In addition, the weights are used for two purposes: analysis and

comparison. Within a single dataset, the weights reﬂect the rela-

tive likelihoods of each path pattern. For instance, the likelihood of

observing an agent on Figure 9(a):Reference is more than twice as

that on Figure 9(e):Reference, indicated by their respective weights

v0= 0.57 and v1= 0.21. For comparison, the weights are also con-

sidered by Equation 2.

Aside from Equation 2, the user might want simply focus on some

pattern similarities. This can be computed by KL-divergence be-

tween pairs of patterns. In both Figure 9 and Figure 12, each pat-

tern is given a score comparing itself with the corresponding pattern

in the reference. We normalize the values to 0-100, where bigger is

better. The results of this metric may sometimes seem contradictory

with the previous one because the focus is different. For instance,

in Figure 9, ONDREJ10 has the lowest similarity score. But for

KL-divergence similarity, its ﬁrst three patterns outperforms other

datasets. This means we were able to reproduce some major ﬂows

faithfully in the reference data by ONDREJ10, but it does not do

well on capturing the other sub-dominant ﬂows. However, if the

user just wants to reproduce the major ﬂows, then ONDREJ10 is

going to be a good option in this case. PETT09 and MOU09 also

capture good ﬂows in the second group. So they might be the choice

if those ﬂows are to be reproduced. For the KL-divergence metric,

the weights are less meaningful because it can be applied on any

pair of patterns from different datasets depending on the applica-

tion.

Overall, the two metrics here focus on different aspects of the data.

The similarity score gives overall performance, which is the per-

state likelihood. The KL-divergence similarity emphasizes more

on visual similarities. Together, they provide enriched information

for different use cases.

7 Conclusions and Future Work

We propose a new perspective for comparing crowd data. We

present a non-parametric hierarchical Bayesian model to automati-

cally extract a desirable set of patterns. Also, we propose a similar-

ity metric for comparison.

Our metric is environment-based. The main reason is the reference

data is almost always affected by the environment in real-world

applications. When only local ﬂow patterns are needed, our met-

ric still works well as shown in the bidirectional ﬂow example. A

global shift of ﬂows will give low scores but we argue that they can

always be aligned if a rotation/translation invariant comparison is

needed.

Our method has some inherent limitations. Firstly, our method does

not directly measure individual trajectories thus does not reﬂect in-

dividual visual similarities. Our patterns are reﬂections of informa-

tion on a higher level than individual trajectories. Secondly, it does

not capture temporal information such as changes of patterns over

time. Lastly, our truncation-based stochastic variational inference

is sensitive to the initialization even if the stochasticity in gradient

helps to some extent. In our experiments, we did grid search to ﬁnd

out good initializations.

One future direction will be an extension of the current model into

a dynamic model. Currently, all data are considered at once. But

in real situations, the path patterns and their respective weights

can change over time. To capture this effect, a dynamic model

is needed. Another direction is introducing pattern merge and

delete during optimization to ﬁnd better solutions. To use our met-

ric to guide simulation more automatically, we could use patterns

as guiding ﬂows for crowd simulation to improve the scores by

methods such as [Berseth et al. 2014]. Currently, we only capture

ﬂows;although individual trajectories may also inﬂuence perceptual

realism. A good direction is to try to capture information on both

levels. Finally, we would like to add social activity and environ-

mental information such as talking and pouring a cup of coffee so

that it becomes a behavior pattern model. We believe it will further

help simulating realistic crowds with diverse behaviors.

References

ALI , S., AN D SHAH , M . 2007. A Lagrangian Particle Dynamics

Approach for Crowd Flow Segmentation and Stability Analysis.

In IEEE Conference on Computer Vision and Pattern Recogni-

tion, 2007. CVPR ’07, 1–6.

BER SE TH , G., KAPA DI A, M ., HAWO RTH, B., AND FA LO UT SOS ,

P. 2014. SteerFit: Automated Parameter Fitting for Steering Al-

gorithms. In Proceedings of the ACM SIGGRAPH/Eurographics

Symposium on Computer Animation, Eurographics Association,

Aire-la-Ville, Switzerland, Switzerland, SCA ’14, 113–122.

BISHOP, C. 2007. Pattern Recognition and Machine Learning.

Springer, New York.

BLE I, D. M., NG, A. Y., AN D JOR DA N, M. I. 2003. Latent

Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022.

CHARALAMBOUS, P., KARAMOUZAS, I., GUY, S . J., A ND

CHRYSANTHOU, Y. 2014. A Data-Driven Framework for Visual

Crowd Analysis. Comp. Graph. Forum 33, 7, 41–50.

CURTIS, S., BES T, A., AN D MANOCHA, D. 2014. Menge: A

modular framework for simulating crowd movement. University

of North Carolina at Chapel Hill, Tech. Rep.

ENN IS , C. , PET ER S, C ., AND O’SULLIVAN, C . 2011. Perceptual

Effects of Scene Context and Viewpoint for Virtual Pedestrian

Crowds. ACM Trans. Appl. Percept. 8, 2, 10:1–10:22.

FEI -FEI , L., AND PE RONA, P. 2005. A Bayesian hierarchical

model for learning natural scene categories. In IEEE CVPR

2005, 524–531.

FUN GE , J. , TU, X., AN D TERZOPOULOS, D. 1999. Cognitive

Modeling: Knowledge, Reasoning and Planning for Intelligent

Characters. In SIGGRAPH’99, ACM Press/Addison-Wesley

Publishing Co., 29–38.

GOL AS , A. , NARAIN, R., A ND LIN , M . 2013. Hybrid Long-range

Collision Avoidance for Crowd Simulation. In I3D 2013, 29–36.

GUY, S. J., KI M, S ., LI N, M. C., AND MANOCHA, D. 2011.

Simulating Heterogeneous Crowd Behaviors Using Personality

Trait Theory. In SCA 2011, 43–52.

GUY, S. J., VAN D EN BE RG, J., LIU , W., LAU, R ., LI N, M . C.,

AN D MANOCHA, D. 2012. A Statistical Similarity Measure for

Aggregate Crowd Dynamics. ACM Trans. Graph. 31, 6, 190:1–

190:11.

HEL BI NG , D., AND MO LN ´

AR , P. 1995. Social force model for

pedestrian dynamics. Phys. Rev. E 51, 5, 4282–4286.

HOFF MA N, M . D., BL E I, D. M., WANG , C. , AN D PAI SL EY, J.

2013. Stochastic Variational Inference. J. Mach. Learn. Res. 14,

1, 1303–1347.

IKE DA, T., CHIGODO, Y., REA , D. , ZANLUNGO, F., SHIOMI,

M., AN D KAN DA, T. 2013. Modeling and prediction of pedes-

trian behavior based on the sub-goal concept. Robotics, 137.

JU, E., CH OI , M. G., PARK, M., LEE , J. , LEE , K. H. , AND

TAKAHASHI, S. 2010. Morphable Crowds. ACM Trans. Graph.

29, 6, 140:1–140:10.

KAPAD IA, M., WAN G, M ., SI NG H, S ., REINMAN, G., AND

FALO U TS OS , P. 2011. Scenario Space: Characterizing Cov-

erage, Quality, and Failure of Steering Algorithms. In Proceed-

ings of the 2011 ACM SIGGRAPH/Eurographics Symposium on

Computer Animation, ACM, New York, NY, USA, SCA ’11, 53–

62.

KARAMOUZAS, I., HE IL , P., BE EK, P. V., AND OVERMARS,

M. H. 2009. A Predictive Collision Avoidance Model for Pedes-

trian Simulation. In Motion in Games, 41–52.

KAUF MA N, L ., AND ROU SS EEU W, P. J. 2005. Finding Groups in

Data: An Introduction to Cluster Analysis. Wiley-Interscience.

KHATIB, O. 1985. Real-time obstacle avoidance for manipulators

and mobile robots. In 1985 IEEE International Conference on

Robotics and Automation. Proceedings, vol. 2, 500–505.

KIM , S., GU Y, S. J. , MANOCHA, D., AN D LIN, M. C. 2012. In-

teractive Simulation of Dynamic Crowd Behaviors Using Gen-

eral Adaptation Syndrome Theory. In I3D 2012, 55–62.

KIM , S., GU Y, S. J., A ND MANOCHA, D . 2013. Velocity-based

Modeling of Physical Interactions in Multi-agent Simulations. In

SCA 2013, 125–133.

LAMARCHE, F., AND DONIKIAN, S . 2004. Crowd of virtual hu-

mans: a new approach for real time navigation in complex and

structured environments. Computer Graph. Forum 23, 3, 509–

518.

LATOMBE , J. -C. 1991. Robot Motion Planning. Kluwer Academic

Publishers, Norwell, MA, USA.

LEE , K. H., C HOI , M. G ., H ONG, Q., A ND LE E, J. 2007. Group

Behavior from Video: A Data-driven Approach to Crowd Simu-

lation. In SCA 2007, Eurographics Association, 109–118.

LEM ER CI ER, S., J ELI C, A., K ULPA, R ., HUA , J., FEH REN BAC H,

J., DEGOND, P., APP ERT-ROL LAN D, C ., DONIKIAN, S., AND

PET TR ´

E, J . 2012. Realistic Following Behaviors for Crowd

Simulation. Comp. Graph. Forum 31, 2, 489–498.

LER NE R, A ., CHRYSANTHOU, Y., SHA MIR , A., AN D COHE N-

OR, D. 2009. Data Driven Evaluation of Crowds. In Motion in

Games. 75–83.

LER NE R, A ., FITUSI, E., C HRYSANTHOU, Y. , AND COHE N-OR,

D. 2009. Fitting Behaviors to Pedestrian Simulations. In SCA

2009, 199–208.

MACQU EE N, J . 1967. Some methods for classiﬁcation and anal-

ysis of multivariate observations. In Berkeley Symp. on Math.

Statist. and Prob., 281–297.

MCDON NE LL , R., LARKIN, M. , DOBBYN, S., CO LL INS , S. ,

AN D O’SULLIVAN, C. 2008. Clone Attack! Perception of

Crowd Variety. ACM Trans. Graph. 27, 3, 26:1–26:8.

MOU SS A¨

ID , M., HE LBI NG , D., GARNIER, S., JOHANSSON, A .,

COMBE, M., AN D THE RAUL AZ , G. 2009. Experimental study

of the behavioural mechanisms underlying self-organization in

human crowds. Proc. Biol. Sci. 276, 1668, 2755–2762.

MUS SE , S. R ., CA SS OL , V. J., A ND JUN G, C. R. 2012. Towards

a Quantitative Approach for Comparing Crowds. Comp. Anim.

Virt. Worlds 23, 1, 49–57.

NARAIN, R., GO LA S, A., CURTIS, S. , AND LI N, M . C. 2009.

Aggregate Dynamics for Dense Crowd Simulation. ACM Trans.

Graph. 28, 5, 122:1–122:8.

NIE BL ES , J. C., WANG , H., AN D FEI-FE I, L. 2008. Unsupervised

Learning of Human Action Categories Using Spatial-Temporal

Words. Int. J. Comp. Vision 79, 3, 299–318.

OND ˇ

RE J, J., P E TTR ´

E, J., OLIVIER, A.-H., A ND DONIKIAN, S .

2010. A Synthetic-vision Based Steering Approach for Crowd

Simulation. ACM Trans. Graph. 29, 4, 123:1–123:9.

PARI S, S ., PE TTR ´

E, J., AN D DONIKIAN, S . 2007. Pedestrian Re-

active Navigation for Crowd Simulation: a Predictive Approach.

Comp. Graph. Forum 26, 3, 665–674.

PET TR ´

E, J., ON D ˇ

RE J, J., O LIVIER, A.-H., CRET UAL, A., A ND

DONIKIAN, S. 2009. Experiment-based Modeling, Simulation

and Validation of Interactions Between Virtual Walkers. In SCA

2009, ACM, 189–198.

SHI , J., AND MAL IK, J. 2000. Normalized Cuts and Image Seg-

mentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8, 888–905.

SIN GH , S. , KAPAD I A, M., FA LO UTS OS , P., AN D REINMAN, G.

2009. SteerBench: a benchmark suite for evaluating steering

behaviors. Comp. Anim. Virtual Worlds 20, 5-6, 533–548.

SIVIC, J., RUS SEL L, B. C. , EFRO S, A. A., ZISSERMAN, A. , AND

FREEMAN, W. T. 2005. Discovering object categories in image

collections. ICCV 2005.

SNO OK , G . 2000. Simpliﬁed 3d Movement and Pathﬁnding Using

Navigation Meshes. In Game Programming Gems, M. DeLoura,

Ed. Charles River Media, 288–304.

SUDDERTH, E. B., TORRALBA, A., FREEMAN, W. T., A ND

WIL LS KY, A. S . 2007. Describing Visual Scenes Using Trans-

formed Objects and Parts. Int J Comput Vis 77, 1-3, 291–330.

TEH , Y. W., JOR DA N, M . I., BE AL , M . J., AND BLE I, D. M.

2006. Hierarchical Dirichlet Processes. J. Am. Stat. Assoc. 101,

476, 1566–1581.

TEH , Y. W., KURIHARA, K ., AN D WEL LI NG, M. 2008. Collapsed

Variational Inference for HDP. In NIPS 2008.

TREUILLE, A., CO OP ER, S., A ND POP OVI ´

C, Z. 2006. Continuum

Crowds. ACM Trans. Graph. 25, 3, 1160–1168.

VAN DEN BER G, J ., LI N, M., A ND MANOCHA, D. 2008. Re-

ciprocal Velocity Obstacles for real-time multi-agent navigation.

In IEEE International Conference on Robotics and Automation,

2008. ICRA 2008, 1928–1935.

VAN DEN BER G, J., G U Y, S. J., L IN, M., AND MANOCHA, D.

2011. Reciprocal n-Body Collision Avoidance. In Robotics Re-

search, C. Pradalier, R. Siegwart, and G. Hirzinger, Eds., no. 70

in Springer Tracts in Advanced Robotics. Springer Berlin Hei-

delberg, 3–19. DOI: 10.1007/978-3-642-19457-3 1.

WANG, X., MA, X ., AN D GRIMSON, W. 2009. Unsupervised

Activity Perception in Crowded and Complicated Scenes Using

Hierarchical Bayesian Models. IEEE Trans. Patt. Anal. Machine

Intel. 31, 3, 539–555.

WANG, C., PAISL EY, J., AN D BLE I, D . M. 2011. Online vari-

ational inference for the hierarchical Dirichlet process. In AIS-

TATS.

WOLINSKI, D., GU Y, S. J., OLIVIER, A.-H., LIN , M. C. ,

MANOCHA, D., AND PET TR ´

E, J. 2014. Parameter estimation

and comparative evaluation of crowd simulations. Comp. Graph.

Forum 33, 2, 303–312.

ZHO NG , J. , CAI , W., LU O , L., AN D YIN, H. 2015. Learning

Behavior Patterns from Video: A Data-driven Framework for

Agent-based Crowd Modeling. In Autonomous Agents and Mul-

tiagent Systems, 801–809.

ZHO U, B., WANG, X. , AND TAN G, X . 2011. Random ﬁeld

topic model for semantic region analysis in crowded scenes from

tracklets. In IEEE CVPR 2011, 3441–3448.

ZHO U, B., WAN G, X., AND TANG , X . 2012. Understanding col-

lective crowd behaviors: Learning a mixture model of dynamic

pedestrian-agents. In Computer Vision and Pattern Recognition

(CVPR), 2012 IEEE Conference on, IEEE, 2871–2878.