DataPDF Available

# supplement

Authors:

## Figures

Content may be subject to copyright.
1
Supplementary Material of Trending Paths: A New
Semantic-level Metric for Comparing Simulated and
Real Crowd Data
He Wang, Jan Ondˇ
rej and Carol O’Sullivan
Terms Notation Meaning
Agent State w w={p, v}where p and
v are the position and ori-
entation of an agent
Data Segments d Trajectory and velocity
data of the crowd. d=
{wi}.
Path Pattern βA mixture of paths.
DP Atoms hli,cdj
DP Weight Parameters vk,gl,li,πdj
Dirichlet Parameter η
Beta Parameter a, b, ω,α
Component Indices j, i, l, k and their totals: J, I, L, K
TABLE I
TERMINOLOGY AND PARAMETERS
I. SV-DHDP MODEL
We ﬁrst brieﬂy review Dirichlet Processes (DPs) and De-
pendant Dirichlet Processes (DDPs). A DP can be seen as a
probabilistic distribution over distributions, which means any
draw from a DP is a probabilistic distribution itself. In a stick-
breaking representation [1] of DP: G=P
k=1 σk(v)βk, where
σk(v) = vkQk1
j=1 (1 vj),P
k=1 σk(v)=1and βkH(η).
βkare DP atoms drawn from some base distribution
H. In our problem, they are Multinomials drawn from a
Dirichlet(η). The σk(v)s are called stick proportions or DP
weights because it mimics breaking a stick iteratively in
the following way. Assuming the length of a stick is 1, in
each iteration, a proportion vkof what is left of the stick ,
Qk1
j=1 (1 vj), is broken away. vis Beta-distributed.
A DDP [2] generalizes the concept of DP by replacing its
weights and atoms with stochastic processes. In our context,
a DDP can be represented as: Gl=P
i=1 σi(v)Gli, where
everything is the same as the DP representation except that
the atoms are now {Gli}. Each Gliitself is a DP and all
{Gli}are draws from same base DP. Both DP and DDP are
ideal priors for modeling inﬁnite clusters.
With terminologies deﬁned in Table I and equiped with DP
and DDP, we are ready to fully deﬁne our model. In a standard
H. Wang is with the University of Leeds, Leeds, UK.
E-mail: h.e.wang@leeds.ac.uk. ORCID:orcid.org/0000-0002-2281-5679
J. Ondˇ
rej and C. O’Sullivan are with Trinity College Dublin, Dublin,
Ireland.
E-mail: Jan.Ondrej@scss.tcd.ie (ORCID:orcid.org/0000-0002-5409-1521),
Carol.OSullivan@scss.tcd.ie (corresponding author, ORCID:orcid.org/0000-
0003-3772-4961)
The work started when all the authors were with Disney Research, Los
Angeles, USA.
hierarchical Bayesian setting, a tree is constructed in attempt
to explaining the observations through a hierarchy of factors.
In our problem, the observations are agent states. We segment
them into equal-length data segments along time domain. Our
goal is to ﬁnd a set of path patterns {βk}that, when combined
with their respective weights, best describe all the segments
in terms of their likelihoods.
As shown in the toy example in the paper, a subset of
{βk}is needed to describe a data segment.{βk}are shared
across all segments. A two-layer tree is used to model this
phenomenon. The root node is {βk}governed by a global DP
prior. Each leaf node represents a data segment with a DP
drawn from the global DP prior to model its own pattern set
{βk}d⊂ {βk}. This is a standard two-layer HDP.
Further, imagine some data segments share a bigger subset
of {βk}, namely {βk}c, so that {βk}d⊂ {βk}c⊂ {βk}
and they form a segment cluster. Also, we have potentially
inﬁnitely many such clusters. We need a middle layer to
capture this effect. At this layer, there is a nested clustering.
First, each {βk}ccan contain inﬁnite elements. Second, the
number of clusters can be inﬁnitely big. This effect can be
captured by adding a DDP layer immediately below all {βk}
but higher than leaf nodes. After constructing such a tree
structure, we can compute {βk}by clustering the agent states
layer by layer up to the top.
Such a tree structure is shown in Figure 1. Each sharp-
cornered rectangle is a DP. Gon the right is the global DP
over {βk}. The bottom-left segment-level distribution, Gd, is
the local DP over {βk}d.Glis the DDP. The number of atoms
in Glis the number of segment clusters. Each atom Gliis
a DP governing {βk}cAll stick proportions sum to 1, s.t.
Pv= 1,Pg= 1,P= 1 and Pπ= 1.βk,hliand cdj
are DP atoms.
This model explains how the observations, w(shaded),
are generated from βkthrough a hierarchical factors between
wnand βk. This dependency is explained in Algorithm 1 in
the supplementary material. We explain, in Algorithm 1, the
dependency between the observed agent states and the latent
path patterns we are solving for.
II. VARI ATIO NA L INFERENCE FOR SVDHDP
In this section we give details of the variational inference
of our SVDHDP model. Variational Inference (VI) [3] ap-
proximates a target distribution by solving an optimization
problem. When the target distribution is intractable, VI uses
2
For the global DP, build G by:
Drawing an inﬁnite number of patterns, βkDirichlet(η) for k∈{1,2,3, . . . }
Drawing the stick proportions, σk(v)where vkBeta(1, ω) for k∈{1,2,3, . . . }
For each segment cluster l, build a Glby:
1) Drawing stick proportions, glBeta(1, b) for l∈{1,2,3, . . . }
2) Since the atoms of Glare DPs, build each atom Gliby:
a) Drawing an inﬁnite number of pattern indices, hliMultinomial(σ(v)) for i∈{1,2,3, . . . }
b) Drawing the stick proportions, σi(), where liBeta(1, a) for i∈{1,2,3, . . . }
c) For each data segment d, build Gdby:
i) Drawing an inﬁnite number of data segment cluster indices, el
dMultinomial(σ(g)) for l∈{1,2,3, . . . }
ii) Drawing an inﬁnite number of group pattern indices, cdjMultinomial(σ(l)) for j∈{1,2,3, . . . }
iii) Drawing the stick proportions, σj(π), where πdjBeta(1, α) for j∈{1,2,3, . . . }
iv) For each data sample w:
A) Draw a pattern assignment, zdnMultinomial(σ(π)).
B) Generate a data sample wnMultinomial(βhu), where u = ex,x=cyand y = zdn.
Algorithm 1: Data sample generation in SV-DHDP. Dirichlet, Beta and Multinomial stand for their eponymous distributions.
Fig. 1. SV-DHDP model. Sharp-corner rectangles (G,Gd,Gliand Gl) are
DPs in which squares are weights and circles are atoms. Rounded rectangles
are data samples or data segments. Hexgons are pattern assignments. N is the
number of agent states in segment D.
a family of tractable distributions (variational distributions) to
approximate the target distribution. By optimizing for the pa-
rameters of the variational distributions, the target distribution
can be approximated. The optimization is done by minimizing
the Kullback-Leibler (KL) divergence between the posterior
distribution and the variational distribution q(β, Ω), which
amounts to maximizing the evidence lower bound(ELBO), a
lower bound on the logarithm of the marginal probability of
the observations log p(w):
L(q) = Eq[log p(w, β , Ω)] Eq[log q(β, Ω)] (1)
The mean-ﬁeld family is the simplest for approximating the
posterior. It assumes each model parameter is only conditioned
on its own hyper-parameters:
q(β, Ω) =
K
Y
k=1
q(βk|λk)(
M
Y
m=1
R
Y
r=1
q(Ωmr|ξmr )) (2)
where λis the parameter governing the distribution of the
global parameter. Parameter ξmr governs the distribution of
the local parameter mr in the mth context (e.g., the mth
data segment or the mth cluster). Here, M and R do not have
speciﬁc meanings and are only for illustration purpose.
We then optimize Equation 1 for λand ξ. Since all the
distributions in SV-DHDP are from the exponential family, we
assume that q(Ω|ξ)and q(β|λ)are also from the exponential
family which has the general form:
p(β|w, , η) = h(β)exp{ρg(w, , η)Tt(β)ag(ρg(w, , η ))}
(3)
where scalar functions h() and a() are base measure and
log normalizer; the vector functions ρ() and t() are the
natural parameter and suf ficient statistics. For optimiz-
ing Equation 1 with respect to λ, we take the gradient:
OλL=O2
λag(λ)(Eq[ρg(w, , η)] λ)(4)
and we can set it to zero by setting:
λ=Eq[ρg(w, , η)λ](5)
The optimization for εis similar to Equation 5.
Since we are trying to optimize the parameters to mini-
mize the KL-divergence, it is more reasonable to compute
information geometry of its parameter space, using a Rieman-
3
[4], a natural gradient can be computed by pre-multiplying
the gradient by the inverse of the Riemannian metric G(ω)1:
ˆ
Oλf(λ),G(λ)1Ofλ(λ)(6)
where G(λ)1is the Fisher information matrix of q(λ). When
q(β|λ)is from the exponential family, G(λ) = O2
λag(λ)and
ˆ
OλL=E[ρg(w, , η)] λ. The natural gradient of Lwith
respect to ξis in a similar form, but only depending on its
local contexts.
B. Stochastic Optimization
Optimizing Equation 1 for λand ξby a traditional coor-
dinate ascent algorithm involves nested iteration loops. The
inner loop iterates on all data segments to update ξuntil it
converges and jumps out to the outer loop to make one update
on λ, then the iteration starts over again until λalso converges.
This is very slow especially when the number of data segments
is large, because before updating λfor one step, the inner
loop has to compute the gradient at every data segment in the
dataset.
To further speed up the training, we employ Stochastic
Optimization. Stochastic optimization uses noisy gradient es-
timates with a decreasing step size to discover good local
optima. Noisy gradients are usually cheaper to compute and
help avoid low quality local optima. With certain conditions
on the step size, it provably converges to an optimum [5].
Stochastic optimization uses a noisy gradient distribution B(λ)
so that Eq[B(λ)] = Oλf(λ). It allows us to update λ:
λ(t)= (1 ρt)λ(t1) +ρtbt(λ(t1) )(7)
where btis an independent draw from the noisy gradient B, t
is time step and the step size ρtsatisﬁes:
Xρt=;Xρ2
t<(8)
Speciﬁcally, we use:
ρt= (t+τ)κ(9)
where τdown-weights the early iterations and κ, the forgetting
rate, controls how much the new information is valued in each
iteration. From Equation 7, we can sample the gradient on one
We further extend Equation 7 to a mini batch version of
Equation 7. In each iteration, we sample Ddata segments and
compute Equation 7 for each of them, then average the results
as the ﬁnal update:
λ(t)= (1 ρt)λ(t1) +ρt
1
DX
d
bt
d(λ(t1))(10)
where bt
d() is the stochastic gradient computed from sample d
and Dis the sample number. Since the mini batch version is
highly parallelizable and gives better estimations of the gra-
dient, we thus further speed up the computation and improve
the results.
In practice, we cannot perform computations for an inﬁnite
number of path patterns. So a truncation number is given
at each level. This number is the maximum cluster number
modeled at its level. It is set bigger than needed so that only
a part of clusters are used in the clustering. The truncation
number for each layer is much smaller than the one above it
because we expect a much smaller number of path patterns
in a child node than its parent. We emphasize that this
is fundamentally different from giving a pre-deﬁned cluster
number and the model can still automatically compute the
desirable number of clusters.
Given, Ddata segments, each containing Nagent states,
we assume that the whole data set contains kpath patterns
where k < K. Data segments can be clustered into lclusters
where l < L, each of which contain ipath pattern indices
where i < I. Finally, in each data segment d, the agent states
can be clustered into jgroups where j < Jgroups. We give
the overall algorithm in Algorithm 2 and refer the readers to
the supplementary material for the function subroutines and
the mathematical deduction.
Algorithm 2: VI Optimization
1Initialize λ0, set o1= 1 and o2=ω,p1= 1, p2= b, q1=
1, q2= a; Set up step size ρt, set init t = 0;
2while not converged do
3sample a data segment wd;
4[ε0,µd,ζd,φd] = initLocal(wd,λ) (Algorithm 3);
5[µd,ζd,φd] = opLocal(wd,ε,ζd,φd,β, g)
(Algorithm 4);
6[,ζd,φd] = updateCluster(wd,ε,ζd,φd, v, β)
Algorithm 5);
7[λ(t),o1(t),o2(t)] = updateGlobal(wd,η,ε,ζd,φd,
ρt,λ(t1),o1(t1) ,o2(t1)) (Algorithm 6);
8t=t+1;
9update ρwith t (Equation 9);
10 end
C. Computational Details
Based on Figure 1 and the complete conditional in explained
in the paper Equation 2. However, Equation 2 is for the
purpose of explaining Variational Inference in the paper and
does not contain all the details. To do variational inference,
we condition our model parameters on their own hyper-
parameters. Here we expand it into:
q(β, Ω) =(
K
Y
k=1
q(βk|λk)q(vk|ωk))
(
L
Y
l=1
q(gl|pl)(
I
Y
i=1
q(li|qli)q(hli|εli)))
(
D
Y
d=1
q(ed|µd)
J
Y
j=1
q(cdj |ζdj )q(πdj |αdj )
N
Y
n=1
q(zdn|φdn ))
(11)
This is the complete variational distribution. From this, we
can deduce the complete conditional for every parameter.
A complete conditional is the distribution of a parameter
given all the other parameters. We also assume the conditional
distribution of parameters on their hyper-parameters are also
4
from the same exponential families. So q(z|φ),q(c|ζ)and
q(h|ε)are Multinomial distributions. q(π|α1, α2),q(|q1, q2),
q(g|p1, p2)and q(v|ω1, ω2)are Beta distributions. Finally,
q(β|λ)is Dirichlet distribution.
We abuse the notation a bit here. We convert our denotations
into vector indicators. For instance, we treat was a vector of
size(S). So if the nth agent state in dth data segment is v, it
can be represented by wv
dn= 1.zj
dn= 1 means the nth agent
state in the dth data segment is classiﬁed into the jth group
in this segment. Similarly, cli
dj= 1 means the jth group in the
dth data segment is assigned to the ith component in cluster l.
Finally, hk
li= 1 means the ith component in the lth cluster is
assigned to the kth pattern. So the complete conditionals for
Multinomial nodes are:
P(zj
dn= 1|πd, wdn, cd, ed, h, β)
exp{logσj(πd) +
L
X
l=1
el
d
I
X
i=1
cli
dj
K
X
k=1
hk
lilogβk,wdn}
(12)
P(cli
dj= 1|wd, zd, ed, h, , β)
exp{logσi(l) + el
d
N
X
n=1
zj
dn
K
X
k=1
hk
lilogβk,wdn}(13)
P(el
d= 1|g, wd, zd, cd, h, β)
exp{logσl(g) +
I
X
i=1
J
X
j=1
cli
dj
N
X
n=1
zj
dn
K
X
k=1
hk
lilogβk,wdn}
(14)
P(hk
li= 1|v, w, z , e, c, β)
exp{logσk(v) +
D
X
d=1
el
d
J
X
j=1
cli
dj
N
X
n=1
zj
dnlogβk,wdn}
(15)
Aside from the Multinomial nodes, we also have Beta
nodes:
P(vk|h, w) = Beta(1+
L
X
l=1
I
X
i=1
hk
li, ω+
L
X
l=1
I
X
i=1
X
m>k
hm
li)(16)
P(gl|b, e) = Beta(1 +
D
X
d=1
el
d, b +
D
X
d=1
X
m>l
el
d)(17)
P(li|a, c) = Beta(1+
D
X
d=1
J
X
j=1
cli
dj, a+
D
X
d=1
J
X
j=1
X
m>i
clm
dj)(18)
P(πdj|α, zd) = Beta(1 +
N
X
n=1
zj
dn,
N
X
n=1
X
m>j
zm
dn)(19)
Finally, the path patterns are Dirichlet distributions:
P(βk|w, z, c, e, h, η )
=Dirichlet(η+
D
X
d=1
L
X
l=1
el
d
I
X
i=1
hk
li
J
X
j=1
cli
dj
N
X
n=1
zj
dnwdn)
(20)
Given the complete conditionals, now we can compute the
hyper-parameters. We ﬁrst give the distributions of hyper-
parameters of the Multinomial distributions:
φj
dn=E[zj
dn]exp{logσj(πd)+
L
X
l=1
µl
d
I
X
i=1
ζli
dj
K
X
k=1
εk
liE[logβk,wdn]}(21)
ζli
dj=E[cli
dj] =
exp{logσi(l) + µl
d
N
X
n=1
φj
dn
K
X
k=1
εk
liE[logβk,wdn]}(22)
µl
d=E[el
d]exp{logσl(g)+
I
X
i=1
J
X
j=1
ζli
dj
N
X
n=1
φj
dn
K
X
k=1
εk
liE[logβk,wdn]}(23)
εk
li=E[hk
li]exp{logσk(v)+
D
X
d=1
µl
d
J
X
j=1
ζli
dj
N
X
n=1
φj
dn
E[logβk,wdn]}(24)
(25)
We also give the distributions of hyper-parameters of the
Beta distributions:
ok= (1 +
L
X
l=1
I
X
i=1
εk
li, ω +
L
X
l=1
I
X
i=1
X
m>k
εm
li)(26)
pl= (1 +
D
X
d=1
µl
d, b +
D
X
d=1
X
m>l
µl
d)(27)
qli= (1 +
D
X
d=1
J
X
j=1
ζli
dj, a +
D
X
d=1
J
X
j=1
X
m>i
ζlm
dj)(28)
γdj= (1 +
N
X
n=1
φj
dn, α +
N
X
n=1
X
m>j
φm
dn)(29)
Finally, for the sake of completeness, we give the equations
to calculate E[logσi(v)] and E[logβ]:
E[logvk] = Ψ(o1
k)Ψ(o1
k+o2
k)
E[log(1 vk)] = Ψ(o2
k)Ψ(o1
k+o2
k)
E[logσk(v)] = E[logvk] +
k1
X
l=1
E[log(1 vk)] (30)
E[logβkv ] = Ψ(λk v)Ψ(X
v0
λkv0)(31)
where Ψis digamma function.
5
Algorithm 3: initLocal
Data:wd
Result:ε,ζd,φd
1for l∈ {1, . . . , L}do
2for i∈ {1, . . . , I}do
3εk
liexp{PN
n=1 E[logβk,wdn ]},k∈ {1, . . . , K };
4end
5end
6for l∈ {1, . . . , L}do
7µl
dexp{PI
i=1 PK
k=1 εk
liPN
n=1 E[logβk,wdn ]};
8end
9for j∈ {1, . . . , J}do
10 for l∈ {1, . . . , L}do
11 ζli
djexp{µl
dPK
k=1 εk
liPN
n=1 E[logβk,wdn ]},
i∈ {1, . . . , I};
12 end
13 end
14 for n∈ {1, . . . , N}do
15 φj
dn
exp{PL
l=1 µl
dPI
i=1 ζli
djPK
k=1 εk
li
E[logβk,wdn ]},
j∈ {1, . . . , J };
16 end
Algorithm 4: opLocal
Data:wd,ε,ζd,φd,β, g
Result:ζd,φd,µd
1while µdnot converged do
2while γd,ζd,φdnot converged do
3for j∈ {1, . . . , J}do
4γ1
dj=1+PN
n=1 φj
dn;
5γ2
dj=α+PN
n=1 Pm>j φm
dn;
6ζli
djexp{E[logσl(g))] + E[logσi(l)] +
PN
n=1 φj
dnPK
k=1 εk
li
E[logβk,wdn],
l∈ {1, . . . , J },i∈ {1, . . . , I};
7end
8for n∈ {1, . . . , N}do
9φj
dnexp{E[logσj(πd)] +
PL
l=1 PI
i=1 ζli
djPK
k=1 εk
li
E[logβk,wdn]},
j∈ {1, . . . .J };
10 end
11 end
12 µl
dexp{E[logσl(g)] +
PI
i=1 PJ
j=1 ζli
djPN
n=1 φj
dnPK
k=1 εk
li
E[logβk,wdn]},
l∈ {1, . . . , L}
13 end
Algorithm 5: updateCluster
Data:wd,εd,ζd,φd, v, β
Result:ε, p, q
1Set initial step size ρt0
l, set initial t0= 0;
2while p not converged do
3Set initial step size ρt0
o
i, set initial t0
o= 0;
4while q, εnot converged do
5[ζd,φd] = opLocal(wd,εd,ζd,φd, p,
q)(Algorithm 4);
6for l∈ {1, . . . , L}do
7for i∈ {1, . . . , I}do
8ˆq1
li=1+DPJ
j=1 ζli
dj;
9ˆq2
li=a+DPJ
j=1 Pm>i ζlm
dj;
10 ˆεk
liexp{E[logσk(v)] +
l
dPJ
j=1 ζli
djPN
n=1 φj
dn
E[logβk,wdn]},
k∈ {1, . . . , K};
11 end
12 end
13 t0
o=t0
o+ 1;
14 update ρt0
o
lwith t0
o(Equation 9). for
l∈ {1, . . . , L}do
15 for i∈ {1, . . . , I}do
16 q(1,t0)
li=(1 ρt0
l)q(1,t01)
li+ρt0
o
lˆq1
li;
17 q(2,t0)
li=(1 ρt0
l)q(2,t01)
li+ρt0
o
lˆq2
li;
18 ε(k,t0)
li=(1 ρt0
l)ε(k,t01)
li+ρt0
o
lˆεk
li,
k∈ {1, . . . , K};
19 end
20 end
21 end
22 t0=t0+ 1;
23 update ρt0
lwith t0(Equation 9). for l∈ {1, . . . , L}do
24 ˆp1
l=1+DPI
i=1 E(li)PJ
j=1 ζli
dj;
25 ˆp2
l=b+DPI
i=1 E(li)PJ
j=1 Pm>l ζmi
dj;
26 end
27 for l∈ {1, . . . , L}do
28 p(1,t0)
l=(1 ρt0
l)p(1,t01)
l+ρt0
lˆp1
l;
29 p(2,t0)
l=(1 ρt0
l)p(2,t01)
l+ρt0
lˆp2
l;
30 end
31 end
A. Bi-directional Flows
Here, we show some data segments of the simulations
done for our Bi-directional ﬂow example. They are shown in
Figure 2.
B. Park
Trajectories and data segments of the park dataset is shown
in Figure 3.
C. Train Station
1) Patterns learned by SVDHDP: Figure 4 shows some
snapshots of the data segments of the train station dataset.
6
Algorithm 6: updateGlobal
Data:wd,η,ε,µd,ζd,φd,ρt,λ(t1),o1,(t1) ,o2,(t1)
Result:λ(t),o1,(t),o2,(t)
1for k∈ {1, . . . , K}do
2ˆ
λkv =η+
DPL
l=1 µl
dPI
i=1 εk
liPJ
j=1 ζli
djPN
n=1 φj
dnwv
dn;
3ˆo1
k=1+PL
l=1 PI
i=1 εk
li;
4ˆo2
k=ω+PL
l=1 PI
i=1 Pm>k εm
li;
5end
6λ(t)=(1 ρt)λ(t1) +ρtˆ
λ;
7o1,(t)=(1 ρt)o1,(t1) +ρtˆ
o1;
8o2,(t)=(1 ρt)o(2,t1) +ρtˆ
o2;
9return λ(t),o1,(t),o2,(t)
Fig. 2. Data segment samples from PARIS07, ONDREJ10, PETT09 and
MOU09
Some additional patterns for the train station dataset shown in
Figure 5.
2) Patterns learned by Gibbs Sampling: The top 32 patterns
learned by Gibbs Sampling for the train station dataset shown
in Figure 6 and Figure 7.
IV. SIMILARITY
A. Park Simulation
Here we show, in Figure 8, some data segments of the
four simulations we used in similarity computation in the park
example.
B. Train Station Simulation
Here we show, in Figure 9, some data segments of the
four simulations we used in similarity computation in the train
station example. Learned patterns can be found in the main
paper.
REFERENCES
[1] J. Sethuraman, “A Constructive Deﬁnition of Dirichlet Priors,” Statistica
Sinica, vol. 4, pp. 639–650, 1994.
Fig. 3. a: All trajectories. The red dots are cameras. The blue circles are
exts/entrances. b-d:data segments. All data segments span 5 seconds.
Fig. 4. Two data segments in train station dataset.
[2] S. MacEachern, “Dependent Nonparametric Processes,” in ASA Bayesian
Stat. Sci., 1999.
[3] C. Bishop, Pattern Recognition and Machine Learning. New York:
Springer, 2007.
[4] S.-I. Amari, “Natural Gradient Works Efﬁciently in Learning,” Neural
Comp., vol. 10, no. 2, pp. 251–276, 1998.
[5] H. Robbins and S. Monro, “A Stochastic Approximation Method,Ann.
Math. Statist., vol. 22, no. 3, pp. 400–407, 1951.
7
Fig. 5. Additional patterns learned by SVDHDP from train station dataset.
Fig. 6. Patterns learned Gibbs Sampling from train station dataset.
Fig. 7. Patterns learned Gibbs Sampling from train station dataset.
Fig. 8. Data segment samples for park simulation from PARIS07, ONDREJ10,
PETT09 and MOU09
Fig. 9. Data segment samples for train station simulation from PARIS07,
ONDREJ10, PETT09 and MOU09
ResearchGate has not been able to resolve any citations for this publication.
Article
Let $M(x)$ denote the expected value at level $x$ of the response to a certain experiment. $M(x)$ is assumed to be a monotone function of $x$ but is unknown to the experimenter, and it is desired to find the solution $x = \theta$ of the equation $M(x) = \alpha$, where $\alpha$ is a given constant. We give a method for making successive experiments at levels $x_1,x_2,\cdots$ in such a way that $x_n$ will tend to $\theta$ in probability.
Article
When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction, but the natural graadient does. Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the space of linear dynamical systems (for blind source deconvolution). The dynamical behaviour of natural gradient online learning is analyzed and is proved to be Fischer efficient, implying that it has assymptotically the same performance as the optimal batch estimation of parameters. This suggests that the plateau phenomenon, which appears in the backpropagation learning algorithm of multilayer perceptrons, might disappear or might not be so serious when the natural gradient is used. An adaptive method of updating the learning rate is proposed and analyzed.
a: All trajectories. The red dots are cameras. The blue circles are exts/entrances. b-d:data segments. All data segments span 5 seconds
• Fig
Fig. 3. a: All trajectories. The red dots are cameras. The blue circles are exts/entrances. b-d:data segments. All data segments span 5 seconds.