Content uploaded by Fangkai Yang
Author content
All content in this area was uploaded by Fangkai Yang on Aug 20, 2019
Content may be subject to copyright.
Who are my neighbors?
A perception model for selecting neighbors of
pedestrians in crowds
Fangkai Yang∗
KTH Royal Institute of Technology
Stockholm, Sweden
fangkai@kth.se
Himangshu Saikia
KTH Royal Institute of Technology
Stockholm, Sweden
saikia@kth.se
Christopher Peters
KTH Royal Institute of Technology
Stockholm, Sweden
chpeters@kth.se
ABSTRACT
Pedestrian trajectory prediction is a challenging problem. One of
the aspects that makes it so challenging is the fact that the future po-
sitions of an agent are not only determined by its previous positions,
but also by the interaction of the agent with its neighbors. Previous
methods, like Social Attention have considered the interactions with
all agents as neighbors. However, this ends up assigning high at-
tention weights to agents who are far away from the queried agent
and/or moving in the opposite direction, even though, such agents
might have little to no impact on the queried agent’s trajectory.
Furthermore, trajectory prediction of a queried agent involving all
agents in a large crowded scenario is not ecient. In this paper, we
propose a novel approach for selecting neighbors of an agent by
modeling its perception as a combination of a location and a loco-
motion model. We demonstrate the performance of our method by
comparing it with the existing state-of-the-art method on publicly
available data-sets. The results show that our neighbor selection
model overall improves the accuracy of trajectory prediction and
enables prediction in scenarios with large numbers of agents in
which other methods do not scale well.
KEYWORDS
perception, virtual agents, trajectory prediction, machine learning
1 INTRODUCTION
Pedestrians are able to perceive and act according to a variety of
information present in the environment. Such information includes
the surroundings, relative positions, velocities and perceived intent
of other pedestrians - some located quite far apart - and processing
all of this in a streaming fashion to successfully navigate their
path. Designing virtual agents to perform similar processing of
their surroundings and evaluate their path trajectories correctly is
of great importance in social robotics [
11
], abnormality detection
in crowds [
12
], urban planning for public safety [
2
] and realistic
simulation of virtual crowds [10] among others.
A seminal work on modeling the behavior of agents in a crowd
of multiple agents is the Social Forces model [
6
] where interactions
of an agent with its surrounding agents are modeled by means
of invisible (social) forces. Interacting Gaussian Processes (IGP) is
another method used to model the joint distribution of trajecto-
ries of all interacting agents in a crowd [
22
]. Following the recent
advancements in machine learning based methods, Recurrent Neu-
ral Networks (RNNs) became very popular at sequence prediction
∗This is the corresponding author
problems. Especially, RNNs equipped with Long Short-Term Mem-
ory (LSTM) cells performed much better than traditional RNNs
at remembering important features far back in time [
18
]. LSTM
networks [
1
] were used to predict human trajectories in crowds
with Social-LSTM. It is more general to learn useful features rather
then hand-coding such features. However, Social-LSTM only used
a local discretized neighborhood around an agent and ignored all
other agents outside this neighborhood. The Social Attention model
was introduced in [
24
], which improved the trajectory prediction
compared with the Social-LSTM model by considering a spatio-
temporal graph representation of the relationship between all pairs
of agents.
All of these past methods achieve varying degrees of accuracy
but suer from assumptions which detract from generality. Usu-
ally methods which do not learn from human trajectory informa-
tion suer from insucient information from hand-coded features.
Methods which do try to learn from human trajectories either only
consider very local regions or naïvely cover the entire space - which
in turn does not scale well.
We present a learning-based approach related to the Social Atten-
tion method which uses spatio-temporal representation between all
pairs of agents. An LSTM is then trained using this representation
to predict future trajectories of humans in real scenarios. Though
it performs better than the Social-LSTM method, considering all
agents as neighboring agents is not optimal as this does not scale
for large crowds. To this end, we propose a novel perception model
based on the human visual system [
19
,
20
] which combines a loca-
tion and a locomotion model to determine neighboring agents to a
queried one. Our approach helps in pruning out unimportant neigh-
boring agents, thereby, also making it scalable for larger datasets.
2 RELATED WORK
2.1 Pedestrian Trajectory Prediction
Social Forces was presented in [
6
] to capture the interactions of an
agent with its surrounding agents by means of invisible (social)
forces, attractive forces towards destinations and repulsive forces
from obstacles and other agents, which led to an ecient collision
avoidance strategy while navigation. However, this is only enough
to simulate very simple behavior and cannot take into account com-
plex interactions, especially further back in time. In [
21
,
22
], it was
argued that agents exhibit cooperative behavior with regards to col-
lision avoidance. The authors used Interacting Gaussian Processes
(IGP) to model the joint distribution of trajectories of all interacting
agents in the crowd. Again, this method also only accounted for
relative positioning, but not relative velocities or accelerations.
Following the recent advancements in learning based methods
using neural networks, automatic feature detection without having
to handcraft individual features became a huge success in many
diverse elds. For sequence prediction problems such as speech
recognition and synthesis, RNNs became very popular [
3
]. How-
ever, RNNs suer from the vanishing gradient problem [
14
] and
hence are dicult to train. LSTM cells were then introduced as
a specic building unit for RNNs which performed much better
than traditional RNNs [
18
]. A key feature of LSTMs is the ability to
learn from features observed long ago in the sequence. In [
1
], LSTM
networks were used to predict multiple correlated sequences corre-
sponding to human trajectories in crowds with an approach called
Social-LSTM. Using a neural network to learn useful features of
social interaction from real data is indeed more general, rather than
hand-coding such features like in Social Forces, or IGP. However,
Social-LSTM in its current form, and some of its derivatives, only
used a local discretized neighborhood around an agent and ignored
all other agents outside this neighborhood. The Social Attention
model was introduced in [
24
], which improved upon the Social-
LSTM model by considering a spatio-temporal graph representation
of the relationship between all pairs of agents in a crowd.
3 METHOD
When pedestrians walk in crowds, the trajectories are aected by
the motions of other pedestrians. Some work considers the inuence
to be local [
1
,
11
]. However, as shown in [
24
], not only positions,
but other features like velocity and acceleration also play important
roles in inuencing the queried pedestrian’s trajectory. Keeping
this in mind, we come up with a neighbor selection model which
not only considers the position of an agent, but also their speed,
forward orientation, and bearing angles with other agents.
3.1 The Social Attention Model
Vemula et al. [
24
] proposed the Social Attention model which used
Structural RNN (S-RNN) [
7
] to model both the spatial and temporal
dynamics of trajectories in crowds. The human-human interactions
are modeled using a soft attention model over all pedestrians in the
crowd. When predicting the future trajectory of a target pedestrian,
other pedestrians in the crowd who has a higher attention weight
should have a larger inuence on trajectory prediction. By com-
puting a soft attention over hidden states of spatial edges for each
agent, they trained an LSTM network to predict future trajectories.
Since they aimed to nd out which surrounding agents humans
attend to, they built spatial edges between all pairs of agents. This
however is expensive and (as we show later) does not scale for
large crowds. In some cases, their model assigned a high attention
weight to agents who are far away from the queried agent or to
agents almost static. Also, the bearing angle between agents did
not seem to inuence the attention weights in their model. In order
to address these issues, we propose a perception model based on
the human visual system to better select the important agents from
all surrounding agents rather than every pedestrian in the crowd.
In this paper, we use the same S-RNN architecture to train an LSTM
network as shown in [
24
], but we use our perception model (see
Section 3.2) to prune out the unimportant spatial edges.
3.2 Model Architecture
The overall architecture of our model consists of two parts. The
location model selects interesting neighbors out of all the agents
based on their proximity and bearing angles. The locomotion model
selects the agents with high risk of future collisions based on their
angular and tangential velocities.
3.2.1 Location Model. People perceive their surroundings with
a sense of vision and proximity. Therefore, we applied a unied
agent-sensing model proposed in [
17
]. As shown in Figure 1(a),
for an agent
Ai
it consists of an ellipse
Ei
and a sector
Si
. Unlike
using multiple vision cones to simulate human vision [
8
,
15
] which
results in blind spots near the cones’ intersection, the ellipse covers
blind spots and simulates the reduction of vision sensitivity as the
distance increases. The ellipse foci F1,F2are calculated below:
F1,F2=xi+fi(a−d±c)(1)
where
xi
is the position and
fi
is the forward direction of
Ai
,
d
is
the intimate distance within which the agent could sense neighbors
from behind,
a
is the semi-major axis of the ellipse and
c
the focal
distance. We consider the semi-minor axis
b=atan(π/
6
)
here and
hence cis given by c=√a2−b2∼0.817a.
Agents within the ellipse are marked as fully perceived. However,
for the agents outside of the ellipse, the probability of being per-
ceived varies based on their proximity and orientation with respect
to the queried agent. The perceived probability of the
L
ocation
Model is therefore modeled as:
pi
LM (xj;α,β)=
1,for Aj∈ Ei
cos π
2||xj−xi||
RSα·
cos π
2θ(Aj,Ai)
ΘSβ,for Aj∈ Si∧Aj<Ei
0,for Aj<Si
(2)
where
RS
is the sector radius,
ΘS
is the central angle of the sector,
||xj−xi||
and
θ(Aj,Ai)
are the distance term and the orientation
term between agent
Aj
and
Ai
respectively. The orientation term
θ(Aj,Ai)
or simply
θji
is given by the bearing angle between the
two agents given by
∠(xj−xi,fi)
.
α
and
β
are parameters which
control the inuence of the distance term and orientation term
respectively. An example of the Location Model is shown in Figure 2.
The proximity parameters in Equation 1 and Equation 2 were
chosen from Proxemics Theory [
5
]. Although the perception of
proximities is culturally determined, to simplify the model, we set
these parameters to be constants (
d=
0
.
15
m
,
a=
1
.
2
m
,
RS=
3
.
5
m
).
Also, human eyes have around 200
◦
vision angle as proposed in
[25], thus we set ΘS=200◦.
3.2.2 Locomotion Model. As stated in [
24
], sometimes agents
in the immediate vicinity of the queried agent and moving in the
same direction might not be as important as agents located far away
but moving towards the queried agent. In such cases, the location
model alone would fail to consider potential neighbors which may
have an inuence on the queried agent’s trajectory. [
13
] evaluated
the risk and dangerousness of future collisions by using the bearing
2
Si
Ei
Ai
F1
F2
2a
2b
(a)
θθθ
Û
θ<0Û
θ∼0Û
θ>0
(b)
Figure 1: (a) The Location Model. For an agent Ai, this con-
sists of an ellipse Eiand a sector Si(b) The eect of the bear-
ing angle velocity on risk of collision. Assume two agents
(represented by the green and blue circles), one moving ver-
tically up and the other horizontally towards the left. If the
bearing angle between the two reduces with time (i.e. Û
θ<
0
),
the blue agent will pass in front of the green agent and not
collide. If the bearing angle increases with time (i.e. Û
θ>
0
),
the green agent passes the blue agent and again there is no
collision. However, if Û
θ∼
0
, the two agents will probably col-
lide.
Figure 2: Example of Location Model. (Left) Three Location
models are calculated by using dierent αand β. (Right) The
probability along two horizontal lines (a)and (b)
angle velocity given by
Û
θi j =θ2
i j −θ1
i j
and the remaining time-to-
interaction
ti j =||xj−xi||/|vr
i j |
relative to the agent. Here
θ2
i j
and
θ1
i j
are bearing angles in the current frame and the previous one
respectively, and
vr
i j
is the relative tangential velocity which points
towards agent Ai(the queried agent).
Figure 3: Trajectories for 20 time steps in the ETH Hotel
dataset. The queried agent, whose trajectory is being pre-
dicted, is shown in red. The blue diamond marker repre-
sents the current positions of the various agents. The circu-
lar radii represents the weights from our neighbor selection
model. The current frame in the original video of the dataset
is superimposed in the background.
As shown in Figure 1(b), if
|Û
θi j |
is low, agent
Ai
and
Aj
have
a high risk of collision in the future [
13
]. Also, smaller
ti j
means
higher dangerousness of future collision. Thus, the inuence prob-
ability of the Locomotion Model is given as:
pi
C M (xj;γ)=exp−γt2
i j − (1−γ)| Û
θi j |2(3)
where γis a weighting parameter.
The nal combined model is thus represented based on the
agent’s location and locomotion by combining the corresponding
models pLM and pC M as follows :
Pi(xj;Θ)=λpi
LM (xj;α,β)+(1−λ)pi
C M (xj;γ)(4)
where
Θ=[α,β,γ,λ]
are weighting parameters for the dierent
terms. For the queried agent
Ai
, we only select agent
Aj
if
j,i
and
with
Pi(xj)
above a specied threshold
τ
. As shown in Figure 3,
our model assigns higher attention weights to those agents which
might be involved in future collisions with high dangerousness.
4 EVALUATION
4.1 Datasets and Metrics
We evaluated our model on three publicly available datasets: ETH
[
16
], UCY [
9
], and Pedestrian Walking Path (PWP) [
27
]. The ETH
and UCY datasets contain 5 crowd sets with a total of 1536 pedestri-
ans. The PWP dataset contains the labeled walking paths of 12684
pedestrians. As shown in [
24
], Social Attention performs better
than other methods such as LSTM and Social LSTM [
1
] on the ETH
and UCY datasets. Thus, we choose Social Attention as the base-
line to compare the performance with our model. However, unlike
Social Attention which modeled the inuence of all agents in the
crowd, we used our model to select only those neighboring agents
with potential inuence. Hence, we tested a new dataset, PWP, in
our work. Compared with the ETH and UCY datasets which have
roughly 10 agents per frame, PWP has roughly 100 agents per frame
which results in higher computational overhead if all agents in the
environment are considered. We preprocessed the datasets given
the homography matrices used in [
26
] for normalizing all datasets
to a perspective top-down view.
To compute the prediction error, we used two metrics: Average
Displacement Error (ADE) [
1
] and Final Displacement Error (FDE)
[
24
]. ADE calculates the mean squared error over all the estimated
3
points in a trajectory with the ground truth. FDE calculates the Eu-
clidean distance between the nal predicted position of a trajectory
with the ground truth.
4.2 Implementation
In the process of training and testing, we perform a two-level ap-
proach. We set ETH and UCY as low-density level, and PWP as
high-density level. Similar to [
24
], we used the same leave-one-out
method while training and validation on 4 sets from low-density
datasets, and test on the remaining set. This was repeated for all
the 5 sets in ETH and UCY. For validation, each set was divided
in a 4:1 ratio for training and validation. As for PWP, we divided
it into training (80 %) and testing (20 %) parts. To match with the
annotation frequency in low-density datasets (annotated every 0.4
seconds), we did an interpolation for the PWP dataset. We also set
the same time-steps for observed trajectory (
Tobs =
8time-steps,
3.2 seconds) and predicted trajectory (
Tpr e d =
12 time-steps, 4.8
seconds). The dimension of hidden states of temporal edges was set
to 128 and the spatial edges to 256. The embedding layers embed-
ded the input into a 64 dimensional vector with ReLU nonlinearity.
The model was trained on a single GTX-1070 GPU on a personal
computer with 16GB RAM.
The weighting parameter
Θ=[α,β,γ,λ]
used in this paper was
set as
[2,2,0.5,0.4]
. To nd a good threshold
τ
, we tested values
varying from 0.1 to 0.7 with 0.1 time-step in ETH-Hotel dataset.
The Final Displacement Error was used to select the best threshold
among all these values. The FDE was observed to be small near
τ=
0
.
2, which was the value chosen for the entire training process.
5 RESULTS
5.1 Quantitative Results
The prediction errors of two models are shown in Figure 4. Our
model performed better on the most low-density datasets except
for UCY Zara 1. It might be the case that our model over pruned
the neighbors which actually exerted some inuence. On datasets
which have pedestrians standing still or walking cross, like ETH-
Hotel and UCY Zara 2, our model performed much better than the
Social Attention model since it assigned a higher attention to static
agents and distant agents than those agents walking by.
A high-density level dataset (PWP) tested the scalability of our
model. On this dataset, the Social Attention model ran out of mem-
ory in the training stage. Because it tried to build spatial edges
between each pair of agents in this environments which contains
almost 100 agents. However, our model enabled the process to scale
to large crowds while conserving important interactions.
5.2 Qualitative Results
Figure 5 shows an exemplar scenario where the Social Attention
model did not perform optimally, but with the neighbor selection
model, it could successfully predict the trajectories. The predicted
trajectories from Social Attention model diverge further than the
ones predicted using our model, and falsely predict the trajectories
of static agents (purple trajectory and dots in Figure 5(a)). From
the weights in these two methods, we can see that Social Attention
model assigned a high attention to agent who moves backwards
related to the queried agent (green trajectory and dots in Figure 5(a)),
(a) Average Displacement Error
(b) Final Displacement Error
Figure 4: The average and nal displacement errors (in me-
ters) on several datasets for the Social Attention method
and ours. The Social Attention method runs out of memory
in the training stage on the PWP dataset. As can be seen,
our method performs better than Social Attention on most
datasets and even scales to larger datasets.
and the agent who stands still (purple one in Figure 5(a)). However,
these agents should not exert much inuence on the queried agent.
In Figure 6, we list three representative cases (Figure 6(a), (c) and
(e)) where the Social Attention model did not perform optimally.
Figure 6(a) and Figure 6(c) show that the Social Attention model
assigned a high attention to the agent who is far behind, but a
very low attention to the agent close by. Figure 6(e) shows that
the Social Attention model assigned a similar attention weights
to both close-by and far away neighbors who could hardly exerts
any inuence. Figures 6(b), (d) and (f) show the weights from our
model which assigned a relative high attention to the pedestrian
agents who could exert important inuence to the trajectory to be
predicted in the exact same scenarios.
6 DISCUSSION
We observe that our model performs better than the Social Atten-
tion model, which in turn outperforms the-state-of-art methods,
e.g. Social-LSTM, as shown in [
24
]. This leads us to believe that
our model performs better than the current state-of-art. Since our
model is a neighbor selection model, it could be integrated with
4
(a) The Social Attention model. (b) Our model.
Figure 5: An example illustrating the dierence between the Social Attention model and our model. (1) Prediction accuracy :
The solid dots represent the ground truth positions and the ‘+-’ markers represent the predicted positions. As can be clearly
observed, our method predicts future positions more accurately and there is lesser deviation between the true position and
the predicted positions. (2) Neighbor importance : The queried agent, whose neighbors are being estimated for importance, is
shown in red. The blue diamond marker represents the current positions of the various agents. The circular radii represents
their attention weights. The Social Attention model assigns high attention weights to agents who are far away from the queried
agent and/or moving in the opposite direction. Our model successfully prunes such agents and assigns high weights to only
those agents who are likely to inuence the queried agent’s trajectory.
(a) (b)
(c) (d)
(e) (f)
Figure 6: The weights in the Social Attention model (a), (c), (e), and our model (b), (d), (f). The solid dots represent the ground
true positions. The queried agent, whose trajectory is being predicted, is shown in red. The blue diamond marker represents
the current positions of the various agents. The circular radii represents the attention weights.
5
other methods which select neighbors to predict trajectories and
test performance. For example, replacing the grid based neighbor
selection model in Social-LSTM [
1
] in order to select neighbors with
potentially higher aecting trajectory predictions. Moreover, our
method could be extended beyond trajectory prediction methods.
For example, in crowd simulation, the-state-of-art methods (e.g.,
ORCA [
23
], Social Force model [
6
]) consider neighbors (normally
within a circle centered in the queried agent) in order to avoid
collisions. Our method could be integrated in order to better select
neighbors and assign attention weights based on perception. For
simplication, our model is homogeneous which assumes all indi-
viduals have the same ability to perceive neighbors. One possible
solution is to learn the personality of an agent based on their tra-
jectory [
4
], which in turn, gifts heterogeneous perceptual abilities.
Also, the weighting parameters (as shown in Section 4.2) could
potentially be tuned to give a better performance. It is interesting
to train optimal weighting parameters based on better trajectory
prediction feedback.
7 CONCLUSIONS
In this paper, we presented a novel method for selecting neighbors
of a pedestrian by modeling its vision and perception. It consists
of the Location Model and the Locomotion Model which accounted
for both relative positions and velocities. The model was used to
prune out unimportant agents and shown to perform better than
Social Attention model which is the current state-of-the-art method
using LSTM for trajectory prediction. We show that our model
performs better on most low-density datasets and also scales to
larger datasets. As discussed in Section 6, a direction of future
work will be to extend our model further to methods requiring
neighbor selection, for example, trajectory prediction and crowd
simulation, and compare performance. Another future work will
be to train general optimal weighting parameters and incorporate
personalities of pedestrians based on gait and other features, to
enable better prediction accuracy.
REFERENCES
[1]
A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese.
2016. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 961–971.
https://doi.org/10.1109/CVPR.2016.110
[2]
Michael Batty, Jake Desyllas, and Elspeth Duxbury. 2003. Safety in Numbers?
Modelling Crowds and Designing Control for the Notting Hill Carnival. Urban
Studies 40, 8 (2003), 1573–1590. https://doi.org/10.1080/0042098032000094432
arXiv:https://doi.org/10.1080/0042098032000094432
[3]
Alex Graves, Abdel-rahman Mohamed, and Georey Hinton. 2013. Speech
recognition with deep recurrent neural networks. In Acoustics, speech and signal
processing (icassp), 2013 ieee international conference on. IEEE, 6645–6649.
[4]
Stephen J Guy, Sujeong Kim, Ming C Lin, and Dinesh Manocha. 2011. Simulating
heterogeneous crowd behaviors using personality trait theory. In Proceedings of
the 2011 ACM SIGGRAPH/Eurographics symposium on computer animation. ACM,
43–52.
[5] Edward Twitchell Hall. 1966. The Hidden Dimension.
[6]
Dirk Helbing and Peter Molnar. 1995. Social force model for pedestrian dynamics.
Physical review E 51, 5 (1995), 4282.
[7]
A. Jain, A. R. Zamir, S. Savarese, and A. Saxena. 2016. Structural-RNN: Deep
Learning on Spatio-Temporal Graphs. In 2016 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR). 5308–5317. https://doi.org/10.1109/CVPR.2016.
573
[8]
Tom Leonard. 2003. Building an AI Sensory System: Examining the Design of
Thief: The Dark Project. Game Development Conference (GDC 2003) (2003).
[9]
Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. 2007.
Crowds by Example. Computer Graphics Forum 26, 3 (2007),
655–664. https://doi.org/10.1111/j.1467-8659.2007.01089.x
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2007.01089.x
[10]
C. Loscos, D. Marchal, and A. Meyer. 2003. Intuitive Crowd Behavior in Dense
Urban Environments using Local Laws. In Proceedings of Theory and Practice of
Computer Graphics, 2003. 122–129. https://doi.org/10.1109/TPCG.2003.1206939
[11]
Matthias Luber, Johannes A Stork, Gian Diego Tipaldi, and Kai O Arras. 2010.
People tracking with human motion predictions from social forces. In Robotics
and Automation (ICRA), 2010 IEEE International Conference on. IEEE, 464–469.
[12]
Ramin Mehran, Alexis Oyama, and Mubarak Shah. 2009. Abnormal crowd
behavior detection using social force model. In Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 935–942.
[13]
Jan Ondřej, Julien Pettré, Anne-Hélène Olivier, and Stéphane Donikian. 2010. A
synthetic-vision based steering approach for crowd simulation. ACM Transactions
on Graphics 29, 4 (2010), 1. https://doi.org/10.1145/1778765.1778860
[14]
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the diculty
of training recurrent neural networks. In International Conference on Machine
Learning. 1310–1318.
[15]
Claudio Pedica and Hannes Högni Vilhjálmsson. 2010. Spontaneous avatar
behavior for human territoriality. Applied Articial Intelligence 24, 6 (2010),
575–593.
[16]
Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. 2009. You’ll
never walk alone: Modeling social behavior for multi-target tracking. In Computer
Vision, 2009 IEEE 12th International Conference on. IEEE, 261–268.
[17]
Steve Rabin and Michael Delp. 2008. Designing a Realistic and Unied Agent-
Sensing Model. Game Programming Gems 7 (2008), 217–228.
[18]
Haşim Sak, Andrew Senior, and Françoise Beaufays. 2014. Long short-term mem-
ory recurrent neural network architectures for large scale acoustic modeling. In
Fifteenth annual conference of the international speech communication association.
[19]
Jill Sardegna. 2002. The encyclopedia of blindness and vision impairment. Infobase
Publishing.
[20]
Hans Strasburger, Ingo Rentschler, and Martin Jüttner. 2011. Peripheral vision
and pattern recognition: A review. Journal of vision 11, 5 (2011), 13–13.
[21]
Peter Trautman and Andreas Krause. 2010. Unfreezing the robot: Navigation in
dense, interacting crowds. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ
International Conference on. IEEE, 797–803.
[22]
Peter Trautman, Jeremy Ma, Richard M Murray, and AndreasKrause. 2013. Robot
navigation in dense human crowds: the case for cooperation. In Robotics and
Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2153–2160.
[23]
Jur Van Den Berg, Stephen J Guy, Ming Lin, and Dinesh Manocha. 2011. Reciprocal
n-body collision avoidance. In Robotics research. Springer, 3–19.
[24]
Anirudh Vemula, Katharina Mülling, and Jean Oh. 2017. Social Attention: Model-
ing Attention in Human Crowds. CoRR abs/1710.04689 (2017). arXiv:1710.04689
http://arxiv.org/abs/1710.04689
[25] Brian A Wandell. 1995. Foundations of vision. Sinauer Associates.
[26]
Kota Yamaguchi, Alexander C Berg, Luis E Ortiz, and Tamara L Berg. 2011.
Who are you with and where are you going?. In Computer Vision and Pattern
Recognition (CVPR), 2011 IEEE Conference on. IEEE, 1345–1352.
[27]
Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2015. Understanding pedestrian
behaviors from stationary crowd groups. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. 3488–3496.
6