Chapter

Human Trajectory Prediction via Neural Social Physics

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored. The former include rule-based, geometric or optimization-based models, and the latter are mainly comprised of deep learning approaches. In this paper, we propose a new method combining both methodologies based on a new Neural Differential Equation model. Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters. The explicit physics model serves as a strong inductive bias in modeling pedestrian behaviors, while the rest of the network provides a strong data-fitting capability in terms of system parameter estimation and dynamics stochasticity modeling. We compare NSP with 15 recent deep learning methods on 6 datasets and improve the state-of-the-art performance by 5.56%–70%. Besides, we show that NSP has better generalizability in predicting plausible trajectories in drastically different scenarios where the density is 2–5 times as high as the testing data. Finally, we show that the physics model in NSP can provide plausible explanations for pedestrian behaviors, as opposed to black-box deep learning. Code is available: https://github.com/realcrane/Human-Trajectory-Prediction-via-Neural-Social-Physics.KeywordsHuman trajectory predictionNeural differential equations

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Most simulation systems focus on vehicle flows and exclude the role of pedestrians. More recent work includes the modeling of vehicle-pedestrian interaction such as Social Force Model [4], which has been adopted for the prediction of pedestrian motions [15,33]. ...
... Recurrent Neural Network or Transformer) which autoregressively generates future predictions based on past observations [3,12]. Some models take a generative approach and predict the embeddings of future trajectories from latent distributions to account for varying data patterns or noise using Generative Adversarial Networks (GANs) [14,19], Generative Adversarial Imitation Learning (GAIL) [6,8] or Conditional Variational Auto-Encoders (CVAEs) [5,23,30,33]. Specially designed modules are introduced when modeling interactions in multi-agent scenarios by pooling [3,13], attention operation [12,29], or Graph Neural Networks [20,23,30]. Many architectures choose to incorporate auxiliary supervision using coarse way-points [9,22] or final destinations [17,31] of agent trajectories to boost model performances. ...
Preprint
Full-text available
We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling from the spatial and temporal generative distributions, then refined using state-of-the-art trajectory forecasting models. The simulation can run either autonomously, or under explicit human control conditioned on the generative distributions. We present the experiments for a variety of model configurations. Under an iterative prediction scheme, the way-point-supervised TrajNet++ model obtained 0.36 Final Displacement Error (FDE) in 20 FPS on an NVIDIA A100 GPU.
... Recently, multiple approaches integrating SFM into neural network schemes were proposed. For example, Yue et al. [198] integrated SFM and a deep neural network in their Neural Social Physics model with learnable parameters. Gil and Sanfeliu [199] presented Social Force Generative Adversarial Network (SoFGAN) that uses a GAN and SFM to generate different plausible people trajectories reducing collisions in a scene. ...
Article
Full-text available
Navigation lies at the core of social robotics, enabling robots to navigate and interact seamlessly in human environments. The primary focus of human-aware robot navigation is minimizing discomfort among surrounding humans. Our review explores user studies, examining factors that cause human discomfort, to perform the grounding of social robot navigation requirements and to form a taxonomy of elementary necessities that should be implemented by comprehensive algorithms. This survey also discusses human-aware navigation from an algorithmic perspective, reviewing the perception and motion planning methods integral to social navigation. Additionally, the review investigates different types of studies and tools facilitating the evaluation of social robot navigation approaches, namely datasets, simulators, and benchmarks. Our survey also identifies the main challenges of human-aware navigation, highlighting the essential future work perspectives. This work stands out from other review papers, as it not only investigates the variety of methods for implementing human awareness in robot control systems but also classifies the approaches according to the grounded requirements regarded in their objectives.
Article
Full-text available
Pedestrian trajectory prediction is widely used in various applications, such as intelligent transportation systems, autonomous driving, and social robotics. Precisely forecasting surrounding pedestrians’ future trajectories can assist intelligent agents in achieving better motion planning. Currently, deep learning-based trajectory prediction methods have demonstrated superior prediction performance to traditional approaches by learning from trajectory data. However, these methods still face many challenges in improving prediction accuracy, efficiency, and reliability. In this survey, we research the main challenges in deep learning-based pedestrian trajectory prediction methods and study this problem and its solutions through literature collection and analysis. Specifically, we first investigate and analyze the existing literature and surveys on pedestrian trajectory prediction. On this basis, we summarize several main challenges faced by deep learning-based pedestrian trajectory prediction, including motion uncertainty, interaction modeling, scene understanding, data-related issues, and the interpretability of prediction models. We then summarize solutions for each challenge. Subsequently, we introduce mainstream trajectory prediction datasets and analyze the state-of-the-art (SOTA) results reported on them. Finally, we discuss potential research prospects in trajectory prediction, aiming to promote the trajectory prediction community.
Article
Full-text available
The integration of pedestrian movement analysis with Unmanned Aerial Vehicle (UAV)-based remote sensing enables comprehensive monitoring and a deeper understanding of human dynamics within urban environments, thereby facilitating the optimization of urban planning and public safety strategies. However, human behavior inherently involves uncertainty, particularly in the prediction of pedestrian trajectories. A major challenge lies in modeling the multimodal nature of these trajectories, including varying paths and targets. Current methods often lack a theoretical framework capable of fully addressing the multimodal uncertainty inherent in trajectory predictions. To tackle this, we propose a novel approach that models uncertainty from two distinct perspectives: (1) the behavioral factor, which reflects historical motion patterns of pedestrians, and (2) the stochastic factor, which accounts for the inherent randomness in future trajectories. To this end, we introduce a global framework named LSN-GTDA, which consists of a pair of symmetrical U-Net networks. This framework symmetrically distributes the semantic segmentation and trajectory prediction modules, enhancing the overall functionality of the network. Additionally, we propose a novel thermal diffusion process, based on signal and system theory, which manages uncertainty by utilizing the full response and providing interpretability to the network. Experimental results demonstrate that the LSN-GTDA method outperforms state-of-the-art approaches on benchmark datasets such as SDD and ETH-UCY, validating its effectiveness in addressing the multimodal uncertainty of pedestrian trajectory prediction.
Conference Paper
Traffic simulation is needed for planning safe routes of self-driving cars and in analyzing traffic situations of a given area. Commonly supervised learning methods of vehicle, bicycle, and pedestrian traffic models have several limitations such as drifting errors and weak generalization to novel scenarios. Reinforcement learning can address these issues but it is much slower to converge due to the large state and action spaces involved in real-world traffic. To overcome this challenge, a hybrid methodology that combines supervised learning for short-term agents kinematics and reinforcement learning for long-term trajectory planning is developed in this work, and then tested on two distinct heterogeneous traffic datasets: 1) InD dataset of intersections traffic and 2) UniD dataset of shared space traffic. The results showcase the effectiveness of this method as it outperforms the supervised learning baseline models in terms of lower average displacement errors, increased success rate, and higher survival time for simulated agents. Additionally, the generalization of this approach was demonstrated by testing it on both regular intersection traffic and shared space traffic. This approach combines the benefits of supervised learning’s ability to learn complicated systems like vehicle kinematics and reinforcement learning’s potential for long-term planning in real-world traffic situations (code and video demonstrations: https://github.com/engyasin/SLRL).
Article
Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in closed-loop. We model each human in the crowd to be following an Optimal Reciprocal Collision Avoidance (ORCA) scheme and embed that model as a constraint in the robot's local planner, resulting in a bilevel nonlinear MPC optimization problem. We use a KKT-reformulation to cast the bilevel problem as a single level and use a nonlinear solver to optimize. Our MPC method can influence pedestrian motion while explicitly satisfying safety constraints in a single-robot multi-human environment. We analyze the performance of SICNav in two simulation environments and indoor experiments with a real robot to demonstrate safe robot motion that can influence the surrounding humans. We also validate the trajectory forecasting performance of ORCA on a human trajectory dataset. Code: https://github.com/sepsamavi/safe-interactive-crowdnav.git .
Article
We present a new large dataset of indoor human and robot navigation and interaction, called THÖR-MAGNI, that is designed to facilitate research on social human navigation: for example, modeling and predicting human motion, analyzing goal-oriented interactions between humans and robots, and investigating visual attention in a social interaction context. THÖR-MAGNI was created to fill a gap in available datasets for human motion analysis and HRI. This gap is characterized by a lack of comprehensive inclusion of exogenous factors and essential target agent cues, which hinders the development of robust models capable of capturing the relationship between contextual cues and human behavior in different scenarios. Unlike existing datasets, THÖR-MAGNI includes a broader set of contextual features and offers multiple scenario variations to facilitate factor isolation. The dataset includes many social human–human and human–robot interaction scenarios, rich context annotations, and multi-modal data, such as walking trajectories, gaze-tracking data, and lidar and camera streams recorded from a mobile robot. We also provide a set of tools for visualization and processing of the recorded data. THÖR-MAGNI is, to the best of our knowledge, unique in the amount and diversity of sensor data collected in a contextualized and socially dynamic environment, capturing natural human–robot interactions.
Chapter
Full-text available
In previous chapters, we have explored advanced plenoptic imaging and reconstruction techniques, enabling images and videos to reach gigapixel-level resolution. This breakthrough unlocks new possibilities for a wide range of applications and industries. However, traditional computer vision methods, tailored for megapixel-level data, are ill-equipped to handle the complexities of gigapixel-level data, which often feature large-scale scenes with hundreds of objects and intricate interactions. As a result, these methods face significant limitations in both precision and efficiency.
Article
Pedestrian trajectory prediction in a first-person view has recently attracted much attention due to its importance in autonomous driving. Recent work utilizes pedestrian character information, i.e., action and appearance, to improve the learned trajectory embedding and achieves state-of-the-art performance. However, it neglects the invalid and negative pedestrian character information, which is harmful to trajectory representation and thus leads to performance degradation. To address this issue, we present a two-stream sparse-character-based network (TSNet) for pedestrian trajectory prediction. Specifically, TSNet learns the negative-removed characters in the sparse character representation stream to improve the trajectory embedding obtained in the trajectory representation stream. Moreover, to model the negative-removed characters, we propose a novel sparse character graph, including the sparse category and sparse temporal character graphs, to learn the different effects of various characters in category and temporal dimensions, respectively. Extensive experiments on two first-person view datasets, PIE and JAAD, show that our method outperforms existing state-of-theart methods. In addition, ablation studies demonstrate different effects of various characters and prove that TSNet outperforms approaches without eliminating negative characters
Article
Recently, heatmap-oriented approaches have demonstrated their state-of-the-art performance in pedestrian trajectory prediction by exploiting scene information from input images before running the encoder. To align the image and trajectory information, existing methods centre the scene images to agents' last observed locations or convert trajectory sequences into images. Such alignment processes cause repetitive executions of the scene encoder for each pedestrian in an input image while there are often many pedestrians in an image, thus leading to significant memory consumption. In this paper, we address this problem by fully decoupling scene and trajectory feature extractions so that the scene information is only encoded once for an input image regardless of the number of pedestrians in the image. To do this, we directly extract temporal information from trajectories in a global pixel coordinate system. Then, we propose a transformer-based heatmap decoder to model the complex interaction between high-level trajectory and image features via trajectory self-attention, trajectory-to-image cross-attention and image-to-trajectory cross-attention layers. We also introduce scene counterfactual learning to alleviate the over-focusing on the trajectory features and knowledge transfer from Segment Anything Model to simplify the training. Our experiments show that our framework shows highly competitive performance on multiple benchmarks, demonstrating scene-compliant predictions on complex terrains and much less memory consumption when handling multi-pedestrians. Code is publicly available at https://github.com/HRHLALALA/Decouple-Traj .
Article
Full-text available
In this paper, we assess the state of the art in pedestrian trajectory prediction within the context of generating single trajectories, a critical aspect aligning with the requirements in autonomous systems. The evaluation is conducted on the widely-used ETH/UCY dataset where the Average Displacement Error (ADE) and the Final Displacement Error (FDE) are reported. Alongside this, we perform an ablation study to investigate the impact of the observed motion history on prediction performance. To evaluate the scalability of each approach when confronted with varying amounts of agents, the inference time of each model is measured. Following a quantitative analysis, the resulting predictions are compared in a qualitative manner, giving insight into the strengths and weaknesses of current approaches. The results demonstrate that although a constant velocity model (CVM) provides a good approximation of the overall dynamics in the majority of cases, additional features need to be incorporated to reflect common pedestrian behavior observed. Therefore, this study presents a data-driven analysis with the intent to guide the future development of pedestrian trajectory prediction algorithms.
Article
Forecasting the motion of others in shared spaces is a key for intelligent agents to operate safely and smoothly. We present an approach for probabilistic prediction of pedestrian motion incorporating various context cues. Our approach is based on goal-oriented prediction, yielding interpretable results for the predicted pedestrian intention, even without the prior knowledge of goal positions. By using Markov chains, the resulting probability distribution is deterministic—a beneficial property for motion planning or risk assessment in automated and assisted driving. Our approach outperforms a physics-based approach and improves over state-of-the-art approaches by reducing standard deviations of prediction errors and improving robustness against realistic, noisy measurements.
Chapter
In the development of autonomous driving systems, pedestrian trajectory prediction plays a crucial role. Existing models still face some challenges in capturing the accuracy of complex pedestrian actions in different environments and in handling large-scale data and real-time prediction efficiency. To address this, we have designed a novel Complex Gated Recurrent Unit (CGRU) model, cleverly combining the spatial expressiveness of complex numbers with the efficiency of Gated Recurrent Unit networks to establish a lightweight model. Moreover, we have incorporated a social force model to further develop a Social Complex Gated Recurrent Unit (S-CGRU) model specifically for predicting pedestrian trajectories. To improve computational efficiency, we conducted an in-depth study of the pedestrian’s attention field of view in different environments to optimize the amount of information processed and increase training efficiency. Experimental verification on six public datasets confirms that S-CGRU model significantly outperforms other baseline models not only in prediction accuracy but also in computational efficiency, validating the practical value of our model in pedestrian trajectory prediction.
Chapter
Pedestrian trajectory prediction is a fundamental task in applications such as autonomous driving, robot navigation, and advanced video surveillance. Since human motion behavior is inherently unpredictable, resembling a process of decision-making and intrinsic motivation, it naturally exhibits multimodality and uncertainty. Therefore, predicting multi-modal future trajectories in a reasonable manner poses challenges. The goal of multi-modal pedestrian trajectory prediction is to forecast multiple socially plausible future motion paths based on the historical motion paths of agents. In this paper, we propose a multi-modal pedestrian trajectory prediction method based on conditional variational auto-encoder. Specifically, the core of the proposed model is a conditional variational auto-encoder architecture that learns the distribution of future trajectories of agents by leveraging random latent variables conditioned on observed past trajectories. The encoder models the channel and temporal dimensions of historical agent trajectories sequentially, incorporating channel attention and self-attention to dynamically extract spatio-temporal features of observed past trajectories. The decoder is bidirectional, first estimating the future trajectory endpoints of the agents and then using the estimated trajectory endpoints as the starting position for the backward decoder to predict future trajectories from both directions, reducing cumulative errors over longer prediction ranges. The proposed model is evaluated on the widely used ETH/UCY pedestrian trajectory prediction benchmark and achieves state-of-the-art performance.
Article
This article proposes a means of autonomous mobile robot navigation in dense crowds based on predicting pedestrians’ future trajectories. The method includes a pedestrian trajectory prediction for a running mobile robot and spatiotemporal path planning for when the path crosses with pedestrians. The predicted trajectories are converted into a time series of cost maps, and the robot achieves smooth navigation without dodging to the right or left in crowds; the path planner does not require a long-term prediction. The results of an evaluation implementing this method in a real robot in a science museum show that the trajectory prediction works. Moreover, the proposed planning’s arrival times is 26.4% faster than conventional 2D path planning’s arrival time in a simulation of navigation in a crowd of 50 people.
Conference Paper
Full-text available
Differentiable physics modeling combines physics models with gradient-based learning to provide model explicability and data efficiency. It has been used to learn dynamics, solve inverse problems and facilitate design, and is at its inception of impact. Current successes have concentrated on general physics models such as rigid bodies, deformable sheets, etc, assuming relatively simple structures and forces. Their granularity is intrinsically coarse and therefore incapable of modelling complex physical phenomena. Fine-grained models are still to be developed to incorporate sophisticated material structures and force interactions with gradient-based learning. Following this motivation, we propose a new dif-ferentiable fabrics model for composite materials such as cloths, where we dive into the granularity of yarns and model individual yarn physics and yarn-to-yarn interactions. To this end, we propose several differentiable forces, whose counterparts in empirical physics are indifferentiable, to facilitate gradient-based learning. These forces, albeit applied to cloths, are ubiquitous in various physical systems. Through comprehensive evaluation and comparison, we demonstrate our model's explicability in learning meaningful physical parameters, versatility in incorporating complex physical structures and heterogeneous materials, data-efficiency in learning, and high-fidelity in capturing subtle dynamics. Code is available in: https://github.com/realcrane/Fine-grained-Differentiable-P hysics-A-Yarn-level-Model-for-Fabrics.git
Article
Full-text available
Despite the significant progress over the last 50 years in simulating flow problems using numerical discretization of the Navier–Stokes equations (NSE), we still cannot incorporate seamlessly noisy data into existing algorithms, mesh-generation is complex, and we cannot tackle high-dimensional problems governed by parametrized NSE. Moreover, solving inverse flow problems is often prohibitively expensive and requires complex and expensive formulations and new computer codes. Here, we review flow physics-informed learning, integrating seamlessly data and mathematical models, and implement them using physics-informed neural networks (PINNs). We demonstrate the effectiveness of PINNs for inverse problems related to three-dimensional wake flows, supersonic flows, and biomedical flows.
Article
Full-text available
Pedestrian trajectory prediction is one of the main concerns of computer vision problems in the automotive industry, especially in the field of advanced driver assistance systems. The ability to anticipate the next movements of pedestrians on the street is a key task in many areas, e.g., self-driving auto vehicles, mobile robots or advanced surveillance systems, and they still represent a technological challenge. The performance of state-of-the-art pedestrian trajectory prediction methods currently benefits from the advancements in sensors and associated signal processing technologies. The current paper reviews the most recent deep learning-based solutions for the problem of pedestrian trajectory prediction along with employed sensors and afferent processing methodologies, and it performs an overview of the available datasets, performance metrics used in the evaluation process, and practical applications. Finally, the current work exposes the research gaps from the literature and outlines potential new research directions.
Article
Full-text available
Despite great progress in simulating multiphysics problems using the numerical discretization of partial differential equations (PDEs), one still cannot seamlessly incorporate noisy data into existing algorithms, mesh generation remains complex, and high-dimensional problems governed by parameterized PDEs cannot be tackled. Moreover, solving inverse problems with hidden physics is often prohibitively expensive and requires different formulations and elaborate computer codes. Machine learning has emerged as a promising alternative, but training deep neural networks requires big data, not always available for scientific problems. Instead, such networks can be trained from additional information obtained by enforcing the physical laws (for example, at random points in the continuous space-time domain). Such physics-informed learning integrates (noisy) data and mathematical models, and implements them through neural networks or other kernel-based regression networks. Moreover, it may be possible to design specialized network architectures that automatically satisfy some of the physical invariants for better accuracy, faster training and improved generalization. Here, we review some of the prevailing trends in embedding physics into machine learning, present some of the current capabilities and limitations and discuss diverse applications of physics-informed learning both for forward and inverse problems, including discovering hidden physics and tackling high-dimensional problems.
Article
Full-text available
We describe a new algorithm for the generation of high quality tetrahedral meshes using artificial neural networks. The goal is to generate close-to-optimal meshes in the sense that the error in the computed finite element (FE) solution (for a target system of partial differential equations (PDEs)) is as small as it could be for a prescribed number of nodes or elements in the mesh. In this paper we illustrate and investigate our proposed approach by considering the equations of linear elasticity, solved on a variety of three-dimensional geometries. This class of PDE is selected due to its equivalence to an energy minimization problem, which therefore allows a quantitative measure of the relative accuracy of different meshes (by comparing the energy associated with the respective FE solutions on these meshes). Once the algorithm has been introduced it is evaluated on a variety of test problems, each with its own distinctive features and geometric constraints, in order to demonstrate its effectiveness and computational efficiency.
Article
Full-text available
The real‐time simulation of human crowds has many applications. Simulating how the people in a crowd move through an environment is an active and ever‐growing research topic. Most research focuses on microscopic (or ‘agent‐based’) crowd‐simulation methods that model the behavior of each individual person, from which collective behavior can then emerge. This state‐of‐the‐art report analyzes how the research on microscopic crowd simulation has advanced since the year 2010. We focus on the most popular research area within the microscopic paradigm, which is local navigation, and most notably collision avoidance between agents. We discuss the four most popular categories of algorithms in this area (force‐based, velocity‐based, vision‐based, and data‐driven) that have either emerged or grown in the last decade. We also analyze the conceptual and computational (dis)advantages of each category. Next, we extend the discussion to other types of behavior or navigation (such as group behavior and the combination with path planning), and we review work on evaluating the quality of simulations. Based on the observed advancements in the 2010s, we conclude by predicting how the research area of microscopic crowd simulation will evolve in the future. Overall, we expect a significant growth in the area of data‐driven and learning‐based agent navigation, and we expect an increasing number of methods that re‐group multiple ‘levels’ of behavior into one principle. Furthermore, we observe a clear need for new ways to analyze (real or simulated) crowd behavior, which is important for quantifying the realism of a simulation and for choosing the right algorithms at the right time.
Article
Full-text available
When overpopulated cities face frequent crowded events like strikes, demonstrations, parades or other sorts of people gatherings, they are confronted to multiple security issues. To mitigate these issues, security forces are often involved to monitor the gatherings and to ensure the security of their participants. However, when access to technology is limited, the security forces can quickly become overwhelmed. Fortunately, more and more important smart cities are adopting the concept of intelligent surveillance systems. In these situations, intelligent surveillance systems require the most advanced techniques of crowd analysis to monitor crowd events properly. In this review, we explore various studies related to crowd analysis. Crowd analysis is commonly broken down into two major branches: crowd statistics and crowd behavior analysis. When crowd statistics determines the Level Of Service (LoS) of a crowded scene, crowd behavior analysis describes the motion patterns and the activities that are observed in a scene. One of the hottest topics of crowd analysis is anomaly detection. Although a unanimous definition of anomaly has not yet been met, each of crowd analysis subtopics can be subjected to abnormality. The purpose of our review is to find subareas, in crowd analysis, that are still unexplored or that seem to be rarely addressed through the prism of Deep Learning.
Article
Full-text available
Due to the interaction and external interference, the crowds will constantly and dynamically adjust their evacuation path in the evacuation process to achieve the purpose of rapid evacuation. The information from previous process can be used to modify the current evacuation control information to achieve a better evacuation effect, and iterative learning control can achieve an effective prediction of the expected path within a limited running time. In order to depict this process, the social force model is improved based on an iterative extended state observer so that the crowds can move along the optimal evacuation path. First, the objective function of the optimal evacuation path is established in the improved model, and an iterative extended state observer is designed to get the estimated value. Second, the above model is verified through simulation experiments. The results show that, as the number of iterations increases, the evacuation time shows a trend of first decreasing and then increasing.
Chapter
Full-text available
We introduce a novel approach to automatic unstructured mesh generation using machine learning to predict an optimal finite element mesh for a previously unseen problem. The framework that we have developed is based around training an artificial neural network (ANN) to guide standard mesh generation software, based upon a prediction of the required local mesh density throughout the domain. We describe the training regime that is proposed, based upon the use of a posteriori error estimation, and discuss the topologies of the ANNs that we have considered. We then illustrate performance using two standard test problems, a single elliptic partial differential equation (PDE) and a system of PDEs associated with linear elasticity. We demonstrate the effective generation of high quality meshes for arbitrary polygonal geometries and a range of material parameters, using a variety of user-selected error norms.
Article
Full-text available
Fig. 1. Overview of our framework. Crowd simulation is a central topic in several fields including graphics. To achieve high-fidelity simulations, data has been increasingly relied upon for analysis and simulation guidance. However, the information in real-world data is often noisy, mixed and unstructured, making it difficult for effective analysis, therefore has not been fully utilized. With the fast-growing volume of crowd data, such a bottleneck needs to be addressed. In this paper, we propose a new framework which comprehensively tackles this problem. It centers at an unsupervised method for analysis. The method takes as input raw and noisy data with highly mixed multi-dimensional (space, time and dynamics) information, and automatically structure it by learning the correlations among these dimensions. The dimensions together with their correlations fully describe the scene semantics which consists of recurring activity patterns in a scene, manifested as space flows with temporal and dynamics profiles. The effectiveness and robustness of the analysis have been * Corresponding author † Corresponding author tested on datasets with great variations in volume, duration, environment and crowd dynamics. Based on the analysis, new methods for data visual-ization, simulation evaluation and simulation guidance are also proposed. Together, our framework establishes a highly automated pipeline from raw data to crowd analysis, comparison and simulation guidance. Extensive experiments and evaluations have been conducted to show the flexibility, versatility and intuitiveness of our framework.
Preprint
Full-text available
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." Scientific models, such as Newtonian physics or biological gene regulatory networks, are human-driven simplifications of complex phenomena that serve as surrogates for the countless experiments that validated the models. Recently, machine learning has been able to overcome the inaccuracies of approximate modeling by directly learning the entire set of nonlinear interactions from data. However, without any predetermined structure from the scientific basis behind the problem, machine learning approaches are flexible but data-expensive, requiring large databases of homogeneous labeled training data. A central challenge is reconciling data that is at odds with simplified models without requiring "big data". In this work we develop a new methodology, universal differential equations (UDEs), which augments scientific models with machine-learnable structures for scientifically-based learning. We show how UDEs can be utilized to discover previously unknown governing equations, accurately extrapolate beyond the original data, and accelerate model simulation, all in a time and data-efficient manner. This advance is coupled with open-source software that allows for training UDEs which incorporate physical constraints, delayed interactions, implicitly-defined events, and intrinsic stochasticity in the model. Our examples show how a diverse set of computationally-difficult modeling issues across scientific disciplines, from automatically discovering biological mechanisms to accelerating climate simulations by 15,000x, can be handled by training UDEs.
Article
Full-text available
Steering and navigation are important components of character animation systems to enable them to autonomously move in their environment. In this work, we propose a synthetic vision model that uses visual features to steer agents through dynamic environments. Our agents perceive optical flow resulting from their relative motion with the objects of the environment. The optical flow is then segmented and processed to extract visual features such as the focus of expansion and time‐to‐collision. Then, we establish the relations between these visual features and the agent motion, and use them to design a set of control functions which allow characters to perform object‐dependent tasks, such as following, avoiding and reaching. Control functions are then combined to let characters perform more complex navigation tasks in dynamic environments, such as reaching a goal while avoiding multiple obstacles. Agent's motion is achieved by local minimization of these functions. We demonstrate the efficiency of our approach through a number of scenarios. Our work sets the basis for building a character animation system which imitates human sensorimotor actions. It opens new perspectives to achieve realistic simulation of human characters taking into account perceptual factors, such as the lighting conditions of the environment.
Article
Full-text available
Controlling a crowd using multi-touch devices appeals to the computer games and animation industries, as such devices provide a high dimensional control signal that can effectively define the crowd formation and movement. However, existing works relying on pre-defined control schemes require the users to learn a scheme that may not be intuitive. We propose a data-driven gesture-based crowd control system, in which the control scheme is learned from example gestures provided by different users. In particular, we build a database with pairwise samples of gestures and crowd motions. To effectively generalize the gesture style of different users, such as the use of different numbers of fingers, we propose a set of gesture features for representing a set of hand gesture trajectories. Similarly, to represent crowd motion trajectories of different numbers of characters over time, we propose a set of crowd motion features that are extracted from a Gaussian mixture model. Given a run-time gesture, our system extracts the K nearest gestures from the database and interpolates the corresponding crowd motions in order to generate the run-time control. Our system is accurate and efficient, making it suitable for real-time applications such as real-time strategy games and interactive animation controls.
Conference Paper
Full-text available
We present a real-time algorithm to automatically classify the behavior or personality of a pedestrian based on his or her movements in a crowd video. Our classification criterion is based on Personality Trait theory. We present a statistical scheme that dynamically learns the behavior of every pedestrian and computes its motion model. This model is combined with global crowd characteristics to compute the movement patterns and motion dynamics and use them for crowd prediction. Our learning scheme is general and we highlight its performance in identifying the personality of different pedestrians in low and high density crowd videos. We also evaluate the accuracy by comparing the results with a user study.
Conference Paper
Full-text available
Automatically recognizing activities in video is a classic problem in vision and helps to understand behaviors, describe scenes and detect anomalies. We propose an unsupervised method for such purposes. Given video data, we discover recurring activity patterns that appear, peak, wane and disappear over time. By using non-parametric Bayesian methods, we learn coupled spatial and temporal patterns with minimum prior knowledge. To model the temporal changes of patterns, previous works compute Markovian progressions or locally continuous motifs whereas we model time in a globally continuous and non-Markovian way. Visually, the patterns depict flows of major activities. Temporally, each pattern has its own unique appearance-disappearance cycles. To compute compact pattern representations, we also propose a hybrid sampling method. By combining these patterns with detailed environment information, we interpret the semantics of activities and report anomalies. Also, our method fits data better and detects anomalies that were difficult to detect previously.
Thesis
Full-text available
To well understand crowd behavior, microscopic models have been developed in recent decades, in which an individual’s behavioral/psychological status can be modeled and simulated. A well-known model is the social-force model innovated by physicists (Helbing and Molnar, 1995; Helbing, Farkas and Vicsek, 2000; Helbing et al., 2002). This model has been widely accepted and mainly used in simulation of crowd evacuation in the past decade. A problem, however, is that the testing results of the model were not explained in consistency with the psychological findings, resulting in misunderstanding of the model by psychologists. This paper will bridge the gap between psychological studies and physical explanation about this model. We reinterpret this physics-based model from a psychological perspective, clarifying that the model is consistent with psychological studies on stress, including time-related stress and interpersonal stress. Based on the conception of stress, we renew the model at both micro-and-macro level, referring to multi-agent simulation in a microscopic sense and fluid-based analysis in a macroscopic sense. The cognition and behavior of individual agents are critically modeled as response to environmental stimuli. Existing simulation results such as faster-is-slower effect will be reinterpreted by Yerkes–Dodson law, and herding and grouping effect are further discussed by integrating attraction into the social force. In brief the social-force model exhibits a bridge between the physics laws and psychological principles regarding crowd motion, and this paper will renew and reinterpret the model on the foundations of psychological studies.
Conference Paper
Full-text available
Crowd simulation has been an active and important area of research in the field of interactive 3D graphics for several decades. However, only recently has there been an increased focus on evaluating the fidelity of the results with respect to real-world situations. The focus to date has been on analyzing the properties of low-level features such as pedestrian trajectories, or global features such as crowd densities. We propose a new approach based on finding latent Path Patterns in both real and simulated data in order to analyze and compare them. Unsupervised clustering by non-parametric Bayesian inference is used to learn the patterns, which themselves provide a rich visualization of the crowd's behaviour. To this end, we present a new Stochastic Variational Dual Hierarchical Dirichlet Process (SV-DHDP) model. The fidelity of the patterns is then computed with respect to a reference, thus allowing the outputs of different algorithms to be compared with each other and/or with real data accordingly.
Article
Full-text available
Pedestrian crowds often have been modeled as many-particle system including microscopic multi-agent simulators. One of the key challenges is to unearth governing principles that can model pedestrian movement, and use them to reproduce paths and behaviors that are frequently observed in human crowds. To that effect, we present a novel crowd simulation algorithm that generates pedestrian trajectories that exhibit the speed-density relationships expressed by the Fundamental Diagram. Our approach is based on biomechanical principles and psychological factors. The overall formulation results in better utilization of free space by the pedestrians and can be easily combined with well-known multi-agent simulation techniques with little computational overhead.We are able to generate human-like dense crowd behaviors in large indoor and outdoor environments and validate the results with captured real-world crowd trajectories.
Conference Paper
Full-text available
Car navigation systems usually offer different choices to route calculation, for example the fastest, the shortest, and nowadays also the one with minimum energy consumption. For pedestrians the choice typically is shortest route, but it is not clear if the shortest path is the best one. A pedestrian has a tendency to energy minimization, for example when choosing the walking speed. This paper presents a novel energy based routing algorithm for pedestrians. The energy model is derived from energy expenditure measurements found from literature. Route planning is based on a road database utilizing an accurate digital elevation model.
Article
We present LCollision, a learning-based method that synthesizes collision-free 3D human poses. At the crux of our approach is a novel deep architecture that simultaneously decodes new human poses from the latent space and predicts colliding body parts. These two components of our architecture are used as the objective function and surrogate hard constraints in a constrained optimization for collision-free human pose generation. A novel aspect of our approach is the use of a bilevel autoencoder that decomposes whole-body collisions into groups of collisions between localized body parts. By solving the constrained optimizations, we show that a significant amount of collision artifacts can be resolved. Furthermore, in a large test set of 2.5 × 10 6 randomized poses from SCAPE, our architecture achieves a collision-prediction accuracy of 94.1% with 80× speedup over exact collision detection algorithms. To the best of our knowledge, LCollision is the first approach that accelerates collision detection and resolves penetrations using a neural network.
Article
Trajectory prediction aims to predict the movement trend of the agents like pedestrians, bikers, vehicles. It is helpful to analyze and understand human activities in crowded spaces and widely applied in many areas such as surveillance video analysis and autonomous driving systems. Thanks to the success of deep learning, trajectory prediction has made significant progress. The current methods are dedicated to studying the agents’ future trajectories under the social interaction and the sceneries’ physical constraints. Moreover, how to deal with these factors still catches researchers’ attention. However, they ignore the Semantic Shift Phenomenon when modeling these interactions in various prediction sceneries. There exist several kinds of semantic deviations inner or between social and physical interactions, which we call the “Gap”. In this paper, we propose a Contextual Semantic Consistency Network (CSCNet) to predict agents’ future activities with powerful and efficient context constraints. We utilize a well-designed context-aware transfer to obtain the intermediate representations from the scene images and trajectories. Then we eliminate the differences between social and physical interactions by aligning activity semantics and scene semantics to cross the Gap. Experiments demonstrate that CSCNet performs better than most of the current methods quantitatively and qualitatively.
Chapter
Reasoning about human motion is an important prerequisite to safe and socially-aware robotic navigation. As a result, multi-agent behavior prediction has become a core component of modern human-robot interactive systems, such as self-driving cars. While there exist many methods for trajectory forecasting, most do not enforce dynamic constraints and do not account for environmental information (e.g., maps). Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data (e.g., semantic maps). Trajectron++ is designed to be tightly integrated with robotic planning and control frameworks; for example, it can produce predictions that are optionally conditioned on ego-agent motion plans. We demonstrate its performance on several challenging real-world trajectory forecasting datasets, outperforming a wide array of state-of-the-art deterministic and generative methods.
Chapter
This paper studies the problem of predicting future trajectories of people in unseen cameras of novel scenarios and views. We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras. We propose a novel approach to learn robust representation through augmenting the simulation training data such that the representation can better generalize to unseen real-world test data. The key idea is to mix the feature of the hardest camera view with the adversarial feature of the original view. We refer to our method as SimAug. We show that SimAug achieves promising results on three real-world benchmarks using zero real training data, and state-of-the-art performance in the Stanford Drone and the VIRAT/ActEV dataset when using in-domain training data. Code and models are released at https://next.cs.cmu.edu/simaug.
Chapter
Human trajectory forecasting with multiple socially interacting agents is of critical importance for autonomous navigation in human environments, e.g., for self-driving cars and social robots. In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction. PECNet infers distant trajectory endpoints to assist in long-range multi-modal trajectory prediction. A novel non-local social pooling layer enables PECNet to infer diverse yet socially compliant trajectories. Additionally, we present a simple “truncation-trick” for improving diversity and multi-modal trajectory prediction performance. We show that PECNet improves state-of-the-art performance on the Stanford Drone trajectory prediction benchmark by ∼20.9% and on the ETH/UCY benchmark by ∼40.8% (Code available at project homepage: https://karttikeya.github.io/publication/htf/).
Conference Paper
Over the last two decades there has been a proliferation of methods for simulating crowds of humans. As the number of different methods and their complexity increases, it becomes increasingly unrealistic to expect researchers and users to keep up with all the possible options and trade-offs. We therefore see the need for tools that can facilitate both domain experts and non-expert users of crowd simulation in making high-level decisions about the best simulation methods to use in different scenarios. In this paper, we leverage trajectory data from human crowds and machine learning techniques to learn a manifold which captures representative local navigation scenarios that humans encounter in real life. We show the applicability of this manifold in crowd research, including analyzing trends in simulation accuracy, and creating automated systems to assist in choosing an appropriate simulation method for a given scenario.
Article
We introduce physics-informed neural networks – neural networks that are trained to solve supervised learning tasks while respecting any given laws of physics described by general nonlinear partial differential equations. In this work, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct types of algorithms, namely continuous time and discrete time models. The first type of models forms a new family of data-efficient spatio-temporal function approximators, while the latter type allows the use of arbitrarily accurate implicit Runge–Kutta time stepping schemes with unlimited number of stages. The effectiveness of the proposed framework is demonstrated through a collection of classical problems in fluids, quantum mechanics, reaction–diffusion systems, and the propagation of nonlinear shallow-water waves.
Article
Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments. This is challenging because human motion is inherently multimodal: given a history of human motion paths, there are many socially plausible ways that people could move in the future. We tackle this problem by combining tools from sequence prediction and generative adversarial networks: a recurrent sequence-to-sequence model observes motion histories and predicts future behavior, using a novel pooling mechanism to aggregate information across people. We predict socially plausible futures by training adversarially against a recurrent discriminator, and encourage diverse predictions with a novel variety loss. Through experiments on several datasets we demonstrate that our approach outperforms prior work in terms of accuracy, variety, collision avoidance, and computational complexity.
Article
Robots that navigate through human crowds need to be able to plan safe, efficient, and human predictable trajectories. This is a particularly challenging problem as it requires the robot to predict future human trajectories within a crowd where everyone implicitly cooperates with each other to avoid collisions. Previous approaches to human trajectory prediction have modeled the interactions between humans as a function of proximity. However, that is not necessarily true as some people in our immediate vicinity moving in the same direction might not be as important as other people that are further away, but that might collide with us in the future. In this work, we propose Social Attention, a novel trajectory prediction model that captures the relative importance of each person when navigating in the crowd, irrespective of their proximity. We demonstrate the performance of our method against a state-of-the-art approach on two publicly available crowd datasets and analyze the trained attention model to gain a better understanding of which surrounding agents humans attend to, when navigating in a crowd.
Article
Large dense crowds show aggregate behavior with reduced individual freedom of movement. We present a novel, scalable approach for simulating such crowds, using a dual representation both as discrete agents and as a single continuous system. In the continuous setting, we introduce a novel variational constraint called unilateral incompressibility, to model the large-scale behavior of the crowd, and accelerate inter-agent collision avoidance in dense scenarios. This approach makes it possible to simulate very large, dense crowds composed of up to a hundred thousand agents at nearinteractive rates on desktop computers.
Conference Paper
Humans navigate crowded spaces such as a university campus by following common sense rules based on social etiquette. In this paper, we argue that in order to enable the design of new target tracking or trajectory forecasting methods that can take full advantage of these rules, we need to have access to better data in the first place. To that end, we contribute a new large-scale dataset that collects videos of various types of targets (not just pedestrians, but also bikers, skateboarders, cars, buses, golf carts) that navigate in a real world outdoor environment such as a university campus. Moreover, we introduce a new characterization that describes the “social sensitivity” at which two targets interact. We use this characterization to define “navigation styles” and improve both forecasting models and state-of-the-art multi-target tracking–whereby the learnt forecasting models help the data association step.
Article
In this work, we propose an unsupervised approach for crowd scene anomaly detection and localization using a social network model. Using a window-based approach, a video scene is first partitioned at spatial and temporal levels, and a set of spatio-temporal cuboids is constructed. Objects exhibiting scene dynamics are detected and the crowd behavior in each cuboid is modeled using local social networks (LSN). From these local social networks, a global social network (GSN) is built for the current window to represent the global behavior of the scene. As the scene evolves with time, the global social network is updated accordingly using LSNs, to detect and localize abnormal behaviors. We demonstrate the effectiveness of the proposed Social Network Model (SNM) approach on a set of benchmark crowd analysis video sequences. The experimental results reveal that the proposed method outperforms the majority, if not all, of the state-of-the-art methods in terms of accuracy of anomaly detection.
Article
We present a novel approach for analyzing the quality of multi-agent crowd simulation algorithms. Our approach is data-driven, taking as input a set of user-defined metrics and reference training data, either synthetic or from video footage of real crowds. Given a simulation, we formulate the crowd analysis problem as an anomaly detection problem and exploit state-of-the-art outlier detection algorithms to address it. To that end, we introduce a new framework for the visual analysis of crowd simulations. Our framework allows us to capture potentially erroneous behaviors on a per-agent basis either by automatically detecting outliers based on individual evaluation metrics or by accounting for multiple evaluation criteria in a principled fashion using Principle Component Analysis and the notion of Pareto Optimality. We discuss optimizations necessary to allow real-time performance on large datasets and demonstrate the applicability of our framework through the analysis of simulations created by several widely-used methods, including a simulation from a commercial game.