(A) The New York Grand Central station. Two semantic regions learned by our algorithm are plotted on the background image. They correspond to paths of pedestrians. Colors indicate different moving directions of pedestrians. Activities observed on the same semantic region have similar semantic interpretation such as " pedestrians enter the hall from entrance a and leave from exit b " .(B) Examples of tracklets collected in the scene. The goal of this work is to learn semantic regions from tracklets.

Source publication

Random field topic model for semantic region analysis in crowded scenes from tracklets

Conference Paper

Full-text available

Jul 2011

In this paper, a Random Field Topic (RFT) model is proposed for semantic region analysis from motions of objects in crowded scenes. Different from existing approaches of learning semantic regions either from optical flows or from complete trajectories, our model assumes that fragments of trajectories (called tracklets) are observed in crowded scene...

Context 1

... semantic regions correspond to different paths commonly taken by objects, and activities observed in the same semantic region have similar semantic interpretation. Some examples are shown in Figure 1 (A). Semantic regions can be used for activity analysis in a single camera view [21,9,10,24,20] or in multiple camera views [13,11,23] at later stages. ...

View in full-text

Context 2

... worked well for traffic scenes where at different time different subsets of activities were observed. However, our experiments show that it fails in a scene like Figure 1 (A), where all types of activities happen together most of the time with significant temporal over- laps. In this type of scenes, the temporal co-occurrence information is not discriminative enough. ...

View in full-text

Pedestrian Behavior Prediction Using Deep Learning Methods for Urban Scenarios: A Review

Article

Full-text available

Oct 2023
IEEE T INTELL TRANSP

The prediction of pedestrian behavior is essential for automated driving in urban traffic and has attracted increasing attention in the vehicle industry. This task is challenging because pedestrian behavior is driven by various factors, including their individual properties, the interactions with other road users, and the interactions with the environment. Deep learning approaches have become increasingly popular because of their superior performance in complex scenarios compared to traditional approaches such as the social force or constant velocity models. In this paper, we provide a comprehensive review of deep learning-based approaches for pedestrian behavior prediction. We review and categorize a large selection of scientific contributions covering both trajectory and intention prediction from the last five years. We categorize existing works by prediction tasks, input data, model features, and network structures. Besides, we provide an overview of existing datasets and the evaluation metrics. We analyze, compare, and discuss the performance of existing work. Finally, we point out the research gaps and outline possible directions for future research.

Survey of maps of dynamics for mobile robots

Article

Full-text available

Aug 2023
INT J ROBOT RES

Robotic mapping provides spatial information for autonomous agents. Depending on the tasks they seek to enable, the maps created range from simple 2D representations of the environment geometry to complex, multilayered semantic maps. This survey article is about maps of dynamics (MoDs), which store semantic information about typical motion patterns in a given environment. Some MoDs use trajectories as input, and some can be built from short, disconnected observations of motion. Robots can use MoDs, for example, for global motion planning, improved localization, or human motion prediction. Accounting for the increasing importance of maps of dynamics, we present a comprehensive survey that organizes the knowledge accumulated in the field and identifies promising directions for future work. Specifically, we introduce field-specific vocabulary, summarize existing work according to a novel taxonomy, and describe possible applications and open research problems. We conclude that the field is mature enough, and we expect that maps of dynamics will be increasingly used to improve robot performance in real-world use cases. At the same time, the field is still in a phase of rapid development where novel contributions could significantly impact this research area.

Fusion of CCTV Video and Spatial Information for Automated Crowd Congestion Monitoring in Public Urban Spaces

Article

Full-text available

Mar 2023

Crowd congestion is one of the main causes of modern public safety issues such as stampedes. Conventional crowd congestion monitoring using closed-circuit television (CCTV) video surveillance relies on manual observation, which is tedious and often error-prone in public urban spaces where crowds are dense, and occlusions are prominent. With the aim of managing crowded spaces safely, this study proposes a framework that combines spatial and temporal information to automatically map the trajectories of individual occupants, as well as to assist in real-time congestion monitoring and prediction. Through exploiting both features from CCTV footage and spatial information of the public space, the framework fuses raw CCTV video and floor plan information to create visual aids for crowd monitoring, as well as a sequence of crowd mobility graphs (CMGraphs) to store spatiotemporal features. This framework uses deep learning-based computer vision models, geometric transformations, and Kalman filter-based tracking algorithms to automate the retrieval of crowd congestion data, specifically the spatiotemporal distribution of individuals and the overall crowd flow. The resulting collective crowd movement data is then stored in the CMGraphs, which are designed to facilitate congestion forecasting at key exit/entry regions. We demonstrate our framework on two video data, one public from a train station dataset and the other recorded at a stadium following a crowded football game. Using both qualitative and quantitative insights from the experiments, we demonstrate that the suggested framework can be useful to help assist urban planners and infrastructure operators with the management of congestion hazards.

A Tracklet-before-Clustering Initialization Strategy Based on Hierarchical KLT Tracklet Association for Coherent Motion Filtering Enhancement

Article

Full-text available

Feb 2023

Coherent motions depict the individuals’ collective movements in widely existing moving crowds in physical, biological, and other systems. In recent years, similarity-based clustering algorithms, particularly the Coherent Filtering (CF) clustering approach, have accomplished wide-scale popularity and acceptance in the field of coherent motion detection. In this work, a tracklet-before-clustering initialization strategy is introduced to enhance coherent motion detection. Moreover, a Hierarchical Tracklet Association (HTA) algorithm is proposed to address the disconnected KLT tracklets problem of the input motion feature, thereby making proper trajectories repair to optimize the CF performance of the moving crowd clustering. The experimental results showed that the proposed method is effective and capable of extracting significant motion patterns taken from crowd scenes. Quantitative evaluation methods, such as Purity, Normalized Mutual Information Index (NMI), Rand Index (RI), and F-measure (Fm), were conducted on real-world data using a huge number of video clips. This work has established a key, initial step toward achieving rich pattern recognition.

Evolving graph-based video crowd anomaly detection

Article

Full-text available

Jan 2023
VISUAL COMPUT

Detecting anomalous crowd behavioral patterns from videos is an important task in video surveillance and maintaining public safety. In this work, we propose a novel architecture to detect anomalous patterns of crowd movements via graph networks. We represent individuals as nodes and individual movements with respect to other people as the node-edge relationship of an evolving graph network. We then extract the motion information of individuals using optical flow between video frames and represent their motion patterns using graph edge weights. In particular, we detect the anomalies in crowded videos by modeling pedestrian movements as graphs and then by identifying the network bottlenecks through a max-flow/min-cut pedestrian flow optimization scheme (MFMCPOS). The experiment demonstrates that the proposed framework achieves superior detection performance compared to other recently published state-of-the-art methods. Considering that our proposed approach has relatively low computational complexity and can be used in real-time environments, which is crucial for present day video analytics for automated surveillance.

MSCDP: Multi-Step Crowd Density Predictor in Indoor Environment

Preprint

Full-text available

Sep 2022

Monitoring and predicting crowd movements in indoor environments are of great importance in crowd management to prevent crushing and trampling. Existing works mostly focused on individual trajectory forecasting in a less crowded scene, or crowd counting and density estimation. Only a very few works predict the crowd density distribution. However, this study is failing to realize multi-step prediction or exploiting only density heatmaps modality ignores the information complementation with corresponding video frames. Therefore, we are motivated to predict crowd density distribution in multiple time steps to facilitate long-term prediction. In this paper, a Multi-Step Crowd Density Predictor (MSCDP) to fuse video frame sequences and corresponding density heatmaps, is proposed to accurately forecast the future crowd density heatmaps. To capture long-term periodic movement features, the long-term optical flow context memory (LOFCM) module is designed to store learnable patterns. We conducted extensive experiments on two real-world datasets. Evaluation results show that our MSCDP outperforms the state-of-the-art baseline techniques and MSCDP variants in terms of various prediction errors, demonstrating the effectiveness of MSCDP and each of its key components in multi-step crowd density prediction.

Human Trajectory Prediction via Neural Social Physics

Preprint

Full-text available

Jul 2022

Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored. The former include rule-based, geometric or optimization-based models, and the latter are mainly comprised of deep learning approaches. In this paper, we propose a new method combining both methodologies based on a new Neural Differential Equation model. Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters. The explicit physics model serves as a strong inductive bias in modeling pedestrian behaviors, while the rest of the network provides a strong data-fitting capability in terms of system parameter estimation and dynamics stochasticity modeling. We compare NSP with 15 recent deep learning methods on 6 datasets and improve the state-of-the-art performance by 5.56%-70%. Besides, we show that NSP has better generalizability in predicting plausible trajectories in drastically different scenarios where the density is 2-5 times as high as the testing data. Finally, we show that the physics model in NSP can provide plausible explanations for pedestrian behaviors, as opposed to black-box deep learning. Code is available: https://github.com/realcrane/Human-Trajectory-Prediction-via-Neural-Social-Physics.

Analysis of moving cluster with scene constraints for group behavior pattern mining

Article

Full-text available

Jun 2022
NEURAL COMPUT APPL

Group behavior pattern mining in traffic scenarios is a challenging problem due to group variability and behavioral regionality. Most methods are either based on trajectory data stored in static databases regardless of the variability of group members or do not consider the influence of scene structures on behaviors. However, in traffic scenarios, information about group members may change over time, and objects' motions show regional characteristics owing to scene structures. To address these issues, we present a general framework of a moving cluster with scene constraints (MCSC) discovery consisting of semantic region segmentation, mapping, and an MCSC decision. In the first phase, a hidden Markov chain is adopted to model the evolution of behaviors along a video clip sequence, and a Markov topic model is proposed for semantic region analysis. During the mapping procedure, to generate snapshot clusters, moving objects are mapped into the corresponding sets of moving objects according to the semantic regions where they are located at each timestamp. In the MCSC decision phase, a candidate MCSC recognition algorithm and screening algorithm are designed to incrementally identify and output MCSCs. The effectiveness of the proposed approach is verified by experiments carried out using public road traffic data.

Human Action Recognition and Prediction: A Survey

Article

Full-text available

May 2022
INT J COMPUT VISION

Derived from rapid advances in computer vision and machine learning, video analysis tasks have been moving from inferring the present state to predicting the future state. Vision-based action recognition and prediction from videos are such tasks, where action recognition is to infer human actions (present state) based upon complete action executions, and action prediction to predict human actions (future state) based upon incomplete action executions. These two tasks have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as visual surveillance, autonomous driving vehicle, entertainment, and video retrieval, etc. Many attempts have been devoted in the last a few decades in order to build a robust and effective framework for action recognition and prediction. In this paper, we survey the complete state-of-the-art techniques in action recognition and prediction. Existing models, popular algorithms, technical difficulties, popular action databases, evaluation protocols, and promising future directions are also provided with systematic discussions.

Using Deep Siamese networks for trajectory analysis to extract motion patterns in videos

Article

Full-text available

Mar 2022
ELECTRON LETT

This paper investigates the use of Siamese networks for trajectory similarity analysis in surveillance tasks. Specifically, the proposed approach uses an auto‐encoder as a part of training a discriminative twin (Siamese) network to perform trajectory similarity analysis, thus presenting an end‐to‐end framework to perform an online motion pattern extraction in the scene with an ability to incorporate new incoming trajectory(ies) incrementally. The effectiveness of the proposed method is evaluated on four challenging public real‐world datasets containing both vehicle and person targets, and compared with five existing methods. The proposed method consistently shows better or comparable performance than the existing methods on all datasets.

Contexts in source publication

Citations