Conference Paper

Speed Estimation and Abnormality Detection from Surveillance Cameras

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To detect vehicles all of the teams have opted for deep learning based object detectors which we describe in section 2.2. Most teams [168,170,171,173,174] used Faster R-CNN [18], followed by multiple teams [166,169,167] using Mask R-CNN [98] and finally only three teams [164,172] used YOLO v2 [103]. One team [170] additionally constructed a 3D bounding box around the vehicle using a contour extraction network [175]. ...
... To perform tracking many teams [165,168,171,172,173,174] base their approach on IoU between successive detections with some additional processing via graphs [165], histogram matching [171], optical flow [172], Kalman filter [173] or correlation-based filtering [174]. Kalman filter [42] has been also used by itself [170]. ...
... To perform tracking many teams [165,168,171,172,173,174] base their approach on IoU between successive detections with some additional processing via graphs [165], histogram matching [171], optical flow [172], Kalman filter [173] or correlation-based filtering [174]. Kalman filter [42] has been also used by itself [170]. ...
Thesis
Full-text available
Intelligent Transportation Systems are advanced systems integrating various information technologies with the goal to provide services for efficient, informed and safe use and development of transportation networks. Such systems require efficient large-scale data collection. Recent developments in camera technology have made traffic cameras a viable source for such data. This thesis deals with the topic of automated analysis of traffic camera footage. In this thesis we present our own pipeline for vehicle counting which consists of existing state of the art methods. We also present our novel contribution for the task of detection of 3D bounding boxes of vehicles. We show that when this method is used for vehicle speed estimation the resulting mean error is only 0.75 km/h which is 32 % less than the error of the best competing method. We also present our contribution to semi-automatic traffic camera calibration based on detection of vanishing points of individual vehicles detected in traffic camera footage. We show that the results of this method are on par with the best existing approach while suffering from fewer limitations.
... Hard calibration involves to jointly estimate both intrinsic and extrinsic parameters with the camera already installed. It can also be performed either manually [20,38,58] or automatically [12,14,18,19,23,33,78,85,88,101,108]. ...
... Multiple lanes, both directions. Very high meter-to-pixel ratio [88]. (b) Medium focal length, medium relative distances. ...
... One of the most common ways to compute the camera's extrinsic parameters in the operation environment is using vanishing points [12,14,15,19,22,23,27,33,42,85,88,89,91,108]. The set of parallel lines in the 3D real-world coordinate The translation matrix T is then obtained using knowledge of real-world dimensions of some object or region in the image. ...
Article
Full-text available
Abstract The need to accurately estimate the speed of road vehicles is becoming increasingly important for at least two main reasons. First, the number of speed cameras installed worldwide has been growing in recent years, as the introduction and enforcement of appropriate speed limits are considered one of the most effective means to increase the road safety. Second, traffic monitoring and forecasting in road networks plays a fundamental role to enhance traffic, emissions and energy consumption in smart cities, being the speed of the vehicles one of the most relevant parameters of the traffic state. Among the technologies available for the accurate detection of vehicle speed, the use of vision‐based systems brings great challenges to be solved, but also great potential advantages, such as the drastic reduction of costs due to the absence of expensive range sensors, and the possibility of identifying vehicles accurately. This paper provides a review of vision‐based vehicle speed estimation. The terminology and the application domains are described and a complete taxonomy of a large selection of works that categorizes all stages involved is proposed. An overview of performance evaluation metrics and available datasets is provided. Finally, current limitations and future directions are discussed.
... Hua et al. [32] proposed speed estimation to deal with collisions on the road by combining detection and tracking methods. They used the NVIDIA AI City Challenge [14] dataset [33]- [38] for testing. Kumar et al. [39] proposed a semi-automatic 2D way rectification with vanishing points associated with the scaling factor to estimate the speed. ...
... MSE FULLACC [41] 104.98 OPtScale [9] 9.66 OPtscaleVP2 [9] 411.67 OPtCalib [9] 8.71 OPtCalibVP2 [9] 15.63 3DCNN [12] 14.62 Proposed method I 7.07 Proposed method II 6.56 [33] 36.49 UIUC [34] 44.45 Stevens IT [14] 46.91 ColumbiaU [35] 63.52 VietnamUN [40] 79.46 UMaryland [39] 91.03 BrnoUT [14] 63.52 Iowa SU [36] 74.11 SJSU [32] 146.62 UAlbany [37] 106.91 Iowa SU [14] 167.88 CERTH [38] 745.39 ...
Article
Full-text available
Vehicle speed estimation is one of the most critical issues in intelligent transportation system (ITS) research, while defining distance and identifying direction have become an inseparable part of vehicle speed estimation. Despite the success of traditional and deep learning approaches in estimating vehicle speed, the high cost of deploying hardware devices to get all related sensor data, such as infrared/ultrasonic devices, Global Positioning Systems (GPS), Light Detection and Ranging (LiDAR systems), and magnetic devices, has become the key barrier to improvement in previous studies. In this paper, our proposed model consists of two main components: 1) a vehicle detection and tracking component – this module is designed for creating reliable detection and tracking every specific object without doing calibration; 2) homography transformation regression network – this module has a function to solve occlusion issues and estimate vehicle speed accurately and efficiently. Experimental results on two datasets show that the proposed method outperforms the state-of-the-art methods by reducing the mean square error (MSE) metric from 14.02 to 6.56 based on deep learning approaches. We have announced our test code and model on GitHub with https://github.com/ervinyo/Speed-Estimation-Using-Homography-Transformation-and-Regression-Network.
... GNNs (Graph neural networks) are also deemed a promising solution for anomaly detections [47,75], though we argue that their learned features are not quite compatible with road traffic surveillance. The recent anomaly detection frameworks can be divided into supervised learning methods [4,11,22,27,29,32,33,40,50,56,72,76,77,80,86] that require certain level of manually processed data, or unsupervised learning-based methods [7,13,14,17,37,42,46,51,55,57,64,65,70,78,79] that requires virtually no labeled data. Supervised learning has been deemed obsolete in recent years due to its low robustness and laborsome annotation. ...
... Note that, RMSE is normalized with 300 to set the maximum acceptable time error in detection. Many works [7,10,14,15,21,22,50,54,65,80,87] are evaluated, respectively, by the above three metrics. ...
Article
Full-text available
Occasions such as stalled vehicles or crashes led by abnormal trajectories should be instantly identified and then dealt with quickly by the city traffic management system for the sake of road safety. However, a fast and accurate automatic detection system based on machine learning in general meets with great challenges from the shortage of recorded accident data, resulting in low detection accuracy. Many existing studies implement a two-level detection approach: stalled vehicles are detected at the stationary level, while abnormal trajectories are detected at the mobile level. This paper proposes a novel triple-layer framework to distribute these two levels to three parallel layers for maximum efficiency. A straightforward background extraction algorithm is applied at the beginning of this framework for motion-stationary distribution. Layer 1 implements a lightweight optical-flow-based feature extraction algorithm to convert the mobile visual features to learnable data. With a clustering algorithm that learns the common trajectories in an unsupervised manner, abnormal trajectories are detected in Layer 2. Simultaneously, in Layer 3, a custom-trained object detection algorithm is applied to detect the stall/crashed vehicles. The computational efficiency is improved and the detection accuracy is boosted. Experiments conducted on Nvidia AI City Challenge Dataset demonstrate the effectiveness of our LRATD (Lightweight Real-Time Abnormal Trajectory Detection framework) in terms of \(104\%\) gain in detection speed compared to the fastest entry, while achieving 0.935 S4-Score, only \(2.1\%\) less than the current state-of-the-art method. Overall, the performance of LRATD opens the possibility of its real-life application.
... Event recognition is of particular importance for traffic safety analysis since it can be used for detecting abnormal events and traffic violations and their associations with crash rates [348], car behaviour analysis [354,355], and pedestrians' crossing identification [388,391,392]. Recently, some works [337,393,394] try to predict anomaly actions using generative adversarial networks (GAN). ...
... We summarise important DL-based event recognition methods in Table 13, and present some popular datasets for event recognition tasks in Table 18. One observation is that unsupervised, semi-supervised, and self-supervised models are becoming more prevalent in recent works [337,338,348] to mitigate the costly and tedious job of video annotation and simplifies volume video processing. ...
Article
Full-text available
This paper explores deep learning (DL) methods that are used or have the potential to be used for traffic video analysis, emphasising driving safety for both autonomous vehicles and human‐operated vehicles. A typical processing pipeline is presented, which can be used to understand and interpret traffic videos by extracting operational safety metrics and providing general hints and guidelines to improve traffic safety. This processing framework includes several steps, including video enhancement, video stabilisation, semantic and incident segmentation, object detection and classification, trajectory extraction, speed estimation, event analysis, modelling, and anomaly detection. The main goal is to guide traffic analysts to develop their own custom‐built processing frameworks by selecting the best choices for each step and offering new designs for the lacking modules by providing a comparative analysis of the most successful conventional and DL‐based algorithms proposed for each step. Existing open‐source tools and public datasets that can help train DL models are also reviewed. To be more specific, exemplary traffic problems are reviewed and required steps are mentioned for each problem. Besides, connections to the closely related research areas of drivers' cognition evaluation, crowd‐sourcing‐based monitoring systems, edge computing in roadside infrastructures, automated driving systems‐equipped vehicles are investigated, and the missing gaps are highlighted. Finally, commercial implementations of traffic monitoring systems, their future outlook, and open problems and remaining challenges for widespread use of such systems are reviewed.
... It is defined that the concept of "behaviour" of the detected agents (vehicles) considered in this project concerns the attributes referring to pure vectorial physical quantities (such as modulus and direction), combinatorial (speed) and derivatives (displacement, acceleration), in particular its variations in the observed fields (images that compose the footages). Considering this definition, anomaly detection in road traffic is a suitable field for research, and [10] delivered a survey exploring several approaches, in which [11] stands out by trying an approach applying LSTMs to discriminate behaviours; [12] approach defines outliers as abnormal behaviours, identified by K-Means clustering, and applies a Markov Hidden Model to detect them; [13] uses object's dimensions and displacement to identify vehicles in sidewalks and persons in roads; and [14] takes speed information to detect road laws infringements. All these approaches make use of surveillance security cameras as image source. ...
... The Figure 1 shows the ordered flowchart of the methodological process applied in this project, characterized by 1) images capture, 2) labeling of vehicles present in the images, 3) data segmentation in a training and validation set, 4) modeling of the CNN for vehicles' detection and training with the prepared data, 5) activation of the CNN for vehicles detection in road videos and tracking of the detected objects, 6) time series extraction from tracked vehicles, 7) exploration and analysis of time series sets in order to extract behavioural characteristics, 8) classification the time series of vehicles in normal and abnormal behaviours, 9) modeling of LSTMs and training with the time series, and 10) experimentation of the LSTMs with cross validation methods. Therefore, this approach takes features applied by [12][13][14], considering the footage from surveillance cameras as well from aerial non-static perspectives of a drone. ...
Conference Paper
Full-text available
Relying on computer vision, many clever things are possible in order to make the world safer and optimized on resource management, especially considering time and attention as manageable resources, once the modern world is very abundant in cameras from inside our pockets to above our heads while crossing the streets. Thus, automated solutions based on computer vision techniques to detect, react or even prevent relevant events such as robbery, car crashes and traffic jams can be accomplished and implemented for the sake of both logistical and surveillance improvements. In this paper, we present an approach for vehicles’ abnormal behaviours detection from highway footages, in which the vectorial data of the vehicles’ displacement are extracted directly from surveillance cameras footage through object detection and tracking with a deep convolutional neural network and inserted into a long-short term memory neural network for behaviour classification. The results show that the classifications of behaviours are consistent and the same principles may be applied on other trackable objects and scenarios as well.
... Event recognition is of particular importance for traffic safety analysis since it can be used for detecting abnormal events and traffic violations and their associations with crash rates [357], car behavior analysis [358,359], and pedestrians' crossing identification [360,354,361]. Recently, some works [362,363,364] try to predict anomaly actions using Generative Adversarial Networks (GAN). ...
... We summarize important DL-based event recognition methods in Table 13, and present some popular datasets for event recognition tasks in Table 18. One observation is that unsupervised, semi-supervised, and self-supervised models are becoming more prevalent in recent works [357,368,364] to mitigate the costly and tedious job of video annotation and simplifies volume video processing. These pure data-driven methods often focus on detection problems. ...
Preprint
Full-text available
This paper explores Deep Learning (DL) methods that are used or have the potential to be used for traffic video analysis, emphasizing driving safety for both Autonomous Vehicles (AVs) and human-operated vehicles. We present a typical processing pipeline, which can be used to understand and interpret traffic videos by extracting operational safety metrics and providing general hints and guidelines to improve traffic safety. This processing framework includes several steps, including video enhancement, video stabilization, semantic and incident segmentation, object detection and classification, trajectory extraction, speed estimation, event analysis, modeling and anomaly detection. Our main goal is to guide traffic analysts to develop their own custom-built processing frameworks by selecting the best choices for each step and offering new designs for the lacking modules by providing a comparative analysis of the most successful conventional and DL-based algorithms proposed for each step. We also review existing open-source tools and public datasets that can help train DL models. To be more specific, we review exemplary traffic problems and mentioned requires steps for each problem. Besides, we investigate connections to the closely related research areas of drivers' cognition evaluation, Crowd-sourcing-based monitoring systems, Edge Computing in roadside infrastructures, ADS-equipped AVs, and highlight the missing gaps. Finally, we review commercial implementations of traffic monitoring systems, their future outlook, and open problems and remaining challenges for widespread use of such systems.
... Authors of [104] have used 3D-tube representation of trajectories as features using the contextual proximity of neighboring trajectory for learning normal trajectory. In [52], Fisher vector corresponding to each trajectory obtained using optical flow of the object and its position, has been used. Histogram of optical flow and motion [207] Sub-trajectories Multi instance learning Nearest neighborhood based approach with Hausdorff distance-based threshold for anomaly detection. ...
... A new feature descriptor HOFME that could handle diverse anomaly scenarios as compared with conventional features. Giannakeris (2018) [52] Trajectory Fisher vector SVM Anomaly score derived from the Fisher vector using OCSVM. ...
Preprint
Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. In this paper, we present a survey on relevant visual surveillance related researches for anomaly detection in public places, focusing primarily on roads. Firstly, we revisit the surveys done in the last 10 years in this field. Since the underlying building block of a typical anomaly detection is learning, we emphasize more on learning methods applied on video scenes. We then summarize the important contributions made during last six years on anomaly detection primarily focusing on features, underlying techniques, applied scenarios and types of anomalies using single static camera. Finally, we discuss the challenges in the computer vision related anomaly detection techniques and some of the important future possibilities.
... The information collected by the cameras is diverse and should be determined by the system administrator. This can range from monitoring pedestrian movement to prevent crimes to observing vehicle traffic patterns for optimization purposes [23,24,25,26]. The network is the medium used by cameras to send their data. ...
Preprint
Full-text available
High crime rates plague many nations worldwide, and with it comes the problem of lack of security that these countries face. The lack of an efficient monitoring system allows people to commit the most varied types of crimes. Deploying minimally intelligent monitoring systems helps to a great extent to minimize the problem, but they are very costly and well-designed to avoid unnecessary expenses. A point of great importance in planning an intelligent monitoring system is to measure its availability to ensure a configuration that provides the highest possible uptime. In this context, this paper proposes models for evaluating the dependability of an intelligent monitoring system. The proposed models are stochastic Petri nets (SPN) models that allow evaluation of the availability and reliability of the proposed monitoring architecture. The models investigate the impact of maintenance issues on proposals to increase availability. When analyzing the maintenance routines, it was possible to perceive the peculiarity of each one, as well as some negative points in one of them. This study can help speed up planning smart monitoring systems by showing a low-cost method compared to a real test.
... The potential applications include vehicle detection and classification (Lu and Dai 2023), human detection and recognition (Nikouei et al. 2018), incident detection (Sharma and Sungheetha 2021), and illegal activity detection (Bhatti et al. 2021). If the deployed surveillance cameras are properly calibrated, the applicability of the surveillance system can be further widened, enabling capabilities such as speed measurement (Yang et al. 2019) and behavior understanding (Giannakeris et al. 2018). ...
... The simplest method for finding the vehicle position is using the center of the bounding box given by object detectors [15][16][17]. This method is computationally efficient and easy to implement because it simply reuses conventional object detection results. ...
Article
Full-text available
In intelligent transportation systems, it is essential to estimate the vehicle position accurately. To this end, it is preferred to detect vehicles as a bottom face quadrilateral (BFQ) rather than an axis-aligned bounding box. Although there have been some methods for detecting the vehicle BFQ using vehicle-mounted cameras, few studies have been conducted using surveillance cameras. Therefore, this paper conducts a comparative study on various approaches for detecting the vehicle BFQ in surveillance camera environments. Three approaches were selected for comparison, including corner-based, position/size/angle-based, and line-based. For comparison, this paper suggests a way to implement the vehicle BFQ detectors by simply adding extra heads to one of the most widely used real-time object detectors, YOLO. In experiments, it was shown that the vehicle BFQ can be adequately detected by using the suggested implementation, and the three approaches were quantitatively evaluated, compared, and analyzed.
... A novel vehicle classification technique has been developed based on multiple pavement strains caused by moving traffic loads [33]. The overlap of vehicle classification feature parameters belonging to different classes suggested the need to use a pattern recognition technique for separating vehicles into different groups [34]. To improve classification accuracy and robustness centralized and distributed fusion schemes based on two popular SVMs multi-class algorithms were used as fusion multiple sensor data [35]. ...
... IoT in a mode of transportation is very important because there are existing media in several modes of transportation, such as online motorcycle taxis, with IoT media, so data transmission and data reception can be sent quickly and can be processed immediately [34]. ...
... The community also has taken advantage of the recent developments in DL-based image/video processing that yield superior performance far beyond the conventional methods [26]. DL methods have also enabled developing well-annotated volume datasets (e.g., highD dataset 2018) [27] that in turn led to developing even more powerful video processing methods for autonomous and safe driving applications, such as vehicle detection [28], [29], plate recognition [30], [31], traffic sign classification [28], [29], lane detection [32], [33], and abnormal driving detection from surveillance video [34], [35]. ...
Article
Full-text available
Driving safety analysis has recently experienced unprecedented improvements thanks to technological advances in precise positioning sensors, artificial intelligence (AI)-based safety features, autonomous driving systems, connected vehicles, high-throughput computing, and edge computing servers. Particularly, deep learning (DL) methods empowered volume video processing to extract safety-related features from massive videos captured by roadside units (RSU). Safety metrics are commonly used measures to investigate crashes and near-conflict events. However, these metrics provide limited insight into the overall network-level traffic management. On the other hand, some safety assessment efforts are devoted to processing crash reports and identifying spatial and temporal patterns of crashes that correlate with road geometry, traffic volume, and weather conditions. This approach relies merely on crash reports and ignores the rich information of traffic videos that can help identify the role of safety violations in crashes. To bridge these two perspectives, we define a new set of network-level safety metrics (NSM) to assess the overall safety profile of traffic flow by processing imagery taken by RSU cameras. Our analysis suggests that NSMs show significant statistical associations with crash rates. This approach is different than simply generalizing the results of individual crash analyses, since all vehicles contribute to calculating NSMs, not only the ones involved in crash incidents. This perspective considers the traffic flow as a complex dynamic system where actions of some nodes can propagate through the network and influence the crash risk for other nodes. The analysis is carried out using six video cameras in the state of Arizona along with a 5-year crash report obtained from the Arizona Department of Transportation (ADOT). The results confirm that NSMs modulate the baseline crash probability. Therefore, online monitoring of NSMs can be used by traffic management teams and AI-based traffic monitoring systems for risk analysis and traffic control.
... Ahmadi et al. [151] clustered optical flow features to learn motion patterns in the traffic phase for detecting abnormal vehicle behavior, e.g., abnormal driving and not following traffic laws. In [152], the global amplitudes of optical flow in each frame were utilized to obtain the optical flow descriptors of traffic scenes. The descriptors and Fisher vector representing the spatiotemporal visual volumes were employed to detect traffic violations. ...
Article
Full-text available
A semantic understanding of road traffic can help people understand road traffic flow situations and emergencies more accurately and provide a more accurate basis for anomaly detection and traffic prediction. At present, the overview of computer vision in traffic mainly focuses on the static detection of vehicles and pedestrians. There are few in-depth studies on the semantic understanding of road traffic using visual methods. This paper aims to review recent approaches to the semantic understanding of road traffic using vision sensors to bridge this gap. First, this paper classifies all kinds of traffic monitoring analysis methods from the two perspectives of macro traffic flow and micro road behavior. Next, the techniques for each class of methods are reviewed and discussed in detail. Finally, we analyze the existing traffic monitoring challenges and corresponding solutions.
... Giannakeris et al. [6] introduced a fully automatic camera calibration algorithm to estimate the speed of vehicles. ...
Article
Full-text available
In this article, an effective solution has been presented to assist a driver in taking decisions for overtaking under adverse night time dark condition on a two-lane single carriageway road. Here, an awkward situation of the road where a vehicle is just in front of the test vehicle in the same direction and another vehicle coming from the opposite direction is considered. As the environmental condition is very dark, so only headlights and taillights of any vehicle are visible. Estimation of distance and speed with greater accuracy, especially at night where vehicles are not visible is really a challenging task. The proposed assistance system can estimate the actual and relative speed and the distance of the slow vehicle in front of the test vehicle and the vehicle coming from the opposite direction by observing taillights and headlights respectively. Subsequently, required gap, road condition level, speed and acceleration for safe overtaking are estimated. Finally, the overtaking decision is made in a such way that there should not be any collision between vehicles. Several real time experiments reveal that the estimation achieves a great accuracy with safe condition over the state-of-the-art techniques using a low-cost 2D camera.
... Authors of [114] have used 3D-tube representation of trajectories as features using the contextual proximity of neighboring trajectory for learning normal trajectory. In [125], Fisher vector corresponding to each trajectory obtained using optical flow of the object and its position, has been used. Histogram of optical flow and motion entropy (HOFME) have been used in [91]. ...
Article
Full-text available
Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. Timely detection of traffic violations and abnormal behavior of pedestrians at public places through computer vision and visual surveillance can be highly effective for maintaining traffic order in cities. However, despite a handful of computer vision–based techniques proposed in recent times to understand the traffic violations or other types of on-road anomalies, no methodological survey is available that provides a detailed insight into the classification techniques, learning methods, datasets, and application contexts. Thus, this study aims to investigate the recent visual surveillance–related research on anomaly detection in public places, particularly on road. The study analyzes various vision-guided anomaly detection techniques using a generic framework such that the key technical components can be easily understood. Our survey includes definitions of related terminologies and concepts, judicious classifications of the vision-guided anomaly detection approaches, detailed analysis of anomaly detection methods including deep learning–based methods, descriptions of the relevant datasets with environmental conditions, and types of anomalies. The study also reveals vital gaps in the available datasets and anomaly detection capability in various contexts, and thus gives future directions to the computer vision–guided anomaly detection research. As anomaly detection is an important step in automatic road traffic surveillance, this survey can be a useful resource for interested researchers working on solving various issues of Intelligent Transportation Systems (ITS).
... Reference distances measured or assumed from features such as road markings or standard lane widths (Huang, 2018;Tran et al., 2018), combined with 'vanishing points' at which parallel lines meet on the image domain, provide parameters enabling for camera calibration through algorithmic optimisation (Tang et al., 2018). Vanishing point methods have also been adapted for fully automatic application with the use of vehicle dimension estimation, vehicle motion analysis and diamond space accumulation algorithms (Dubská et al., 2015;Giannakeris and Briassouli, 2018;Sochor et al., 2017). However, with the use of background modelling and cluster analysis of vehicle trajectories derived from video footage, perspective transformation is not strictly necessary for deriving vehicle speed estimations (Xiong, 2018). ...
Article
Full-text available
A workflow is devised in this paper by which vehicle speeds are estimated semi-automatically via fixed DSLR camera. Deep learning algorithm YOLOv2 was used for vehicle detection, while Simple Online Realtime Tracking (SORT) algorithm enabled for tracking of vehicles. Perspective projection and scale factor were dealt with by remotely mapping corresponding image and real-world coordinates through a homography. The ensuing transformation of camera footage to British National Grid Coordinate System, allowed for the derivation of real-world distances on the planar road surface, and subsequent simultaneous vehicle speed estimations. As monitoring took place in a heavily urbanised environment, where vehicles frequently change speed, estimations were determined consecutively between frames. Speed estimations were validated against a reference dataset containing precise trajectories from a GNSS and IMU equipped vehicle platform. Estimations achieved an average root mean square error and mean absolute percentage error of 0.625 m/s and 20.922 % respectively. The robustness of the method was tested in a real-world context and environmental conditions.
Article
The road event (e.g., traffic accidents) usually causes traffic jams, especially during rush hours, and drivers will be very interested to see what happened and how it goes. We propose the framework named COSense for collecting photos of road events from cameras in vehicles. Different to existing applications, CoSense uses a human-machine collaborative way to opportunistically collect photos. Firstly, after the event is reported by some people’s smart phones, its location can be estimated. And by the aid of this location information, the vehicle-mounted smart camera (e.g., driving recorder) will take photos of the events opportunistically as soon as the vehicle comes by the event. Because the vehicle speed is too fast to take photos of road events, CoSense uses a self-adjustment photo collection method for the vehicle-mounted smart camera. Based on predicting the position of the vehicle in advance, the shooting direction of the camera can be adjusted to obtain more photos that can cover the event. Secondly, in order to push high-quality events pictures to drivers, CoSense selects photos by analyzing both the reliability of the photo-taking command and photos’ time stamps. CoSense is evaluated under different conditions, and experimental results relying on eighty real-world datasets demonstrate the effectiveness of COSense.
Article
With the rapid development of connected autonomous vehicles (CAVs), both road infrastructure and transport are experiencing a profound transformation. In recent years, the cooperative perception and control supported infrastructure-vehicle system (IVS) attracted increasing attention in the field of intelligent transportation systems (ITS). The perception information of surrounding objects can be obtained by various types of sensors or communication networks. Control commands generated by CAVs or infrastructure can be executed promptly and accurately to improve the overall performance of the transportation system in terms of safety, efficiency, comfort and energy saving. This study presents a comprehensive review of the research progress achieved upon cooperative perception and control supported IVS over the past decade. By focusing on the essential interactions between infrastructure and CAVs and between CAVs, the infrastructure-vehicle cooperative perception and control methods are summarized and analyzed. Furthermore, the mining site as a closed scenario was used to show the current application of IVS. Finally, the existing issues of the cooperative perception and control technology implementation are discussed, and the recommendation for future research directions are proposed.
Article
Video processing solutions for motion analysis are key tasks in many computer vision applications, ranging from human activity recognition to object detection. In particular, speed estimation algorithms may be relevant in contexts such as street monitoring and environment surveillance. In most realistic scenarios, the projection of a framed object of interest onto the image plane is likely to be affected by dynamic changes mainly related to perspective transformations or periodic behaviours. Therefore, advanced speed estimation techniques need to rely on robust algorithms for object detection that are able to deal with potential geometrical modifications. The proposed method is composed of a sequence of pre-processing operations, that aim to reduce or neglect perspectival effects affecting the objects of interest, followed by the estimation phase based on the Maximum Likelihood (ML) principle, where the speed of the foreground objects is estimated. The ML estimation method represents, indeed, a consolidated statistical tool that may be exploited to obtain reliable results. The performance of the proposed algorithm is evaluated on a set of real video recordings and compared with a block-matching motion estimation algorithm. The obtained results indicate that the proposed method shows good and robust performance.
Article
Full-text available
Detection of abnormal events in the traffic scene is very challenging and is a significant problem in video surveillance. The authors proposed a novel scheme called super orientation optical flow (SOOF)‐based clustering for identifying the abnormal activities. The key idea behind the proposed SOOF features is to efficiently reproduce the motion information of a moving vehicle with respect to superorientation motion descriptor within the sequence of the frame. Here, the authors adopt the mean absolute temporal difference to identify the anomalies by motion block (MB) selection and localisation. SOOF features obtained from MB are used as motion descriptor for both normal and abnormal events. Simple and efficient K‐means clustering is used to study the normal motion flow during the training. The abnormal events are identified using the nearest‐neighbour searching technique in the testing phase. The experimental outcome shows that the proposed work is effectively detecting anomalies and found to give results better than the state‐of‐the‐art techniques.
Conference Paper
Full-text available
This paper introduces an auto-calibration mechanism for an Automatic Number Plate Recognition camera dedicated to a vehicle speed measurement. A calibration task is formulated as a multi-objective optimization problem and solved with Non-dominated Sorting Genetic Algorithm. For simplicity a uniform motion profile of a majority of vehicles is assumed. The proposed speed estimation method is based on tracing licence plates quadrangles recognized on video frames. The results are compared with concurrent measurements performed with piezoelectric sensors.
Article
Full-text available
In this paper, we study the trade-off between accuracy and speed when building an object detection system based on convolutional neural networks. We consider three main families of detectors --- Faster R-CNN, R-FCN and SSD --- which we view as "meta-architectures". Each of these can be combined with different kinds of feature extractors, such as VGG, Inception or ResNet. In addition, we can vary other parameters, such as the image resolution, and the number of box proposals. We develop a unified framework (in Tensorflow) that enables us to perform a fair comparison between all of these variants. We analyze the performance of many different previously published model combinations, as well as some novel ones, and thus identify a set of models which achieve different points on the speed-accuracy tradeoff curve, ranging from fast models, suitable for use on a mobile phone, to a much slower model that achieves a new state of the art on the COCO detection challenge.
Article
Full-text available
In recent years, numerous effective multi-object tracking (MOT) methods are developed because of the wide range of applications. Existing performance evaluations of MOT methods usually separate the object tracking step from the object detection step by using the same fixed object detection results for comparisons. In this work, we perform a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset. The UA-DETRAC benchmark dataset consists of 100 challenging video sequences captured from real-world traffic scenes (over 140, 000 frames with rich annotations, including occlusion, weather, vehicle category, truncation, and vehicle bounding boxes) for object detection, object tracking and MOT system. We evaluate complete MOT systems constructed from combinations of state-of-the-art object detection and object tracking methods. Our analysis shows the complex effects of object detection accuracy on MOT system performance. Based on these observations, we propose new evaluation tools and metrics for MOT systems that consider both object detection and object tracking for comprehensive analysis.
Article
Full-text available
This paper presents a hierarchical framework for detecting local and global anomalies via hierarchical feature representation and Gaussian process regression (GPR) which is fully non-parametric and robust to the noisy training data, and supports sparse features. While most research on anomaly detection has focused more on detecting local anomalies, we are more interested in global anomalies that involve multiple normal events interacting in an unusual manner such as car accidents. To simultaneously detect local and global anomalies, we cast the extraction of normal interactions from the training videos as a problem of finding the frequent geometric relations of the nearby sparse spatio-temporal interest points (STIPs). A codebook of interaction templates is then constructed and modeled using GPR, based on which a novel inference method for computing the likelihood of an observed interaction is also developed. Thereafter, these local likelihood scores are integrated into globally consistent anomaly masks, from which anomalies can be succinctly identified. To the authors' best knowledge, it is the first time GPR is employed to model the relationship of the nearby STIPs for anomaly detection. Simulations based on four widespread datasets show that the new method outperforms the main state-of-the-art methods with lower computational burden.
Conference Paper
Full-text available
Vehicle speed estimation using Closed Circuit Television (CCTV) is one of the interesting issues in the field of computer vision. Various approaches are used to perform automation in vehicle speed estimation using CCTV. In this study, the use of Gaussian Mixture Model (GMM) for vehicle detection has been improved with the hole-filling method (HF). The speed estimation of the vehicles with various scenarios has been done, and gives the best estimation with the deviation of 7.63 Km/hr. GMM fusion with hole-filling algorithm combined with Pinhole models have shown the best results compared with results using other scenarios.
Article
Full-text available
This paper introduces a novel probabilistic activity modeling approach that mines recurrent sequential patterns called motifs from documents given as word $\times $ time count matrices (e.g., videos). In this model, documents are represented as a mixture of sequential activity patterns (our motifs) where the mixing weights are defined by the motif starting time occurrences. The novelties are multi fold. First, unlike previous approaches where topics modeled only the co-occurrence of words at a given time instant, our motifs model the co-occurrence and temporal order in which the words occur within a temporal window. Second, unlike traditional Dynamic Bayesian networks (DBN), our model accounts for the important case where activities occur concurrently in the video (but not necessarily in synchrony), i.e., the advent of activity motifs can overlap. The learning of the motifs in these difficult situations is made possible thanks to the introduction of latent variables representing the activity starting times, enabling us to implicitly align the occurrences of the same pattern during the joint inference of the motifs and their starting times. As a third novelty, we propose a general method that favors the recovery of sparse distributions, a highly desirable property in many topic model applications, by adding simple regularization constraints on the searched distributions to the data likelihood optimization criteria. We substantiate our claims with experiments on synthetic data to demonstrate the algorithm behavior, and on four video datasets with significant variations in their activity content obtained from static cameras. We observe that using low-level motion features from videos, our algorithm is able to capture sequential patterns that implicitly represent typical trajectories of scene objects.
Article
Full-text available
This paper addresses the problem of fully automated mining of public space video data, a highly desirable capability under contemporary commercial and security considerations. This task is especially challenging due to the complexity of the object behaviors to be profiled, the difficulty of analysis under the visual occlusions and ambiguities common in public space video, and the computational challenge of doing so in real-time. We address these issues by introducing a new dynamic topic model, termed a Markov Clustering Topic Model (MCTM). The MCTM builds on existing dynamic Bayesian network models and Bayesian topic models, and overcomes their drawbacks on sensitivity, robustness and efficiency. Specifically, our model profiles complex dynamic scenes by robustly clustering visual events into activities and these activities into global behaviours with temporal dynamics. A Gibbs sampler is derived for offline learning with unlabeled training data and a new approximation to online Bayesian inference is formulated to enable dynamic scene understanding and behaviour mining in new video data online in real-time. The strength of this model is demonstrated by unsupervised learning of dynamic scene models for four complex and crowded public scenes, and successful mining of behaviors and detection of salient events in each.
Article
Full-text available
During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the application field, most of the works in event analysis are based on two main approaches: the former based on explicit event recognition, focused on finding high-level, semantic interpretations of video sequences, and the latter based on anomaly detection. This paper deals with the second approach, where the final goal is not the explicit labeling of recognized events, but the detection of anomalous events differing from typical patterns. In particular, the proposed work addresses anomaly detection by means of trajectory analysis, an approach with several application fields, most notably video surveillance and traffic monitoring. The proposed approach is based on single-class support vector machine (SVM) clustering, where the novelty detection SVM capabilities are used for the identification of anomalous trajectories. Particular attention is given to trajectory classification in absence of a priori information on the distribution of outliers. Experimental results prove the validity of the proposed approach.
Article
Full-text available
We propose a novel method to model and learn the scene activity, observed by a static camera. The proposed model is very general and can be applied for solution of a variety of problems. The motion patterns of objects in the scene are modeled in the form of a multivariate nonparametric probability density function of spatiotemporal variables (object locations and transition times between them). Kernel Density Estimation is used to learn this model in a completely unsupervised fashion. Learning is accomplished by observing the trajectories of objects by a static camera over extended periods of time. It encodes the probabilistic nature of the behavior of moving objects in the scene and is useful for activity analysis applications, such as persistent tracking and anomalous motion detection. In addition, the model also captures salient scene features, such as the areas of occlusion and most likely paths. Once the model is learned, we use a unified Markov Chain Monte Carlo (MCMC)-based framework for generating the most likely paths in the scene, improving foreground detection, persistent labeling of objects during tracking, and deciding whether a given trajectory represents an anomaly to the observed motion patterns. Experiments with real-world videos are reported which validate the proposed approach.
Conference Paper
Full-text available
We present two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes. They allow to discover temporal rules, such as the right of way between different lanes or typical traffic light sequences. To extract them, sequences of activities need to be learned. While the first method extracts rules based on a learned topic model, the second model called DDP-HMM jointly learns co-occurring activities and their time dependencies. To this end we employ Dependent Dirichlet Processes to learn an arbitrary number of infinite Hidden Markov Models. In contrast to previous work, we build on state-of-the-art topic models that allow to automatically infer all parameters such as the optimal number of HMMs necessary to explain the rules governing a scene. The models are trained offline by Gibbs Sampling using unlabeled training data.
Article
Full-text available
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.
Conference Paper
Full-text available
We propose a novel unsupervised learning framework for activity perception. To understand activities in complicated scenes from visual data, we propose a hierarchical Bayesian model to connect three elements: low-level visual features, simple "atomic" activities, and multi-agent interactions. Atomic activities are modeled as distributions over low-level visual features, and interactions are modeled as distributions over atomic activities. Our models improve existing language models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) by modeling interactions without supervision. Our data sets are challenging video sequences from crowded traffic scenes with many kinds of activities co-occurring. Our approach provides a summary of typical atomic activities and interactions in the scene. Unusual activities and interactions are found, with natural probabilistic explanations. Our method supports flexible high-level queries on activities and interactions using atomic activities as components.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
In this paper, we focus on fully automatic traffic surveillance camera calibration which we use for speed measurement of passing vehicles. We improve over a recent state-of-the-art camera calibration method for traffic surveillance based on two detected vanishing points. More importantly, we propose a novel automatic scene scale inference based on matching bounding boxes of rendered 3D models of vehicles with detected bounding boxes in the image. The proposed method can be used from an arbitrary viewpoint and it has no constraints on camera placement. We evaluate our method on recent comprehensive dataset for speed measurement BrnoCompSpeed. Experiments show that our automatic camera calibration by detected two vanishing points method reduces the error by 50% compared to the previous state-of-the-art method. We also show that our scene scale inference method is much more precise (mean speed measurement error 1.10km/h) outperforming both state of the art automatic calibration method (error reduction by 86% -- mean error 7.98km/h) and manual calibration (error reduction by 19% -- mean error 1.35km/h). We also present qualitative results of automatic camera calibration method on video sequences obtained from real surveillance cameras on various places and under different lighting conditions (night, dawn, day).
Article
We propose a method for fully automatic calibration of traffic surveillance cameras. This method allows for calibration of the camera-including scale-without any user input, only from several minutes of input surveillance video. The targeted applications include speed measurement, measurement of vehicle dimensions, vehicle classification, etc. The first step of our approach is camera calibration by determining three vanishing points defining the stream of vehicles. The second step is construction of 3D bounding boxes of individual vehicles and their measurement up to scale. We propose to first construct the projection of the bounding boxes and then, by using the camera calibration obtained earlier, create their 3D representation. In the third step, we use the dimensions of the 3D bounding boxes for calibration of the scene scale. We collected a dataset with ground truth speed and distance measurements and evaluate our approach on it. The achieved mean accuracy of speed and distance measurement is below 2%. Our efficient C++ implementation runs in real time on a low-end processor (Core i3) with a safe margin even for full-HD videos. © 2014. The
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
This paper deals with automatic calibration of roadside surveillance cameras. We focus on parameters necessary for measurements in traffic-surveillance applications. Contrary to the existing solutions, our approach requires no a priori knowledge, and it works with a very wide variety of road settings (number of lanes, occlusion, quality of ground marking), as well as with practically unlimited viewing angles. The main contribution is that our solution works fully automatically—without any per-camera or per-video manual settings or input whatsoever—and it is computationally inexpensive. Our approach uses tracking of local feature points and analyzes the trajectories in a manner based on cascaded Hough transform and parallel coordinates. An important assumption for the vehicle movement is that at least a part of the vehicle motion is approximately straight—we discuss the impact of this assumption on the applicability of our approach and show experimentally that this assumption does not limit the usability of our approach severely. We efficiently and robustly detect vanishing points, which define the ground plane and vehicle movement, except for the scene scale. Our algorithm also computes parameters for radial distortion compensation. Experiments show that the obtained camera parameters allow for measurements of relative lengths (and potentially speed) with $sim$2% mean accuracy. The processing is performed easily in real time, and typically, a 2-min-long video is sufficient for stable calibration.
Article
In this paper, we propose a method for modeling trajectory patterns with both regional and velocity observations through the probabilistic topic model. By embedding Gaussian models into the discrete topic model framework, our method uses continuous velocity as well as regional observations unlike existing approaches. In addition, the proposed framework combined with Hidden Markov Model can cover the temporal transition of the scene state, which is useful in checking a violation of the rule that some conflict topics (e.g. two cross-traffic patterns) should not occur at the same time. To achieve online learning even with the complexity of the proposed model, we suggest a novel learning scheme instead of collapsed Gibbs sampling. The proposed two-stage greedy learning scheme is not only efficient at reducing the search space but also accurate in a way that the accuracy of online learning becomes not worse than that of the batch learning. To validate the performance of our method, experiments were conducted on various datasets. Experimental results show that our model explains satisfactorily the trajectory patterns with respect to scene understanding, anomaly detection, and prediction.
Conference Paper
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in understanding an object's precise 2D location. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old along with per-instance segmentation masks. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
Article
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in understanding an object's precise 2D location. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old along with per-instance segmentation masks. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
Article
The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies -- any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the Discrete Fourier Transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new Kernelized Correlation Filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call Dual Correlation Filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.
Article
Vehicle speed measurement (VSM) based on video images represents the development direction of speed measurement in the intelligent transportation systems (ITS). This paper presents a novel vehicle speed measurement method, which contains the improved three-frame difference algorithm and the proposed gray constraint optical flow algorithm. By the improved three-frame difference algorithm, the contour of moving vehicles can be detected exactly. Through the proposed gray constraint optical flow algorithm, the vehicle contour's optical flow value, which is the speed (pixels/s) of the vehicle in the image, can be computed accurately. Then, the velocity (km/h) of the vehicles is calculated by the optical flow value of the vehicle's contour and the corresponding ratio of the image pixels to the width of the road. The method can yield a better optical flow field by reducing the influence of changing lighting and shadow. Besides, it can reduce computation obviously, since it only calculates the moving target contour's optical flow value. Experimental comparisons between the method and other VSM methods show that the proposed approach has a satisfactory estimate of vehicle speed.
Conference Paper
In this paper, we present a new algorithm for estimating individual vehicle speed based on two consecutive images captured from a traffic safety camera system. Its principles are first, both images are transformed from the image plane to the 3D world coordinates based on the calibrated camera parameters. Second, the difference of the two transformed images is calculated, resulting in the background being eliminated and vehicles in the two images are mapped onto one image. Finally, a block feature of the vehicle closest to the ground is matched to estimate vehicle travel distance and speed. Experimental results show that the proposed method exhibits good and consistent performance. When compared with speed measurements obtained from speed radar, averaged estimation errors are 3.27% and 8.51% for day-time and night-time test examples respectively, which are better than other previously published results. The proposed algorithm can be easily extended to work on image sequences
Article
Compared to other anomalous video event detection approaches that analyze object trajectories only, we propose a context-aware method to detect anomalies. By tracking all moving objects in the video, three different levels of spatiotemporal contexts are considered, i.e., point anomaly of a video object, sequential anomaly of an object trajectory, and co-occurrence anomaly of multiple video objects. A hierarchical data mining approach is proposed. At each level, frequency-based analysis is performed to automatically discover regular rules of normal events. Events deviating from these rules are identified as anomalies. The proposed method is computationally efficient and can infer complex rules. Experiments on real traffic video validate that the detected video anomalies are hazardous or illegal according to traffic regulations.
Estimating the support of a high-dimensional distribution
  • B Schlkopf
  • J C Platt
  • J Shawe-Taylor
  • A J Smola
  • R C Williamson