To read the full-text of this research, you can request a copy directly from the authors.
... However, surveillance cameras capture a huge number of videos, dealing with which human operators have to burden a large amount of labor. With the development of artificial intelligence, this problem can be resolved using automated anomaly detectors [1][2][3]. Extensive works including supervised models [5][6][7][8][9][10] and unsupervised models [15,16,18,22,24] have been proposed in recent years [4]. ...
... For online GNG, we attempt to answer four specific questions: (1) when and how to insert a new neuron, (2) how to adjust learning rates to learn input samples efficiently, (3) how to define and delete useless neurons, (4) when to stop the learning. Accordingly, we propose a series of online neighbor-related strategies that utilize the relationships between input samples and network neurons. ...
... Following [16,45,51,53,52], we plot the Receiver Operating Characteristic (ROC) curves using frame-level and pixel-level measurements, as given in Fig. 12. Involved approaches are with following abbreviations: Hist+SVDD 2016 [53], OADC-SA 2015 [52], CCUKL 2014 [51], SS 2013 [61], MDT 2013 [45], Fast SC 2013 [16], SRC 2013 [15], SF 2009 [44], SF+MPPCA 2009 [60], online SOM 2010 [24], original GNG, and our online GNG. 2 In Fig. 12, it is clear that our online GNG demonstrates the best performance in both scenes. The main reason could be that most other approaches use limited training samples for modeling while our online model keeps on training to improve itself also in the testing stage. ...
... Several studies that analyze the state of the art of surveillance systems can be found in the literature (e.g. Hu et al., 2004; Valera & Velastin, 2006; Kang & Deng, 2007; Kumar et al., 2008; Haering et al., 2008). These works focus on revising the architecture of surveillance systems and the algorithms used for visual surveillance. ...
... The functionality offered by surveillance systems improves as new technologies turn up and as users demand new solutions to their problems. From a historical point of view, it is acknowledged that the evolution of surveillance systems has gone through three generations (Valera & Velastin, 2006). In the first generation (1960–1980), closed-circuit television (CCTV) analog systems were used, which consisted of several cameras connected to a series of monitors. ...
... F E R N A ´ N D E Z -C A B A L L E R O According to Attwood and Watson (2004), it is impossible to define one single general purpose architecture optimum to carry out intelligent surveillance, since there are too many variables and restrictions which vary depending on the specific installation and the user's requirements. The systems previously described do not use agents, when the use of a multiagent distribution approximation can, as a matter of fact, offer several advantages for developing a surveillance system (Valera & Velastin, 2006). First, intelligent cooperation between agents may allow the use of less expensive sensors, and therefore a larger number of sensors may be deployed over a greater area. ...
This article revises the state of the art of the application of agent technology within the scope of surveillance systems. Thus, the potential of the practical use of the concepts and technologies of the agent paradigm can be identified and evaluated in this domain. Current surveillance systems are noted for using several devices, heterogeneous in many instances, distributed along the observed scenario, while incorporating a certain degree of intelligence to alert the operator proactively to what is going on in the observed scenario and prevent the operator from having to observe the monitors continuously. The basic characteristics of the agents (autonomy, reactivity, proactiveness and social ability), along with multiagent systems’ characteristics (distributed data management, low coupling, robustness, communication and coordination between autonomous entities), suggest that the agency is a good choice for solving problems which appear and are dealt with in surveillance systems.
... However, surveillance cameras capture a huge number of videos, dealing with which human operators 5 have to burden a large amount of labor. With the development of artificial intelligence, this problem can be resolved using automated anomaly detectors [1,2,3]. Extensive works including supervised models [5,6,7,8,9,10] and unsupervised models [15,16,18,22,24] have been proposed in recent years [4]. ...
... For online GNG, we attempt to answer four specific questions: (1) when and how to insert a new neuron, (2) how to adjust learning rates to learn input samples efficiently, (3) how to define and delete useless neurons, (4) when to stop the learning. Accordingly, we propose a series of online neighbor-related strate- 55 gies that utilize the relationships between input samples and network neurons. ...
Anomaly detection is still a challenging task for video surveillance due to complex environments and unpredictable human behaviors. Most existing approaches train offline detectors using manually labeled data and predefined parameters, and are hard to model changing scenes. This paper introduces a neural network based model called online Growing Neural Gas (online GNG) to perform an unsupervised learning. Unlike a parameter-fixed GNG, our model updates learning parameters continuously, for which we propose several online neighbor-related strategies. Specific operations, namely neuron insertion, deletion, learning rate adaptation and stopping criteria selection, get upgraded to online modes. In the anomaly detection stage, the behavior patterns far away from our model are labeled as anomalous, for which far away is measured by a time-varying threshold. Experiments are implemented on three surveillance datasets, namely UMN, UCSD Ped1/Ped2 and Avenue dataset. All datasets have changing scenes due to mutable crowd density and behavior types. Anomaly detection results show that our model can adapt to the current scene rapidly and reduce false alarms while still detecting most anomalies. Quantitative comparisons with 12 recent approaches further confirm our superiority.
... The surveillance systems range from simple, analogue Closed-Circuit Television (CCTV) systems to sophisticated networks of infra-red and motion sensors in sensitive areas such as banks and museums. The London Underground and London Heathrow Airport have more than 5000 cameras [1], for instance. Simultaneously monitoring multiple image trains becomes tedious and monotonous for human operators with typically short attention spans and cognitive limits on how many screens one may attentively observe simultaneously. ...
... A range of algorithms have been proposed for so-called third-generation systems [3]. The goals of current research are to develop algorithms that attract the attention of a human operator in real-time based on end-user requirements, process information arriving from a multi-sensor environment at high rates, and use low-cost standard components [1,3]. ...
Many schemes have been presented over the years to develop automated visual surveillance systems. However, these schemes typically need custom equipment, or involve significant complexity and storage requirements. In this paper we present three software-based agents built using kernel machines to perform automated, real-time intruder detection in surveillance systems. Kernel machines provide a powerful data mining technique that may be used for pattern matching in the presence of complex data. They work by first mapping the raw input data onto a (often much) higher-dimensional feature space, and then clustering in the feature space instead. The reasoning is that mapping onto the (higher-dimensional) feature space enables the comparison of additional, higher-order correlations in determining patterns between the raw data points. The agents proposed here have been built using algorithms that are adaptive, portable, do not require any expensive or sophisticated components, and are lightweight and efficient having run times of the order of hundredths of a second. Through application to real image streams from a simple, run-of-the-mill closed-circuit television surveillance system, and direct quantitative performance comparison with some existing schemes, we show that it is possible to easily obtain high detection accuracy with low computational and storage complexities.
... The intelligent video surveillance system is a convergence technology including detecting and tracking objects, analyzing their movements, and responding to them [3,13]. We propose a method detecting and tracking multiple moving objects, 1 This work was partially supported by the National Research Foundation of Korea Grant funded by the Korean Government (MEST) ( which includes the basic technologies of the intelligent video surveillance systems. ...
... .13 shows the error between the actual position and the predicted position of groups formed by the moving objects. ...
This paper deals with an intelligent image processing method for the video surveillance systems. We propose a technology detecting and tracking multiple moving objects, which can be applied to consumer electronics such as home and business surveillance systems consisting of an internet protocol (IP) camera and a network video recorder (NVR). A real-time surveillance system needs to detect moving objects robustly against noises and environment. So the proposed method uses the red-green-blue (RGB) color background modeling with a sensitivity parameter to extract moving regions, the morphology to eliminate noises, and the blob-labeling to group moving objects. To track moving objects fast, the proposed method predicts the velocity and the direction of the groups formed by moving objects. Finally, the experiments show that the proposed method has the robustness against the environmental influences and the speed, which are suitable for the real- time surveillance system.
... The use of surveillance systems has grown exponentially during the last decade, and has been applied in many different environments [29] . A distributed configuration is mandatory to get scalable and robust surveillance applications ([28], [27], [18], [22] ). ...
A great number of methodologies to develop multi-agent systems (MAS) have been proposed in the last few years. But a unique
methodology cannot be general enough to be useful for everyone without some level of customization. According to our knowledge,
existent agent-based surveillance systems have been developed ad-hoc and no methodology has been followed. We are interested
in creating tools that allow to model and to generate monitoring environments. This has motivated the selection of Prometheus
and INGENIAS methodologies, to take advantage of both approaches in developing agent-based applications. In this paper a collection
of equivalences between the concepts used in both methodologies is described extensively.
... For all these situations, automation and the tracking of objects provide an opportunity to deploy staff only where they are really needed. This enables security personnel to be freed from having to acquire and manually track targets [19]. @BULLET Learning Camera Topology: An intelligent surveillance system must capture and track objects to establish a history of their behaviour, classify the objects (as people or vehicles, etc. of particular types), and establish their trajectories in a 3D space. ...
The development and capabilities of closed circuit television surveillance systems in association with distributed computing systems are reviewed, and the applications to various aspects of surveillance are described.
Unusual crowd activity detection is a challenging problem in surveillance video applications because feature extraction is difficult process in crowded scenes. The main objective of this research work is to detect unusual crowd activities and to detect unusual splits of moving objects. Various methods have been employed to address these challenges. However, there is still a lack of appropriate handling of this problem due to frames having occlusion, noise, and congestion. This paper proposes a novel clustering approach to detect unusual crowd activities. The proposed method consists of five phases including foreground extraction, foreground enhancement, foreground estimation, clustering crowds, and the Unusual Crowd Activities (UCA) model. The UCA model can find unusual crowd activities and unusual splits of moving objects using the Laplacian Matrix formulation. Two public datasets viz. PETS 2009 and UMN dataset are used for evaluating the proposed methodology. To estimate the effectiveness of the proposed work, several unusual event detection methods are compared with the proposed work results. The experimental results revealed that the proposed method gives better results than the existing methods.
In this paper we investigate five traditional techniques to extract features within a face image, and we evaluate them by applying the Kernel-based Online Anomaly Detection (KOAD) algorithm. The main objective of this work is to explore the various fundamental feature extraction techniques that can be used to identify whether a person’s face is covered by a mask or not. Although face covering or wearing a mask is recommended during this global COVID-19 pandemic, deliberate face occlusion is considered to be a suspicious activity in a normal scenario. Even during this pandemic, it may be considered suspicious if someone is covering his/her face inside an ATM booth or an apartment complex during odd hours, for instance. Our proposed framework detects such intrusion activity by combining a traditional face detection algorithm with KOAD. Comparative analysis is performed for each filter used and we show that our proposed system achieves high detection accuracy with low computational complexity, while also providing the added benefits of being adaptive, portable, and involving low infrastructural costs.
Visual surveillance networks are installed in many sensitive places in the present world. Human security officers are required to continuously stare at large numbers of monitors simultaneously, and for lengths of time at a stretch. Constant alert vigilance for hours on end is difficult to maintain for human beings. It is thus important to remove the onus of detecting unwanted activity from the human security officer to an automated system. While many researchers have proposed solutions to this problem in the recent past, significant gaps remain in existing knowledge. Most existing algorithms involve high complexities. No quantitative performance analysis is provided by most researchers. Most commercial systems require expensive equipment. This work proposes algorithms where the complexities are independent of time, making the algorithms naturally suited to online use. In addition, the proposed methods have been shown to work with the simplest surveillance systems that may already be publicly deployed. Furthermore, direct quantitative performance comparisons are provided.
We propose a new algorithm based on machine learning techniques for automatic intruder detection in visual surveillance networks. The proposed algorithm is theoretically founded on the concept of Minimum Volume Sets. Through application to image sequences from two different scenarios and comparison with existing algorithms, we show that it is possible for our proposed algorithm to easily obtain high detection accuracy with low false alarm rates.
Many types of automated visual surveillance systems have been presented in the recent literature. Most of the schemes require custom equipment, or involve significant complexity and storage needs. After studying the area in detail, this work presents four novel algorithms to perform automated, real-time intruder detection in surveillance networks. Built using machine learning techniques, the proposed algorithms are adaptive and portable, do not require any expensive or sophisticated component, are lightweight, and efficient with runtimes of the order of hundredths of a second. Two of the proposed algorithms have been developed by us. With application to two complementary data sets and quantitative performance comparisons with two representative existing schemes, we show that it is possible to easily obtain high detection accuracy with low false positives.
Due to the development and market expansion of image analysis and recognition technology, video security such as CCTV cameras and digital storage devices, are required for real-time monitoring systems and intelligent video security systems. This includes the development of more advanced technologies. A rotatable PTZ camera, in a CCTV camera system, has a zoom function so you can acquire a precise picture. However it can cause blind spots, and can not monitor two or more moving objects at the same time. This study concerns, the intelligent tracking of multiple moving objects, CCTV systems, and methods of video surveillance. An intelligent video surveillance system is proposed. It can accurately shoot broad areas and track multiple objects at the same time, much more effectively than using one fixed camera for an entire area or two or more PTZ cameras.
In this chapter we target the problem of monitoring an environment with a team of mobile robots having on board video-cameras
and fixed stereo cameras available within the environment. Current research regards homogeneous robots, whereas in this chapter
we study highly heterogeneous systems and consider the problem of patrolling an area with a dynamic set of agents. The system
presented in the chapter provides enhanced multi-robot coordination and vision-based activity monitoring techniques. The main
objective is the integration and development of coordination techniques for multi-robot environment coverage, with the goal
of maximizing the quality of information gathered from a given area thus, implementing a Heterogeneous mobile and reconfigurable multi-camera video-surveillance system.
This paper presents a methodology for evaluating the performance of video surveillance tracking systems. We introduce a novel framework for performance evaluation using pseudo-synthetic video, which employs data captured online and stored in a surveillance database. Tracks are automatically selected from the surveillance database and then used to generate ground truthed video sequences with a controlled level of perceptual complexity that can be used to quantitatively characterise the quality of the tracking algorithms.
The paper describes a novel integrated vision system in which two autonomous visual modules are combined to interpret a dynamic scene. The first module employs a 3D model-based scheme to track rigid objects such as vehicles. The second module uses a 2D deformable model to track non-rigid objects such as people. The principal contribution is a novel method for handling occlusion between objects within the context of this hybrid tracking system. The practical aim of the work is to derive a scene description that is sufficiently rich to be used in a range of surveillance tasks. The paper describes each of the modules in outline before detailing the method of integration and the handling of occlusion in particular. Experimental results are presented to illustrate the performance of the system in a dynamic outdoor scene involving cars and people.
The recognition of activities from sensory data is important in advanced surveillance systems to enable prediction of high-level goals and intentions of the target under surveillance. The problem is complicated by sensory noise and complex activity spanning large spatial and temporal extents. The paper presents a system for recognizing high-level human activities from multi-camera video data in complex spatial environments. The Abstract Hidden Markov mEmory Model (AHMEM) is used to deal with noise and scalability. The AHMEM is an extension of the Abstract Hidden Markov Model (AHMM) that allows us to represent a richer class of both state-dependent and context-free behaviors. The model also supports integration with low-level sensory models and efficient probabilistic inference. We present experimental results showing the ability of the system to perform real-time monitoring and recognition of complex behaviors of people from observing their trajectories within a real, complex indoor environment.
This work describes an active vision framework that is able to
perform visual monitoring tasks involving attention control and pattern
categorization behaviors. We use an articulated stereo vision platform
and image processing device, which provides abstracted information about
the environment. As a practical result of this work, the system can
select a region of interest in its environment, perform attention shifts
involving saccadic movements, and perform an efficient feature
extraction and recognition. Also, attentional maps of the scene are
incrementally constructed, and the maps are kept consistent with a
current perception of the world. Another important result for the
attentional mechanism is that the system is capable of analyzing all
regions of its world, selected according to salience maps
The requirement for flexible operation is becoming increasingly important in modern industrial systems. This requirement has to be supported at all system levels, including the field level in process industry, as well as the cell and machine control levels in manufacturing industry, where fieldbus-based communication systems are commonly found. Furthermore, typical applications at these levels require both time- and event-triggered communication services, in most cases under stringent timing constraints, to convey state data in the former case and alarms and management data in the latter. However, neither the requirement for flexible operation under guaranteed timeliness nor for joint support of time and event-triggered traffic are efficiently fulfilled by most of existing fieldbus systems. This paper presents a new protocol, flexible time-triggered communication on controller area network, which fulfills both requirements: it supports time-triggered communication in a flexible way as well as being an efficient combination of both time- and event-triggered traffic with temporal isolation. These types of traffic are handled by two complementary subsystems, the synchronous and the asynchronous messaging systems, respectively. The paper includes a justification for the new protocol as well as its description and worst case temporal analysis for both subsystems. This analysis shows the capability of the protocol to convey real-time traffic of either type.
Research in the surveillance domain was confined for years in the
military domain. Recently, as military spending for this kind of
research was reduced and the technology matured, the attention of the
research and development community turned to commercial applications of
surveillance. In this paper we describe a state-of-the-art monitoring
system developed by a corporate R&D lab in cooperation with the
corresponding security business units. It represents a sizable effort to
transfer some of the best results produced by computer vision research
into a viable commercial product. Our description spans both practical
and technical issues. From the practical point of view we analyze the
state of the commercial security market, typical cultural differences
between the research team and the business team and the perspective of
the potential users of the technology. These are important issues that
have to be dealt with or the surveillance technology will remain in the
lab for a long time. From the technical point of view we analyze our
algorithmic and implementation choices. We describe the improvements we
introduced to the original algorithms reported in the literature in
response to some problems that arose during field testing. We also
provide extensive experimental results that highlight the strong points
and some weaknesses of the prototype system
We propose a method for learning models of people's motion behaviors in indoor environments. As people move through their environments, they do not move randomly. Instead, they often engage in typical motion patterns, related to specific locations that they might be interested in approaching and specific trajectories that they might follow in doing so. Knowledge about such patterns may enable a mobile robot to develop improved people following and obstacle avoidance skills. This paper proposes an algorithm that learns collections of typical trajectories that characterize a person's motion patterns. Data, recorded by mobile robots equipped with laser-range finders, is clustered into different types of motion using the popular expectation maximization algorithm, while simultaneously learning multiple motion patterns. Experimental results, obtained using data collected in a domestic residence and in an office building, illustrate that highly predictive models of human motion patterns can be learned.
If robotic agents are to act autonomously they must have the ability to construct and reason about models of their physical environment. For example, planning to achieve goals requires knowledge of how the robot's actions affect the state of the world over time. The traditional approach of handcoding this knowledge is often quite difficult, especially for robotic agents with rich sensing abilities that exist in dynamic and uncertain environments. Ideally, robots would acquire knowledge of their environment and then use this knowledge to act. We present an unsupervised learning method that allows a robotic agent to identify and represent qualitatively different outcomes of actions. Experiments with a Pioneer-1 mobile robot demonstrate the utility of the approach with respect to capturing the structure and dynamics of a complex, real-world environment, and show that the models acquired by the robot correlate surprisingly well with human models of the environment.
A framework for multi-user distributed virtual environments (DVEs) has been proposed. The proposed framework, incorporating two models (the functional model and the interconnection model), attempts to represent the common functionality, communication issues and requirements found in multi-user DVEs. The functional model concentrates on the DVE's functionality, while the interconnection model concentrates on how the components are interconnected to realize the required functionality. The models have been specified using the Unified Modeling Language (UML). An experimental case study demonstrates the applicability and generality of the proposed approach.
In this paper we explore the use of priority progress streaming (PPS) for video surveillance applications. PPS is an adaptive streaming technique for the delivery of continuous media over variable bit-rate channels. It is based on the simple idea of reordering media components within a time window into priority order before transmission. The main concern when using PPS for live video streaming is the time delay introduced by reordering. In this paper we describe how PPS can be extended to support live streaming and show that the delay inherent in the approach can be tuned to satisfy a wide range of latency constraints while supporting fine-grain adaptation.
This paper presents the work done on video sequence
interpretation. We propose a framework based on two kinds of a priori
knowledge: predefined scenarios and contextual information. This
approach has been applied on video sequences of the AVS-PV visual
surveillance European project
A system architecture and method for tracking people is presented for a sports application. The system input is video data from static cameras with overlapping fields-of-view at a football stadium. The output is the real-world, real-time positions of football players during a match. The system comprises two processing stages, operating on data from first a single camera and then multiple cameras. The organisation of processing is designed to achieve sufficient synchronisation between cameras, using a request-response pattern, invoked by the second stage multi-camera tracker. The single-view processing includes change detection against an adaptive background and image-plane tracking to improve the reliability of measurements of occluded players. The multiview process uses Kalman trackers to model the player position and velocity, to which the multiple measurements input from the single-view stage are associated. Results are demonstrated on real data.