ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.


Video surveillance systems have become an indispensable tool for the security and organization of public and private areas. Most of the current commercial video surveillance systems rely on a classical client/server architecture to perform face and object recognition. In order to support the more complex and advanced video surveillance systems proposed in the last years, companies are required to invest resources in order to maintain the servers dedicated to the recognition tasks. In this work, we propose a novel distributed protocol for a face recognition system that exploits the computational capabilities of the surveillance devices (i.e. cameras) to perform the recognition of the person. The cameras fall back to a centralized server if their hardware capabilities are not enough to perform the recognition. In order to evaluate the proposed algorithm we simulate and test the 1NN and weighted kNN classification algorithms via extensive experiments on a freely available dataset. As a prototype of surveillance devices we have considered Raspberry PI entities. By means of simulations, we show that our algorithm is able to reduce up to 50% of the load from the server with no negative impact on the quality of the surveillance service.
Distributed Video Surveillance Using Smart Cameras
Hanna Kavalionak ·Claudio Gennaro ·
Giuseppe Amato ·Claudio Vairo ·
Costantino Perciante ·Carlo Meghini ·Fabrizio Falchi
Received: 28 November 2017 / Accepted: 19 September 2018
© Springer Nature B.V. 2018
Abstract Video surveillance systems have become an
indispensable tool for the security and organization
of public and private areas. Most of the current com-
mercial video surveillance systems rely on a classical
client/server architecture to perform face and object
recognition. In order to support the more complex
and advanced video surveillance systems proposed
in the last years, companies are required to invest
resources in order to maintain the servers dedicated to
the recognition tasks. In this work, we propose a novel
distributed protocol for a face recognition system that
H. Kavalionak ()
University of Florence, via Morgagni 65, Firenze, Italy
C. Gennaro ·G. Amato ·C. Vairo ·C. Perciante ·
C. Meghini ·F. Fal c h i
Information Science and Technologies Institute, National
Research Council (ISTI-CNR), Pisa, Italy
C. Gennaro
G. Amato
C. Vairo
C. Perciante
C. Meghini
F. Fa l c hi
exploits the computational capabilities of the surveil-
lance devices (i.e. cameras) to perform the recognition
of the person. The cameras fall back to a centralized
server if their hardware capabilities are not enough
to perform the recognition. In order to evaluate the
proposed algorithm we simulate and test the 1NN
and weighted kNN classification algorithms via exten-
sive experiments on a freely available dataset. As a
prototype of surveillance devices we have considered
Raspberry PI entities. By means of simulations, we
show that our algorithm is able to reduce up to 50%
of the load from the server with no negative impact on
the quality of the surveillance service.
Keywords Distributed architectures ·
Internet of things ·Video surveillance ·
1 Introduction
Video surveillance is of paramount importance in
areas like law enforcement, military and even for com-
mercial environment. A way to execute the surveil-
lance is to stream the data from the cameras to the
displays of the human operators, who are responsi-
ble to analyze the video. Human resources used in
the field of the video surveillance services are both
costly and not reliable. The person who is supposed
to follow and analyze the surveillance video cannot
keep the concentration for a long time, and can miss
J Grid Computing (2019) 17:59–77
/ Published online: 25 October 2018
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Digital image and video processing is actually considered as very dynamic area with critical applications often encountered in our daily lives, such as military and commercial drones, computer vision, surveillance of sensitive areas, identification in access control, automated inspection of the industry and many other areas [1]- [3]. Although some applications do not require a high data processing capacity, while other embedded applications require strong constraints such as real time, power, cost, and so on [4]. ...
... As shown in Fig. 12, the IP includes two ports of type "Stream", it must work with another IP, type DMA (Direct Memory Access), the latter has the advantage of being directly linked to the RAM of the FPGA platform, it will handle the communication of the incoming and outgoing pixel stream of our IP convolution in a fast way. The proposed architecture also includes: 1) Block [1]: The convolution IP block is the designed HLS architecture, 2) Block [2]: The Zynq7 Processing System block is the block containing the embedded ARM processor, which will be used for the software architecture, 3) Block [3]: The AXI Interconnect0 block will handle the interconnections between the different blocks of the hardware architecture. 4) Block [4]: The AXI DMA block will read and write the pixel stream between the IP core and the RAM. ...
... In the scheme that we propose in this paper, stream producers autonomously and adaptively self-organize in order to gather similar streams, thus allowing to optimally place standing queries throughout the network. Cai et al. [10] Load-Balancing Cluster similar services together Emphasis on services, Decentralized and allow deploy services rather than data System closer to the users that need them Kavalionak et al. [35] Load-Balancing Allow the coordination of distributed Tailored to a specific use Decentralized entities to balance the case. Data is analyzed only System computational cost of running for a recognition task; do services not allow other query mechanisms. ...
... As another example, Kavalionak et al. [35] present a distributed mechanism that allow geographically distributed streaming devices (i.e., surveillance cameras) to coordinate (among themselves and a remote server) in order to carry out recognition activities by sharing and balancing the associated computational cost. ...
Full-text available
NOA-AID (Network Overlays for Adaptive information Aggregation, Indexing and Discovery on the fog) is an approach for decentralized indexing, aggregation and discovery of data belonging to streams. It is organized on two network layers. The upper layer is in charge of delivering an information discovery approach by providing a distributed index structure. The lower layer is devoted to resource aggregation based on epidemic protocols designed for highly dynamic environment, well suited to stream-oriented scenarios. It defines a flexible approach to express queries targeting highly heterogeneous data, as well as a self-organizing dynamic system allowing the optimal resolution of queries on the most suitable stream producers. The paper also presents a theoretical study and discusses the costs related to information management operations; it also gives an empirical validation of findings. Finally, it reports an extended experimental evaluation that demonstrated the ability of NOA-AID to be effective and efficient for retrieving information extracted from streams in highly-dynamic and distributed processing architectures.
... • Safety insurance of public space to identify suspicious activities using specialized software to infer attributes of individuals, e.g. estimation of gender and age , detect left luggage and monitor unauthorized access, using automatic behavior analysis and event detection for crime prevention [195,206,119,198,8]. ...
World has recently witnessed a surge of criminal and terrorist activities that took the lives of many innocent people. Although CCTV are becoming ubiquitous and intrusive being largely deployed to survey public and strategic areas such as airports, metro stations and shopping malls, the identification of suspects using automated methods is still a challenging task to stop further terrorist attacks or prevent crimes. Law enforcement agencies can make use of surveillance systems for the safety of our neighborhood and crime prevention or resolving. In fact, it is impossible regardless the size of manpower recruited to monitor and analyze the immense amount of CCTV footage recorded either offline or streamed at real time. The use of surveillance technology should without doubt assist to lessen the risks and number of crimes by serving as a deterrent. Biometric technologies can be a major milestone to improve the automation process of visual surveillance in order to recognize criminal offenders and track them across different places. Gait defined as the way we walk, is considered recently as a more suited modality for people recognition in surveillance scenarios. This is because it can be captured nonintrusively and covertly from a distance even with poor resolution imageries. Gait biometrics can be of benefits not only for identify recognition, but it can play a vital role to enhance the automation process for surveillance systems including reidentification and people tracking across different cameras. Moreover, a biometric signature constructed from the gait rhythmic motion pattern is considered the only likely identification method suitable for covert surveillance and reliably not prone to spoofing attacks and signature forgery. Numerous research studies haveconfirmed the potentials of using gait for people identification in surveillance and forensic scenarios, but only a few studies have investigated the contribution of motion- based features on the recognition process. we explore, in our work, the use of optical flow estimated from consecutive frames to construct a discriminative biometric signature for gait recognition. Set of different Local and global optical. flow based features have been proposed and a set of experiments are carried out using the CASIA-B dataset to assess the discriminatory potency of motion-based analyzed features for gait identification subjected to different covariate factors including clothing and carrying conditions. Further experiments are conducted to explore the effects of the dataset size, the number of frames and viewpoint on the classification process. Based on a dataset containing 1240 video sequences for 124 individuals, higher recognition rates are achieved using the KNN and neural network classifiers without incorporating static and anthropometric measurements. This confirms that gait identification using motion-based features is perceivable with acceptable recognition rates even under different covariate factors and real world environmental covariates.
... The problem of detecting and recognizing people in images or videos has become of central importance in many video surveillance applications [3][4][5]16]. The issue of facial recognition from a drone perspective has, however, been addressed more recently in literature [8]. ...
... El procesamiento de imágenes y videos digitales es un área dinámica con aplicaciones críticas respecto al tiempo. Las mismas se encuentran cotidianamente en casos como drones militares y comerciales, sistemas de visión por computadora, vigilancia de áreas sensibles, control de acceso, inspección automatizada en industrias entre otras [1]- [3]. Si bien en estas aplicaciones la tasa de procesamiento es importante, existen otras restricciones como la capacidad de operación en tiempo real, el consumo de potencia, o el nivel de integración del sistema. ...
Full-text available
La aplicación de sistemas de procesamiento de imágenes en edge computing resulta cada vez más atractiva y necesaria. Sin embargo, las exigencias en cuanto a consumo de potencia y alto rendimiento impiden que puedan utilizarse plataformas de procesamiento estándares. En este aspecto, los FPGA son una buena opción para el desarrollo de sistemas de visión computacional a causa de su capacidad de explotación del paralelismo. Por otra parte, el flujo de diseño de las herramientas de síntesis de FPGA actuales admiten lenguajes de alta abstracción como descripciones de entrada, en contraposición a los lenguajes de descripción de hardware. La síntesis de alto nivel (HLS) automatiza el proceso de diseño al transformar la descripción algorítmica en hardware digital mientras se satisfacen las limitaciones del diseño. Sin embargo, a los expertos en procesamiento de imágenes puede resultarles compleja la integración hardware obtenida con el resto de los componentes del sistema, como por ejemplo interfaces de captura y visualización. En este trabajo, se presenta un diseño base para la construcción de aplicaciones de procesamiento de imágenes basada en Zynq. Se proporciona además una metodología que posibilita el desarrollo eficiente de soluciones de procesamiento de imágenes embebidas de manera ágil.
... With the advent of deep learning and artificial intelligence methods, there are important breakthroughs in various surveillance based computer vision tasks, such as pedestrian detection and video summarization [6]- [8]. From the viewpoint of system design, the distributed learning paradigm has been used in developing an edge computing based Distributed Intelligent Video Surveillance (DIVS) system [9], [10]. When edge computing meets the DIVS systems, the huge data communication overhead, high latency, and severe packet loss limitations can be readily solved. ...
Conference Paper
Full-text available
From the mutual empowerment of two high-speed development technologies: artificial intelligence and edge computing , we propose a tailored Edge Intelligent Video Surveillance (EIVS) system. It is a scalable edge computing architecture and uses multitask deep learning for relevant computer vision tasks. Due to the potential application of different surveillance devices are widely different, we adopt a smart IoT module to normalize the video data of different cameras, thus the EIVS system can conveniently found proper data for a specific task. In addition, the deep learning models can be deployed at every EIVS nodes, to make computer vision tasks on the normalized data. Meanwhile, due to the training and deploying of deep learning model are usually separated, for the related tasks in the same scenario, we propose to collaboratively train the depth learning models in a multitask paradigm on the cloud server. The simulation results on the publicly available datasets show that the system continuously supports intelligent monitoring tasks, has good scalability, and can improve performance through multitask learning.
The wide availability of heterogeneous resources at the Edge of the network is gaining a central role in defining and developing new computing paradigms for both the infrastructures and the applications. However, it becomes challenging to optimize the system’s behaviour, due to the Edge’s highly distributed and dynamic nature. Recent solutions propose new decentralized, self-adaptive approaches to face the needs of this scenario. One of the most challenging aspect is related to the optimization of the system’s energy consumption. In this paper, we propose a fully decentralized solution that limits the energy consumed by the system, without failing to match the users expectations, defined as the services’ Quality of Experience (QoE). Specifically, we propose a scheme where the autonomous coordination of entities at Edge is able to reduce the energy consumption by reducing the number of instances of the applications executed in system. This result is achieve without violating the services’ QoE, expressed in terms of latency. Experimental evaluations through simulation conducted with PureEdgeSim demonstrate the effectiveness of the approach.
Purpose The paper proposes a privacy-preserving artificial intelligence-enabled video surveillance technology to monitor social distancing in public spaces. Design/methodology/approach The paper proposes a new Responsible Artificial Intelligence Implementation Framework to guide the proposed solution's design and development. It defines responsible artificial intelligence criteria that the solution needs to meet and provides checklists to enforce the criteria throughout the process. To preserve data privacy, the proposed system incorporates a federated learning approach to allow computation performed on edge devices to limit sensitive and identifiable data movement and eliminate the dependency of cloud computing at a central server. Findings The proposed system is evaluated through a case study of monitoring social distancing at an airport. The results discuss how the system can fully address the case study's requirements in terms of its reliability, its usefulness when deployed to the airport's cameras, and its compliance with responsible artificial intelligence. Originality/value The paper makes three contributions. First, it proposes a real-time social distancing breach detection system on edge that extends from a combination of cutting-edge people detection and tracking algorithms to achieve robust performance. Second, it proposes a design approach to develop responsible artificial intelligence in video surveillance contexts. Third, it presents results and discussion from a comprehensive evaluation in the context of a case study at an airport to demonstrate the proposed system's robust performance and practical usefulness.
Full-text available
Internet of Things (IoT) has become an important network paradigm and there are lots of smart devices connected by IoT. IoT systems are producing massive data and thus more and more IoT applications and services are emerging. Machine learning, as an another important area, has obtained a great success in several research fields such as computer vision, computer graphics, natural language processing, speech recognition, decision-making, and intelligent control. It has also been introduced in networking research. Many researches study how to utilize machine learning to solve networking problems, including routing, traffic engineering, resource allocation, and security. Recently, there has been a rising trend of employing machine learning to improve IoT applications and provide IoT services such as traffic engineering, network management, security, Internet traffic classification, and quality of service optimization. This survey paper focuses on providing an overview of the application of machine learning in the domain of IoT. We provide a comprehensive survey highlighting the recent progresses in machine learning techniques for IoT and describe various IoT applications. The application of machine learning for IoT enables users to obtain deep analytics and develop efficient intelligent IoT applications. This paper is different from the previously published survey papers in terms of focus, scope, and breadth; specifically, we have written this paper to emphasize the application of machine learning for IoT and the coverage of most recent advances. This paper has made an attempt to cover the major applications of machine learning for IoT and the relevant techniques, including traffic profiling, IoT device identification, security, edge computing infrastructure, network management and typical IoT applications. We also make a discussion on research challenges and open issues.
Conference Paper
Full-text available
In this paper, a framework for collaborative face recognition from video sequences in a multi-camera environment is proposed. Collaboration between cameras allows for higher recognition performance in both the common and non-common field-of-view (FOV) cases. For the latter, the appearance of an object in a nearby camera is predicted using the last tracked position of the object paired with a time-of-arrival model between camera pairs. An experiment using four cameras in an office environment confirms the applicability and performance gains of the proposed framework .
Conference Paper
Tracking several objects across multiple cameras is essential for collaborative monitoring in distributed camera networks. The tractability of the related optimization aiming at tracking a maximal number of important targets, decreases with the growing number of objects moving across cameras. To tackle this issue, a viable model and sound object representation, which can leverage the power of existing tool at run-time for a fast computation of solution, is required. In this paper, we provide a formalism to object tracking across multiple cameras. A first assignment of objects to cameras is performed at start-up to initialize a set of distributed trackers in embedded cameras. We model the run-time self-coordination problem with target handover by encoding the problem as a run-time binding of objects to cameras. This approach has successively been used in high-level system synthesis. Our model of distributed tracking is based on Answer Set Programming, a declarative programming paradigm, that helps formulate the distribution and target handover problem as a search problem, such that by using existing answer set solvers, we produce stable solutions in real-time by incrementally solving time-based encoded ASP problems. The effectiveness of the proposed approach is proven on a 3-node camera network deployment.
Continuous mobile vision is limited by the inability to efficiently capture image frames and process vision features. This is largely due to the energy burden of analog readout circuitry, data traffic, and intensive computation. To promote efficiency, we shift early vision processing into the analog domain. This results in RedEye, an analog convolutional image sensor that performs layers of a convolutional neural network in the analog domain before quantization. We design RedEye to mitigate analog design complexity, using a modular column-parallel design to promote physical design reuse and algorithmic cyclic reuse. RedEye uses programmable mechanisms to admit noise for tunable energy reduction. Compared to conventional systems, RedEye reports an 85% reduction in sensor energy, 73% reduction in cloudlet-based system energy, and a 45% reduction in computation-based system energy.
Distributed query processing is of paramount importance in next-generation distribution services, such as Internet of Things (IoT) and cyber-physical systems. Even if several multi-attribute range queries supports have been proposed for peer-to-peer systems, these solutions must be rethought to fully meet the requirements of new computational paradigms for IoT, like fog computing. This paper proposes dragon, an efficient support for distributed multi-dimensional range query processing targeting efficient query resolution on highly dynamic data. In dragon nodes at the edges of the network collect and publish multi-dimensional data. The nodes collectively manage an aggregation tree storing data digests which are then exploited, when resolving queries, to prune the sub-trees containing few or no relevant matches. Multi-attribute queries are managed by linearizing the attribute space through space filling curves. We extensively analysed different aggregation and query resolution strategies in a wide spectrum of experimental set-ups. We show that dragon manages efficiently fast changing data values. Further, we show that dragon resolves queries by contacting a lower number of nodes when compared to a similar approach in the state of the art.
We describe a distributed face recognition strategy that is implemented across a set of independent wireless sensor nodes which have no shared storage. Each node contains a camera of limited aperture, whose images are matched against a randomly selected feature of each face in a statically defined training set the training set is represented in a distributed in-network storage device that emerges based on a shared protocol running on the sensor nodes. Nodes communicate their findings to other nodes in their immediate vicinity, and eventually the network aggregates these findings to conclude on the identity of the observed target the independence of the recognition processes on nodes gives rise to the possibility that conflicting results may arise. We compare two consensus building strategies that we have designed to allow the entire network to arrive at a consistent conclusion using efficient communications. Our consensus strategies are based respectively on an optimization of gossiping, and on a distributed feature aggregation process that allows the recognition confidence on each node to monotonically increase.