[Show abstract][Hide abstract] ABSTRACT: In Intelligent Video Systems, most of the recent advanced performance evaluation metrics perform a stage of mapping data between the system results and ground truth. This paper aims to review these metrics using a proposed framework. It will focus on metrics for events detection, objects detection and objects tracking systems.
Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on; 10/2010
[Show abstract][Hide abstract] ABSTRACT: This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session. As an original contribution, we explain how these functionalities can be exploited to offer improved viewing experience, when accessing high-resolution or multi-views video content through individual and potentially bandwidth-constrained connections. This is done by giving the client the opportunity to select interactively a preferred version among the multiple streams that are offered to render the scene. An instance of this architecture has been implemented extending the liveMedia streaming library and using the x264 video encoder. Automatic methods have been designed and implemented to generate the multiple versions of the streamed content. In a surveillance scenario, the versions are constructed by sub-sampling the original high resolution image, or by cropping the image sequence to focus on regions of interest, in a temporally consistent way. In a soccer game context, zoomed-in versions of far view shots are computed and offered as alternatives to the sub-sampled sequence. We demonstrate the feasibility and relevance of the approach through subjective experiments.
Multimedia and Expo (ICME), 2010 IEEE International Conference on; 08/2010
[Show abstract][Hide abstract] ABSTRACT: 2010 Second International Conferences on Advances in Multimedia, Athens, Greece, June 13-June 19 This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session. As an original contribution, these functionalities can be exploited to offer improved viewing experience, when accessing football content through individual and potentially bandwidth constrained connections. Starting from a conventional broadcasted content, our system automatically splits the native content into non-overlapping and semantically consistent segments. Each segment is then divided into shots, based on conventional view boundary detection. Shots are finally splitted in small clips. These clips support our browsing capabilities during the whole playback in a temporally consistent way. Multiple versions are automatically created to render each clip. Versioning depends on the view type of the initial shot, and typically corresponds to the generation of zoomed in and spatially or temporally subsampled video streams. Clips are encoded independently so that the server can decide on the fly the version to send as a function of the semantic relevance of the segments (in a user-transparent basis, as inferred from video analysis or metadata) and the interactive user requests. Replaying certain game actions is also offered upon request. The streaming is automatically switched to the requested event. Later, the playback is resumed without any offset. The capabilities of our system rely on the H.264/AVC standard. We use soccer videos to validate our framework in subjective experiments showing the feasibility and relevance of our system.
[Show abstract][Hide abstract] ABSTRACT: Stand-alone cameras or CCTV networks are nowadays commonly present in public areas such as city centers, stores and more recently in transportation infrastructures. In the meantime, automatic processing of video data is a field of activity stirring up the utmost attention in the pattern recognition community; state-of-the-art advances in this area enable the reliable extraction of features and the investigation of numerous applications dedicated to ITS. A first obvious field of application of Video Content Analysis (VCA) consists in improving safety and security in transport's context. Embedded VCA in vehicles can track pedestrians to avoid collisions, improving safety. Used in railway station, VCA are able to detect left luggage allowing to enhance security. Video streams available from such installations may also represent a useful source of information for statistical transportation applications, e.g. monitoring of road traffic conditions or providing accurate counting statistics in railway/subway stations. This paper proposes an overview of the VCA applications in terms of safety, security and efficiency for ITS, with a specific focus on the usability of such VCA systems (emerging research topics, state-of-the-art studies, already commercialized applications, etc).
Intelligent Transport Systems Telecommunications,(ITST),2009 9th International Conference on; 11/2009
[Show abstract][Hide abstract] ABSTRACT: For the 9000 train accidents reported each year in the European Union , the Recording Strip (RS) and Filling-Card (FC) related to the train activities represent the only usable evidence for SNCF (the French railway operator) and most of National authorities. More precisely, the RS contains information about the train journey, speed and related Driving Events (DE) such as emergency brakes, while the FC gives details on the departure/arrival stations. In this context, a complete checking for 100% of the RS was recently voted by French law enforcement authorities (instead of the 5% currently performed), which raised the question of an automated and efficient inspection of this huge amount of recordings. To do so, we propose a machine vision prototype, constituted with cassettes receiving RS and FC to be digitized. Then, a video analysis module firstly determines the type of RS among eight possible types; time/speed curves are secondly extracted to estimate the covered distance, speed and stops, while associated DE are finally detected using convolution process. A detailed evaluation on 15 RS (8000 kilometers and 7000 DE) shows very good results (100% of good detections for the type of band, only 0.28% of non detections for the DE). An exhaustive evaluation on a panel of about 100 RS constitutes the perspectives of the work.
Proceedings of SPIE - The International Society for Optical Engineering 02/2009;
[Show abstract][Hide abstract] ABSTRACT: The major drawback of interactive retrieval systems is the potential frustration of the user that is caused by an excessive labelling work. Active learning has proven to help solving this issue, by carefully selecting the examples to present to the user. In this context, the design of the user interface plays a critical role since it should invite the user to label the examples elected by the active learning. This paper presents the design and evaluation of an innovative user interface for image retrieval. It has been validate using real-life IEEE PETS video surveillance data. In particular, we investigated the most appropriate repartition of the display area between the retrieved video frames and the active learning examples, taking both objective and subjective user satisfaction parameters into account. The flexibility of the interface relies on using a scalable representation of the video content such as JPEG 2000.
[Show abstract][Hide abstract] ABSTRACT: This paper presents a new fusion scheme for enhancing the result quality based on the combination of multiple different detectors. We present a study showing the fusion of multiple video analysis detectors like "detecting unattended luggage" in video sequences. One of the problems is the time jitter between different detectors, i.e. typically one system can trigger an event several seconds before another one. Another issue is the computation of the adequate fusion of realigned events. We propose a fusion system that overcomes these problems by being able (i) In the learning stage to match off-line the ground truth events with the result of the detectors events using a dynamic programming scheme (ii) To learn the relation between ground truth and result (iii) To fusion in real-time the events from different detectors thanks to the learning stage in order to maximize the global quality of result. We show promising results by combining outputs of different video analysis detector technologies.
Proceedings of SPIE - The International Society for Optical Engineering 01/2009;
[Show abstract][Hide abstract] ABSTRACT: On-board video analysis has attracted a lot of interest over the two last decades, mainly for safety improvement (through e.g. obstacles detection or drivers assistance). In this context, our study aims at providing a video-based real-time understanding of the urban road traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate the speed of the vehicles on the adjacent lanes when the bus operates on its reserved lane. We propose to work on 1-D segments drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and tracking features along each of these segments, while the absolute speed of vehicles is estimated from the relative one thanks to odometer and/or GPS data. Using pre-defined speed thresholds, the traffic can be classified in real-time into different categories such as "fluid", "congestion"... As demonstrated in the evaluation stage, the proposed solution offers both good performances and low computing complexity, and is also compatible with cheap video cameras, which allows its adoption by city traffic management authorities.
[Show abstract][Hide abstract] ABSTRACT: Scientific advances in the development of video processing algorithms now allow various distributed and collaborative vision-based
applications. However, the lack of recognised standard in this area drives system developers to build specific systems, preventing
from e.g. content analysis components upgrade or system reuse in different environments. As a result, the need for a generic,
context-independent and adaptive system for storing and managing video analysis results comes out as conspicuous. In order
to address this issue, we propose a data schema-independent data warehouse backed by a multiagent system. This system relies
on the semantic web knowledge representation format, namely the RDF, to guarantee maximum adaptability and flexibility regarding
schema transformation and knowledge retrieval. The storage system itself, namely data warehouse, comes from the state-of-the-art
technologies of knowledge management, providing efficient analysis and reporting capabilities within the monitoring system.
[Show abstract][Hide abstract] ABSTRACT: On board video analysis has attracted a lot of interest over the two last decades with as main goal to improve safety by detecting obstacles or assisting the driver. Our study aims at providing a real-time understanding of the urban road traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate the speed of the vehicles on the adjacent lanes when the bus operates on a dedicated lane. We work on 1-D segments drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and tracking features along each of these segments. The absolute speed can be estimated from the relative speed if the camera speed is known, e.g. thanks to an odometer and/or GPS. Using pre-defined speed thresholds, the traffic can be classified into different categories such as 'fluid', 'congestion' etc. The solution offers both good performances and low computing complexity and is compatible with cheap video cameras, which allows its adoption by city traffic management authorities.
Proceedings of SPIE - The International Society for Optical Engineering 03/2008;
[Show abstract][Hide abstract] ABSTRACT: Even if the number of accidents involving the railway system is decreasing due to the technical progress, the statistics are still too high. For instance in 2004 in EU 25, 9309 accidents were reported, including 142 in France (ERA). For each of these accidents in France, one element can be used as evidence in the eyes of the law: the recording strip and its associated filling-card or the so-called ATESS file recorded by digital Juridical Recording Units (JRU) introduced in the mid 80s. The strip contains all the information concerning the journey of the train, speed and time recording and all the driving events (such as emergency breaking). The card features additional information on train's driver, departure/arrival stations, number of trains, etc. These two elements are presently checked manually. The idea of this project is to simplify the procedure and to perform the checking as automatically as possible. This paper then aims at presenting a whole system for the Automatic Read of Recording Strips (ARRS).
[Show abstract][Hide abstract] ABSTRACT: We present a point based reconstruction and transmission pipeline for a collaborative tele-immersion system. Two or more users in different locations collaborate with each other in a shared, simulated environment as if they were in the same physical room. Each user perceives point-based models of distant users along with collaborative data like molecule models. Disparity maps, computed by a commercial stereo solution, are filtered and transformed into clouds of 3D points. The clouds are compressed and transmitted over the network to distant users. At the other side the clouds are decompressed and incorporated into the 3D scene. The viewpoint used to display the 3D scene is dependent on the position of the head of the user. Collaborative data is manipulated through natural hand gestures. We analyse the performance of the system in terms of computation time, latency and photo realistic quality of the reconstructed models.
Proceedings of SPIE - The International Society for Optical Engineering 02/2007;
[Show abstract][Hide abstract] ABSTRACT: Today' s technologies in video analysis use state of the art systems and formalisms like onthologies and dataware-housing to handle huge amount of data generated from low-level descriptors to high-level descriptors. In the IST CARETAKER project we develop a multi-dimensional database with distributed features to add a centric data view of the scene shared between all the sensors of a network. We propose to enhance possibilities of this kind of system by delegating the intelligence to a lot of other entities, also known as "Agents" which are specialized little applications, able to walk across the network and work on dedicated sets of data related to their core domain. In other words, we can reduce, or enhance, the complexity of the analysis by adding or not feature specific agents, and processing is limited to the data concerned by the processing. This article explains how to design and develop an agent oriented systems which can be used by a video analysis datawarehousing. We also describe how this methodology can distribute the intelligence over the system, and how the system can be extended to obtain a self reasoning architecture using cooperative agents. We will demonstrate this approach.
Proceedings of SPIE - The International Society for Optical Engineering 02/2007;
[Show abstract][Hide abstract] ABSTRACT: Nowadays, video-conference tends to be more and more advantageous because of the economical and ecological cost of transport. Several platforms exist. The goal of the TIFANIS immersive platform is to let users interact as if they were physically together. Unlike previous teleimmersion systems, TIFANIS uses generic hardware to achieve an economically realistic implementation. The basic functions of the system are to capture the scene, transmit it through digital networks to other partners, and then render it according to each partner's viewing characteristics. The image processing part should run in real-time. We propose to analyze the whole system. it can be split into different services like central processing unit (CPU), graphical rendering, direct memory access (DMA), and communications trough the network. Most of the processing is done by CPU resource. It is composed of the D reconstruction and the detection and tracking of faces from the video stream. However, the processing needs to be parallelized in several threads that have as little dependencies as possible. In this paper, we present these issues, and the way we deal with them.
Proceedings of SPIE - The International Society for Optical Engineering 02/2007;
[Show abstract][Hide abstract] ABSTRACT: This paper tackles the challenge of interactively re- trieving visual scenes within surveillance sequences ac- quired with fixed camera. Contrarily to today's solu- tions, we assume that no a-priori knowledge is available so that the system must progressively learn the target scenes thanks to interactive labelling of a few frames by the user. The proposed method is based on very low-cost features extraction and integrates relevance feedback, multiple-instance SVM classification and active learn- ing. Each of these 3 steps runs iteratively over the session, and takes advantage of the progressively in- creasing training set. Repeatable experiments on both simulated and real data demonstrate the efficiency of the approach and show how it allows reaching high re- trieval performances.
2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA; 01/2007