[show abstract][hide abstract] ABSTRACT: We propose a hybrid personalized summarization framework that combines adaptive fast-forwarding and content truncation to generate comfortable and compact video summaries. We formulate video summarization as a discrete optimization problem, where the optimal summary is determined by adopting Lagrangian relaxation and convex-hull approximation to solve a resource allocation problem. To trade-off playback speed and perceptual comfort we consider information associated to the still content of the scene, which is essential to evaluate the relevance of a video, and information associated to the scene activity, which is more relevant for visual comfort. We perform clip-level fast-forwarding by selecting the playback speeds from discrete options, which naturally include content truncation as special case with infinite playback speed. We demonstrate the proposed summarization framework in two use cases, namely summarization of broadcasted soccer videos and surveillance videos. Objective and subjective experiments are performed to demonstrate the relevance and efficiency of the proposed method.
IEEE Transactions on Multimedia 01/2014; 16(2):455-469. · 1.75 Impact Factor
[show abstract][hide abstract] ABSTRACT: To evaluate multi-target video tracking results, one needs to quantify the accuracy of the estimated target-size and the cardinality error as well as measure the frequency of occurrence of ID changes. In this paper we survey existing multi-target tracking performance scores and, after discussing their limitations, we propose three parameter-independent measures for evaluating multi-target video tracking. The measures take into account target-size variations, combine accuracy and cardinality errors, quantify long-term tracking accuracy at different accuracy levels, and evaluate ID changes relative to the duration of the track in which they occur. We conduct an extensive experimental validation of the proposed measures by comparing them with existing ones and by evaluating four state-of-the-art trackers on challenging real-world publicly-available datasets. The software implementing the proposed measures is made available online to facilitate their use by the research community.
IEEE Transactions on Image Processing 11/2013; · 3.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: We propose a generic online multi-target track-before-detect (MT-TBD) that is applicable on confidence maps used as observations. The proposed tracker is based on particle filtering and automatically initializes tracks. The main novelty is the inclusion of the target ID in the particle state, enabling the algorithm to deal with unknown and large number of targets. To overcome the problem of mixing IDs of targets close to each other, we propose a probabilistic model of target birth and death based on a Markov Random Field (MRF) applied to the particle IDs. Each particle ID is managed using the information carried by neighboring particles. The assignment of the IDs to the targets is performed using Mean-Shift clustering and supported by a Gaussian Mixture Model. We also show that the computational complexity of MT-TBD is proportional only to the number of particles. To compare our method with recent state-of-the-art works, we include a postprocessing stage suited for multi-person tracking. We validate the method on real-world and crowded scenarios, and demonstrate its robustness in scenes presenting different perspective views and targets very close to each other.
[show abstract][hide abstract] ABSTRACT: Person re-identification aims to recognize the same person viewed by disjoint cameras at different time instants and locations. In this paper, after an extensive review of state-of-the-art approaches, we propose a re-identification method that takes into account the appearance of people, the spatial location of cameras and potential paths a person can choose to follow. This choice is modeled with a set of areas of interest (landmarks) that constrain the propagation of people trajectories in non-observed regions between the field-of-view of cameras. We represent people with a selective patch around their upper body to work in crowded scenes when occlusions are frequent. We demonstrate the proposed method in a challenging scenario from London Gatwick airport and compare it to well-known person re-identification methods, highlighting their strengths and limitations. Finally, we show by Cumulative Matching Characteristic curve that the best performance results by modeling people movements in non-observed regions combined with appearance methods, achieving an average improvement of 6% when only appearance is used and 15% when only motion is used for the association of people across cameras.
[show abstract][hide abstract] ABSTRACT: We introduce the concept of depth-based blurring to achieve an aesthetically acceptable distortion when reducing the bitrate in image coding. The proposed depth-based blurring is a prefiltering that reduces high-frequency components by mimicking the limited depth of field effect that occurs in cameras. To cope with the challenge of avoiding intensity leakage at the boundaries of objects when blurring at different depth levels, we introduce a selective blurring algorithm that simulates occlusion effects as occur in natural blurring. The proposed algorithm can handle any number of blurring and occlusion levels. Subjective experiments show that the proposed algorithm outperforms foveation filtering, which is the dominant approach for bitrate reduction by space-variant prefiltering.
IEEE Transactions on Image Processing 12/2011; · 3.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: We present a motion classification approach to detect movements of interest (abnormal motion) based on local feature modeling within spatio-temporal detectors. The modeling is performed using motion vectors and local detectors. The detectors are trained independently for learning abnormal motion based on labeled samples. Each detector is assigned an abnormality score, both in space and time, which is the basis of the final classification. The spatial relationship across detectors is used to discriminate simultaneous occurrences of abnormal motion. The performance of the proposed method is evaluated on 52 hours of the multi-camera surveillance dataset of the TRECVID 2010 challenge.
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic; 01/2011 · 4.63 Impact Factor
[show abstract][hide abstract] ABSTRACT: We present a novel algorithm for enhancing an image or video frame with depth of field. The algorithm deals with occlu-sive blurring effects as occur in real cameras and can handle a variation in blur which is continuous up to blur level quantization granularity, with asymptotic complexity O(N log2 N) in terms of time and memory for an N-pixel image, irrespective of the complexity of the variation in blur. The proposed algorithm is a postfiltering approach which, unlike prior algorithms, does not suffer from intensity leakage. Experimental results show the algorithm to be 3 to 4 times faster than an existing algorithm, which had the same asymptotic complexity.
18th IEEE International Conference on Image Processing, ICIP 2011, Brussels, Belgium, September 11-14, 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: We present a novel algorithm for automated video production based on content ranking. The proposed algorithm generates videos by performing camera selection while minimizing the number of inter-camera switches. We model the problem as a finite horizon Partially Observable Markov Decision Process over temporal windows and we use a multivariate Gaussian distribution to represent the content-quality score for each camera. The performance of the proposed approach is demonstrated on a multi-camera setup of fixed cameras with partially overlapping fields of view. Subjective experiments based on the Turing test confirmed the quality of the automatically produced videos. The proposed approach is also compared with recent methods based on Recursive Decision and on Dynamic Bayesian Networks and its results outperform both methods.
[show abstract][hide abstract] ABSTRACT: We present an algorithm for non-overlapping cam- era network localization using trajectory estimation. The local- ization refers to the extrinsic calibration of a network i.e., the recovery of relative position and orientation of each camera in the network on a common ground plane coordinate system. To this end, Kalman filtering is initially used to model the observed trajectories in each camera's field of view. This information is then used to estimate the missing trajectory information in the unobserved regions by integrating the results of forward and backward linear regression estimation from adjacent cameras. These estimated trajectories are then filtered and used to recover the relative position and orientation of the cameras by analyzing the estimated and observed exit and entry points of an object in each camera's field of view. We fix one camera as a reference and find the final configuration of the network by adjusting the remaining cameras with respect to this reference. We evaluate performance of the algorithm on both simulated and real data and compare the results with state-of-the-art approaches.
[show abstract][hide abstract] ABSTRACT: The growing interest in developing video tracking algorithms has not been accompanied by the development of commonly used evaluation criteria to assess and to compare their performance. Researchers often present trackers' results on different datasets and evaluate them with different performance measures thus hindering both formative and summative quality assessment. In this paper, we present a protocol to evaluate the performance of tracking algorithms that tests video trackers using a set of trials and a pre-defined set of sequences and that enables objective and reproducible performance evaluation of trackers using ground truth information. Each trial highlights strengths and weaknesses of a tracker on simulated test scenarios on real sequences that represent real-world scenarios. Moreover a new evaluation measure is introduced that allows us to summarize the performance of a tracker based on the lost-track-ratio curve. The validation and the effectiveness of the proposed protocol is demonstrated experimentally on three trackers and its implementation is made available online to the research community.
18th IEEE International Conference on Image Processing, ICIP 2011, Brussels, Belgium, September 11-14, 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: We propose a simulation environment for networks for Wireless Multimedia Sensor Networks (WMSNs), i.e. net-works with sensors capturing complex vectorial data, such as for example video and audio. The proposed simulation environment allows us to model the communication layers, the sensing and distributed applications of a WMSN. This Wireless Simulation Environment for Multimedia Networks (WiSE-MNet) is based on Castalia/Omnet++ and is available as open source to the research community . The environment is designed to be flexible and extensible, and has a simple camera model that enables the simulation of distributed computer-vision algorithms at a high level of abstraction. We demonstrate the effectiveness of WiSE-MNet with a distributed tracking application.
[show abstract][hide abstract] ABSTRACT: On-line abnormality detection in video without the use of object detection and tracking is a desirable task in surveillance.We address this problem for the case when labeled information about normal events is limited and information about abnormal events is not available. We formulate this problem as a one-class classification, where multiple local novelty classifiers (detectors) are used to first learn normal actions based on motion information and then to detect abnormal instances. Each detector is associated to a small region of interest and is trained over labeled samples projected on an appropriate subspace. We discover this subspace by using both labeled and unlabeled segments.We investigate the use of subspace learning and compare two methodologies based on linear (Principal Components Analysis) and on non-linear subspace learning (Locality Preserving Projections), respectively. Experimental results on a real underground station dataset shows that the linear approach is better suited for cases where the subspace learning is restricted to the labeled samples, whereas the non-linear approach is preferable in the presence of additional unlabeled data.
Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on; 10/2010
[show abstract][hide abstract] ABSTRACT: Video analytics, loosely defined as autonomous understanding of events occurring in a scene monitored by multiple video cameras, has been rapidly evolving in the last two decades. Despite this effort, practical surveillance systems deployed today are not yet capable of autonomous analysis of complex events in the field of view of cameras. This is a serious deficiency as video feeds from millions of surveillance cameras worldwide are not analyzed in real time and thus cannot help with accident, crime or terrorism prevention, and mitigation, issues critical to the contemporary society. Today, these feeds are, at best, recorded to facilitate post-event video forensics.
IEEE Signal Processing Magazine 10/2010; · 3.37 Impact Factor
[show abstract][hide abstract] ABSTRACT: This paper presents a computationally efficient algorithm for smoothly space-variant Gaussian blurring of images. The proposed algorithm uses a specialized filter bank with optimal filters computed through principal component analysis. This filter bank approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced computational cost compared to the brute force approach of employing a separate low-pass filter at each image location. This is particularly important for spatially variant image processing such as foveated coding. Experimental results show that the proposed algorithm provides typically 10 to 15 dB better approximation of perfect Gaussian blurring than the blended Gaussian pyramid blurring approach when using a bank of just eight filters.
[show abstract][hide abstract] ABSTRACT: We present a scene understanding strategy for video sequences based on clustering object trajectories. In this chapter, we
discuss a set of relevant feature spaces for trajectory representation and we critically analyze their relative merits. Next,
we examine various trajectory clustering methods that can be employed to learn activity models, based on their classification
into hierarchical and partitional algorithms. In particular, we focus on parametric and non-parametric partitional algorithms
and discuss the limitations of existing approaches. To overcome the limitations of state-of-the-art approaches we present
a soft partitional algorithm based on non-parametric Mean-shift clustering. The proposed algorithm is validated on real datasets
and compared with state-of-the-art approaches, based on objective evaluation metrics.
[show abstract][hide abstract] ABSTRACT: We present a technique for estimating the location of the ball during a basketball game without using a detector. The technique is based on the analysis of the dynamics in the scene and allows us to overcome the challenges due to frequent occlusions of the ball and its similarity in appearance with the background. Based on the assumption that the ball is the point of focus of the game and that the motion flow of the players is dependent on its position during attack actions, the most probable candidates for the ball location are extracted from each frame. These candidates are then validated over time using a Kalman filter. Experimental results on a real basketball dataset show that the location of the ball can be estimated with an average accuracy of 82%.
Proceedings of the International Conference on Image Processing, ICIP 2010, September 26-29, Hong Kong, China; 01/2010