A. Cavallaro

Queen Mary, University of London, Londinium, England, United Kingdom

Are you A. Cavallaro?

Claim your profile

Publications (110)100.41 Total impact

  • E. Sariyanidi, H. Gunes, A. Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: Face images in a video sequence should be registered accurately before being analysed, otherwise registration errors may be interpreted as facial activity. Subpixel accuracy is crucial for the analysis of subtle actions. In this paper we present PSTR (Probabilistic Subpixel Temporal Registration), a framework that achieves high registration accuracy. .... .....
    Asian Computer Vision Conference (ACCV'14), Singapore; 11/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Consensus-based target tracking in camera networks faces three major problems: non-linearity in the measurement model, temporary lack of measurements (naivety) due to the limited field of view (FOV) and redundancy in the iterative exchange of information. In this paper we propose two consensus-based distributed algorithms for non-linear systems using the Extended Information Filter as underlying filter to handle the non-linearity in the camera measurement model. The first algorithm is an Extended Information Consensus Filter (EICF) that overcomes the effect of naivety and non-linearity without requiring knowledge of other nodes in the network. The second algorithm is an Extended Information Weighted Consensus Filter (EIWCF) that overcomes all the three major problems (naivety, redundancy and non-linearity) but requires knowledge of the number of cameras (Nc) in the network. The basic principle of these algorithms is weighting node estimates based on their covariance information. When Nc is not available, EICF can be used at the cost of not handling the redundancy problem. Simulations with highly maneuvering targets show that the two proposed distributed non-linear consensus filters outperform the related state of the art by achieving higher accuracy and faster convergence to the centralised estimates computed by simultaneously considering the information from all the nodes.
    17th International Conference on Information Fusion (FUSION), 2014, Salamanca, Spain; 07/2014
  • Juan C. SanMiguel, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an approach for determining the temporal consistency of particle filters in video tracking based on model validation of their uncertainty over sliding windows. The filter uncertainty is related to the consistency of the dispersion of the filter hypotheses in the state space. We learn an uncertainty model via a mixture of Gamma distributions whose optimum number is selected by modified information-based criteria. The time-accumulated model is estimated as the sequential convolution of the uncertainty model. Model validation is performed by verifying whether the output of the filter belongs to the convolution model through its approximated cumulative density function. Experimental results and comparisons show that the proposed approach improves both precision and recall of competitive approaches such as Gaussian-based online model extraction, bank of Kalman filters and empirical thresholding. We combine the proposed approach with a state-of-the-art online performance estimator for video tracking and show that it improves accuracy compared to the same estimator with manually tuned thresholds while reducing the overall computational cost.
    Computer Vision and Image Understanding 07/2014; · 1.23 Impact Factor
  • Source
    F Chen, C De Vleeschouwer, A Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a hybrid personalized summarization framework that combines adaptive fast-forwarding and content truncation to generate comfortable and compact video summaries. We formulate video summarization as a discrete optimization problem, where the optimal summary is determined by adopting Lagrangian relaxation and convex-hull approximation to solve a resource allocation problem. To trade-off playback speed and perceptual comfort we consider information associated to the still content of the scene, which is essential to evaluate the relevance of a video, and information associated to the scene activity, which is more relevant for visual comfort. We perform clip-level fast-forwarding by selecting the playback speeds from discrete options, which naturally include content truncation as special case with infinite playback speed. We demonstrate the proposed summarization framework in two use cases, namely summarization of broadcasted soccer videos and surveillance videos. Objective and subjective experiments are performed to demonstrate the relevance and efficiency of the proposed method.
    IEEE Transactions on Multimedia 01/2014; 16(2):455-469. · 1.75 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Camera networks that reconfigure while performing multiple tasks have unique requirements, such as concurrent task allocation with limited resources, the sharing of data among fields of view across the network, and coordination among heterogeneous devices.
    Computer 01/2014; 47(5):67-73. · 1.68 Impact Factor
  • Tahir Nawaz, Fabio Poiesi, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: To evaluate multi-target video tracking results, one needs to quantify the accuracy of the estimated target-size and the cardinality error as well as measure the frequency of occurrence of ID changes. In this paper we survey existing multi-target tracking performance scores and, after discussing their limitations, we propose three parameter-independent measures for evaluating multi-target video tracking. The measures take into account target-size variations, combine accuracy and cardinality errors, quantify long-term tracking accuracy at different accuracy levels, and evaluate ID changes relative to the duration of the track in which they occur. We conduct an extensive experimental validation of the proposed measures by comparing them with existing ones and by evaluating four state-of-the-art trackers on challenging real-world publicly-available datasets. The software implementing the proposed measures is made available online to facilitate their use by the research community.
    IEEE Transactions on Image Processing 11/2013; · 3.20 Impact Factor
  • Fabio Poiesi, Riccardo Mazzon, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a generic online multi-target track-before-detect (MT-TBD) that is applicable on confidence maps used as observations. The proposed tracker is based on particle filtering and automatically initializes tracks. The main novelty is the inclusion of the target ID in the particle state, enabling the algorithm to deal with unknown and large number of targets. To overcome the problem of mixing IDs of targets close to each other, we propose a probabilistic model of target birth and death based on a Markov Random Field (MRF) applied to the particle IDs. Each particle ID is managed using the information carried by neighboring particles. The assignment of the IDs to the targets is performed using Mean-Shift clustering and supported by a Gaussian Mixture Model. We also show that the computational complexity of MT-TBD is proportional only to the number of particles. To compare our method with recent state-of-the-art works, we include a postprocessing stage suited for multi-person tracking. We validate the method on real-world and crowded scenarios, and demonstrate its robustness in scenes presenting different perspective views and targets very close to each other.
    Computer Vision and Image Understanding 10/2013; 117(10):1257-1272. · 1.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose to use local Zernike Moments (ZMs) for facial affect recog-nition and introduce a representation scheme based on performing non-linear encoding on ZMs via quantization. Local ZMs provide a useful and compact description of im-age discontinuities and texture. We demonstrate the use of this ZM-based representa-tion for posed and discrete as well as naturalistic and continuous affect recognition on standard datasets, and show that ZM-based representations outperform well-established alternative approaches for both tasks. To the best of our knowledge, the performance we achieved on CK+ dataset is superior to all results reported to date.
    British Machine Vision Conference (BMVC), Bristol, UK; 09/2013
  • Source
    Fan Chen, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a method for detecting group interactions for groups of varying number of objects. We model each object as a moving agent with a direction-aware interest map and group interactions as mutual interests between objects. After grouping objects into unit interactions individually in each frame, we solve the temporal association problem by tracking group interaction over consecutive frames. Optimal grouping is obtained by finding the maximum weight spanning tree of a directed graph formed by objects and their potential interactions. Experimental results show that our method obtained around 80% recalling rates on two publicly available datasets.
    International Conference on Acoustics, Speech, and Signal Processing; 05/2013
  • Source
    Riccardo Mazzon, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: Tracking across non-overlapping cameras is a challenging open problem in video surveillance. In this paper, we propose a novel target re-identification method that models movements in non-observed areas with a modified Social Force Model (SFM) by exploiting the map of the site under surveillance. The SFM is developed with a goal-driven approach that models the desire of people to reach specific interest points (goals) of the site such as exits, shops, seats and meeting points. These interest points work as attractors for people movements and guide the path predictions in the non-observed areas. We also model key regions that are potential intersections of different paths where people can change the direction of motion. Finally, the predictions are linked to the trajectories observed in the next camera view where people reappear. We validate our multi-camera tracking method on the challenging i-LIDS dataset from the London Gatwick airport and show the benefits of the Multi-Goal Social Force Model.
    Neurocomputing. 01/2013; 100:41–50.
  • Source
    Andrea Cavallaro, Andres Kwasinski
    [Show abstract] [Hide abstract]
    ABSTRACT: Presents an editorial for this issue of IEEE Signal Processing Magazine.
    IEEE Signal Processing Magazine 01/2013; 30(1):4-4. · 3.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This Special Issue offers an overview of ongoing research on intelligent video surveillance (IVS) techniques, and brings together cutting-edge research work on security and privacy problems with respect to technological, behavioral, legal, and cultural aspects. We received 34 submissions and each submission was rigorously reviewed by at least two experts in the related fields based on the criteria of originality, significance, quality, and clarity. Eventually, 12 papers were accepted for the Special Issue, spanning a variety of topics including privacy protection, background modeling, tracking, action/activity analysis, and crowd behavior perception. The papers constituting this issue are then briefly summarized.
    IEEE Transactions on Information Forensics and Security 01/2013; 8(10):1559-1561. · 1.90 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding human behaviors is a challenging problem in computer vision that has recently seen important advances. Human behavior understanding combines image and signal processing, feature extraction, machine learning, and 3-D geometry. Application scenarios range from surveillance to indexing and retrieval, from patient care to industrial safety and sports analysis. Given the broad set of techniques used in video-based behavior understanding and the fast progress in this area, in this paper we organize and survey the corresponding literature, define unambiguous key terms, and discuss links among fundamental building blocks ranging from human detection to action and interaction recognition. The advantages and the drawbacks of the methods are critically discussed, providing a comprehensive coverage of key aspects of video-based human behavior understanding, available datasets for experimentation and comparisons, and important open research issues.
    IEEE Transactions on Circuits and Systems for Video Technology 01/2013; · 1.82 Impact Factor
  • Fabio Poiesi, Riccardo Mazzon, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a generic online multi-target track-before-detect (MT-TBD) that is applicable on confidence maps used as observations. The proposed tracker is based on particle filtering and automatically initializes tracks. The main novelty is the inclusion of the target ID in the particle state, enabling the algorithm to deal with unknown and large number of targets. To overcome the problem of mixing IDs of targets close to each other, we propose a probabilistic model of target birth and death based on a Markov Random Field (MRF) applied to the particle IDs. Each particle ID is managed using the information carried by neighboring particles. The assignment of the IDs to the targets is performed using Mean-Shift clustering and supported by a Gaussian Mixture Model. We also show that the computational complexity of MT-TBD is proportional only to the number of particles. To compare our method with recent state-of-the-art works, we include a postprocessing stage suited for multi-person tracking. We validate the method on real-world and crowded scenarios, and demonstrate its robustness in scenes presenting different perspective views and targets very close to each other.
    Computer Vision and Image Understanding 01/2013; 117(10):1257–1272. · 1.23 Impact Factor
  • R. Mazzon, F. Poiesi, A. Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a method to detect and track interacting people by employing a framework based on a Social Force Model (SFM). The method embeds plausible human behaviors to predict interactions in a crowd by iteratively minimizing the error between predictions and measurements. We model people approaching a group and restrict the group formation based on the relative velocity of candidate group members. The detected groups are then tracked by linking their interaction centers over time using a buffered graph-based tracker. We show how the proposed framework outperforms existing group localization techniques on three publicly available datasets, with improvements of up to 13% on group detection.
    Advanced Video and Signal Based Surveillance (AVSS), 2013 10th IEEE International Conference on; 01/2013
  • T Nawaz, A Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: The absence of a commonly adopted performance evaluation framework is hampering advances in the design of effective video trackers. In this paper, we present a single-score evaluation measure and a protocol to objectively compare trackers. The proposed measure evaluates tracking accuracy and failure, and combines them for both summative and formative performance assessment. The proposed protocol is composed of a set of trials that evaluate the robustness of trackers on a range of test scenarios representing several real-world conditions. The protocol is validated on a set of sequences with a diversity of targets (head, vehicle, person) and challenges (occlusions, background clutter, pose changes, scale changes) using six state-of-the-art trackers, highlighting their strengths and weaknesses on more than 187000 frames. The software implementing the protocol and the evaluation results are made available online and new results can be included, thus facilitating the comparison of trackers.
    IEEE Transactions on Image Processing 11/2012; · 3.20 Impact Factor
  • Riccardo Mazzon, Syed Fahad Tahir, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: Person re-identification aims to recognize the same person viewed by disjoint cameras at different time instants and locations. In this paper, after an extensive review of state-of-the-art approaches, we propose a re-identification method that takes into account the appearance of people, the spatial location of cameras and potential paths a person can choose to follow. This choice is modeled with a set of areas of interest (landmarks) that constrain the propagation of people trajectories in non-observed regions between the field-of-view of cameras. We represent people with a selective patch around their upper body to work in crowded scenes when occlusions are frequent. We demonstrate the proposed method in a challenging scenario from London Gatwick airport and compare it to well-known person re-identification methods, highlighting their strengths and limitations. Finally, we show by Cumulative Matching Characteristic curve that the best performance results by modeling people movements in non-observed regions combined with appearance methods, achieving an average improvement of 6% when only appearance is used and 15% when only motion is used for the association of people across cameras.
    Pattern Recognition Letters 10/2012; 33(14):1828–1837. · 1.27 Impact Factor
  • Source
    Samuele Salti, Andrea Cavallaro, Luigi Di Stefano
    [Show abstract] [Hide abstract]
    ABSTRACT: Long-term video tracking is of great importance for many applications in real-world scenarios. A key component for achieving long-term tracking is the tracker's capability of updating its internal representation of targets (the appearance model) to changing conditions. Given the rapid but fragmented development of this research area, we propose a unified conceptual framework for appearance model adaptation that enables a principled comparison of different approaches. Moreover, we introduce a novel evaluation methodology that enables simultaneous analysis of tracking accuracy and tracking success, without the need of setting application-dependent thresholds. Based on the proposed framework and this novel evaluation methodology, we conduct an extensive experimental comparison of trackers that perform appearance model adaptation. Theoretical and experimental analyses allow us to identify the most effective approaches as well as to highlight design choices that favor resilience to errors during the update process. We conclude the paper with a list of key open research challenges that have been singled out by means of our experimental comparison.
    IEEE Transactions on Image Processing 06/2012; 21(10):4334-48. · 3.20 Impact Factor
  • Source
    Juan C SanMiguel, Andrea Cavallaro, José M Martínez
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose an adaptive framework to estimate the quality of video tracking algorithms without ground-truth data. The framework is divided into two main stages, namely, the estimation of the tracker condition to identify temporal segments during which a target is lost and the measurement of the quality of the estimated track when the tracker is successful. A key novelty of the proposed framework is the capability of evaluating video trackers with multiple failures and recoveries over long sequences. Successful tracking is identified by analyzing the uncertainty of the tracker, whereas track recovery from errors is determined based on the time-reversibility constraint. The proposed approach is demonstrated on a particle filter tracker over a heterogeneous data set. Experimental results show the effectiveness and robustness of the proposed framework that improves state-of-the-art approaches in the presence of tracking challenges such as occlusions, illumination changes, and clutter and on sequences containing multiple tracking errors and recoveries.
    IEEE Transactions on Image Processing 01/2012; 21(5):2812-23. · 3.20 Impact Factor
  • G. Kayumbi, P.L. Mazzeo, A. Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-camera tracking often involve the projection of image data onto a ground plane. In this paper, we analyze the propagation of object tracking errors after trajectory transformation from multiple views onto the ground plane. In particular, we contrast a deterministic and a probabilistic algorithm and present an empirical study of their multiple object tracking results on dataset of $18000$ frames. By measuring tracking accuracy, we highlight the processes that generate the most significant errors and how these errors impact the estimation of the final object location. Ultimately, the propagation of these errors from the image plane to the ground plane trajectories gives insights for future enhancement of the algorithms employed.
    Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on; 01/2012

Publication Stats

815 Citations
100.41 Total Impact Points

Institutions

  • 2004–2014
    • Queen Mary, University of London
      • • Centre for Cell Signalling
      • • School of Electronic Engineering and Computer Science
      Londinium, England, United Kingdom
  • 2010
    • Universidad Autónoma de Madrid
      • High Technical College
      Madrid, Madrid, Spain
    • Stanford University
      Palo Alto, California, United States
  • 2005–2009
    • University of London
      Londinium, England, United Kingdom
    • WWF United Kingdom
      Londinium, England, United Kingdom
  • 1999–2004
    • École Polytechnique Fédérale de Lausanne
      • Laboratoire de traitement des signaux
      Lausanne, VD, Switzerland
  • 2000
    • Eawag: Das Wasserforschungs-Institut des ETH-Bereichs
      Duebendorf, Zurich, Switzerland