Andrea Cavallaro

Università degli Studi di Genova, Genova, Liguria, Italy

Are you Andrea Cavallaro?

Claim your profile

Publications (127)163.13 Total impact

  • Fabio Poiesi, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a framework for multitarget detection and tracking that infers candidate target locations in videos containing a high density of homogeneous targets. We propose a gradient-climbing technique and an isocontor slicing approach for intensity maps to localize targets. The former uses Markov chain Monte Carlo to iteratively fit a shape model onto the target locations, whereas the latter uses the intensity values at different levels to find consistent object shapes. We generate trajectories by recursively associating detections with a hierarchical graph-based tracker on temporal windows. The solution to the graph is obtained with a greedy algorithm that accounts for false-positive associations. The edges of the graph are weighted with a likelihood function based on location information. We evaluate the performance of the proposed framework on challenging datasets containing videos with high density of targets and compare it with six alternative trackers.
    IEEE Transactions on Circuits and Systems for Video Technology 04/2015; 25(4):623-637. DOI:10.1109/TCSVT.2014.2344509 · 2.26 Impact Factor
  • Juan C. SanMiguel, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose an approach to create camera coalitions in resource-constrained camera networks and demonstrate it for collaborative target tracking. We cast coalition formation as a decentralized resource allocation process where the best cameras among those viewing a target are assigned to a coalition based on marginal utility theory. A manager is dynamically selected to negotiate with cameras whether they will join the coalition and to coordinate the tracking task. This negotiation is based not only on the utility brought by each camera to the coalition, but also on the associated cost (i.e. additional processing and communication). Experimental results and comparisons using simulations and real data show that the proposed approach outperforms related state-of-the-art methods by improving tracking accuracy in cost-free settings. Moreover, under resource limitations, the proposed approach controls the tradeoff between accuracy and cost, and achieves energy savings with only a minor reduction in accuracy.
    IEEE Sensors Journal 01/2015; 15(5):2657-2668. DOI:10.1109/JSEN.2014.2367015 · 1.85 Impact Factor
  • Source
    E. Sariyanidi, H. Gunes, A. Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: Face images in a video sequence should be registered accurately before being analysed, otherwise registration errors may be interpreted as facial activity. Subpixel accuracy is crucial for the analysis of subtle actions. In this paper we present PSTR (Probabilistic Subpixel Temporal Registration), a framework that achieves high registration accuracy. .... .....
    Asian Computer Vision Conference (ACCV'14), Singapore; 11/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The choice of the most suitable fusion scheme for smart cam-era networks depends on the application as well as on the available computational and communication resources. In this paper we discuss and compare the resource requirements of five fusion schemes, namely centralised fusion, flooding, consensus, token passing and dynamic clustering. The Ex-tended Information Filter is applied to each fusion scheme to perform target tracking. Token passing and dynamic clus-tering involve negotiation among viewing nodes (cameras observing the same target) to decide which node should per-form the fusion process whereas flooding and consensus do not include this negotiation. Negotiation helps limiting the number of participating cameras and reduces the required resources for the fusion process itself but requires additional communication. Consensus has the highest communication and computation costs but it is the only scheme that can be applied when not all viewing nodes are connected directly and routing tables are not available.
    8th ACM / IEEE International Conference on Distributed Smart Cameras (ICDSC 2014); 11/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a framework for improving probabilistic tracking of an extended object with a set of model points. The framework combines the tracker with an on-line performance measure and a correction technique. We correlate model point trajectories to improve on-line the accuracy of a failed or an uncertain tracker. A model point tracker gets assistance from neighboring trackers whenever a degradation in its performance is detected using the online performance measure. The correction of the model point state is based on correlation information from the state of other trackers. Partial Least Square (PLS) regression is used to model the correlation of point tracker states from short windowed trajectories adaptively. Experimental results on data obtained from optical motion capture systems show the improvement in tracking performance of the proposed framework compared to the baseline tracker and other state-of-the-art trackers.
    Neurocomputing 10/2014; DOI:10.1016/j.neucom.2014.10.057 · 2.01 Impact Factor
  • Source
    Evangelos Sariyanidi, Hatice Gunes, Andrea Cavallaro
    IEEE Transactions on Pattern Analysis and Machine Intelligence 10/2014; DOI:10.1109/TPAMI.2014.2366127 · 5.69 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a tracker-level fusion framework for robust visual tracking. The framework combines trackers addressing different tracking challenges to improve the overall performance. A novelty of the proposed framework is the inclusion of an online performance measure to identify the track quality level of each tracker so as to guide the fusion. The fusion is then based on appropriately mixing the prior state of the trackers. Moreover, the track-quality level is used to update the target appearance model. We demonstrate the framework with two Bayesian trackers on video sequences with various challenges and show its robustness compared to the independent use of the two individual trackers, and also compared to state-of-the-art trackers that use tracker-level fusion.
    IEEE Transactions on Circuits and Systems for Video Technology 09/2014; DOI:10.1109/TCSVT.2014.2360027 · 2.26 Impact Factor
  • Sophia Bano, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a framework for the automatic grouping and alignment of unedited multi-camera User-Generated Videos (UGVs) within a database. The proposed framework analyzes the sound in order to match and cluster UGVs that capture the same spatio-temporal event and estimate their relative time-shift to temporally align them. We design a descriptor derived from the pairwise matching of audio chroma features of UGVs. The descriptor facilitates the definition of a classification threshold for automatic query-by-example event identification. We evaluate the proposed identification and synchronization framework on a database of 263 multi-camera recordings of 48 real-world events and compare it with state-of-the-art methods. Experimental results show the effectiveness of the proposed approach in the presence of various audio degradations.
    Information Sciences 08/2014; 302:108-121. DOI:10.1016/j.ins.2014.08.026 · 3.89 Impact Factor
  • S.F. Tahir, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: Networks of smart cameras share large amounts of data to accomplish tasks such as reidentification. We propose a feature-selection method that minimizes the data needed to represent the appearance of objects by learning the most appropriate feature set for the task at hand (person reidentification). The computational cost for feature extraction and the cost for storing the feature descriptor are considered jointly with feature performance to select cost-effective good features. This selection allows us to improve intercamera reidentification while reducing the bandwidth that is necessary to share data across the camera network. We also rank the selected features in the order of effectiveness for the task to enable a further reduction of the feature set by dropping the least effective features when application constraints require this adaptation. We compare the proposed approach with state-of-the-art methods on the iLIDS and VIPeR datasets and show that the proposed approach considerably reduces network traffic due to intercamera feature sharing while keeping the reidentification performance at an equivalent or better level compared with the state of the art.
    IEEE Transactions on Circuits and Systems for Video Technology 08/2014; 24(8):1362-1374. DOI:10.1109/TCSVT.2014.2305511 · 2.26 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Consensus-based target tracking in camera networks faces three major problems: non-linearity in the measurement model, temporary lack of measurements (naivety) due to the limited field of view (FOV) and redundancy in the iterative exchange of information. In this paper we propose two consensus-based distributed algorithms for non-linear systems using the Extended Information Filter as underlying filter to handle the non-linearity in the camera measurement model. The first algorithm is an Extended Information Consensus Filter (EICF) that overcomes the effect of naivety and non-linearity without requiring knowledge of other nodes in the network. The second algorithm is an Extended Information Weighted Consensus Filter (EIWCF) that overcomes all the three major problems (naivety, redundancy and non-linearity) but requires knowledge of the number of cameras (Nc) in the network. The basic principle of these algorithms is weighting node estimates based on their covariance information. When Nc is not available, EICF can be used at the cost of not handling the redundancy problem. Simulations with highly maneuvering targets show that the two proposed distributed non-linear consensus filters outperform the related state of the art by achieving higher accuracy and faster convergence to the centralised estimates computed by simultaneously considering the information from all the nodes.
    17th International Conference on Information Fusion (FUSION), 2014, Salamanca, Spain; 07/2014
  • Juan C. SanMiguel, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an approach for determining the temporal consistency of particle filters in video tracking based on model validation of their uncertainty over sliding windows. The filter uncertainty is related to the consistency of the dispersion of the filter hypotheses in the state space. We learn an uncertainty model via a mixture of Gamma distributions whose optimum number is selected by modified information-based criteria. The time-accumulated model is estimated as the sequential convolution of the uncertainty model. Model validation is performed by verifying whether the output of the filter belongs to the convolution model through its approximated cumulative density function. Experimental results and comparisons show that the proposed approach improves both precision and recall of competitive approaches such as Gaussian-based online model extraction, bank of Kalman filters and empirical thresholding. We combine the proposed approach with a state-of-the-art online performance estimator for video tracking and show that it improves accuracy compared to the same estimator with manually tuned thresholds while reducing the overall computational cost.
    Computer Vision and Image Understanding 07/2014; 131. DOI:10.1016/j.cviu.2014.06.016 · 1.36 Impact Factor
  • Marco Del Coco, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: The complexity of multi-target tracking grows faster than linearly with the increase of the numbers of objects, thus making the design of real-time trackers a challenging task for scenarios with a large number of targets. The Probability Hypothesis Density (PHD) filter is known to help reducing this complexity. However, this reduction may not suffice in critical situations when the number of targets, dimension of the state vector, clutter conditions and sample rate are high. To address this problem, we propose a parallelization scheme for the particle PHD filter. The proposed scheme exploits the knowledge of mutual interacting targets in the scene to help fragmentation and to reduce the workload of individual processors. We compare the proposed approach with alternative parallelization schemes and discuss its advantages and limitations using the results obtained on two multi-target tracking datasets.
    ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
  • Syed Fahad Tahir, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose an object matching approach aimed at smartphone cameras that exploits the well-known concept of local sets of features for object representation. We also enable the temporal alignment of cameras by exploiting the frames of detected objects to group objects appeared in the same time interval for the assignment within each camera. The proposed approach does not need training thus making it suitable for matching during short temporal intervals. We use both outdoor and indoor datasets for the evaluation, and show that the proposed method reduces up to 95% the amount of information to be stored and communicated.
    ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
  • Raul Mohedano, Andrea Cavallaro, Narciso Garcia
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new Bayesian framework for automatically determining the position (location and orientation) of an uncalibrated camera using the observations of moving objects and a schematic map of the passable areas of the environment. Our approach takes advantage of static and dynamic information on the scene structures through prior probability distributions for object dynamics. The proposed approach restricts plausible positions where the sensor can be located while taking into account the inherent ambiguity of the given setting. The proposed framework samples from the posterior probability distribution for the camera position via data driven MCMC, guided by an initial geometric analysis that restricts the search space. A Kullback-Leibler divergence analysis is then used that yields the final camera position estimate, while explicitly isolating ambiguous settings. The proposed approach is evaluated in synthetic and real environments, showing its satisfactory performance in both ambiguous and unambiguous settings.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 04/2014; 36(4):684-697. DOI:10.1109/TPAMI.2013.243 · 5.69 Impact Factor
  • Anna Llagostera Casanovas, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a multimodal method for the automatic synchronization of audio-visual recordings captured with a set of independent cameras. The proposed method jointly processes data from audio and video channels to estimate inter-camera delays that are used to temporally align the recordings. Our approach is composed of three main steps. First we extract from each recording temporally sharp audio-visual events. These audio-visual events are short and characterized by an audio onset happening jointly to a well-localized spatio-temporal change in the video data. Then, we estimate the inter-camera delays by assessing the co-occurrence of the events in the various recordings. Finally, we use a cross-validation procedure that combines the results for all camera pairs and aligns the recordings in a global timeline. An important feature of the proposed method is the estimation of the confidence level on the results that allows us to automatically reject recordings that are not reliable for the alignment. Results show that our method outperforms state-of-the-art approaches based on audio-only or video-only analysis with both fixed and hand-held moving cameras.
    Multimedia Tools and Applications 02/2014; 74(4). DOI:10.1007/s11042-014-1872-y · 1.06 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a hybrid personalized summarization framework that combines adaptive fast-forwarding and content truncation to generate comfortable and compact video summaries. We formulate video summarization as a discrete optimization problem, where the optimal summary is determined by adopting Lagrangian relaxation and convex-hull approximation to solve a resource allocation problem. To trade-off playback speed and perceptual comfort we consider information associated to the still content of the scene, which is essential to evaluate the relevance of a video, and information associated to the scene activity, which is more relevant for visual comfort. We perform clip-level fast-forwarding by selecting the playback speeds from discrete options, which naturally include content truncation as special case with infinite playback speed. We demonstrate the proposed summarization framework in two use cases, namely summarization of broadcasted soccer videos and surveillance videos. Objective and subjective experiments are performed to demonstrate the relevance and efficiency of the proposed method.
    IEEE Transactions on Multimedia 02/2014; 16(2-2):455-469. DOI:10.1109/TMM.2013.2291967 · 1.78 Impact Factor
  • Luca Zini, Francesca Odone, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of multi-view association of articulated objects observed by potentially moving and handheld cameras. Starting from trajectory data, we encode the temporal evolution of the objects and perform matching without making assumptions on scene geometry and with only weak assumptions on the field-of-view overlaps. After generating a viewpoint invariant representation using Self Similarity Matrices, we put in correspondence the spatio-temporal object descriptions using spectral methods on the resulting matching graph. We validate the proposed method on three publicly available real-world datasets, and compare it with alternative approaches. Moreover we present an extensive analysis of the accuracy of the proposed method in different contexts, with varying noise levels on the input data, varying amount of overlap between the fields of view, and varying duration of the available observations.
    IEEE Transactions on Circuits and Systems for Video Technology 01/2014; 24(99). DOI:10.1109/TCSVT.2014.2302547 · 2.26 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Camera networks that reconfigure while performing multiple tasks have unique requirements, such as concurrent task allocation with limited resources, the sharing of data among fields of view across the network, and coordination among heterogeneous devices.
    Computer 01/2014; 47(5):67-73. DOI:10.1109/MC.2014.133 · 1.44 Impact Factor
  • Fabio Poiesi, Andrea Cavallaro
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an extensive survey of methods for recognizing human interactions and propose a method for predicting rendezvous areas in observable and unobservable regions using sparse motion information. Rendezvous areas indicate where people are likely to interact with each other or with static objects (e.g., a door, an information desk or a meeting point). The proposed method infers the direction of movement by calculating prediction lines from displacement vectors and temporally accumulates intersecting locations generated by prediction lines. The intersections are then used as candidate rendezvous areas and modeled as spatial probability density functions using Gaussian Mixture Models. We validate the proposed method to predict dynamic and static rendezvous areas on real-world datasets and compare it with related approaches.
    Journal of Real-Time Image Processing 01/2014; DOI:10.1007/s11554-014-0428-8 · 1.11 Impact Factor
  • Raúl Mohedano, Andrea Cavallaro, Narciso García
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new Bayesian framework for automatically determining the position (location and orientation) of an uncalibrated camera using the observations of moving objects and a schematic map of the passable areas of the environment. Our approach takes advantage of static and dynamic information on the scene structures through prior probability distributions for object dynamics. The proposed approach restricts plausible positions where the sensor can be located while taking into account the inherent ambiguity of the given setting. The proposed framework samples from the posterior probability distribution for the camera position via Data Driven MCMC, guided by an initial geometric analysis that restricts the search space. A Kullback-Leibler divergence analysis is then used that yields the final camera position estimate, while explicitly isolating ambiguous settings. The proposed approach is evaluated in synthetic and real environments, showing its satisfactory performance in both ambiguous and unambiguous settings.
    IEEE Transactions on Software Engineering 12/2013; · 5.69 Impact Factor

Publication Stats

1k Citations
163.13 Total Impact Points

Institutions

  • 2014
    • Università degli Studi di Genova
      • School of Mathematical, Physical and Natural Sciences
      Genova, Liguria, Italy
    • University of Udine
      Udine, Friuli Venezia Giulia, Italy
    • AVACO AG, Switzerland
      Basel-Landschaft, Switzerland
  • 2004–2014
    • Queen Mary, University of London
      • School of Electronic Engineering and Computer Science
      Londinium, England, United Kingdom
  • 2010
    • Stanford University
      Palo Alto, California, United States
  • 2005–2009
    • University of London
      Londinium, England, United Kingdom