About
157
Publications
46,200
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,914
Citations
Citations since 2017
Introduction
Additional affiliations
January 2009 - December 2010
Queen Mary, University of London
January 2005 - December 2010
Publications
Publications (157)
In this paper we present a novel method to detect the presence of social interactions occurring in a surveillance scenario. The algorithm we propose complements motion features with proxemics cues, so as to link the human motion with the contextual and environmental information. The extracted features are analyzed through a multi-class SVM. Testing...
In this paper we propose a novel method to analyze trajectories in surveillance scenarios relying on automatically learned Context-Free Grammars. Given a training corpus of trajectories associated to a set of actions, an initial processing is carried out to extract the syntactical structure of the activities; then, the rules characterizing differen...
In this paper, we propose a novel approach to automatically plan video cameras positioning in indoor environments for surveillance applications. In order to ensure maximum coverage of the observed scene, we have implemented an ad-hoc tool based on the particle swarm optimization. The camera is modeled as a 2D function. A Rayleigh distribution is us...
The task of human pose estimation (HPE) deals with the ill-posed problem of estimating the 3D position of human joints directly from images and videos. In recent literature, most of the works tackle the problem mostly by using convolutional neural networks (CNNs), which are capable of achieving state-of-the-art results in most datasets. We show how...
In this article, we present a flow-based framework for multi-modal trajectory prediction, which is able to provide an accurate and explicit inference of the latent representations on trajectory data. Differently from other typical generative models (such as GAN, VAE, etc.), the flow-based models aim at learning data distribution explicitly through...
As of today, the field of networked musical XR is in its infancy. While the next generation networks keep pushing the available bandwidth towards new frontiers, promoting the deployment of new services and applications, a limited amount of residual latency still hinders the possibility for musicians to seamlessly interact over the Internet. In fact...
The recovery of motion abilities after a physical or neurological trauma is a long and winding road that is usually supported by medical staff. In particular, occupational therapists (OTs) play a fundamental role in assessing the performances of patients in daily life tasks and suggesting better practices and aids. The goal of OTs is to promote the...
Gender recognition from images is generally approached by extracting the salient visual features of the observed subject, either focusing on the facial appearance or by analyzing the full body. In real-world scenarios, image-based gender recognition approaches tend to fail, providing unreliable results. Face-based methods are compromised by environ...
The increasing popularity of social networks and users’ tendency towards sharing their feelings, expressions, and opinions in text, visual, and audio content have opened new opportunities and challenges in sentiment analysis. While sentiment analysis of text streams has been widely explored in the literature, sentiment analysis from images and vide...
Deep neural networks achieve outstanding results in a large variety of tasks, often outperforming human experts. However, a known limitation of current neural architectures is the poor accessibility to understand and interpret the network response to a given input. This is directly related to the huge number of variables and the associated non-line...
The Visual Sentiment Analysis task is being offered for the first time at MediaEval. The main purpose of the task is to predict the emotional response to images of natural disasters shared on social media. Disaster-related images are generally complex and often evoke an emotional response, making them an ideal use case of visual sentiment analysis....
Disaster analysis in social media content is one of the interesting research domains having an abundance of data. However, there is a lack of labeled data that can be used to train machine learning models for disaster analysis applications. Active learning is one of the possible solutions to such a problem. To this aim, in this paper, we propose an...
Indoor environment modeling has become a relevant topic in several application fields, including augmented, virtual, and extended reality. With the digital transformation, many industries have investigated two possibilities: generating detailed models of indoor environments, allowing viewers to navigate through them; and mapping surfaces so as to i...
Human Pose Estimation (HPE) aims at retrieving the 3D position of human joints from images or videos. We show that current 3D HPE methods suffer a lack of viewpoint equivariance, namely they tend to fail or perform poorly when dealing with viewpoints unseen at training time. Deep learning methods often rely on either scale-invariant, translation-in...
In this paper, we present a hierarchical framework for multi-modal trajectory forecasting, which can provide for each pedestrian in the scene the distributions for the next moves at every time step. The overall architecture adopts a standard encoder-decoder paradigm, where the encoder is based on a self-attention mechanism to extract the temporal f...
In this article, we propose a framework for crowd behavior prediction in complicated scenarios. The fundamental framework is designed using the standard encoder-decoder scheme, which is built upon the long short-term memory module to capture the temporal evolution of crowd behaviors. To model interactions among humans and environments, we embed bot...
Recurrent neural networks have shown good abilities in learning the spatio-temporal dependencies of moving agents in crowded scenes. Recently, they have been adopted to predict the motion of pedestrians by learning the relative motion of each individual in the crowd with respect to its neighbors. Crowded scenes present a wide variety of situations,...
Camera calibration is a necessary preliminary step in computer vision for the estimation of the position of objects in the 3D world. Despite the intrinsic camera parameters can be easily computed offline, extrinsic parameters need to be computed each time a camera changes its position, thus not allowing for fast and dynamic network re-configuration...
The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task, which aims to analyze and detect flooding events in multimedia content shared over Twitter. In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single m...
The increasing popularity of social networks and users' tendency towards sharing their feelings, expressions and opinions in text, visual, and audio content, have opened new opportunities and challenges in sentiment analysis. While sentiment analysis of text streams has been widely explored in literature, sentiment analysis from images and videos i...
Crowd surveillance plays a key role to ensure safety and security in public areas. Surveillance systems traditionally rely on fixed camera networks, which suffer from limitations, as coverage of the monitored area, video resolution and analytic performance. On the other hand, a smart camera network provides the ability to reconfigure the sensing in...
Sentiment analysis aims to extract and express a person's perception, opinions and emotions towards an entity, object, product and a service, enabling businesses to obtain feedback from the consumers. The increasing popularity of the social networks and users' tendency towards sharing their feelings, expressions and opinions in text, visual and aud...
This article aims at introducing to the readers of the AES Magazine the recently constituted technical panel: “Glue Technologies for Space Systems.” A short overview of the technologies considered in the panel will be provided, along with panel vision and perspectives shared with the founder members. Some information about panel meetings and partic...
The analysis of natural disaster-related multimedia content got great attention in recent years. Being one of the most important sources of information, social media have been crawled over the years to collect and analyze disaster-related multimedia content. Satellite imagery has also been widely explored for disasters analysis. In this paper, we s...
Social media have been widely exploited to detect and gather relevant information about opinions and events. However, the relevance of the information is very subjective and rather depends on the application and the end-users. In this article, we tackle a specific facet of social media data processing, namely the sentiment analysis of disaster-rela...
In this paper we present our methods for the MediaEval 2019 Mul-timedia Satellite Task, which is aiming to extract complementaryinformation associated with adverse events from Social Media andsatellites. For the first challenge, we propose a framework jointly uti-lizing colour, object and scene-level information to predict whetherthe topic of an ar...
Disaster analysis in social media content is one of the interesting research domains having abundance of data. However, there is a lack of labeled data that can be used to train machine learning models for disaster analysis applications. Active learning is one of the possible solutions to such problem. To this aim, in this paper we propose and asse...
We propose a solution to increase the privacy of people recorded with security cameras without decreasing the details stored in the videos. We strongly believe that CCTV recordings are a necessary and precious source of information to be analyzed when a crime or other unfortunate events happen; for this reason, we would like to have powerful survei...
Camera resectioning is essential in computer vision and 3D reconstruction to estimate the position of matching pinhole cameras in 3D worlds. While the internal camera parameters are usually known or can be easily computed offline, in camera networks extrinsic parameters need to be computed each time a camera changes position, thus not allowing for...
Camera resectioning is essential in computer vision and 3D reconstruction to estimate the position of matching pinhole cameras in 3D worlds. While the internal camera parameters are usually known or can be easily computed offline, in camera networks extrinsic parameters need to be computed each time a camera changes position, thus not allowing for...
Social media have been widely exploited to detect and gather relevant information about opinions and events. However, the relevance of the information is very subjective and rather depends on the application and the end-users. In this article, we tackle a specific facet of social media data processing, namely the sentiment analysis of disaster-rela...
Social modeling of pedestrian dynamics is a key element to understand the behavior of crowded scenes. Existing crowd models like the Social Force Model and the Reciprocal Velocity Obstacle, traditionally rely on empirically-defined functions to characterize the dynamics of a crowd. On the other hand, frameworks based on deep learning, like the Soci...
A successful rehabilitation process always requires both medical and infrastructural support. In this paper we focus on paraplegic wheelchair users, aiming at understanding the correlation between accuracy in guidance and muscular fatigue, while moving on a known training path. In particular, we study the trajectories performed and the correspondin...
Nowadays, the automobile industry is investing a considerable effort on self-driving cars. One of the most relevant survey areas researchers are focusing on is the estimation of the position of a vehicle. This thesis proposes some of the core methodologies required to determine the location of a car when the GNSS receiver does not provide useful in...
This paper addresses the problem of floods classification and floods aftermath detection based on both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed for...
The analysis of crowded scenes is one of the most challenging scenarios in visual surveillance, and a variety of factors need to be taken into account, such as the structure of the environments, and the presence of mutual occlusions and obstacles. Traditional prediction methods (such as RNN, LSTM, VAE, etc.) focus on anticipating individual’s futur...
The analysis of natural disaster-related multimedia content got great attention in recent years. Being one of the most important sources of information, social media have been crawled over the years to collect and analyze disaster-related multimedia content. Satellite imagery has also been widely explored for disasters analysis. In this paper, we s...
Event recognition is one of the areas in multimedia that is attracting great attention of researchers. Being applicable in a wide range of applications, from personal to collective events, a number of interesting solutions for event recognition using multimedia information sources have been proposed. On the other hand, following their immense succe...
This paper addresses the problem of floods classification and floods aftermath detection utilizing both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed fo...
This paper presents the method proposed by team UTAOS for MediaEval 2018 Multimedia Satellite Task: Emergency Response for Flooding Events. In the first challenge, we mainly rely on object and scene level features extracted through multiple deep models pre-trained on the ImageNet and Places datasets. The object and scene-level features are combined...
We present a new multi-modal technique for assisting visually-impaired people in recognizing objects in public indoor environment. Unlike common methods which aim to solve the problem of multi-class object recognition in a traditional single-label strategy, a comprehensive approach is developed here allowing samples to take more than one label at a...
This paper presents the method proposed by team UTAOS for MediaEval 2018 Multimedia Satellite Task: Emergency Response for Flooding Events. In the first challenge, we mainly rely on object and scene level features extracted through multiple deep models pre-trained on the ImageNet and Places datasets. The object and scene-level features are combined...
Crowd surveillance will play a fundamental role in the coming generation of video surveillance systems, in particular for improving public safety and security. However, traditional camera networks are mostly not able to closely survey the entire monitoring area due to limitations in coverage, resolution and analytics performance. A smart camera net...
Marker-less skeleton tracking methods are being widely used for applications such as computer animation, human action recognition, human robot collaboration and humanoid robot motion control. Regarding robot motion control, using the humanoid’s 3D camera and a robust and accurate tracking algorithm, vision based tracking could be a wise solution. I...
In this article, we address the problem of recognizing an event from a single related picture. Given the large number of event classes and the limited information contained in a single shot, the problem is known to be particularly hard. To achieve a reliable detection, we propose a combination of multiple classifiers, and we compare three alternati...
Being able to automatically link social media and satellite imagery holds large opportunities for research, with a potentially considerable impact on society. The possibility of integrating different information sources opens in fact to new scenarios where the wide coverage of satellite imaging can be used as a collector of the fine-grained details...
The paper addresses the problem of adverse events (natural disasters) recognition in user-generated images from social media, addressing the problem from two complementary perspectives. On one side, we aim to provide a comprehensive comparative analysis of different feature extraction and classification algorithms, relying on two different families...
Object recognition methods usually tend to focus on single cues coming from traditional vision based systems but ignore to incorporate multi-modal data. With the advent of depth RGB-D sensors which provide synchronized multi-modal data with good quality, new opportunities have been emerged. In this paper, we make use of RGB and depth images to prop...
In this paper, we propose an active learning based approach to event recognition in personal photo collections to tackle the challenges due to the weakly labeled data, and the presence of irrelevant pictures in personal photo collections. Conventional approaches relying on supervised learning can not identify the relevant samples in training albums...
The automated analysis of crowds and the identification of crowd behaviors are important for predicting dangerous situations during events, for appropriately designing public spaces, and for the real-time management of people flows. This chapter covers models and algorithms for the analysis of crowds captured in videos for facilitating personal mob...
Over the last few years, a rapid growth has been witnessed in the number of digital photos produced per year. This rapid process poses challenges in the organization and management of multimedia collections, and one viable solution consists of arranging the media on the basis of the underlying events. However, album-level annotation and the presenc...
This paper proposes a novel two-stage framework for event
recognition in still images. First, for a generic event image,
deep features, obtained via different pre-trained models, are
fed into an ensemble of classifiers, whose posterior classification
probabilities are thereafter fused by means of an orderinduced
scheme, which penalizes the yielded...
This paper presents the method proposed by team UTAOS for the Mediaeval 2017 challenge on Multi-media and Satellite. In the first task, we mainly rely on different Convolutional Neural Network (CNN) models combined with two different late fusion methods. We also utilize the additional information available in the form of meta-data. The average and...
Over the last few years, a number of interesting solutions covering different aspects of event recognition have been proposed for event-based multimedia analysis. Existing approaches mostly focus on an efficient representation of the image and advanced classification schemes. However, it would be desirable to focus on the event-specific information...